### Minimum Energy Operation of Clustered Island-Style FPGAs

Peter Grossmann, Miriam Leeser, Marvin Onabajo

(grossmann@ll.mit.edu, mel@coe.neu.edu, monabajo@ece.neu.edu)



The Lincoln Laboratory portion of this work was sponsored by the Department of the Air Force under Air Force contract number FA8721-05-C-0002. The opinions, interpretations, conclusions and recommendations are those of the authors and are not necessarily endorsed by the United States Government.

FPGA13-1 PJG 3/6/2013





- Motivation for minimum energy FPGAs
- Introduction to minimum energy digital circuits
- Subthreshold FPGA test chip measurements
- Subsonic Mini FPGA and minimum energy analysis technique
- Minimum energy point variation across multiple benchmark circuits on Subsonic Mini
- Summary





- Low power systems benefit from FPGAs
  - Improved energy efficiency/performance vs. microcontroller
  - Improved design via reconfigurability
  - Lower cost vs. ASIC
- Lowest power systems (< 1 mW) have been slow to adopt FPGAs
  - Limited logic resources
  - High static power vs. microcontrollers
- Lowest power systems need a minimum energy FPGA
  - Prioritize energy consumption over performance
  - Maximize voltage scaling benefit for static power reduction





- Voltage scaling causes competing static energy penalty and dynamic energy savings
- Minimum energy point:
  - is usually subthreshold
  - depends on circuit design and input activity







#### Minimum energy predictions for FPGAs:

- Higher minimum energy voltages than ASICs
- Dependency on FPGA architecture and utilization



#### Subthreshold vs. Superthreshold Circuits





- Delay increases, power decreases by orders of magnitude
- Low speed supports many ultra-low power applications
- Minimizing power-delay product maximizes energy efficiency

# Exponential I-V relationship drastically increases process variation sensitivity below V<sub>TH</sub>

FPGA13-5 PJG 3/6/2013





- Our work: IBM 0.18 µm SOI test chip with 4x4 array of tiles
- Differences from previous work
  - Single supply voltage
  - Latches instead of SRAM for configuration bits
  - Unidirectional routing fabric
  - Static CMOS instead of pass transistors for multiplexers
  - Previous work:
    - J. F. Ryan and B. H. Calhoun, "A sub-threshold FPGA with lowswing dual-VDD interconnect in 90nm CMOS," in *Proc. Custom Integrated Circuits Conference (CICC), Sep. 2010, pp. 1-4.*



**Test Chip Die Photograph** 



- Clustered, island-style FPGA
- Clusters of 8 basic logic elements (BLEs) good BLE count for low-power FPGAs
- Directional, single driver routing

**Connection Block (CB)** 



FPGA13-7 PJG 3/6/2013





# Sample Test Chip Measurements



|                          | V <sub>DD</sub> =1.5V  | V <sub>DD</sub> =0.26V  |
|--------------------------|------------------------|-------------------------|
| Max Frequency            | 16.7 MHz               | 322 kHz                 |
| Power @ F <sub>max</sub> | 76.5 mW                | 34.6 μW                 |
| Power Delay Product      | 4.6 × 10⁻ <sup>9</sup> | 0.11 × 10 <sup>-9</sup> |

- FPGA programmed as array of 16 4-bit counters
- Data collected on Agilent SoC 93000 ATE
- Minimum operating voltage across all dies: 0.26V
- Average minimum operating voltage: 300 mV
- Lowest voltage at which an FPGA has been successfully programmed



#### Sample Shmoo Plot







$$PDP = V_{DD} \cdot I \cdot T_{clk}$$

Power Delay Product
 (PDP) == average energy
 per clock cycle

 If FPGA can be kept busy enough, minimum energy point is below threshold

 Expect practical circuits to have higher minimum energy point





- Motivation for minimum energy FPGAs
- Introduction to minimum energy digital circuits
- Subthreshold FPGA test chip measurements
- Subsonic Mini FPGA and minimum energy analysis technique
- Minimum energy point variation across multiple benchmark circuits on Subsonic Mini
- Summary



- Design a VPR-compatible FPGA (Subsonic Mini)
- Devise a simulation-based approach for estimating minimum energy point
- Investigate FPGA PDP and minimum energy point sensitivity across multiple benchmark circuits

#### **Approach taken**

Combine Cadence IC design and verification tools with VPR and custom scripts to plot PDP vs. voltage



### Subsonic Mini FPGA



| VPR<br>Parameter  | Value |
|-------------------|-------|
| К                 | 4     |
| N                 | 8     |
| l I               | 18    |
| F <sub>cin</sub>  | 0.333 |
| F <sub>cout</sub> | 1     |
| Fs                | 3     |
| W                 | 20    |
| L                 | 1     |

- Fully connected input crossbar
- 6x6 array of tiles (288 total 4-LUT/flip-flop pairs)
- IOBs with 2 I/Os per block, no level shifters
- Added logic to read configuration bits off-chip

- Process Details
  - IBM 65 nm low power bulk process
  - Standard-V<sub>TH</sub> transistors used throughout











- IC CAD details
  - Leaf cells characterized with Cadence Liberate from extracted layout
  - Benchmarks simulated with gate-level Verilog and SDF back-annotation
  - 1000 random input test vectors applied
- FPGA CAD details:
  - Ran 10 VPR trials for each benchmark at each V<sub>DD</sub>
  - Place and route solution with smallest critical path delay used
- Benchmarks used: 21 ISCAS '85 circuits







 Minimum energy point slightly above threshold

 Local minima likely caused by variation in routing solution quality



# Sample Benchmark PDP vs. V<sub>DD</sub> Plots





FPGA13-16 PJG 3/6/2013



# Sample Benchmark PDP vs. V<sub>DD</sub> Plots

































#### Minimum Energy V<sub>DD</sub> vs. Benchmark Input Count and LUT Utilization for 21 ISCAS '85 Benchmarks







#### Minimum Energy V<sub>DD</sub> vs. Benchmark Input Count and LUT Utilization for 21 ISCAS '85 Benchmarks







- First multi-benchmark FPGA minimum energy point study
  - Analysis, results differ from ASIC studies
  - Multiple benchmarks produce a range of results

• The energy efficiency of a circuit mapped to an FPGA influences the FPGA's minimum energy point

• A programmable supply voltage is required to optimize FPGA energy efficiency across a range of use cases





- Investigate minimum energy point analysis accuracy vs.
  Spectre simulations for a single tile
- Perform analysis on 30x30 array with Toronto20 benchmarks
- Incorporate power model into VPR 6.0 and compare results
- Explore minimum energy point sensitivity to FPGA architecture parameters





- Measurement results from fabricated FPGA test chip
  - Single 260 mV supply lowest programming voltage for FPGAs
  - Subthreshold minimum energy point under high activity conditions
- Minimum energy point estimation using ASIC verification techniques and FPGA CAD tools
  - Slightly above threshold minimum energy point for real benchmark circuits
  - Minimum energy point is a benchmark circuit property
  - Optimizing FPGA energy efficiency requires tuning the supply voltage to an application-specific value







- MIT Lincoln Laboratory
  - Lincoln Scholars Program
  - **Group 83**
  - Group 88 WeiLin Hu, Tony Soares
  - LLCAD

#### Peter Grossmann (grossmann@ll.mit.edu)





• List of backup slides here



# Minimum Energy Analysis—Detailed







### **Bitstream and Schematic Generation**





FPGA13-29 PJG 3/6/2013



# Adding Timing to VPR Architecture Files











# **Complete VPR Post-Processing Flow**





FPGA13-32 PJG 3/6/2013



# **Complete FPGA CAD Flow**





 Use bitstream file to generate Verilog testbench task calls for programming

 One test case per benchmark per V<sub>DD</sub>

• 10 VPR trials per test case









### Subthreshold Digital Circuit Design Considerations







- Static CMOS still effective
- P:N sizing ratio typically increases (2:1 -> 10:1)
- Higher uncertainty in delay
- Some circuit styles limit minimum functional V<sub>DD</sub> and should be avoided
  - Parallel off transistors w/o corresponding on transistors
  - Series stacks with more than two transistors
  - Circuits dependent on transistor sizing ratios



### **Multiplexer Leaf Cells**











#### **NMOS-controlled Output Transitions**

- PMOS transistors more vulnerable to process variation
- DTMOS configuration more robust, but has 2.6X area penalty

- 0.18 µm IBM SOI process
- Min W, L, P:N = 2:1
- Post-layout circuit netlists
- 100 Monte Carlo iterations
- Apply variation to testbench drivers, loads as well as DUT



#### **PMOS-controlled Output Transitions**



### **Configuration Bit Storage**





#### Standard 6T SRAM Bit Cell

- Exploit FPGA use case: slow writes, no random access reads
- Improve robustness by eliminating contention during writes
- Minimize area by eliminating extra transistors, supplies



#### Subthreshold 6T SRAM Bit Cell







- 0.18 µm IBM SOI process
- Post-layout circuit netlists
- 100 Monte Carlo iterations
- Applied variation to testbench drivers, loads as well as DUT









- IBM 0.18 µm SOI Process
- Two test chips fabricated: conventional and DTMOS multiplexers
- Both chips use 6T latch for configuration bits





**Test Chip #1—Conventional Multiplexers** 

**Test Chip #2—DTMOS Multiplexers**