

# Rethinking FPGAs: Elude the Flexibility Excess of LUTs with And-Inverter Cones

Hadi P. Afshar

Joint work with: David Novo, Paolo Ienne, and Hind Benbihi

#### **Motivation**



#### Motivation



# Outline

- And Inverter Cone (AIC)
- Mapping AIG Subgraphs to AICs
- Technology Mapping
- AIC Clustering
- Experiments
- Conclusions

### And-Inverter Cone (AIC)



#### LUT vs. AIC



#### Multi-Output AICs



**4-AIC** 

Multiple OutputsFracturable Structure

# Multi-Output: LUTs vs. AICs



#### **AICs Specifications**

| Block | inputs | outputs | 2:1 mux | Config bits |
|-------|--------|---------|---------|-------------|
| 2-AIC | 4      | 1       | 3       | 3           |
| 3-AIC | 8      | 3       | 7       | 7           |
| 4-AIC | 16     | 7       | 15      | 15          |
| 5-AIC | 32     | 15      | 31      | 31          |
| 6-AIC | 64     | 31      | 63      | 63          |
| 6-LUT | 6      | 1       | 63      | 64          |

≈ configuration

6-AIC can implement <u>larger</u> and <u>multiple</u> functions

# **AICs Specifications**

|                | Block | inputs | outputs | 2:1 mux | Config bits |                  |
|----------------|-------|--------|---------|---------|-------------|------------------|
|                | 2-AIC | 4      | 1       | 3       | 3           |                  |
|                | 3-AIC | 8      | 3       | 7       | 7           |                  |
|                | 4-AIC | 16     | 7       | 15      | 15          | ~                |
| ≈ input counts | 5-AIC | 32     | 15      | 31      | 31          | 3-AIC is smaller |
| K              | 6-AIC | 64     | 31      | 63      | 63          |                  |
|                | 6-LUT | 6      | 1       | 63      | 64          |                  |

# Outline

- And Inverter Cone (AIC)
- Mapping AIG Subgraphs to AICs
- Technology Mapping
- AIC Clustering
- Experiments
- Conclusions

## Mapping AIG Subgraphs to AICs

**Graph-based Transformations** 



## Mapping AIG Subgraphs to AICs



## Mapping AIG Subgraphs to AICs



# Outline

- And Inverter Cone (AIC)
- Mapping AIG Subgraphs to AICs
- Technology Mapping
- AIC Clustering
- Experiments
- Conclusions

# **Technology Mapping**



## What is the difference?

- <u>Depth</u> feasible cones
  - Rather than k-feasible cones
- Multi-output cones



## LUT/AIC Breakdown



Average over all benchmarks

# Outline

- And Inverter Cone (AIC)
- Mapping AIG Subgraphs to AICs
- Technology Mapping
- AIC Clustering
- Experiments
- Conclusions

# **Conventional LUT Cluster**



## **AIC Cluster**



#### **AIC Cluster**



# Outline

- And Inverter Cone (AIC)
- Mapping AIG Subgraphs to AICs
- Technology Mapping
- AIC Clustering
- Experiments
- Conclusions

# Experiments

- Altera Stratix III as the reference architecture
  - LAB with 10 ALMs
- Area model
  - Transistor level
  - In terms of minimum-width transistors
- Delay model
  - SPICE simulation
  - Feedback for the mapping
- VPR-6 with AAPack

# Experiments

- Scenarios
  - Normal FPGA
    - LUT clusters (LAB)
  - Hybrid FPGA
    - Both LUT and AIC clusters
    - Fixed ratio of clusters (1:4)
  - AIC-based FPGA
    - Only AIC clusters
- MCNC Benchmarks



#### **Block Level**



#### Wire Delay Computation

| Logic Block | Intra-cluster Wires |  |  |
|-------------|---------------------|--|--|
| LUT         | 50%                 |  |  |
| 6-AIC       | 34%                 |  |  |
| LUT/6-AIC   | 35%                 |  |  |
| LUT/5-AIC   | 37%                 |  |  |
| LUT/4-AIC   | 38%                 |  |  |
| LUT/3-AIC   | 40%                 |  |  |

 $Delay(wire) = \frac{Delay(intra) \times Percentage(intra) + Delay(inter) \times Percentage(inter)}{Delay(wire)} = \frac{Delay(intra) \times Percentage(intra) + Delay(inter) \times Percentage(inter)}{Delay(inter) \times Percentage(intra)} + \frac{Delay(intra) \times Percentage(intra)}{Delay(inter) \times Percentage(intra)} + \frac{Delay(inter) \times Percentage(inter)}{Delay(inter) \times Percentage(intra)} + \frac{Delay(inter) \times Percentage(inter)}{Delay(inter) \times Percentage(inter)} + \frac{Delay(inter) \times Percentage(inter)}{D$ 2

#### Total Delay (Geometric Mean)



# Area (Clusters)



## Conclusions

- Post-synthesis inspired logic-block
  - AIC: maps arbitrary AIG subgraphs
- 32% saving in delay
  - Rough estimation of routing delay
- 16% area reduction
- Few design points explored!
  - Routing network tailored for AICs
  - Logic matching targeting AICs

# Thanks for your attention.

hadi.parandehafshar@epfl.ch



#### **Future Explorations**

• AICs as shadow Logic of FPGA Blocks

#### Shadow Logic





Shadow of DSP Block Shadow

## **Future Explorations**

- AICs as shadow logic of FPGA Blocks
- Optimized routing
  - Clustering
  - Cluster Bandwidth
  - Crossbar
- Area recovery during mapping

#### **AIC Cluster Example**



36

#### Input Crossbar Scenarios



#### Area of LUT and AIC Clusters

| Component                     | Area ( $Tr_{minW}$ ) |
|-------------------------------|----------------------|
| 6-AIC block                   | 1,512                |
| 6-AIC output Xbar             | 217                  |
| 6-AIC FFs and muxes           | $1,\!104$            |
| AIC cluster input Xbar        | 22,072               |
| AIC cluster out Xbar          | $2,\!660$            |
| AIC cluster buffers           | $1,\!447$            |
| AIC cluster with three 6-AICs | $34,\!678$           |
| ALM                           | 1,751                |
| LAB in Xbar                   | $16,\!251$           |
| LAB buffers                   | 470                  |
| LAB with ten ALMs             | 34,231               |

# **Delay Paths of AIC Cluster**

| $\mathbf{Path}$             | Description                | Delay (ps) |
|-----------------------------|----------------------------|------------|
| $\mathbf{A} \to \mathbf{B}$ | 6-AIC main output          | 496        |
| $\mathbf{B} \to \mathbf{C}$ | crossbar and FF-Mux        | 75         |
| $\mathrm{C} \to \mathrm{D}$ | output crossbar of cluster | 50         |

# Wire Length

| Benchmark               | $\mathbf{LUT}$ | LUT/  | LUT/  |
|-------------------------|----------------|-------|-------|
|                         |                | 5-AIC | 6-AIC |
| alu4                    | 14.9           | 10.59 | 11.32 |
| apex2                   | 16.4           | 15.2  | 12.9  |
| apex4                   | 15.5           | 16.1  | 14.1  |
| $\operatorname{bigkey}$ | 14.3           | 12.6  | 11.6  |
| $_{\rm clma}$           | 20.8           | 22.9  | 25.5  |
| $\operatorname{des}$    | 14.6           | 16.1  | 15.1  |
| diffeq                  | 10.4           | 13.4  | 13.8  |
| $\operatorname{dsip}$   | 18.6           | 17.4  | 12.5  |
| elliptic                | 15.5           | 16.6  | 16.7  |
| ex5p                    | 11.2           | 15.9  | 23.2  |
| ex1010                  | 23.8           | 18.2  | 30.3  |
| $\mathbf{frisc}$        | 18.8           | 19.35 | 23.2  |
| misex3                  | 14             | 12    | 13    |
| $\operatorname{pdc}$    | 22.8           | 23.4  | 21.2  |
| s298                    | 13.2           | 9.7   | 15.8  |
| s38417                  | 12.5           | 18.2  | 19    |
| s38584.1                | 11.5           | 18.4  | 17.5  |
| $\operatorname{seq}$    | 17.1           | 15.5  | 15.5  |
| $_{\rm spla}$           | 21.5           | 18.8  | 21.1  |
| $\operatorname{tseng}$  | 8.3            | 13.1  | 12.5  |