

# FPGAs in the Data Center

FPGA 2014 Workshop

P.K.Gupta Intel Data Center Group 2014-02-26 p.k.gupta@intel.com

# **Accelerators Motivation**

- Enhanced Performance: Accelerators compliment CPU cores to meet market needs for performance of diverse workloads in the Data Center:
  - Enhance single thread performance with tightly coupled accelerators or compliment multi-core performance with loosely coupled accelerators via PCIe or QPI attach
- Move to Heterogeneous Computing: Moore's Law continues but demands radical changes in architecture and software.
  - Architectures will go beyond homogeneous parallelism, embrace heterogeneity, and exploit the bounty of transistors to incorporate application-customized hardware.



#### **FPGA Market**

- Total FPGA Market in 2014 : ~\$5.1B
- Data Processing: \$304m ~6% of total

|                          | 2011  | 2012  | 2013  | 2014  | 2015  | 2016  | 2017  |
|--------------------------|-------|-------|-------|-------|-------|-------|-------|
| Communications           | 2,253 | 2,065 | 2,158 | 2,339 | 2,565 | 2,790 | 2,986 |
| Consumer                 | 406   | 406   | 378   | 456   | 519   | 598   | 660   |
| Data Processing          | 359   | 260   | 270   | 304   | 352   | 387   | 418   |
| Automotive               | 154   | 148   | 164   | 185   | 229   | 285   | 361   |
| Industrial               | 1,033 | 976   | 1,000 | 1,158 | 1,311 | 1,462 | 1,584 |
| Military/Civil Aerospace | 615   | 571   | 596   | 664   | 735   | 802   | 867   |
| Total FPGA/PLD           | 4,820 | 4,426 | 4,566 | 5,105 | 5,710 | 6,324 | 6,877 |

Estimated Worldwide FPGA/PLD Consumption by Application Market, 2011-2017 (Millions of Dollars)

• Servers : \$71m

#### Source : Gartner



# **Which Data Center Applications?**

#### Berkeley's 13 Dwarfs

| Category              | Examples                                       | Data Center ? |
|-----------------------|------------------------------------------------|---------------|
| Dense Linear Algebra  | Gaussian Elimination, K-means                  |               |
| Sparse Linear Algebra | Finite Element Analysis, PDE                   |               |
| Spectral Methods      | FFT                                            |               |
| N-Body Methods        | Molecular Dynamics                             |               |
| Structured Grids      | Image Processing, Physics                      |               |
| Unstructured Grids    | Computational Fluid Dynamics                   |               |
| Map Reduce            | Distributed Searching, Monte Carlo Simulations |               |
| Combinational Logic   | CRC, Checksums, AES, Hashing,                  |               |
| Graph Traversal       | Search, Sort                                   |               |
| Dynamic Programming   | Genome string matching                         |               |
| Graphical Models      | Neural Networks, HMM, Viterbi                  |               |
| Backtracking / B&B    | Integer Linear Programming                     |               |
| Finite State Machines | Video codecs, Data Mining                      |               |



### **Accelerator Architecture**



Performance Efficiency: Performance/Watt, Performance/\$ Programming Complexity : Effort, Cost

### **Accelerator Attach**



Best attach technology might be application or even algorithm dependent



### **Programming Model**

- Data Movement
  - In-line
    - Accelerator processes data fully or partially from direct I/O
  - Shared Virtual Memory :
    - Virtual addressing eliminates need for pinning memory buffers
    - Zero-copy data buffers
- Interaction between Core and Accelerator
  - Off-load
  - Hybrid : algorithm implemented on host and accelerator

SVM and Hybrid processing enabled with coherency



# Intel<sup>®</sup> QuickAssist Technology – Comprehensive Approach to Acceleration



- Multiple accelerator and attach options with software and ecosystem support
- Performance and scalability based on customer needs and priorities

#### Includes Xeon + FPGA platforms for acceleration of workloads in the Data Center



# **QPI-FPGA Romley-EP 2S Platform**





#### **Platform Details**

| Server       | Canoe Pass (Production Platform from EPSD)                                         |
|--------------|------------------------------------------------------------------------------------|
| ΙΑ           | Sandy Bridge-EP (E5-2600)<br>Ivy Bridge-EP (E5 2600 v2)                            |
| Chipset      | Patsburg                                                                           |
| Interconnect | <b>QPI 1.1 @ 6.4 GT/s full width</b> (target 8.0 GT/s at full width)               |
| FPGA Module  | Altera : Stratix V<br>Xilinx : Virtex 7                                            |
| Features     | Config Agent, Caching Agent, Home Agent,<br>Memory Controller                      |
| Availability | SNB-EP :, IVB-EP                                                                   |
| Optional     | <ul> <li>Ethernet 10G port to FPGA</li> <li>PCIe connection to Socket R</li> </ul> |
|              | (Intel/                                                                            |

# Intel<sup>®</sup> QuickAssist Technology – FPGA Reference Stack



Published interfaces (AAL on CPU and CCI on FPGA) provide portability of applications across platforms and technologies.



### Intel® Commitment to Intel® QuickAssist Technology

Enabling of FPGA Vendors to support QPI attach :

- QPI/KTI Reference RTL:
  - Enable FPGA vendors (Xilinx, Altera, ...) to implement QPI/KTI PHY
  - Providing validated QPI RTL for Link and Protocol layers for integration with PHY
- Software and Applications:
  - Providing a software layer (AAL Accelerator Abstraction Layer) to the FPGA accelerator vendors to enable ease of migration and protect the software investment of end users.
  - Sample RTL and SW applications
- Verification Environment
  - Complete OVM based verification environment
- Simulation Environment
  - VCS based simulator for development of SW and RTL
- Validation Environment
  - In-socket QPI-FPGA modules with expansion boards



#### **Summary**

- Xeon+FPGA platforms available today for Data Center applications.
- Continue to enhance the performance and programmability of the platforms





