Heterogeneous CPU-FPGA systems are gaining momentum in the embedded systems sector and in the data center market. While the programming abstractions for implementing the data transfer between CPU and FPGA (and vice versa) that are available in today's commercial programming tools are well-suited for certain types of applications, the CPU-FPGA communication for applications that share complex pointer-based data structures between the CPU and FPGA remains difficult to implement.
This repository provides the infrastructure and building blocks to enable the programming abstraction of a virtual address space that is shared between the host CPU and one (or potentially several) FPGA devices. One example of shared virtual memory (SVM) is defined by the recent OpenCL 2.0 standard. SVM allows the software and hardware portion of a hybrid application to seamlessly (and concurrently) share complex data structures by simply passing a pointer, which can be dereferenced from both the CPU and the FPGA side and which greatly eases programming heterogeneous systems.
In order to provide researches a tool for experimenting with OpenCL SVM in the context of FPGAs, this repository contains a framework that automatically adds the physical infrastructure for SVM into a commercial OpenCL tool for FPGAs (targeting the Intel SDK for OpenCL and an Intel Cyclone V CPU-FPGA heterogeneous system). Please refer to the companion paper [1] for more information.
Among the three modes of OpenCL 2.0 SVM, Coarse-grain buffer SVM, Fine-grained buffer SVM and Fine-grained system SVM, this repository provides code for supporting the third mode, which has the highest degree of hardware abstraction, where the entire CPU host address space is shared directly with the FPGA. While the software/firmware stacks on top of Intel's Xeon+FPGA Multi-Chip Package and IBM's CAPI can support some SVM functionality, this repository gives researchers a low-cost tool to assess the performance impact of Fine-grained system SVM.
The companion paper to this repository explores the design space for these building blocks and studies the performance impact. It shows that, due to the ability of SVM-enabled implementations to avoid artificially sizing dynamic data structures and fetching data on-the-fly, up to 2x speed-up over an OpenCL design without SVM support can be achieved.
-
The code in this repository has been developed for the Intel Cyclone V SoC Development Kit [2]. This platform was chosen because it is a generally available, low-cost platform with hardware support for cache-coherent through-memory communication between CPU and FPGA. Other (including non-SoC such Intel's Xeon+FPGA multi-chip package) platforms are possible, but have not been tested and will likely require minor code modifications (planned for future work).
-
The code is compatible to and has been tested with the Intel FPGA SDK for OpenCL version 16.0.0.211 (pro not required).
-
The Cyclone V SoC Development Kit runs Linux (the OpenCL SDK for Cyclone V SoC comes with a Linux SD card image).
-
Set up Cyclone V Development Kit: Set up the OpenCL run-time environment on the Cyclone V SoC as described in [3]. After completion, the SoC runs Linux. The Intel FPGA SDK for OpenCL and the SoC Embedded Design Suite (required for cross-compiling the OpenCL host code for the SoC) have been installed on your workstation.
-
Download linux-socfpga sources: The SVM driver provided in this repository must be compiled against the Linux kernel on the board. Download the Linux kernel from https://github.com/altera-opensource/linux-socfpga and save it on your workstation.
-
Compile the SVM driver: Set the cross compiler for the SoC platform:
export CROSS_COMPILE=<path-to-SoC-Embedded-Design-Suite-installation>/ds-5/sw/gcc/bin/arm-linux-gnueabihf-
. Opensvm_common/svm_driver/Makefile
and setKDIR
to the path of the linux-socfpga sources. -
Build the custom RTL library for SVM: The SVM functionality at the hardware end is implemented in a custom RTL library, which is integrated into the OpenCL compilation flow. Ensure that
$ALTERAOCLSDKROOT
points to your Intel FPGA OpenCL installation andsource ./init_opencl_env.sh
to point to the correct board support package. Build the custom RTL library by running the scriptssvm_common/rtl_src/generate_aocl_interface.sh
andsvm_common/rtl_src/package_ip.sh
(in this order).
Once the setup is complete, the code examples (./examples
) provide information on how to use the framework. We provide three examples:
- filtering_algorithm: an optimized SVM-enabled implementation of the filtering algorithm [4] for K-means clustering
- filtering_algorithm_no_svm: an implementation of the filtering algorithm without SVM
- atomicity_test: a micro-benchmark to test the host-device lock service
The two implementations of the filtering algorithm can be used to reproduce the results presented in the companion paper [1].
Build and run filtering_algorithm:
-
Build the hardware: Change into
./examples/filtering_algorithm
. Ensure you have completed all setup steps from the previous section. Build the FPGA design by running the scripts./generate_system_files.sh
and./generate_hardware.sh
(in this order). The first script generates the RTL and QSYS design files, calls the SVM scripts in../../svm_common/scripts
and the custom RTL library in../../svm_common/rtl_src
and then stops the build flow. The second script continues the build flow with the manipulated RTL and QSYS sources. -
Build the host software: Include the ARM cross compiler in the $PATH environment:
export PATH=<path-to-SoC-Embedded-Design-Suite-installation>/ds-5/sw/gcc/bin:$PATH
. Runmake
. -
Run the example: Copy the files
bin/filter_stream_opt1.aocx
andbin/host
to the Cyclone V SoC (e.g. via SSH). Set the OpenCL run-time environment on the SoC and run./host
.
We are planning to integrate this framework with the software/firmware stacks on top of Intel's Xeon+FPGA Multi-Chip Package and IBM's CAPI.
Write to me: http://cas.ee.ic.ac.uk/people/fw1811
-
F. Winterstein and G. Constantinides: "Pass a Pointer: Exploring Shared Virtual Memory Abstractions in OpenCL Tools for FPGAs," in Proc. ICFPT 2017 http://cas.ee.ic.ac.uk/people/fw1811/papers/Felix_ICFPT17.pdf
-
Intel Corp, Cyclone® V SoC Development Kit, https://www.altera.com/products/boards_and_kits/dev-kits/altera/kit-cyclone-v-soc.html
-
Intel Corp, Altera SDK for OpenCL - Cyclone V SoC Getting Started Guide, UG-OCL006, 2016.05.02, https://www.altera.com/en_US/pdfs/literature/hb/opencl-sdk/aocl_c5soc_getting_started.pdf
-
F. Winterstein, S. Bayliss, and G. Constantinides, "High-level synthesis of dynamic data structures: a case study using Vivado HLS," in Proc. ICFPT 2013 http://cas.ee.ic.ac.uk/people/fw1811/papers/FelixFPT13.pdf
The source code is distributed under an Apache-2.0 license (see LICENSE). If you use it, please cite F. Winterstein and G. Constantinides: "Pass a Pointer: Exploring Shared Virtual Memory Abstractions in OpenCL Tools for FPGAs," Proceedings of the International Conference on Field Programmable Technology (ICFPT), 2017.