Binary Search Thresholding #1019

azizb-xlnx · 2024-03-28T15:57:43Z

azizb-xlnx
Mar 28, 2024
Collaborator

Binary Search Thresholding Vs Classic Thresholding

The classic thresholding module implements a basic threshold comparison of the input value against a set of pre-defined threshold values. The total number of thresholds equals 2^k-1 with k being the number of output bits. This results in the exponential growth of the required threshold memory bandwidth, which harshly limits the scalability of this approach. While it works well for binary or 4-bit outputs with very few quantization levels, the cost for wider datatypes quickly turns painful.

Binary search thresholding (BST) is an updated approach that takes advantage of the binary search algorithm to perform the thresholding process more efficiently and overcome the limitations of the classic thresholding approach. With BST, the required memory bandwidth only grows linearly with output precision, enabling the economic support for wider output precisions. This BST update is also implemented in RTL, eliminating the need for HLS and its limitations altogether. Most notably, this eliminates the very long HLS synthesis times for larger thresholding instantiations from the implementation process.

The table below shows the comparison between the resource usage of the classic thresholding approach and the new RTL BST approach for a 32 channel thresholding module with UINT16 input and INT8 output:

Style	LUT	BRAM	Time to stitched IP (Minutes)
HLS Classic	23438	0	120
RTL BST	3272	64	4

As can be seen from the table, the resource saving in LUT count and time to synthesize the IP is massive when using the BST approach. While the HLS approach uses 23438 LUTs and takes over 120 minutes to synthesize, the BST approach only uses 3272 LUTs and takes a mere 4 minutes to create an IP.
These improved efficiencies means that we can handle higher-precision outputs while using a fraction of the resources of previous approach.

BST memory configuration

The RTL BST implementation is heavily pipelined and instantiates local memories in the individual pipeline stages. Depth trigger parameters allow to define which memory type to use for what memory size. The depth of a memory is defined by the number of thresholds it holds. The Thresholding RTL customOP class offers a helper function get_pe_mem_geometries to determine memory geometry, which can be used to determine trigger values. Additionally, get_memory_estimate helper function can be used to estimate the memory usage for BST based on the trigger parameters.

The triggers are set using the folding configuration JSON file as shown in the following example, leaving both triggers to 0 will let the synthesis tool decide how to implement the RAM.

  "Thresholding_rtl_0": {
    "PE": 32,
    "runtime_writeable_weights": 0,
    "depth_trigger_uram": 512,
    "depth_trigger_bram": 128
  },

Switching between BST and Classic

FINN underwent major restructuring in v0.10 release, see #1020 and PR #928. A new transformation is introduced to allow selection of HLS or RTL variants of all customOps, RTL BST is selected by default for thresholding.
To switch to classic HLS variant, simply set preferred_impl_style of the thresholding layer in specialize_layers_config file to hls.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Binary Search Thresholding #1019

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Binary Search Thresholding #1019

azizb-xlnx Mar 28, 2024 Collaborator

Binary Search Thresholding Vs Classic Thresholding

BST memory configuration

Switching between BST and Classic

Replies: 0 comments

azizb-xlnx
Mar 28, 2024
Collaborator