Binary Search Thresholding #1019
azizb-xlnx
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Binary Search Thresholding Vs Classic Thresholding
The classic thresholding module implements a basic threshold comparison of the input value against a set of pre-defined threshold values. The total number of thresholds equals 2^k-1 with k being the number of output bits. This results in the exponential growth of the required threshold memory bandwidth, which harshly limits the scalability of this approach. While it works well for binary or 4-bit outputs with very few quantization levels, the cost for wider datatypes quickly turns painful.
Binary search thresholding (BST) is an updated approach that takes advantage of the binary search algorithm to perform the thresholding process more efficiently and overcome the limitations of the classic thresholding approach. With BST, the required memory bandwidth only grows linearly with output precision, enabling the economic support for wider output precisions. This BST update is also implemented in RTL, eliminating the need for HLS and its limitations altogether. Most notably, this eliminates the very long HLS synthesis times for larger thresholding instantiations from the implementation process.
The table below shows the comparison between the resource usage of the classic thresholding approach and the new RTL BST approach for a 32 channel thresholding module with UINT16 input and INT8 output:
As can be seen from the table, the resource saving in LUT count and time to synthesize the IP is massive when using the BST approach. While the HLS approach uses 23438 LUTs and takes over 120 minutes to synthesize, the BST approach only uses 3272 LUTs and takes a mere 4 minutes to create an IP.
These improved efficiencies means that we can handle higher-precision outputs while using a fraction of the resources of previous approach.
BST memory configuration
The RTL BST implementation is heavily pipelined and instantiates local memories in the individual pipeline stages. Depth trigger parameters allow to define which memory type to use for what memory size. The depth of a memory is defined by the number of thresholds it holds. The Thresholding RTL customOP class offers a helper function get_pe_mem_geometries to determine memory geometry, which can be used to determine trigger values. Additionally, get_memory_estimate helper function can be used to estimate the memory usage for BST based on the trigger parameters.
The triggers are set using the folding configuration JSON file as shown in the following example, leaving both triggers to 0 will let the synthesis tool decide how to implement the RAM.
Switching between BST and Classic
FINN underwent major restructuring in v0.10 release, see #1020 and PR #928. A new transformation is introduced to allow selection of HLS or RTL variants of all customOps, RTL BST is selected by default for thresholding.
To switch to classic HLS variant, simply set
preferred_impl_style
of the thresholding layer inspecialize_layers_config
file tohls
.Beta Was this translation helpful? Give feedback.
All reactions