Replies: 2 comments 1 reply
-
Hi @hleblevec, sorry for the late response. The reason behind this constraint is that the 1x1 kernel case is implemented using the "parallel" implementation style due to the simplicity of the operation. Implementing it with the "default" style would've required more complexity (i.e. edge case handling) in the addressing logic of the circular buffer. However, the "parallel" style always implements full SIMD unrolling. We could tackle this in multiple ways: a) Take the constraint into account in the automatic folding, which is in urgent need of a rework anyways. I would prefer a) because the hardware cost should boil down to a single SIMD-wide stream register (+DWCs) in this case, which is probably not a huge waste of resources. What values of maximum SIMD = #channels are we talking about for your use case? Is the resource consumption bothering you? For your second question: the goal was to combine functionality of the many existing HLS functions into a single RTL implementation. I didn't want to introduce more fragmentation on the backend by handling cases like this separately. Do you have resource consumption figures of the Downsampler vs. RTL SWG in this use case? |
Beta Was this translation helpful? Give feedback.
-
Hi @hleblevec, @maltanar, this should be fixed by #922, which lifts the SIMD = C constraint in the first place. Could you please check if this works for you? |
Beta Was this translation helpful? Give feedback.
-
Hi @fpjentzsch,
I have been trying to use the RTL version of the ConvolutionInputGenerator with a network I'm working with. Notably, I am sometimes using convolutions with a 1x1 kernel and a stride of 2 to do downsampling, in the same manner as you can find in some ResNet blocks, which means I still need an Im2col even if I'm using a 1x1 kernel.
There seem to be a constraint set in the custom_op (https://github.com/Xilinx/finn/blob/main/src/finn/custom_op/fpgadataflow/convolutioninputgenerator_rtl.py#L680C1-L680C1), that in the case of a 1x1 kernel, the SIMD must be equal to the number of input channels. However, this constraint is not taken into account by the SetFolding transformation (https://github.com/Xilinx/finn/blob/main/src/finn/transformation/fpgadataflow/set_folding.py), which will try to optimize the SIMD parameter anyway, causing an error because the constraint is not set anymore.
Before looking into proposing a fix for this issue, I have a few questions:
Hope you can clarify this.
Beta Was this translation helpful? Give feedback.
All reactions