SIMD contraint in ConvolutionInputGenerator_rtl for 1x1 kernels #895

hleblevec · 2023-09-22T14:18:36Z

hleblevec
Sep 22, 2023

I have been trying to use the RTL version of the ConvolutionInputGenerator with a network I'm working with. Notably, I am sometimes using convolutions with a 1x1 kernel and a stride of 2 to do downsampling, in the same manner as you can find in some ResNet blocks, which means I still need an Im2col even if I'm using a 1x1 kernel.
There seem to be a constraint set in the custom_op (https://github.com/Xilinx/finn/blob/main/src/finn/custom_op/fpgadataflow/convolutioninputgenerator_rtl.py#L680C1-L680C1), that in the case of a 1x1 kernel, the SIMD must be equal to the number of input channels. However, this constraint is not taken into account by the SetFolding transformation (https://github.com/Xilinx/finn/blob/main/src/finn/transformation/fpgadataflow/set_folding.py), which will try to optimize the SIMD parameter anyway, causing an error because the constraint is not set anymore.
Before looking into proposing a fix for this issue, I have a few questions:

Where does this constraints come from? What's the reason behind it?
The HLS version of InferConvInpGen will, in this situation, use a Downsampler node followed by the MVAU implementing a 1x1 convolution, which removes the necessity of a ConvolutionInputGenerator. Why did you choose not to do this when using the RTL option?

Hope you can clarify this.

fpjentzsch · 2023-10-16T13:33:26Z

fpjentzsch
Oct 16, 2023
Collaborator

Hi @hleblevec,

sorry for the late response. The reason behind this constraint is that the 1x1 kernel case is implemented using the "parallel" implementation style due to the simplicity of the operation. Implementing it with the "default" style would've required more complexity (i.e. edge case handling) in the addressing logic of the circular buffer. However, the "parallel" style always implements full SIMD unrolling. We could tackle this in multiple ways:

a) Take the constraint into account in the automatic folding, which is in urgent need of a rework anyways.
b) Let the HLSCustomOp disregard the SIMD setting in this particular case while still applying full SIMD parallelism on the backend.
c) Support this case in the "default" implementation style.

I would prefer a) because the hardware cost should boil down to a single SIMD-wide stream register (+DWCs) in this case, which is probably not a huge waste of resources. What values of maximum SIMD = #channels are we talking about for your use case? Is the resource consumption bothering you?

For your second question: the goal was to combine functionality of the many existing HLS functions into a single RTL implementation. I didn't want to introduce more fragmentation on the backend by handling cases like this separately. Do you have resource consumption figures of the Downsampler vs. RTL SWG in this use case?

1 reply

hleblevec Nov 7, 2023
Author

Hi @fpjentzsch,

Thanks for your answer. I went for solution a) locally for now as I also thought it was the best way to handle it for now.
@auphelia : would it make sense to do a hotfix, even if just temporary?

For the second question, I haven't done a comparison yet, I was just curious about the decision here, and also thought it could be linked with the issue. I could try do measure the difference in resources out of curiosity.

fpjentzsch · 2023-11-21T18:50:25Z

fpjentzsch
Nov 21, 2023
Collaborator

Hi @hleblevec, @maltanar, this should be fixed by #922, which lifts the SIMD = C constraint in the first place. Could you please check if this works for you?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SIMD contraint in ConvolutionInputGenerator_rtl for 1x1 kernels #895

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

SIMD contraint in ConvolutionInputGenerator_rtl for 1x1 kernels #895

hleblevec Sep 22, 2023

Replies: 2 comments · 1 reply

fpjentzsch Oct 16, 2023 Collaborator

hleblevec Nov 7, 2023 Author

fpjentzsch Nov 21, 2023 Collaborator

hleblevec
Sep 22, 2023

Replies: 2 comments 1 reply

fpjentzsch
Oct 16, 2023
Collaborator

hleblevec Nov 7, 2023
Author

fpjentzsch
Nov 21, 2023
Collaborator