All verilog, C/C++ and python codes used for "EE5332: Mapping digital signal processing algorithms to hardware" course at IIT Madras, along with their usecase and summary is available. The course content (reading material, videos, etc.) can be found here.
- Read multiple papers on efficient FFT implementation and finally followed the approach in this paper.
- The above proposes novel systematic approaches for parallel & pipelined (retimed) architectures using folding transformations for computation of complex and real FFT, based on radix-2n algorithms, for better throughput & reductions in total area.
- Find a short review of that paper written by me here.