-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using an FPGA to check if proposed sorting networks are compliant #14
Comments
Thanks for sharing your thoughts about this. I think that indeed it makes a lot of sense to try to use FPGA(s) as accelerator.
The mode of operation would need to be chosen as some tradeoff between versatility and performance. |
Hi, Based on your description I agree, synthesizing generated networks would not make too much sense, except maybe for very large networks which already passed a few rounds of SW checks. So reconfigurable networks would probably be the way to go. I mentioned UART as the communication protocol, since it is the easiest to implement. PCIe, Ethernet, ... require a combination of hardware and software, in general they take more time to implement, maybe too much time for a hobby project. My answer is a bit short, since I lost most interest. My research was focused on using sorting networks as interconnect inside chips, but it seems they are not a good compromise for modern devices. I was only able to find one mention of a sorting network used in an old ATM network device (probably about 30 years old now). Even other non-blocking rearrangeable networks for unicast traffic like the Clos network seem to be in the past. A book mentioned a SONET chip using them internally, I could not find the documentation for the chip, only for the next device in the family and it did not seem to use Clos networks internally, but would still use them to connect multiple devices. But again SONET is also a thing of the past. Modern global networks use Ethernet and some server farms use Infiniband, those seem to use fat tree and supercomputer specific network topologies. In general it seems simple crossbars have replaced all "non-blocking rearrangeable networks for unicast traffic" inside chips, what remains are either simple crossbar switches or some other blocking structures which are easy to control (interconnected rings, ...). |
Hi,
I was thinking whether it would make sense to check the sorting network proposals generated by the evolutionary algorithm using an FPGA. I will try to explain what can be done on an FPGA and how much time it would take to perform the steps, so you could estimate if the process would provide a speedup compared to running it in SW, and what network size would be the tipping point where FPGA is faster than SW. I could do the estimation myself, but I would have to know, some estimates on how long it takes for the SW approach to check a proposed network.
n=64
might fit into an FPGA, depending on the FPGA size. After thinking about the network being very simple for ones and zeros input, I would update the estimate to 1000 networks fitting into an FPGA.n=32
network in 8.6 seconds, an=48
network in 6.5 days, ...The largest overhead is the slow conversion from generated HDL code to FPGA bitstream.
The FPGA can process all input combinations for small networks fast compared to SW, so using prefixes might not be needed. This would enable the evolutionary algorithm to cover a larger optimization space.
It might be possible to write efficient generic HDL code for a fixed
n
(and some max depth), so that new network proposals could be loaded into the FPGA over UART. This approach would allow skipping the slow FPGA synthesis step. I did not think about such generic code yet, so I am unsure how efficient it could be in therms of FPGA area and clock speed. It would probably make sense forn<32
.Going even further, for small
n
, the FPGA could just go through all possible routing networks, I am not going to calculate right now, how many would this be. As a drawback, the algorithm for generating network proposals would have to be rather simple, as implementing something like an evolutionary algorithm on FPGA would take time.The text was updated successfully, but these errors were encountered: