Question About CU Selection Logic. #73
Replies: 1 comment
-
Yes, we use round-robin while selecting CU for WG. We have researched some complex thread-block scheduling strategies, but we currently have no plans to apply them in practice. Perhaps we will work on this in the future. Of course, some less complex strategies are relatively easy to implement, such as BFS and DFS (see this). There is evidence suggesting that Nvidia uses similar strategies. For the second question, In most cases, CTA scheduler is not the throughput bottleneck of the whole GPGPU, so a fully parallel structure is not used to reduce hardware area cost. |
Beta Was this translation helpful? Give feedback.
-
When we assign a CU to a workgroup, do we use a simple round-rebin algorithm. The code look like this. Have we considered other prioritization algorithms?
And why choose the serial scanning method? As the CU increases, the performance overhead of this part will become very large.
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions