You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am eager to contribute to this project and this might be something I contribute if it doesn't already exist.
The basic idea is that if I create a simd vector of length x and it contains elements of type y, then it should run using simd hardwere if the amount and type of registers is right, else it should run sequentially in a for loop over each element of the vector using traditional addition, multiplication, ect. This way, you can code things once using simd, then compile for whatever platform and know that if the cpu supports that kind of simd, it will run simd, and if not, your software will still work just a bit slower.
However this has compilactions. If you could know the cpu you are compiling for at compile time, this is a trivial issue of conditional compilation flags in the simd module. But this is of course not the case. So my question is can this be done with reasonible or no runtime performace cost? Because one option is at run time, when the program starts, to get the avalibilty of simd registers for that cpu, then evertime a simd vector operation is performed, you have a branch that checks if the global varible set at the start of the program, that indiates the amout and type of simd registers, and if there are enough, you go down the simd route, and if not, the non-simd route. But I can imagine this having 2 unanswered questions: how do you handle multi-cpu systems, and is the performace cost of a branch on every single simd operation worth it?
So how is this kind of issue solved in other programming laugauges? that seems like the first place to look anyway.
The text was updated successfully, but these errors were encountered:
a load and branch on every simd operation is waay too expensive.
instead, what is generally done is to have several different versions of the whole algorithm with different target features where you're using simd and then to pick which version to run, so inside each version it can run many simd operations without having to re-check which target-features to use.
Hi
I am eager to contribute to this project and this might be something I contribute if it doesn't already exist.
The basic idea is that if I create a simd vector of length x and it contains elements of type y, then it should run using simd hardwere if the amount and type of registers is right, else it should run sequentially in a for loop over each element of the vector using traditional addition, multiplication, ect. This way, you can code things once using simd, then compile for whatever platform and know that if the cpu supports that kind of simd, it will run simd, and if not, your software will still work just a bit slower.
However this has compilactions. If you could know the cpu you are compiling for at compile time, this is a trivial issue of conditional compilation flags in the simd module. But this is of course not the case. So my question is can this be done with reasonible or no runtime performace cost? Because one option is at run time, when the program starts, to get the avalibilty of simd registers for that cpu, then evertime a simd vector operation is performed, you have a branch that checks if the global varible set at the start of the program, that indiates the amout and type of simd registers, and if there are enough, you go down the simd route, and if not, the non-simd route. But I can imagine this having 2 unanswered questions: how do you handle multi-cpu systems, and is the performace cost of a branch on every single simd operation worth it?
So how is this kind of issue solved in other programming laugauges? that seems like the first place to look anyway.
The text was updated successfully, but these errors were encountered: