Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seemless softwere/hardwere computation interoperation #403

Open
TirushOne opened this issue Mar 7, 2024 · 1 comment
Open

Seemless softwere/hardwere computation interoperation #403

TirushOne opened this issue Mar 7, 2024 · 1 comment
Labels
C-feature-request Category: a feature request, i.e. not implemented / a PR

Comments

@TirushOne
Copy link

Hi

I am eager to contribute to this project and this might be something I contribute if it doesn't already exist.

The basic idea is that if I create a simd vector of length x and it contains elements of type y, then it should run using simd hardwere if the amount and type of registers is right, else it should run sequentially in a for loop over each element of the vector using traditional addition, multiplication, ect. This way, you can code things once using simd, then compile for whatever platform and know that if the cpu supports that kind of simd, it will run simd, and if not, your software will still work just a bit slower.

However this has compilactions. If you could know the cpu you are compiling for at compile time, this is a trivial issue of conditional compilation flags in the simd module. But this is of course not the case. So my question is can this be done with reasonible or no runtime performace cost? Because one option is at run time, when the program starts, to get the avalibilty of simd registers for that cpu, then evertime a simd vector operation is performed, you have a branch that checks if the global varible set at the start of the program, that indiates the amout and type of simd registers, and if there are enough, you go down the simd route, and if not, the non-simd route. But I can imagine this having 2 unanswered questions: how do you handle multi-cpu systems, and is the performace cost of a branch on every single simd operation worth it?

So how is this kind of issue solved in other programming laugauges? that seems like the first place to look anyway.

@TirushOne TirushOne added the C-feature-request Category: a feature request, i.e. not implemented / a PR label Mar 7, 2024
@programmerjake
Copy link
Member

a load and branch on every simd operation is waay too expensive.
instead, what is generally done is to have several different versions of the whole algorithm with different target features where you're using simd and then to pick which version to run, so inside each version it can run many simd operations without having to re-check which target-features to use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-request Category: a feature request, i.e. not implemented / a PR
Projects
None yet
Development

No branches or pull requests

2 participants