Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make 2D parallelization a run time choice #894

Open
rengolin opened this issue Feb 29, 2024 · 0 comments
Open

Make 2D parallelization a run time choice #894

rengolin opened this issue Feb 29, 2024 · 0 comments

Comments

@rengolin
Copy link
Contributor

Currently, we're selecting our optimal blocking on the command line, with default {2,8} that is optimal for 16 threads.

On our benchmarks, we pick the best one for each number of threads, but the compiler can't do that, as OpenMP's OMP_NUM_THREADS change at run time.

We need to lower code that can interpret that environment variable (via OpenMP dialect) and create a dynamic loop blocking based on run time values, so that we only need to generate the code once and it can run on any number of threads.

We also need to know which are the best factors for each number of threads (cost model, per arch) and have a generated dispatch table so that we can chose them at run time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant