Make 2D parallelization a run time choice #894

rengolin · 2024-02-29T23:59:44Z

Currently, we're selecting our optimal blocking on the command line, with default {2,8} that is optimal for 16 threads.

On our benchmarks, we pick the best one for each number of threads, but the compiler can't do that, as OpenMP's OMP_NUM_THREADS change at run time.

We need to lower code that can interpret that environment variable (via OpenMP dialect) and create a dynamic loop blocking based on run time values, so that we only need to generate the code once and it can run on any number of threads.

We also need to know which are the best factors for each number of threads (cost model, per arch) and have a generated dispatch table so that we can chose them at run time.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make 2D parallelization a run time choice #894

Make 2D parallelization a run time choice #894

rengolin commented Feb 29, 2024

Make 2D parallelization a run time choice #894

Make 2D parallelization a run time choice #894

Comments

rengolin commented Feb 29, 2024