MPICH + CUDA #10

mkstoyanov · 2023-03-13T17:36:54Z

Some tests, e.g., long long fail when using mpich and CUDA-aware GPU.

The text was updated successfully, but these errors were encountered:

mkstoyanov · 2023-03-15T20:49:55Z

Some issues resolved in #11 but alltoall (no-v) still fails when using empty boxes.

The test is disabled, since it is a fringe use-case (subcomm implies few ranks, so p2p should work better).

Testing should be passing under mpich + CUDA-aware, but further investigation of the alltoall is needed.

ax3l · 2024-08-21T01:34:37Z

Thanks for testing this. We (WarpX & ImpactX) use GPU-aware MPI heavily on DOE Exascale machines, which are currently all HPE/Cray and thus MPICH. With the current releases, anything we should look out for?

We do R2C FW and C2R BW FFTs for 1D to 3D.

mkstoyanov · 2024-08-21T16:40:32Z

This should not affect you. The problem happens when we use alltoall (no-v) which means that we pad the MPI messages to the same size. There appears to be an MPI specific issue if the boxes are empty and we only pad (i.e., we push around fake data). I doubt it will affect you and it may be no-issue on newer installations of mpich. We found this in the version installed from apt on Ubuntu 22.04.

Other than that, check the Cray documentation about GPU-aware MPI. ROCm machines require special env and compiler flags to enable this, sometimes both compile time and runtime.

ax3l · 2024-08-23T17:04:24Z

Thank you for the summary!

Other than that, check the Cray documentation about GPU-aware MPI. ROCm machines require special env and compiler flags to enable this, sometimes both compile time and runtime.

Yes, that's correct. For Cray/HPE machines, we control/request it at compile time so we can activate it at runtime.

mkstoyanov self-assigned this Mar 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MPICH + CUDA #10

MPICH + CUDA #10

mkstoyanov commented Mar 13, 2023

mkstoyanov commented Mar 15, 2023

ax3l commented Aug 21, 2024 •

edited

Loading

mkstoyanov commented Aug 21, 2024

ax3l commented Aug 23, 2024

MPICH + CUDA #10

MPICH + CUDA #10

Comments

mkstoyanov commented Mar 13, 2023

mkstoyanov commented Mar 15, 2023

ax3l commented Aug 21, 2024 • edited Loading

mkstoyanov commented Aug 21, 2024

ax3l commented Aug 23, 2024

ax3l commented Aug 21, 2024 •

edited

Loading