-
Notifications
You must be signed in to change notification settings - Fork 98
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
start of some docs describing the GPU flow (#2843)
- Loading branch information
Showing
3 changed files
with
78 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
********************* | ||
GPU Programming Model | ||
********************* | ||
|
||
CPUs and GPUs have separate memory, which means that working on both | ||
the host and device may involve managing the transfer of data between | ||
the memory on the host and that on the GPU. | ||
|
||
In Castro, the core design when running on GPUs is that all of the compute | ||
should be done on the GPU. | ||
|
||
When we compile with ``USE_CUDA=TRUE`` or ``USE_HIP=TRUE``, AMReX will allocate | ||
a pool of memory on the GPUs and all of the ``StateData`` will be stored there. | ||
As long as we then do all of the computation on the GPUs, then we don't need | ||
to manage any of the data movement manually. | ||
|
||
.. note:: | ||
|
||
We can tell AMReX to allocate the data using managed-memory by | ||
setting: | ||
|
||
:: | ||
|
||
amrex.the_arena_is_managed = 1 | ||
|
||
This is generally not needed. | ||
|
||
The programming model used throughout Castro is C++-lambda-capturing | ||
by value. We access the ``FArrayBox`` stored in the ``StateData`` | ||
``MultiFab`` by creating an ``Array4`` object. The ``Array4`` does | ||
not directly store a copy of the data, but instead has a pointer to | ||
the data in the ``FArrayBox``. When we capture the ``Array4`` by | ||
value in the GPU kernel, the GPU gets access to the pointer to the | ||
underlying data. | ||
|
||
|
||
Most AMReX functions will work on the data directly on the GPU (like | ||
``.setVal()``). | ||
|
||
In rare instances where we might need to operate on the data on the | ||
host, we can force a copy to the host, do the work, and then copy | ||
back. For an example, see the reduction done in ``Gravity.cpp``. | ||
|
||
.. note:: | ||
|
||
For a thorough discussion of how the AMReX GPU offloading works | ||
see :cite:`amrex-ecp`. | ||
|
||
|
||
Runtime parameters | ||
------------------ | ||
|
||
The main exception for all data being on the GPUs all the time are the | ||
runtime parameters. At the moment, these are allocated as managed | ||
memory and stored in global memory. This is simply to make it easier | ||
to read them in and initialize them on the CPU at runtime. | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters