-
Notifications
You must be signed in to change notification settings - Fork 0
GGP Environment Variables
cpviolator edited this page Aug 23, 2024
·
2 revisions
Below is a list of GGP specific environment variables. We also include a list of useful CUDA specific variables at the bottom of the page.
Variable name | Function |
---|---|
GGP_RESOURCE_PATH |
Path where tune cache and profile files will be output |
GGP_PROFILE_OUTPUT_BASE |
Filename prefix for profile output. Setting this will result in the files $(GGP_PROFILE_OUTPUT_BASE).tsv and $(GGP_PROFILE_OUTPUT_BASE_async).tsv and being written out (default is simply profile.tsv and profile_async.tsv ) |
GGP_ENABLE_P2P |
GGP_ENABLE_P2P=0 # disable all p2p transfersGGP_ENABLE_P2P=1 # enable only copy enginesGGP_ENABLE_P2P=2 # enable only remote writingGGP_ENABLE_P2P=3 # enable both copy engines and remote writingDefault is enabling copy engines and remote writing (3) |
GGP_ENABLE_P2P_MAX_ACCESS_RANK |
Set limit on which GPUs are peer-to-peer connected (use to disable low-bandwidth connections), e.g., GGP_ENABLE_P2P_MAX_ACCESS_RANK=0 would limit to only highest bandwidth connections |
GGP_ENABLE_TUNING |
Enable / disable kernel autotuning. Default is enabled, disable with GGP_ENABLE_TUNING=0
|
GGP_REORDER_LOCATION |
Set where data should be reordered when transferring CPU<->GPU (default is GPU) |
GGP_RANK_VERBOSITY |
Set which global ranks are active in printfQuda calls (default is rank 0) |
GGP_ENABLE_DEVICE_MEMORY_POOL |
Enable / disable device memory allocator (default is enabled, disable with GGP_ENABLE_DEVICE_MEMORY_POOL=0
|
GGP_ENABLE_PINNED_MEMORY_POOL |
Enable / disable device memory allocator (default is enabled, disable with GGP_ENABLE_PINNED_MEMORY_POOL=0
|
GGP_ENABLE_MANAGED_MEMORY |
Enable / disable using managed memory for allocations (default is disabled, enable with GGP_ENABLE_MANAGED_MEMORY=1 ). Note: managed memory has some limitations for pre-Pascal architectures. |
GGP_ENABLE_MANAGED_PREFETCH |
Enable / disable explicit managed memory prefetching calls (default is disabled, enable with GGP_ENABLE_MANAGED_PREFETCH=1 ). Does nothing if GGP_ENABLE_MANAGED_MEMORY isn't enabled. |
GGP_ENABLE_NUMA |
Enabled NUMA placement. Default is enabled, if NUMA has been enabled in cmake, disabled with GGP_ENABLE_NUMA=0
|
GGP_ENABLE_GDR |
Enable GPU-Direct RDMA. Default is disabled, enabled with GGP_ENABLE_GDR=1
|
GGP_ENABLE_ZERO_COPY |
Enable zero-copy policies (can be beneficial on systems without performant GDR). Default is disabled, enabled with GGP_ENABLE_ZERO_COPY=1
|
GGP_ENABLE_NVSHMEM |
Enable NVSHMEM communication policies if GGP is build with NVSHMEM support. Default is enabled, set to 0 to disable. |
GGP_ENABLE_MPS |
Enable support for MPS in GGP. Generally not recommended except for testing purposes. Default is disabled, enable with GGP_ENABLE_MPS=1
|
GGP_DEVICE_RESET |
Call cudaDeviceReset in endQuda - this legacy behavior can be useful for profiling, but destroys the CUDA context of other CUDA libraries outside of GGP (e.g., GPU-aware MPI). Default is disabled, enable with GGP_DEVICE_RESET=1
|
GGP_DETERMINISTIC_REDUCE |
Perform all MPI reductions deterministically: setting this flag means that post-tuning or no tuning, GGP will run completely deterministically regardless of the rank order. Default is disabled, enable with GGP_DETERMINISTIC_REDUCE=1
|
GGP_TUNE_VERSION_CHECK |
Set GGP_TUNE_VERSION_CHECK=0 to disable the check that prevents using a tunecache.tsv file from a different GGP version |
GGP_ENABLE_TUNING_SHARED |
Disable shared memory autotuning. Useful for checking the effect of this. |
GGP_TUNING_RANK |
Set the global default rank for doing kernel autotuning (default is rank 0) |
GGP_MAX_MULTI_RHS |
Set the maximum number of RHS per kernel. Default is 64 with large kernel arguments, and 16 otherwise. |
GGP_ENABLE_MONITOR |
Set GGP_ENABLE_MONITOR=1 to enable device monitoring during execution. Monitor log dumped to the GGP_RESOURCE_PATH upon endQuda being called. |
GGP_ENABLE_MONITOR_PERIOD |
Set the monitor period in microseconds (default is 1000 microseconds = 1 millisecond) |
CUDA environment variables
Variable name | Function |
---|---|
CUDA_LAUNCH_BLOCKING |
If set to =0 (default behaviour) this will ensure that all kernels are launched synchronously. If set to =1, kernels are launched asynchronously. This will ensure that that error messages pertain to precisely the last kernel called |