-
Notifications
You must be signed in to change notification settings - Fork 0
GGP Build With CMake
Starting with version 0.9, GGP must be built using cmake. To build GGP with CMake you need to have cmake 3.15 (GGP or later for version 1.1, and 3.18 or later with the current develop. Try
cmake --version
to make sure you have cmake and your version is recent enough. If you do not have a CMake
on your system, please follow the following instructions, else skip to the Building GGP using CMake
section.
For multi-GPU builds with OpenMPI, we recommend using at least version 4.0.x, and compiling OpenMPI with a recent version of UCX (at least 1.10) for CUDA-aware MPI support.
You are likely going to build GGP on a remote machine with a module system. Try
module avail cmake
to see if the module loader has a CMake
option. If it does not have a CMake
module loaded, please ask the system administrator to add the module. In the meantime, you can download the source code here. Once you've gone through the build steps of CMake
, prepend your PATH
so that your environment can access the binaries.
It is recommend to build GGP in a separate folder (out-of_source). This has the advantage that you don't need to have different copies of the GGP source code on your disk to build separate configurations (e.g. for different GPU architectures) or need to trigger a full rebuild in your local GGP copy to build it for a different architecture. For example, suppose you have a machine with two GPU partitions. One has NVIDIA P100, and the other has NVIDIA V100. One can download one copy of the GGP source code (typically named ggp
) and then have two build directories (say, build_p100
and build_v100
). The advantage here is that when the source code is updated or modified, one need only change the source code once, then update each build as required.
After downloading GGP from github, create a build directory and cd into it (the name is arbitrary - here we use build
):
mkdir build
cd build
There are two methods one can use to build. The first is to use ccmake
:
ccmake ../generic-GPU-project
NOTE, for this to work, you may first need to run
cmake [-DGGP_TARGET_TYPE=<TARGET>] ../generic-GPU-project
and then, launch with ccmake
.
This will bring up a text based GUI for all the GGP CMake options. If you take this route, please take note that pressing the t
key in the GUI will bring up extra CMAKE options. This, at first, can seem a little daunting, but the majority of the options you see here are automatically populated. Options are grouped into two main parts: CMAKE options (revealed by hitting t
) and GGP options, each prepended accordingly. CMAKE options are more to do with HOW to build GGP, and GGP
options are more to do with WHAT parts to build.
The CMAKE options CMAKE_CUDA_HOST_COMPILER
, CMAKE_CXX_COMPILER
and CMAKE_C_COMPILER
dictate which host C++
and C
and compiler to use. If you want to use a specific compiler, you must set these manually.
After changing the options to your preferences, press c
to configure. As this will force CMake to find further tools / libraries (like locate mpi if you build using mpi). New variables may pop up here and may require you to run multiple times. As soon as the Press [g] to generate and exit
option is shown at the bottom of the screen you may use it and cmake will generate your configuration.
If using the text GUI is not to your liking, then you can configure GGP directly using cmake
. For example,
cmake ../generic-GPU-project -DGGP_MPI=ON
cmake .
This will configure GGP with the default options, except GGP_MPI
will be turned ON
. Make sure you used the correct architecture for your GPUs in first configuring step. Default architecture is sm_70
but you may want to specify different architectures such as -DGGP_GPU_ARCH=sm_60
for a Pascal GPU or -DGGP_GPU_ARCH=sm_80
for A100. The second cmake .
(and no other arguments) command is often required to ensure that all configuration is completed. Without this second step, some configuration may not be complete (this is equivalent to ccmake
requiring multiple configuration passes.
In either case, once GGP has been configured, you can build with
make -j N
where N is the number of available CPU cores, or alternatively just make -j
, when oversubscribe the CPU cores. This latter approach has typically the shortest time to compile.
The following are advanced options that can be specified directly to cmake
with -D
or can be set using ccmake
under advanced options:
-
GGP_PRECISION=n
- wheren
is 4-bit number that specifies which precisions we will enable (8 - double, 4 - single, 2 - half, 1 - quarter). Default value is 14, which corresponds to double/single/half enabled and quarter disabled. -
GGP_FAST_COMPILE_REDUCE=ON
** - where this option only compiles reduction kernels with block-size = 32, dramatically accelerating of the reduction kernels (reduce_ggp.cu, multi_reduce_ggp.cu, etc.) Additionally, the multi-blas kernels will not employ the warp-shfl optimization. This will affect performance, so should be used for fast debugging or development builds, hence the default value isOFF
. -
GGP_MAX_MULTI_BLAS_N=1
- disables some kernel fusion optimization for BLAS routines
** - signifies this option is post GGP 1.0
By default, GGP builds as a shared library and takes advantage of rpath to avoid needing to set LD_LIBRARY_PATH
in most cases. If, for some reason, you would prefer a static library build you can set GGP_BUILD_SHAREDLIB=OFF
. We do not recommend this because it creates a large spike in link time and binary sizes.
You can use Ninja instead of make to improve parallel builds by specifying it as cmake generator in the initial cmake run
cmake -GNinja ...
and then build using
ninja
or just use
cmake --build .
A further reduction of the overall build time can be achieved by using an alternative linker like LLVM's lld or mold. For using mold you can just use
mold -run ninja
While GGP can be build using clang as compiler this is still considered early and might not work for all possible options and the performance may not perform as expected!
The development version of GGP now supports building GGP with clang as CUDA compiler. This requires
- CMake >= 3.18
- Clang >= 10 and a compatible CUDA toolkit (see https://www.llvm.org/docs/CompileCudaWithLLVM.html for details)
To enable the use of clang as CUDA compiler execute the initial cmake call with the options
-DCMAKE_CUDA_COMPILER=clang++ -DCMAKE_CXX_COMPILER=clang++
You might need to specify the full path to clang++
and append a version number. If you need to specify a specific CUDA toolkit or have it installed in an uncommon location you can do that with
-DCUDAToolkit_ROOT=/some/path
Note: The CUDA Toolkit detection is done by FindCUDAToolkit and its documentation has more details on determining the CUDA Toolkit.