-
Notifications
You must be signed in to change notification settings - Fork 6
ETMC-QUDA/quda-QKXTM-Multigrid-PlugIn
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
ETMC-QUDA production code base 07th July 2017 ------------------------------------------------------------------------ This library is an extension to QUDA feature/multigrid This plug-in contains routines to calculate 2- and 3-point hadronic correlation functions, and an All-to-All propagator with a gauge covariant derivative, with a variety of correlation functions. This particular library is a multigrid enabled version of ETMC's deflated solver code base, currently hosted here: https://github.com/ckallidonis/quda/tree/eig-solver-fullOp-0.8.0 Dependencies: In order to compile this code, one must also have the following installed: REQUIRED: hdf5 lime gsl OPTIONAL: Arpack The QKXTM boolean is ON by default and will expect the REQUIRED for dependencies for successful compilation. The OPTIONAL Arpack dependency is needed only for those routines involving deflation. Compilation: One must have the CONTRACT boolean ON in order to ensure that the covariant derivative functions in QUDA are compiled. The MULTIGRID boolean must be switched ON. One must pass to the plug-in the mainline QUDA source code directory and it will recompile all the kernels as specified by your plug-in CMake configuration. This lightweight version of quda-QKXTM-Multigrid is volatile in that succesful compilatin and execution of the code is depenedent on the stability of the QUDA mainline source code against which you compile. One can use Intel compilers by overriding CMake's choice of the `CUDA_HOST_COMPILER` variable. Simply set this explicily to g++ after you have properly configured via CMake. For safe results, please use quda-QKXTM-Multigrid which has been thoroughly tested. For more recent QUDA builds, use this plug-in with caution. LATEST SUCCESSFUL COMPILATION: 07 July 2017 If you have compilation issues, please @cpviolator Overview: The QKXTM (Quda Kepler and Twisted Mass) calculation routines are appended to the interface_quda.cpp file of mainline QUDA, and renamed to qudaQKXTM_interface.cpp. This is so we can use the internal gauge field structures directly, though this should be resolved so that QKXTM files can be compiled separately. The class definitions and header files of auxiliary functions are contained in files prepended like so, qudaQKXTM_<some file>.suffix There is a single .cu file for kernels, a 'code_pieces' directory in lib/, a single qudaQKXTM_utils.cpp and .h file for containers and general functions, and a suite of qudaQKXTM_<CLASS>.cpp files under a single header qudaQKXTM.h. For clarity, we also have a file, qudaQKXTM_Loops.cpp that contains calculation and data write functions specific to All-to-All calculatons. These qudaQKXTM_<CLASS>.cpp (and Loops) files are NOT individually compiled, rather included in qudaQKXTM_interface.cpp as, ... [end interface_quda.cpp code] #include <qudaQKXTM_Field.cpp ... #include <qudaQKXTM_Deflation.cpp ... [begin QKXTM calculation functions] The Loop routines need only a single flavour operator. The 2pt and 3pt functions require both UP and DOWN flavour types. QUDA does not yey support multiple MG preconditioners, but this can be fixed with three minor additions: First, one must add to quda/include/quda.h two extra preconditioner pointers: void *preconditioner; -> void *preconditioner; void *preconditionerUP; void *preconditionerDN; Then, adjust the quda/lib/check_params.h file: P(preconditioner, 0); -> P(preconditioner, 0); P(preconditionerUP, 0); P(preconditionerDN, 0); and finally the quda/lib/quda_fortran.F90 file: integer(8) :: preconditioner -> integer(8) :: preconditioner integer(8) :: preconditionerUP integer(8) :: preconditionerDN This will allow for sucessful compilation of the 2 flavour routines, set by the CMake Flag QKXTM_2FLAVMG. In order to QUDA's tunecache, you must (at present) edit out line 264 of QUDA's tune.cpp file https://github.com/lattice/quda/blob/feature/multigrid/lib/tune.cpp#L264 This is because QUDA has strict checks on versions, and this library has a diffrent git hash to QUDA. Known Issues: One must pass gauge fields to the QKXTM executables in IDLG lime format. It expects a separate smeared gauge field for source construction in the hadronic 2pt3pt functions. QUDA has gauge smearing functionality, which should be utilised. It would be desirable to completely separate out the QKXTM files from QUDA so that this plug-in can be installed alongside any suitable compiled copy of QUDA, without having to recompile all the .o files. This would require some CMake code to identify which .o files from mainline QUDA may be reused, and which must be recompiled. Authors: The majority of the QKXTM code base was authored by, Kyriakos Hadjiyiannakou Christos Kallidonis The multigrid upgrades, plug-in functionality, and QUDA 0.9.0 compatibility were done by Dean Howarth Bug reports, suggestions, comments, and queries are welcome. QUDA's mainline README follows. Release Notes for QUDA v0.8.0 1st February 2016 ----------------------------- Overview: QUDA is a library for performing calculations in lattice QCD on graphics processing units (GPUs), leveraging NVIDIA's CUDA platform. The current release includes optimized Dirac operators and solvers for the following fermion actions: * Wilson * Clover-improved Wilson * Twisted mass (including non-degenerate pairs) * Twisted mass with a clover term * Staggered fermions * Improved staggered (asqtad or HISQ) * Domain wall (4-d or 5-d preconditioned) * Mobius fermion Implementations of CG, multi-shift CG, BiCGstab, and DD-preconditioned GCR are provided, including robust mixed-precision variants supporting combinations of double, single, and half (16-bit "block floating point") precision. The library also includes auxilliary routines necessary for Hybrid Monte Carlo, such as HISQ link fattening, force terms and clover-field construction. Use of many GPUs in parallel is supported throughout, with communication handled by QMP or MPI. Software Compatibility: The library has been tested under Linux (CentOS 5.8 and Ubuntu 14.04) using releases 6.5, 7.0 and 7.5 of the CUDA toolkit. CUDA 6.0 and earlier are not supported (though they may continue to work fine). The library also works on recent 64-bit Intel-based Macs. Due to issues with compilation using LLVM, under Mac OS X 10.9.x it is required to install and use GCC instead of the default clang compiler, though this is unnecesary with Mac OS X 10.10.x. See also "Known Issues" below. Hardware Compatibility: For a list of supported devices, see http://developer.nvidia.com/cuda-gpus Before building the library, you should determine the "compute capability" of your card, either from NVIDIA's documentation or by running the deviceQuery example in the CUDA SDK, and pass the appropriate value to QUDA's configure script. For example, the Tesla C2075 is listed on the above website as having compute capability 2.0, and so to configure the library for this card, you'd run "configure --enable-gpu-arch=sm_20 [other options]" before typing "make". As of QUDA 0.8.0, only devices of compute capability 2.0 or greater are supported. See also "Known Issues" below. Installation: The recommended method for compiling QUDA is to use cmake, and build QUDA in a separate directory from the source directory. For instructions on how to build QUDA using cmake see this page https://github.com/lattice/quda/wiki/Building-QUDA-with-cmake. Alternatively, QUDA can also be built using "configure" and "make", though this build approach is considered deprecated and will be removed in a subsequent QUDA release. See "./configure --help" for a list of configure options. At a minimum, you'll probably want to set the GPU architecture; see "Hardware Compatibility" above. Enabling multi-GPU support requires passing the --enable-multi-gpu flag to configure, as well as --with-mpi=<PATH> and optionally --with-qmp=<PATH>. If the latter is given, QUDA will use QMP for communications; otherwise, MPI will be called directly. By default, it is assumed that the MPI compiler wrappers are <MPI_PATH>/bin/mpicc and <MPI_PATH>/bin/mpicxx for C and C++, respectively. These choices may be overriden by setting the CC and CXX variables on the command line as follows: ./configure --enable-multi-gpu --with-mpi=<MPI_PATH> \ [--with-qmp=<QMP_PATH>] [OTHER_OPTIONS] CC=my_mpicc CXX=my_mpicxx Finally, with some MPI implementations, executables compiled against MPI will not run without "mpirun". This has the side effect of causing the configure script to believe that the compiler is failing to produce a valid executable. To skip these checks, one can trick configure into thinking that it's cross-compiling by setting the --build=none and --host=<HOST> flags. For the latter, "--host=x86_64-linux-gnu" should work on a 64-bit linux system. By default only the QDP and MILC interfaces are enabled. For interfacing support with QDPJIT, BQCD or CPS; this should be enabled at configure time with the appropriate flag, e.g., --enable-bqcd-interface. To keep compilation time to a minimum it is recommended to only enable those interfaces that are used by a given application. The QDP and MILC interfaces can be disabled with the, e.g., --disable-milc-interface flag. The eigen-vector solvers (eigCG and incremental eigCG) require the installation of the MAGMA dense linear algebra package. It is recommended that MAGMA 1.7.x is used, though versions are 1.5.x and 1.6.x should work. MAGMA is available from http://icl.cs.utk.edu/magma/index.html. MAGMA is enabled using the configure option --with-magma=MAGMA_PATH. If Fortran interface support is desired, the F90 environment variable should be set when configure is invoked, and "make fortran" must be run explicitly, since the Fortran interface modules are not built by default. As examples, the scripts "configure.milc.titan" and "configure.chroma.titan" are provided. These configure QUDA for expected use with MILC and Chroma, respectively, on Titan (the Tesla K20X-powered Cray XK7 supercomputer at the Oak Ridge Leadership Computing Facility). Throughout the library, auto-tuning is used to select optimal launch parameters for most performance-critical kernels. This tuning process takes some time and will generally slow things down the first time a given kernel is called during a run. To avoid this one-time overhead in subsequent runs (using the same action, solver, lattice volume, etc.), the optimal parameters are cached to disk. For this to work, the QUDA_RESOURCE_PATH environment variable must be set, pointing to a writeable directory. Note that since the tuned parameters are hardware-specific, this "resource directory" should not be shared between jobs running on different systems (e.g., two clusters with different GPUs installed). Attempting to use parameters tuned for one card on a different card may lead to unexpected errors. This autotuning information can also be used to build up a first-order kernel profile: since the autotuner measures how long a kernel takes to run, if we simply keep track of the number of kernel calls, from the product of these two quantities we have a time profile of a given job run. If QUDA_RESOURCE_PATH is set, then this profiling information is output to the file "profile.tsv" in this specified directory. Optionally, the output filename can be specified using the QUDA_PROFILE_OUTPUT environment variable, to avoid overwriting previously generated profile outputs. Using the Library: Include the header file include/quda.h in your application, link against lib/libquda.a, and study tests/invert_test.cpp (for Wilson, clover, twisted-mass, or domain wall fermions) or tests/staggered_invert_test.cpp (for asqtad/HISQ fermions) for examples of the solver interface. The various solver options are enumerated in include/enum_quda.h. Known Issues: * For compatibility with CUDA, on 32-bit platforms the library is compiled with the GCC option -malign-double. This differs from the GCC default and may affect the alignment of various structures, notably those of type QudaGaugeParam and QudaInvertParam, defined in quda.h. Therefore, any code to be linked against QUDA should also be compiled with this option. * When the auto-tuner is active in a multi-GPU run it may cause issues with binary reproducibility of this run. This is caused by the possibility of different launch configurations being used on different GPUs in the tuning run. If binary reproducibility is strictly required make sure that a run with active tuning has completed. This will ensure that the same launch configurations for a given Kernel is used on all GPUs and binary reproducibility. Getting Help: Please visit http://lattice.github.com/quda for contact information. Bug reports are especially welcome. Acknowledging QUDA: If you find this software useful in your work, please cite: M. A. Clark, R. Babich, K. Barros, R. Brower, and C. Rebbi, "Solving Lattice QCD systems of equations using mixed precision solvers on GPUs," Comput. Phys. Commun. 181, 1517 (2010) [arXiv:0911.3191 [hep-lat]]. When taking advantage of multi-GPU support, please also cite: R. Babich, M. A. Clark, B. Joo, G. Shi, R. C. Brower, and S. Gottlieb, "Scaling lattice QCD beyond 100 GPUs," International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2011 [arXiv:1109.2935 [hep-lat]]. Several other papers that might be of interest are listed at http://lattice.github.com/quda . Authors: Ronald Babich (NVIDIA) Kipton Barros (Los Alamos National Laboratory) Richard Brower (Boston University) Nuno Cardoso (NCSA) Mike Clark (NVIDIA) Justin Foley (University of Utah) Joel Giedt (Rensselaer Polytechnic Institute) Steven Gottlieb (Indiana University) Dean Howarth (Rensselaer Polytechnic Institute) Balint Joo (Jefferson Laboratory) Hyung-Jin Kim (Samsung Advanced Institute of Technology) Claudio Rebbi (Boston University) Guochun Shi (NCSA) Alexei Strelchenko (Fermi National Accelerator Laboratory) Alejandro Vaquero (INFN Sezione Milano Bicocca) Mathias Wagner (NVIDIA) Portions of this software were developed at the Innovative Systems Lab, National Center for Supercomputing Applications http://www.ncsa.uiuc.edu/AboutUs/Directorates/ISL.html Development was supported in part by the U.S. Department of Energy under grants DE-FC02-06ER41440, DE-FC02-06ER41449, and DE-AC05-06OR23177; the National Science Foundation under grants DGE-0221680, PHY-0427646, PHY-0835713, OCI-0946441, and OCI-1060067; as well as the PRACE project funded in part by the EUs 7th Framework Programme (FP7/2007-2013) under grants RI-211528 and FP7-261557. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the Department of Energy, the National Science Foundation, or the PRACE project.
About
Plug-in version of the ETMC code base
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published