-
Notifications
You must be signed in to change notification settings - Fork 1
/
README
319 lines (240 loc) · 12.5 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
ETMC-QUDA production code base 24th February 2017
------------------------------------------------------------------------
This library is an extension to QUDA mainline: 21st September 2016. It
contains routines to calculate 2- and 3-point hadronic correlation functions,
and an All-to-All propagator with a gauge covariant derivative, with a
variety of correlation functions.
This particular library is a multigrid enabled version of ETMC's deflated
solver code base, currently hosted here:
https://github.com/ckallidonis/quda/tree/eig-solver-fullOp-0.8.0
Dependencies:
In order to compile this code, one must also have the following installed:
QUDA specific
REQUIRED:
magma 1.7.0
ETMC-QUDA specific
REQUIRED:
hdf5
lime
gsl
OPTIONAL:
Arpack
The QKXTM boolean is ON by default and will expect the REQUIRED for
dependencies for successful compilation. The OPTIONAL Arpack dependency
is needed only for those routines involving deflation.
Compilation:
One must have the CONTRACT boolean ON in order to ensure that the covariant
derivative functions in QUDA are compiled. MAGMA, and MULTIGRID booleans also
must be switched ON. One must manually give the explicit path to magma.h in
the file blas_magma.cu.
Overview:
The QKXTM (Quda Kepler and Twisted Mass) calculation routines are appended
to the interface_quda.cpp file of mainline QUDA. This is so we can use the
internal gauge field structures directly. The class definitions and header
files of auxiliary functions are contained in files prepended like so,
qudaQKXTM_<some file>.suffix
There is a single .cu file for kernels, a 'code_pieces' directory in lib/,
a single qudaQKXTM_Kepler_utils.cpp and .h file for containers and general
functions, and a suite of qudaQKXTM_<CLASS>_Kepler.cpp files under a single
header qudaQKXTM_Kepler.h. These <CLASS> .cpp files are NOT individually
compiled, rather included in interface_quda.cpp as,
... [end interface_quda.cpp code]
#include <qudaQKXTM_Field_Kepler.cpp
...
#include <qudaQKXTM_Deflation_Kepler.cpp
... [begin QKXTM calculation functions]
Known Issues:
One must pass gauge fields to the QKXTM executables in IDLG lime format. It
expects a separate smeared gauge field for source construction in the
hadronic 2pt3pt functions. QUDA has gauge smearing functionality, which
should be utilised.
It would be desirable to completely separate out the QKXTM files from QUDA
so that a `plug-in` can be installed alongside any suitable compiled copy
of QUDA.
Authors:
The majority of the QKXTM code base was authored by,
Kyriakos Hadjiyiannakou
Christos Kallidonis
The multigrid upgrades and QUDA 0.9.0 compatibility were done by
Dean Howarth
Bug reports, suggestions, comments, and queries are welcome.
QUDA's mainline README follows.
Release Notes for QUDA v0.8.0 1st February 2016
-----------------------------
Overview:
QUDA is a library for performing calculations in lattice QCD on
graphics processing units (GPUs), leveraging NVIDIA's CUDA platform.
The current release includes optimized Dirac operators and solvers for
the following fermion actions:
* Wilson
* Clover-improved Wilson
* Twisted mass (including non-degenerate pairs)
* Twisted mass with a clover term
* Staggered fermions
* Improved staggered (asqtad or HISQ)
* Domain wall (4-d or 5-d preconditioned)
* Mobius fermion
Implementations of CG, multi-shift CG, BiCGstab, and DD-preconditioned
GCR are provided, including robust mixed-precision variants supporting
combinations of double, single, and half (16-bit "block floating
point") precision. The library also includes auxilliary routines
necessary for Hybrid Monte Carlo, such as HISQ link fattening, force
terms and clover-field construction. Use of many GPUs in parallel is
supported throughout, with communication handled by QMP or MPI.
Software Compatibility:
The library has been tested under Linux (CentOS 5.8 and Ubuntu 14.04)
using releases 6.5, 7.0 and 7.5 of the CUDA toolkit. CUDA 6.0 and
earlier are not supported (though they may continue to work fine).
The library also works on recent 64-bit Intel-based Macs. Due to
issues with compilation using LLVM, under Mac OS X 10.9.x it is
required to install and use GCC instead of the default clang compiler,
though this is unnecesary with Mac OS X 10.10.x.
See also "Known Issues" below.
Hardware Compatibility:
For a list of supported devices, see
http://developer.nvidia.com/cuda-gpus
Before building the library, you should determine the "compute
capability" of your card, either from NVIDIA's documentation or by
running the deviceQuery example in the CUDA SDK, and pass the
appropriate value to QUDA's configure script. For example, the Tesla
C2075 is listed on the above website as having compute capability 2.0,
and so to configure the library for this card, you'd run "configure
--enable-gpu-arch=sm_20 [other options]" before typing "make".
As of QUDA 0.8.0, only devices of compute capability 2.0 or greater
are supported. See also "Known Issues" below.
Installation:
The recommended method for compiling QUDA is to use cmake, and build
QUDA in a separate directory from the source directory. For
instructions on how to build QUDA using cmake see this page
https://github.com/lattice/quda/wiki/Building-QUDA-with-cmake.
Alternatively, QUDA can also be built using "configure" and "make",
though this build approach is considered deprecated and will be
removed in a subsequent QUDA release. See "./configure --help" for a
list of configure options. At a minimum, you'll probably want to set
the GPU architecture; see "Hardware Compatibility" above.
Enabling multi-GPU support requires passing the --enable-multi-gpu
flag to configure, as well as --with-mpi=<PATH> and optionally
--with-qmp=<PATH>. If the latter is given, QUDA will use QMP for
communications; otherwise, MPI will be called directly. By default,
it is assumed that the MPI compiler wrappers are <MPI_PATH>/bin/mpicc
and <MPI_PATH>/bin/mpicxx for C and C++, respectively. These choices
may be overriden by setting the CC and CXX variables on the command
line as follows:
./configure --enable-multi-gpu --with-mpi=<MPI_PATH> \
[--with-qmp=<QMP_PATH>] [OTHER_OPTIONS] CC=my_mpicc CXX=my_mpicxx
Finally, with some MPI implementations, executables compiled against
MPI will not run without "mpirun". This has the side effect of
causing the configure script to believe that the compiler is failing
to produce a valid executable. To skip these checks, one can trick
configure into thinking that it's cross-compiling by setting the
--build=none and --host=<HOST> flags. For the latter,
"--host=x86_64-linux-gnu" should work on a 64-bit linux system.
By default only the QDP and MILC interfaces are enabled. For
interfacing support with QDPJIT, BQCD or CPS; this should be enabled
at configure time with the appropriate flag, e.g.,
--enable-bqcd-interface. To keep compilation time to a minimum it is
recommended to only enable those interfaces that are used by a given
application. The QDP and MILC interfaces can be disabled with the,
e.g., --disable-milc-interface flag.
The eigen-vector solvers (eigCG and incremental eigCG) require the
installation of the MAGMA dense linear algebra package. It is
recommended that MAGMA 1.7.x is used, though versions are 1.5.x and
1.6.x should work. MAGMA is available from
http://icl.cs.utk.edu/magma/index.html. MAGMA is enabled using the
configure option --with-magma=MAGMA_PATH.
If Fortran interface support is desired, the F90 environment variable
should be set when configure is invoked, and "make fortran" must be
run explicitly, since the Fortran interface modules are not built by
default.
As examples, the scripts "configure.milc.titan" and
"configure.chroma.titan" are provided. These configure QUDA for
expected use with MILC and Chroma, respectively, on Titan (the Tesla
K20X-powered Cray XK7 supercomputer at the Oak Ridge Leadership
Computing Facility).
Throughout the library, auto-tuning is used to select optimal launch
parameters for most performance-critical kernels. This tuning
process takes some time and will generally slow things down the first
time a given kernel is called during a run. To avoid this one-time
overhead in subsequent runs (using the same action, solver, lattice
volume, etc.), the optimal parameters are cached to disk. For this
to work, the QUDA_RESOURCE_PATH environment variable must be set,
pointing to a writeable directory. Note that since the tuned parameters
are hardware-specific, this "resource directory" should not be shared
between jobs running on different systems (e.g., two clusters
with different GPUs installed). Attempting to use parameters tuned
for one card on a different card may lead to unexpected errors.
This autotuning information can also be used to build up a first-order
kernel profile: since the autotuner measures how long a kernel takes
to run, if we simply keep track of the number of kernel calls, from
the product of these two quantities we have a time profile of a given
job run. If QUDA_RESOURCE_PATH is set, then this profiling
information is output to the file "profile.tsv" in this specified
directory. Optionally, the output filename can be specified using the
QUDA_PROFILE_OUTPUT environment variable, to avoid overwriting
previously generated profile outputs.
Using the Library:
Include the header file include/quda.h in your application, link
against lib/libquda.a, and study tests/invert_test.cpp (for Wilson,
clover, twisted-mass, or domain wall fermions) or
tests/staggered_invert_test.cpp (for asqtad/HISQ fermions) for
examples of the solver interface. The various solver options are
enumerated in include/enum_quda.h.
Known Issues:
* For compatibility with CUDA, on 32-bit platforms the library is
compiled with the GCC option -malign-double. This differs from the
GCC default and may affect the alignment of various structures,
notably those of type QudaGaugeParam and QudaInvertParam, defined in
quda.h. Therefore, any code to be linked against QUDA should also
be compiled with this option.
* When the auto-tuner is active in a multi-GPU run it may cause issues
with binary reproducibility of this run. This is caused by the
possibility of different launch configurations being used on
different GPUs in the tuning run. If binary reproducibility is
strictly required make sure that a run with active tuning has
completed. This will ensure that the same launch configurations for
a given Kernel is used on all GPUs and binary reproducibility.
Getting Help:
Please visit http://lattice.github.com/quda for contact information.
Bug reports are especially welcome.
Acknowledging QUDA:
If you find this software useful in your work, please cite:
M. A. Clark, R. Babich, K. Barros, R. Brower, and C. Rebbi, "Solving
Lattice QCD systems of equations using mixed precision solvers on GPUs,"
Comput. Phys. Commun. 181, 1517 (2010) [arXiv:0911.3191 [hep-lat]].
When taking advantage of multi-GPU support, please also cite:
R. Babich, M. A. Clark, B. Joo, G. Shi, R. C. Brower, and S. Gottlieb,
"Scaling lattice QCD beyond 100 GPUs," International Conference for
High Performance Computing, Networking, Storage and Analysis (SC),
2011 [arXiv:1109.2935 [hep-lat]].
Several other papers that might be of interest are listed at
http://lattice.github.com/quda .
Authors:
Ronald Babich (NVIDIA)
Kipton Barros (Los Alamos National Laboratory)
Richard Brower (Boston University)
Nuno Cardoso (NCSA)
Mike Clark (NVIDIA)
Justin Foley (University of Utah)
Joel Giedt (Rensselaer Polytechnic Institute)
Steven Gottlieb (Indiana University)
Dean Howarth (Rensselaer Polytechnic Institute)
Balint Joo (Jefferson Laboratory)
Hyung-Jin Kim (Samsung Advanced Institute of Technology)
Claudio Rebbi (Boston University)
Guochun Shi (NCSA)
Alexei Strelchenko (Fermi National Accelerator Laboratory)
Alejandro Vaquero (INFN Sezione Milano Bicocca)
Mathias Wagner (NVIDIA)
Portions of this software were developed at the Innovative Systems Lab,
National Center for Supercomputing Applications
http://www.ncsa.uiuc.edu/AboutUs/Directorates/ISL.html
Development was supported in part by the U.S. Department of Energy
under grants DE-FC02-06ER41440, DE-FC02-06ER41449, and
DE-AC05-06OR23177; the National Science Foundation under grants
DGE-0221680, PHY-0427646, PHY-0835713, OCI-0946441, and OCI-1060067;
as well as the PRACE project funded in part by the EUs 7th Framework
Programme (FP7/2007-2013) under grants RI-211528 and FP7-261557. Any
opinions, findings, and conclusions or recommendations expressed in
this material are those of the authors and do not necessarily reflect
the views of the Department of Energy, the National Science
Foundation, or the PRACE project.