diff --git a/README.md b/README.md index f08ae40..f96e1ba 100644 --- a/README.md +++ b/README.md @@ -4,88 +4,219 @@ |-----------------------------------------| MD-Bench is a toolbox for the performance engineering of short-range force -calculation kernels on molecular-dynamics applications. It aims at covering all -available state-of-the-art algorithms from different community codes such as +calculation kernels on molecular-dynamics applications. It aims at covering +state-of-the-art algorithms from different community codes such as LAMMPS and GROMACS. -## Build instructions +## Getting started -Properly configure your building by changing `config.mk` file. The following -options are available: +Clone the repository from GitHub: -- **TOOLCHAIN:** Compiler toolchain (available options: GCC, CLANG, ICC, ONEAPI, NVCC). -- **ISA:** Instruction set (available options: ARM, X86). Only relevant with -SIMD other than NONE. -- **SIMD:** Instruction set (available options: NONE, SSE, AVX, AVX\_FMA, AVX2, AVX512). -- **MASK\_REGISTERS:** Use AVX512 mask registers (always true when ISA is set to AVX512). -- **OPT\_SCHEME:** Optimization algorithm (available options: verletlist, clusterpair). -- **ENABLE\_LIKWID:** Enable likwid to make use of HPM counters. -- **DATA\_TYPE:** Floating-point precision (available options: SP, DP). -- **DATA\_LAYOUT:** Data layout for atom vector properties (available options: AOS, SOA). -- **ASM\_SYNTAX:** Assembly syntax to use when generating assembly files (available options: ATT, INTEL). -- **DEBUG:** Toggle debug mode. -- **ONE\_ATOM\_TYPE:** Simulate only one atom type and do not perform table lookup for parameters. -- **MEM\_TRACER:** Trace memory addresses for cache simulator. -- **INDEX\_TRACER:** Trace indexes and distances for gather-md. -- **COMPUTE\_STATS:** Compute statistics. +```shell= +git clone https://github.com/RRZE-HPC/MD-Bench.git +``` + +Edit config.mk and configure the compiler toolchain -Configurations for LAMMPS Verlet Lists optimization scheme: +```makefile= +# Compiler tool chain (GCC/CLANG/ICC/ICX/ONEAPI/NVCC) +TOOLCHAIN ?= CLANG +``` -- **ENABLE\_OMP\_SIMD:** Use omp simd pragma on half neighbor-lists kernels. -- **USE\_SIMD\_KERNEL:** Compile kernel with explicit SIMD intrinsics. +Best supported are ICC (deprecated legacy Intel compiler) and ICX (LLVM based +Intel compiler). Choose NVCC to enable CUDA GPU kernels. -Configurations for GROMACS MxN optimization scheme: +The toolchain settings are located in the `./make` directory. Review the +settings for the configured toolchain. You can configure different settings in +`config.mk`, for starters on a X86 based system the defaults are fine. -- **USE\_REFERENCE\_VERSION:** Use reference version (only for correction purposes). -- **XTC\_OUTPUT:** Enable XTC output. +To build the binary call, don't forget to load the compiler module on the +NHR@FAU clusters (e.g. `module load intel`): -Configurations for CUDA: +```shell= +make +``` -- **USE\_CUDA\_HOST\_MEMORY:** Use CUDA host memory to optimize host-device transfers. +While the Makefile works with any version of GNU make, some features require GNU +make > v4. + +You can run MD-Bench without any arguments: + +```shell= +./MDBench-VL-ICX-X86-AVX2-DP +``` -When done, just use `make` to compile the code. -You can clean intermediate build results with `make clean`, and all build results with `make distclean`. -You have to call `make clean` before `make` if you changed the build settings. +## Build system -## Usage +MD-Bench uses a Makefile with pattern rules and automatic dependency generation. +If you add source files you do not need to change the Makefile as long as the +sources are placed either in the `./src/verletlist/`, `./src/clusterpair/` or +`./src/common` directories. If you change a file, all object files that depend +on it are rebuild. -Use the following command to run a simulation: +All configuration variables can be overwritten from the command line, e.g. to +build with ICC without changing `./config.mk` build with: -```bash -./MD-Bench-- [OPTION]... +```shell= +make TOOLCHAIN=ICC ``` -Where `TAG` and `OPT_SCHEME` correspond to the building options with the same -name. Without any options, a Copper FCC lattice system with size 32x32x32 -(131072 atoms) over 200 time-steps using the Lennard-Jones potential (sigma=1.0, -epsilon=1.0) is simulated. - -The default behavior and other options can be changed using the following parameters: - -```sh --p : file to read parameters from (can be specified more than once) --f : force field (lj or eam), default lj --i : input file with atom positions (dump) --e : input file for EAM --n / --nsteps : set number of timesteps for simulation --nx/-ny/-nz : set linear dimension of systembox in x/y/z direction --r / --radius : set cutoff radius --s / --skin : set skin (verlet buffer) ---freq : processor frequency (GHz) ---vtk : VTK file for visualization ---xtc : XTC file for visualization +Multiple configurations can be build at the same time. Every configuration has a +unique binary name `./MDBENCH-`. Intermediate build results are +located in a `./build/build-/` directory. + +All make targets act on the current configuration set in `./config.mk`, but this +can be of course overwritten on the command line. + +Supported make targets: + +- `make`: Build the binary for current configuration. +- `make clean`: Remove intermediate build results. +- `make distclean`: Remove intermediate build results and binary. Also removes +generated tags and clangd files, more on that later. +- `make cleanall`: Remove all generated files. **Note**: This target applies to +all configurations. +- `make info` Output compiler version, useful for logging in automated benchmark +scripts. +- `make asm`: Generate assembly output of all source files. The assembly files +are placed in the intermediate build directory. +- `make format`: Reformat all source files with `clang-format` using the format +specification in `.clang-format`. + +### Build time options + +- `TOOLCHAIN`: Determines which toolchain makefile is included +- `ISA`: No usage apart from tag strings +- `SIMD`: Controls the generation of intrinsic kernels for clusterpair +- `OPT_SCHEME`: Algorithmic variant (verletlist or clusterpair), different +source directories and main routines are used +- `ENABLE_LIKWID`: Turn on LIKWID instrumentation, the LIKWID library has to be available +- `DATA_TYPE`: Switch between single precision and double precision floating +point. This is controlled by defines. +- `DATA_LAYOUT`: Switch between array-of-structure (AOS) and structure-of-array +(SOA) layout for atom positions and forces. Tradeoff between better cache +utilisation and easier SIMD vectorization. +- `DEBUG`: Enable additional debug output +- `SORT_ATOMS`: Resort atoms to ensure that atoms that are nearby are also close +to each other in the data structures +- `EXPLICIT_TYPES`: Default the atom properties are stored in scalar variables. +This option enables to support multiple atom types with different properties. +- `ENABLE_OMP_SIMD`: This enforces the use of `#pragma omp simd` for the +verletlist half-neighbour list force kernel. Without is the Intel compiler (at +least ICC) refuses to do SIMD vectorization. +- `USE_REFERENCE_VERSION`: Enforce usage of C implementation for clusterpair +algorithm for validation +- `USE_CUDA_HOST_MEMORY`: Enable pinned host memory for faster host-device transfers +- `ENABLE_MPI:` Turn on the MPI parallel version of the code + +### Build for GPU targets + +MD-Bench currently only supports Nvidia GPUs using CUDA kernels. To enable CUDA +kernels you need to specify `NVCC` as toolchain. The CUDA source code is in the +same source directories with Cuda suffix and `.cu` as file type ending. If +`NVCC` is set as toolchain, all supported kernels are automatically set to their +CUDA variants at build time. This means a binary either supports CPU kernels or +GPU kernels. + +## Command line arguments + +MD-Bench can be executed without any arguments, in this case the full neighbor +list testcase with LJ force will be computed for 200 steps and a size of +32x32x32 unit cells. + +- `-p / --params `: file to read parameters from (can be specified more +than once). Default initialization sets parameters for default LJ testcase. +*`-f `: force field (lj, eam), default lj. For anything different than +lj you also need to provide spcific parameter file. +- `-i `: input file with atom positions (dump). MD-Bench supports +Brookhaven protein data bank (.pdb), GROMACS GROMOS87 (.gro), and LAMMPS dump +(.dmp) file formats +- `-e `: input file for EAM parameters +- `-n / --nsteps `: set number of timesteps for simulation (default 200) +- `-nx/-ny/-nz `: set linear dimension of systembox in x/y/z direction +(default 32 in every dimension) +- `-half `: use half (1) or full (0) neighbor lists (default 0 - full +neighbor list) +- `-r / --radius `: set cutoff radius (default 2.5) +- `-s / --skin `: set skin (verlet buffer, default 0.3) +- `-w `: write input atoms to file +- `--freq `: processor frequency (GHz), used to calculate cycle metrics +(default 2.4) +- `--vtk `: VTK output file for visualization + +## Available testcases + +For all variants you can switch between single precision and double precision +and between AOS versus SOA data layouts using build time options. You can use +the half neighbour list algorithm instead of the default full neighbour list by +setting `-half 1`. To enforce SIMD vectorization for the half neighbour list +algorithm you can set the option `ENABLE_OMP_SIMD=true`. + +### Lennard-Jones potential for solid copper + +Just start without any command line argument, this is the default testcase. You +may change the number of timesteps using the `-n` options and change the problem +size using the `-nz, -ny, -nz` options. + +### EAM potential for solid copper + +Call MD-Bench as follows: + +```shell= +./MDBench- -n 400 -f eam -e ./data/Cu_u3.eam ``` -## Examples +Two different EAM variants are available: `Cu_u3.eam` and `Cu_u6.eam`. The EAM +potential is currently only available for verletlist. + +### Lennard-Jones potential for melted copper + +The melted copper testcase has only 32000 atoms in the default configuration. +Call MD-Bench as follows: + +```shell=bash +./MDBench- -n 400 -i ./data/copper_melting/input_lj_cu_one_atomtype_20x20x20.dmp +``` + +### Lennard-Jones potential for melted copper with explicit types + +Compile MD-Bench with `EXPLICIT_TYPES=true` in `config.mk`. -TBD +Call MD-Bench as follows: + +```shell=bash +./MDBench- -n 400 -i ./data/copper_melting/input_lj_cu_one_atomtype_20x20x20.dmp +``` + +**This testcase currently segvaults!** + +### EAM potential for melted copper + +Call MD-Bench as follows: + +```shell=bash +./MDBench- -n 400 -f eam -e ./data/Cu_u3.eam -i ./data/copper_melting/input_eam_cu_one_atomtype_20x20x20.dmp +``` + +Two different EAM variants are available: `Cu_u3.eam` and `Cu_u6.eam`. The EAM +potential is currently only available for verletlist.t. + +### Lennard-Jones potential for argon gas + +Call MD-Bench as follows: + +```shell=bash +./MDBench- -i ./data/argon/input.gro -p ./data/argon/mdbench_params.conf +``` ## Citations -Rafael Ravedutti Lucio Machado, Jan Eitzinger, Jan Laukemann, Georg Hager, Harald -Köstler and Gerhard Wellein: MD-Bench: A performance-focused prototyping harness for -state-of-the-art short-range molecular dynamics algorithms. Future Generation -Computer Systems ([FGCS](https://www.sciencedirect.com/journal/future-generation-computer-systems)), Volume 149, 2023, Pages 25-38, ISSN 0167-739X, DOI: +Rafael Ravedutti Lucio Machado, Jan Eitzinger, Jan Laukemann, Georg Hager, +Harald Köstler and Gerhard Wellein: MD-Bench: A performance-focused prototyping +harness for state-of-the-art short-range molecular dynamics algorithms. Future +Generation Computer Systems +([FGCS](https://www.sciencedirect.com/journal/future-generation-computer-systems)), +Volume 149, 2023, Pages 25-38, ISSN 0167-739X, DOI: [https://doi.org/10.1016/j.future.2023.06.023](https://doi.org/10.1016/j.future.2023.06.023) Rafael Ravedutti Lucio Machado, Jan Eitzinger, Harald Köstler, and Gerhard