From be3f4ad1bfaaea6d504e55494968edad2a0a5a20 Mon Sep 17 00:00:00 2001
From: Marco Garten <m.garten@hzdr.de>
Date: Tue, 20 Nov 2018 13:08:41 +0100
Subject: [PATCH 01/40] Change link to CRP group @ HZDR

I propose to change the link to the Computational Radiation Physics group at HZDR.
The new link still leads to the same page as the old but will avoid possible issues with doxygen that I encountered.
In some cases the ampersand in the link will not be automatically escaped by doxygen and this causes and invalid token in the XML file.
---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 078e195da7..3ff1208f7c 100644
--- a/README.md
+++ b/README.md
@@ -73,7 +73,7 @@ PIConGPU was one of the **finalists** of the 2013
 [Gordon Bell Prize](http://sc13.supercomputing.org/content/acm-gordon-bell-prize).
 
 PIConGPU is developed and maintained by the
-[Computational Radiation Physics Group](http://www.hzdr.de/db/Cms?pNid=132&pOid=30354)
+[Computational Radiation Physics Group](https://www.hzdr.de/db/Cms?pNid=2097)
 at the [Institute for Radiation Physics](http://www.hzdr.de/db/Cms?pNid=132)
 at [HZDR](http://www.hzdr.de/) in close collaboration with the Center
 for Information Services and High Performance Computing

From a1709ae894b1791a1801dab4c3532080feb7667d Mon Sep 17 00:00:00 2001
From: Axel Huebl <axel.huebl@plasma.ninja>
Date: Tue, 20 Nov 2018 23:37:20 +0100
Subject: [PATCH 02/40] Plugins: ADIOS & PhaseSpace Wterminate

Silence the GCC 7.3 `-Wterminate` warning about `throws` of our
`MPI_CHECK` macro in destructors.

Instead, write to `stderr`.
---
 include/picongpu/plugins/PhaseSpace/PhaseSpace.tpp | 2 +-
 include/picongpu/plugins/adios/ADIOSWriter.hpp     | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/picongpu/plugins/PhaseSpace/PhaseSpace.tpp b/include/picongpu/plugins/PhaseSpace/PhaseSpace.tpp
index 96973173c4..ebdab5226b 100644
--- a/include/picongpu/plugins/PhaseSpace/PhaseSpace.tpp
+++ b/include/picongpu/plugins/PhaseSpace/PhaseSpace.tpp
@@ -204,7 +204,7 @@ namespace picongpu
         {
             // avoid deadlock between not finished pmacc tasks and mpi blocking collectives
             __getTransactionEvent().waitForFinished();
-            MPI_CHECK(MPI_Comm_free( &commFileWriter ));
+            MPI_CHECK_NO_EXCEPT(MPI_Comm_free( &commFileWriter ));
         }
     }
 
diff --git a/include/picongpu/plugins/adios/ADIOSWriter.hpp b/include/picongpu/plugins/adios/ADIOSWriter.hpp
index b682181684..112f6961b2 100644
--- a/include/picongpu/plugins/adios/ADIOSWriter.hpp
+++ b/include/picongpu/plugins/adios/ADIOSWriter.hpp
@@ -840,7 +840,7 @@ class ADIOSWriter : public IIOBackend
         {
             // avoid deadlock between not finished pmacc tasks and mpi blocking collectives
             __getTransactionEvent().waitForFinished();
-            MPI_CHECK(MPI_Comm_free(&(mThreadParams.adiosComm)));
+            MPI_CHECK_NO_EXCEPT(MPI_Comm_free(&(mThreadParams.adiosComm)));
         }
     }
 

From 33c51d10927fe9a158d9361d79ca16f749321de6 Mon Sep 17 00:00:00 2001
From: Axel Huebl <axel.huebl@plasma.ninja>
Date: Wed, 21 Nov 2018 13:27:40 +0100
Subject: [PATCH 03/40] Doc: Add System Links

Add more information on system user guides, and production
directories so people have a quick time starting with PIConGPU.
---
 docs/source/install/profile.rst | 49 +++++++++++++++++++++++++++++++++
 1 file changed, 49 insertions(+)

diff --git a/docs/source/install/profile.rst b/docs/source/install/profile.rst
index 4a30b025b8..74c9230a66 100644
--- a/docs/source/install/profile.rst
+++ b/docs/source/install/profile.rst
@@ -21,6 +21,12 @@ We listed some example ``picongpu.profile`` files below which can be used to set
 Hemera (HZDR)
 -------------
 
+**System overview:** `link (internal) <https://www.hzdr.de/db/Cms?pOid=29813>`_
+
+**User guide:** *None*
+
+**Production directory:** ``/bigdata/hplsim/`` with ``external/``, ``scratch/``, ``development/`` and ``production/``
+
 For this profile to work, you need to download the :ref:`PIConGPU source code <install-dependencies-picongpu>` manually.
 
 Queue: defq (2x Intel Xeon Gold 6148, 20 Cores + 20 HyperThreads/CPU)
@@ -38,6 +44,12 @@ Queue: gpu (4x NVIDIA P100 16GB)
 Hypnos (HZDR)
 -------------
 
+**System overview:** `link (internal) <https://www.hzdr.de/db/Cms?pOid=29813>`_
+
+**User guide:** `link (internal) <http://hypnos3/wiki>`_
+
+**Production directory:** ``/bigdata/hplsim/`` with ``external/``, ``scratch/``, ``development/`` and ``production/``
+
 For these profiles to work, you need to download the :ref:`PIConGPU source code <install-dependencies-picongpu>` manually.
 
 Queue: laser (AMD Opteron 6276 CPUs)
@@ -61,6 +73,12 @@ Queue: k80 (Nvidia K80 GPUs)
 Hydra (HZDR)
 -------------
 
+**System overview:** `link (internal) <https://www.hzdr.de/db/Cms?pOid=29813>`_
+
+**User guide:** `link (internal) <http://hypnos3/wiki>`_
+
+**Production directory:** ``/bigdata/hplsim/`` with ``external/``, ``scratch/``, ``development/`` and ``production/``
+
 For this profile to work, you need to download the :ref:`PIConGPU source code <install-dependencies-picongpu>` manually.
 
 .. literalinclude:: profiles/hydra-hzdr/default_picongpu.profile.example
@@ -69,6 +87,13 @@ For this profile to work, you need to download the :ref:`PIConGPU source code <i
 Titan (ORNL)
 ------------
 
+**System overview:** `link <https://www.olcf.ornl.gov/olcf-resources/compute-systems/titan/>`_
+
+**User guide:** `link <https://www.olcf.ornl.gov/for-users/system-user-guides/titan/>`_
+
+**Production directory:** usually ``$PROJWORK/$proj/`` (`link <https://www.olcf.ornl.gov/for-users/system-user-guides/titan/file-systems/>`_).
+Note that ``$HOME`` is not mounted on compute nodes, place your ``picongpu.profile`` and auxiliary software in your production directory.
+
 For this profile to work, you need to download the :ref:`PIConGPU source code <install-dependencies-picongpu>` and install :ref:`libSplash, libpng and PNGwriter <install-dependencies>` manually.
 
 K20x GPUs (recommended)
@@ -86,6 +111,12 @@ AMD Opteron 6274 (Interlagos) CPUs (for experiments)
 Piz Daint (CSCS)
 ----------------
 
+**System overview:** `link <https://www.cscs.ch/computers/piz-daint/>`_
+
+**User guide:** `link <https://user.cscs.ch/>`_
+
+**Production directory:** ``$SCRATCH`` (`link <https://user.cscs.ch/storage/file_systems/>`_).
+
 For this profile to work, you need to download the :ref:`PIConGPU source code <install-dependencies-picongpu>` and install :ref:`boost, zlib, libpng, c-blosc, PNGwriter, libSplash and ADIOS <install-dependencies>` manually.
 
 .. note::
@@ -103,6 +134,12 @@ For this profile to work, you need to download the :ref:`PIConGPU source code <i
 Taurus (TU Dresden)
 -------------------
 
+**System overview:** `link <https://tu-dresden.de/zih/hochleistungsrechnen/hpc>`_
+
+**User guide:** `link <https://doc.zih.tu-dresden.de/hpc-wiki/bin/view/Compendium/SystemTaurus>`_
+
+**Production directory:** ``/scratch/$USER/`` and ``/scratch/$proj/``
+
 For these profiles to work, you need to download the :ref:`PIConGPU source code <install-dependencies-picongpu>` and install :ref:`PNGwriter and libSplash <install-dependencies>` manually.
 
 Queue: gpu1 (Nvidia K20x GPUs)
@@ -128,6 +165,12 @@ For this profile, you additionally need to install your own :ref:`boost <install
 Lawrencium (LBNL)
 -----------------
 
+**System overview:** `link <http://scs.lbl.gov/Systems>`_
+
+**User guide:** `link <https://sites.google.com/a/lbl.gov/high-performance-computing-services-group/lbnl-supercluster/lawrencium>`_
+
+**Production directory:** ``/global/scratch/$USER/``
+
 For this profile to work, you need to download the :ref:`PIConGPU source code <install-dependencies-picongpu>` and install :ref:`boost, PNGwriter and libSplash <install-dependencies>` manually.
 Additionally, you need to make the ``rsync`` command available as written below.
 
@@ -137,6 +180,12 @@ Additionally, you need to make the ``rsync`` command available as written below.
 Draco (MPCDF)
 -------------
 
+**System overview:** `link <https://www.mpcdf.mpg.de/services/computing/draco/about-the-system>`_
+
+**User guide:** `link <https://www.mpcdf.mpg.de/services/computing/draco>`_
+
+**Production directory:** ``/ptmp/$USER/``
+
 For this profile to work, you need to download the :ref:`PIConGPU source code <install-dependencies-picongpu>` and install :ref:`libpng, PNGwriter and libSplash <install-dependencies>` manually.
 
 .. literalinclude:: profiles/draco-mpcdf/picongpu.profile.example

From a85f80c84e39326023252fca1075698fae78719f Mon Sep 17 00:00:00 2001
From: Axel Huebl <axel.huebl@plasma.ninja>
Date: Thu, 22 Nov 2018 13:16:40 +0100
Subject: [PATCH 04/40] Taurus Profile: Project

Add a `$proj` variable on Taurus similar to the one in Titan.

Should we also use this for accounting in our `.tpl` script?
---
 etc/picongpu/taurus-tud/k20x_picongpu.profile.example | 4 ++++
 etc/picongpu/taurus-tud/k80_picongpu.profile.example  | 4 ++++
 etc/picongpu/taurus-tud/knl_picongpu.profile.example  | 4 ++++
 3 files changed, 12 insertions(+)

diff --git a/etc/picongpu/taurus-tud/k20x_picongpu.profile.example b/etc/picongpu/taurus-tud/k20x_picongpu.profile.example
index a9c0716827..c6a9beb56a 100644
--- a/etc/picongpu/taurus-tud/k20x_picongpu.profile.example
+++ b/etc/picongpu/taurus-tud/k20x_picongpu.profile.example
@@ -9,6 +9,10 @@ export MY_MAILNOTIFY="NONE"
 export MY_MAIL="someone@example.com"
 export MY_NAME="$(whoami) <$MY_MAIL>"
 
+# Project Information ######################################## (edit this line)
+#   - project account for computing time
+export proj=$(groups | awk '{print $1}')
+
 # Text Editor for Tools ###################################### (edit this line)
 #   - examples: "nano", "vim", "emacs -nw", "vi" or without terminal: "gedit"
 #export EDITOR="nano"
diff --git a/etc/picongpu/taurus-tud/k80_picongpu.profile.example b/etc/picongpu/taurus-tud/k80_picongpu.profile.example
index 93baa03220..5e5991f1e7 100644
--- a/etc/picongpu/taurus-tud/k80_picongpu.profile.example
+++ b/etc/picongpu/taurus-tud/k80_picongpu.profile.example
@@ -9,6 +9,10 @@ export MY_MAILNOTIFY="NONE"
 export MY_MAIL="someone@example.com"
 export MY_NAME="$(whoami) <$MY_MAIL>"
 
+# Project Information ######################################## (edit this line)
+#   - project account for computing time
+export proj=$(groups | awk '{print $1}')
+
 # Text Editor for Tools ###################################### (edit this line)
 #   - examples: "nano", "vim", "emacs -nw", "vi" or without terminal: "gedit"
 #export EDITOR="nano"
diff --git a/etc/picongpu/taurus-tud/knl_picongpu.profile.example b/etc/picongpu/taurus-tud/knl_picongpu.profile.example
index d4c61b2c5a..f7bc9556a5 100644
--- a/etc/picongpu/taurus-tud/knl_picongpu.profile.example
+++ b/etc/picongpu/taurus-tud/knl_picongpu.profile.example
@@ -9,6 +9,10 @@ export MY_MAILNOTIFY="NONE"
 export MY_MAIL="someone@example.com"
 export MY_NAME="$(whoami) <$MY_MAIL>"
 
+# Project Information ######################################## (edit this line)
+#   - project account for computing time
+export proj=$(groups | awk '{print $1}')
+
 # Text Editor for Tools ###################################### (edit this line)
 #   - examples: "nano", "vim", "emacs -nw", "vi" or without terminal: "gedit"
 #export EDITOR="nano"

From 7d683db5dbd9f705cdedff5bacd4aaeee7ab9046 Mon Sep 17 00:00:00 2001
From: Axel Huebl <axel.huebl@plasma.ninja>
Date: Fri, 23 Nov 2018 14:26:25 +0100
Subject: [PATCH 05/40] System: D.A.V.I.D.E

Add new system documentation, profiles and tbg template
for the D.A.V.I.D.E cluster at CINECA.
---
 docs/source/install/profile.rst               |  17 +++
 etc/picongpu/davide-cineca/gpu.tpl            | 108 ++++++++++++++++++
 .../gpu_picongpu.profile.example              | 101 ++++++++++++++++
 3 files changed, 226 insertions(+)
 create mode 100644 etc/picongpu/davide-cineca/gpu.tpl
 create mode 100644 etc/picongpu/davide-cineca/gpu_picongpu.profile.example

diff --git a/docs/source/install/profile.rst b/docs/source/install/profile.rst
index 74c9230a66..199bc9d571 100644
--- a/docs/source/install/profile.rst
+++ b/docs/source/install/profile.rst
@@ -190,3 +190,20 @@ For this profile to work, you need to download the :ref:`PIConGPU source code <i
 
 .. literalinclude:: profiles/draco-mpcdf/picongpu.profile.example
    :language: bash
+
+D.A.V.I.D.E (CINECA)
+--------------------
+
+**System overview:** `link <http://www.hpc.cineca.it/content/davide>`_
+
+**User guide:** `link <https://wiki.u-gov.it/confluence/display/SCAIUS/UG3.2%3A+D.A.V.I.D.E.+UserGuide>`_
+
+**Production directory:** ``$CINECA_SCRATCH/`` (`link <https://wiki.u-gov.it/confluence/display/SCAIUS/UG2.4%3A+Data+storage+and+FileSystems>`_)
+
+For this profile to work, you need to download the :ref:`PIConGPU source code <install-dependencies-picongpu>` manually.
+
+Queue: dvd_usr_prod (Nvidia P100 GPUs)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. literalinclude:: profiles/davide-cineca/gpu_picongpu.profile.example
+   :language: bash
diff --git a/etc/picongpu/davide-cineca/gpu.tpl b/etc/picongpu/davide-cineca/gpu.tpl
new file mode 100644
index 0000000000..4172dd8215
--- /dev/null
+++ b/etc/picongpu/davide-cineca/gpu.tpl
@@ -0,0 +1,108 @@
+#!/usr/bin/env bash
+# Copyright 2013-2018 Axel Huebl, Richard Pausch, Rene Widera
+#
+# This file is part of PIConGPU.
+#
+# PIConGPU is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# PIConGPU is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with PIConGPU.
+# If not, see <http://www.gnu.org/licenses/>.
+#
+
+
+# PIConGPU batch script for D.A.V.I.D.E's SLURM batch system
+
+#SBATCH --account=!TBG_nameProject
+#SBATCH --partition=!TBG_queue
+#SBATCH --time=!TBG_wallTime
+# Sets batch job's name
+#SBATCH --job-name=!TBG_jobName
+#SBATCH --nodes=!TBG_nodes
+#SBATCH --ntasks=!TBG_tasks
+#SBATCH --ntasks-per-node=!TBG_gpusPerNode
+#SBATCH --mincpus=!TBG_mpiTasksPerNode
+#SBATCH --cpus-per-task=!TBG_coresPerGPU
+#SBATCH --mem=!TBG_memPerNode
+#SBATCH --gres=gpu:!TBG_gpusPerNode
+#SBATCH --mail-type=!TBG_mailSettings
+#SBATCH --mail-user=!TBG_mailAddress
+#SBATCH --workdir=!TBG_dstPath
+#SBATCH --workdir=!TBG_dstPath
+
+#SBATCH -o stdout
+#SBATCH -e stderr
+
+
+## calculations will be performed by tbg ##
+.TBG_queue="dvd_usr_prod"
+
+# settings that can be controlled by environment variables before submit
+.TBG_mailSettings=${MY_MAILNOTIFY:-"NONE"}
+.TBG_mailAddress=${MY_MAIL:-"someone@example.com"}
+.TBG_author=${MY_NAME:+--author \"${MY_NAME}\"}
+.TBG_nameProject=${proj:-""}
+.TBG_profile=${PIC_PROFILE:-"~/picongpu.profile"}
+
+# number of available/hosted GPUs per node in the system
+.TBG_numHostedGPUPerNode=4
+
+# required GPUs per node for the current job
+.TBG_gpusPerNode=`if [ $TBG_tasks -gt $TBG_numHostedGPUPerNode ] ; then echo $TBG_numHostedGPUPerNode; else echo $TBG_tasks; fi`
+
+# host memory per gpu
+.TBG_memPerGPU="$((252000 / $TBG_gpusPerNode))"
+# host memory per node
+.TBG_memPerNode="$((TBG_memPerGPU * TBG_gpusPerNode))"
+
+# number of cores to block per GPU
+# We got two Power8 processors with each 8 cores per node,
+# so 16 cores means 4 cores for each of the 4 GPUs.
+.TBG_coresPerGPU=4
+
+# We only start 1 MPI task per GPU
+.TBG_mpiTasksPerNode="$(( TBG_gpusPerNode * 1 ))"
+
+# use ceil to caculate nodes
+.TBG_nodes="$((( TBG_tasks + TBG_gpusPerNode - 1 ) / TBG_gpusPerNode))"
+
+## end calculations ##
+
+echo 'Running program...'
+
+cd !TBG_dstPath
+
+export MODULES_NO_OUTPUT=1
+source !TBG_profile
+if [ $? -ne 0 ] ; then
+  echo "Error: PIConGPU environment profile under \"!TBG_profile\" not found!"
+  exit 1
+fi
+unset MODULES_NO_OUTPUT
+
+#set user rights to u=rwx;g=r-x;o=---
+umask 0027
+
+mkdir simOutput 2> /dev/null
+cd simOutput
+
+# test if cuda_memtest binary is available and we have the node exclusive
+if [ -f !TBG_dstPath/input/bin/cuda_memtest ] && [ !TBG_numHostedGPUPerNode -eq !TBG_gpusPerNode ] ; then
+  # Run CUDA memtest to check GPU's health
+  srun --cpu-bind=sockets !TBG_dstPath/input/bin/cuda_memtest.sh
+else
+  echo "no binary 'cuda_memtest' available or compute node is not exclusively allocated, skip GPU memory test" >&2
+fi
+
+if [ $? -eq 0 ] ; then
+  # Run PIConGPU
+  srun --cpu-bind=sockets !TBG_dstPath/input/bin/picongpu !TBG_author !TBG_programParams | tee output
+fi
diff --git a/etc/picongpu/davide-cineca/gpu_picongpu.profile.example b/etc/picongpu/davide-cineca/gpu_picongpu.profile.example
new file mode 100644
index 0000000000..96a0ad2702
--- /dev/null
+++ b/etc/picongpu/davide-cineca/gpu_picongpu.profile.example
@@ -0,0 +1,101 @@
+# Name and Path of this Script ############################### (DO NOT change!)
+export PIC_PROFILE=$(cd $(dirname $BASH_SOURCE) && pwd)"/"$(basename $BASH_SOURCE)
+
+# User Information ######################################### (edit those lines)
+#   - automatically add your name and contact to output file meta data
+#   - send me a mail on batch system jobs: NONE, BEGIN, END, FAIL, REQUEUE, ALL,
+#     TIME_LIMIT, TIME_LIMIT_90, TIME_LIMIT_80 and/or TIME_LIMIT_50
+export MY_MAILNOTIFY="NONE"
+export MY_MAIL="someone@example.com"
+export MY_NAME="$(whoami) <$MY_MAIL>"
+
+# Project Information ######################################## (edit this line)
+#   - project account for computing time
+export proj=$(groups | awk '{print $2}')
+
+# Text Editor for Tools ###################################### (edit this line)
+#   - examples: "nano", "vim", "emacs -nw", "vi" or without terminal: "gedit"
+#export EDITOR="nano"
+
+# General modules #############################################################
+#
+module purge
+module load gnu/6.4.0
+module load cmake/3.11
+module load cuda/9.2.88
+module load openmpi/3.1.0--gnu--6.4.0
+module load boost/1.68.0--openmpi--3.1.0--gnu--6.4.0
+
+export CMAKE_PREFIX_PATH=$CUDA_HOME:$OPENMPI_HOME:$CMAKE_PREFIX_PATH
+export CMAKE_PREFIX_PATH=$BOOST_HOME:$CMAKE_PREFIX_PATH
+
+# Other Software ##############################################################
+#
+module load zlib/1.2.11--gnu--6.4.0
+module load szip/2.1.1--gnu--6.4.0
+module load blosc/1.12.1--gnu--6.4.0
+
+module load hdf5/1.10.4--openmpi--3.1.0--gnu--6.4.0
+module load libsplash/1.7.0--openmpi--3.1.0--gnu--6.4.0
+module load adios/1.13.1--openmpi--3.1.0--gnu--6.4.0
+
+module load libpng/1.6.35--gnu--6.4.0
+module load freetype/2.9.1--gnu--6.4.0
+module load pngwriter/0.7.0--gnu--6.4.0
+
+export CMAKE_PREFIX_PATH=$ZLIB_HOME:$SZIP_HOME:$BLOSC_HOME:$CMAKE_PREFIX_PATH
+export CMAKE_PREFIX_PATH=$HDF5_HOME:$LIBSPLASH_HOME:$ADIOS_HOME:$CMAKE_PREFIX_PATH
+export CMAKE_PREFIX_PATH=$LIBPNG_HOME:$FREETYPE_HOME:$PNGWRITER_HOME:$CMAKE_PREFIX_PATH
+
+# Work-Arounds ################################################################
+#
+# fix for Nvidia NVCC bug id 2448610
+# see https://github.com/ComputationalRadiationPhysics/alpaka/issues/701
+export CXXFLAGS="-Dlinux"
+
+# Environment #################################################################
+#
+#export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$BOOST_LIB
+
+export PICSRC=$HOME/src/picongpu
+export PIC_EXAMPLES=$PICSRC/share/picongpu/examples
+export PIC_BACKEND="cuda:60"
+
+export PATH=$PATH:$PICSRC
+export PATH=$PATH:$PICSRC/bin
+export PATH=$PATH:$PICSRC/src/tools/bin
+
+export PYTHONPATH=$PICSRC/lib/python:$PYTHONPATH
+
+# "tbg" default options #######################################################
+#   - SLURM (sbatch)
+#   - "gpu" queue
+export TBG_SUBMIT="sbatch"
+export TBG_TPLFILE="etc/picongpu/davide-cineca/gpu.tpl"
+
+# allocate an interactive shell for one hour
+#   getNode 2  # allocates to interactive nodes (default: 1)
+function getNode() {
+    if [ -z "$1" ] ; then
+        numNodes=1
+    else
+        numNodes=$1
+    fi
+    srun --time=0:30:00 --nodes=$numNodes --ntasks-per-socket=8 --ntasks-per-node=16 --mem=252000 --gres=gpu:4 -A $proj -p dvd_usr_prod --pty bash
+}
+
+# allocate an interactive shell for one hour
+#   getDevice 2  # allocates to interactive devices (default: 1)
+function getDevice() {
+    if [ -z "$1" ] ; then
+        numGPUs=1
+    else
+        if [ "$1" -gt 4 ] ; then
+            echo "The maximal number of devices per node is 4." 1>&2
+            return 1
+        else
+            numGPUs=$1
+        fi
+    fi
+    srun  --time=1:00:00 --ntasks-per-node=$numGPUs --cpus-per-task=$((4 * $numGPUs)) --gres=gpu:$numGPUs --mem=$((63000 * numGPUs)) -A $proj -p dvd_usr_prod --pty bash
+}

From 610e283aeffc5e0044d346b5f8b234856bd146a1 Mon Sep 17 00:00:00 2001
From: Igor Andriyash <igor.andriyash@gmail.com>
Date: Mon, 26 Nov 2018 21:49:37 +0200
Subject: [PATCH 06/40] Update FreeRng.def

fix a typo in the example code
---
 include/picongpu/particles/filter/generic/FreeRng.def | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/picongpu/particles/filter/generic/FreeRng.def b/include/picongpu/particles/filter/generic/FreeRng.def
index 243bdbc50b..2bed2d0882 100644
--- a/include/picongpu/particles/filter/generic/FreeRng.def
+++ b/include/picongpu/particles/filter/generic/FreeRng.def
@@ -53,7 +53,7 @@ namespace generic
      *       )
      *       {
      *           bool result = false;
-     *           if( rng >= float_X( 0.5 ) )
+     *           if( rng() >= float_X( 0.5 ) )
      *               result = true;
      *           return result;
      *       }

From e67f8d4d8c745594a08796d7f0eaae6be977c537 Mon Sep 17 00:00:00 2001
From: Axel Huebl <axel.huebl@plasma.ninja>
Date: Tue, 4 Dec 2018 12:19:42 +0100
Subject: [PATCH 07/40] Docs: Source Intro Details

Make the intro in building from source more clear.
Some users mismatch it with the actual install that follows
later on.
---
 docs/source/install/instructions/source.rst | 30 ++++++++++++++++-----
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/docs/source/install/instructions/source.rst b/docs/source/install/instructions/source.rst
index 64f0ac9114..f34db410a0 100644
--- a/docs/source/install/instructions/source.rst
+++ b/docs/source/install/instructions/source.rst
@@ -15,10 +15,10 @@ From Source
 
 Don't be afraid young physicist, self-compiling C/C++ projects is easy, fun and profitable!
 
-Compiling a project from source essentially requires three steps:
+Building a project from source essentially requires three steps:
 
     #. configure the project and find its dependencies
-    #. build the project
+    #. compile the project
     #. install the project
 
 All of the above steps can be performed without administrative rights ("root" or "superuser") as long as the install is not targeted at a system directory (such as ``/usr``) but inside a user-writable directory (such as ``$HOME`` or a project directory).
@@ -40,11 +40,19 @@ In order to compile projects from source, we assume you have individual director
 Note that on some supercomputing systems, you might need to install the final software outside of your home to make dependencies available during run-time (when the simulation runs).
 Use a different path for the last directory then.
 
-Step-by-Step
-^^^^^^^^^^^^
+What is Compiling?
+^^^^^^^^^^^^^^^^^^
+
+.. note::
+
+   This section is **not** yet the installation of PIConGPU from source.
+   It just introduces in general how one compiles projects.
+
+   If you like to skip this introduction, :ref:`jump straight to the dependency install section <install-dependencies>`.
 
 Compling can differ in two principle ways: building *inside* the source directory ("in-source") and in a *temporary directory* ("out-of-source").
 Modern projects prefer the latter and use a build system such as [CMake]_.
+
 An example could look like this
 
 .. code-block:: bash
@@ -62,7 +70,7 @@ Often, you want to pass further options to CMake with ``-DOPTION=VALUE`` or modi
 The second step which compiles the project can in many cases be parallelized by ``make -j``.
 In the final install step, you might need to prefix it with ``sudo`` in case ``CMAKE_INSTALL_PREFIX`` is pointing to a system directory.
 
-Some older projects still build *in-source* and use a build system called *autotools*.
+Some older projects often build *in-source* and use a build system called *autotools*.
 The syntax is still very similar:
 
 .. code-block:: bash
@@ -75,8 +83,16 @@ The syntax is still very similar:
    make
    make install
 
-That's all!
-Continue with the following section to build our dependencies.
+One can usually pass further options with ``--with-something=VALUE`` or ``--enable-thing`` to ``configure``.
+See ``configure --help`` when installing an *autotools* project.
+
+That is all on the theory of building projects from source!
+
+Now Start
+^^^^^^^^^
+
+You now know all the basics to install from source.
+Continue with the following section to :ref:`build our dependencies <install-dependencies>`.
 
 References
 ^^^^^^^^^^

From aa7709b48fa6bd4246484fc9fca21bd216d753e0 Mon Sep 17 00:00:00 2001
From: Axel Huebl <axel.huebl@plasma.ninja>
Date: Tue, 4 Dec 2018 14:11:37 +0100
Subject: [PATCH 08/40] Docs: Install Blosc

Add the documentation on installing c-blosc.
We use it regularly with ADIOS now.

Also adds more detailed section on libpng.
---
 INSTALL.rst | 79 +++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 58 insertions(+), 21 deletions(-)

diff --git a/INSTALL.rst b/INSTALL.rst
index 86d843744d..dd5bb54bee 100644
--- a/INSTALL.rst
+++ b/INSTALL.rst
@@ -171,29 +171,44 @@ CUDA
 If you do not install the following libraries, you will not have the full amount of PIConGPU plugins.
 We recommend to install at least **pngwriter** and either **libSplash** (+ **HDF5**) or **ADIOS**.
 
+libpng
+""""""
+- 1.2.9+ (requires *zlib*)
+- *Debian/Ubuntu dependencies:* ``sudo apt-get install libpng-dev``
+- *Arch Linux dependencies:* ``sudo pacman --sync libpng``
+- *Spack:* ``spack install libpng``
+- *from source:*
+
+  - ``mkdir -p ~/src ~/lib``
+  - ``cd ~/src``
+  - ``curl -Lo libpng-1.6.34.tar.gz ftp://ftp-osl.osuosl.org/pub/libpng/src/libpng16/libpng-1.6.34.tar.gz``
+  - ``tar -xf libpng-1.6.34.tar.gz``
+  - ``cd libpng-1.6.34``
+  - ``CPPFLAGS=-I$HOME/lib/zlib/include LDFLAGS=-L$HOME/lib/zlib/lib ./configure --enable-static --enable-shared --prefix=$HOME/lib/libpng``
+  - ``make``
+  - ``make install``
+- *environment:* (assumes install from source in ``$HOME/lib/libpng``)
+
+  - ``export PNG_ROOT=$HOME/lib/libpng``
+  - ``export CMAKE_PREFIX_PATH=$PNG_ROOT:$CMAKE_PREFIX_PATH``
+  - ``export LD_LIBRARY_PATH=$PNG_ROOT/lib:$LD_LIBRARY_PATH``
+
 pngwriter
 """""""""
-- 0.7.0+
+- 0.7.0+ (requires *libpng*, *zlib*, and optional *freetype*)
 - *Spack:* ``spack install pngwriter``
 - *from source:*
 
-  - download from `github.com/pngwriter/pngwriter <https://github.com/pngwriter/pngwriter>`_
-  - Requires `libpng <http://www.libpng.org>`_
-
-    - *Debian/Ubuntu:* ``sudo apt-get install libpng-dev``
-    - *Arch Linux:* ``sudo pacman --sync libpng``
-  - example:
-
-    - ``mkdir -p ~/src ~/build ~/lib``
-    - ``git clone https://github.com/pngwriter/pngwriter.git ~/src/pngwriter/``
-    - ``cd ~/build``
-    - ``cmake -DCMAKE_INSTALL_PREFIX=$HOME/lib/pngwriter ~/src/pngwriter``
-    - ``make install``
+  - ``mkdir -p ~/src ~/build ~/lib``
+  - ``git clone https://github.com/pngwriter/pngwriter.git ~/src/pngwriter/``
+  - ``cd ~/build``
+  - ``cmake -DCMAKE_INSTALL_PREFIX=$HOME/lib/pngwriter ~/src/pngwriter``
+  - ``make install``
 
-  - *environment:* (assumes install from source in ``$HOME/lib/pngwriter``)
+- *environment:* (assumes install from source in ``$HOME/lib/pngwriter``)
 
-    - ``export CMAKE_PREFIX_PATH=$HOME/lib/pngwriter:$CMAKE_PREFIX_PATH``
-    - ``export LD_LIBRARY_PATH=$HOME/lib/pngwriter/lib:$LD_LIBRARY_PATH``
+  - ``export CMAKE_PREFIX_PATH=$HOME/lib/pngwriter:$CMAKE_PREFIX_PATH``
+  - ``export LD_LIBRARY_PATH=$HOME/lib/pngwriter/lib:$LD_LIBRARY_PATH``
 
 libSplash
 """""""""
@@ -205,7 +220,7 @@ libSplash
 
   - ``mkdir -p ~/src ~/build ~/lib``
   - ``git clone https://github.com/ComputationalRadiationPhysics/libSplash.git ~/src/splash/``
-  - ``cd ~/build``
+  - ``cd ~/build && rm -rf ../build/*``
   - ``cmake -DCMAKE_INSTALL_PREFIX=$HOME/lib/splash -DSplash_USE_MPI=ON -DSplash_USE_PARALLEL=ON ~/src/splash``
   - ``make install``
 
@@ -223,7 +238,7 @@ HDF5
 - *Spack:* ``spack install hdf5~fortran``
 - *from source:*
 
-  - ``mkdir -p ~/src ~/build ~/lib``
+  - ``mkdir -p ~/src ~/lib``
   - ``cd ~/src``
   - download hdf5 source code from `release list of the HDF5 group <https://www.hdfgroup.org/ftp/HDF5/releases/>`_, for example:
 
@@ -263,9 +278,31 @@ png2gas
 - converts png files to hdf5 files that can be used as an input for a species initial density profiles
 - compile and install exactly as *splash2txt* above
 
+c-blosc
+"""""""
+- general purpose compressor, used in ADIOS for in situ data reduction
+- *Debian/Ubuntu:* ``sudo apt-get install libblosc-dev``
+- *Arch Linux:* ``sudo pacman --sync blosc``
+- *Spack:* ``spack install c-blosc``
+- *from source:*
+
+  - ``mkdir -p ~/src ~/build ~/lib``
+  - ``cd ~/src``
+  - ``curl -Lo c-blosc-1.15.0.tar.gz https://github.com/Blosc/c-blosc/archive/v1.15.0.tar.gz``
+  - ``tar -xzf c-blosc-1.15.0.tar.gz``
+  - ``cd ~/build && rm -rf ../build/*``
+  - ``cmake -DCMAKE_INSTALL_PREFIX=$HOME/lib/c-blosc -DPREFER_EXTERNAL_ZLIB=ON ~/src/c-blosc-1.15.0/``
+  - ``make``
+  - ``make install``
+- *environment:* (assumes install from source in ``$HOME/lib/c-blosc``)
+
+  - ``export BLOSC_ROOT=$HOME/lib/c-blosc``
+  - ``export CMAKE_PREFIX_PATH=$BLOSC_ROOT:$CMAKE_PREFIX_PATH``
+  - ``export LD_LIBRARY_PATH=$BLOSC_ROOT/lib:$LD_LIBRARY_PATH``
+
 ADIOS
 """""
-- 1.13.1+ (requires *MPI* and *zlib*)
+- 1.13.1+ (requires *MPI*, *zlib* and *c-blosc*)
 - *Debian/Ubuntu:* ``sudo apt-get install libadios-dev libadios-bin``
 - *Arch Linux* using an `AUR helper <https://wiki.archlinux.org/index.php/AUR_helpers>`_: ``pacaur --sync libadios``
 - *Arch Linux* using the `AUR <https://wiki.archlinux.org/index.php/Arch_User_Repository>`_ manually:
@@ -277,12 +314,12 @@ ADIOS
 - *Spack:* ``spack install adios``
 - *from source:*
 
-  - ``mkdir -p ~/src ~/build ~/lib``
+  - ``mkdir -p ~/src ~/lib``
   - ``cd ~/src``
   - ``curl -Lo adios-1.13.1.tar.gz http://users.nccs.gov/~pnorbert/adios-1.13.1.tar.gz``
   - ``tar -xzf adios-1.13.1.tar.gz``
   - ``cd adios-1.13.1``
-  - ``CFLAGS="-fPIC" ./configure --enable-static --enable-shared --prefix=$HOME/lib/adios --with-mpi=$MPI_ROOT --with-zlib=/usr``
+  - ``CFLAGS="-fPIC" ./configure --enable-static --enable-shared --prefix=$HOME/lib/adios --with-mpi=$MPI_ROOT --with-zlib=$HOME/lib/zlib --with-blosc=$HOME/lib/c-blosc``
   - ``make``
   - ``make install``
 - *environment:* (assumes install from source in ``$HOME/lib/adios``)

From 88080b7c6ef44dee874068c4b0b1974e72724c75 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ren=C3=A9=20Widera?= <r.widera@hzdr.de>
Date: Wed, 5 Dec 2018 12:57:35 +0100
Subject: [PATCH 09/40] fix particle creation if density zero

fix #2823

In the case where the density functor is returning a density of zero or negative the functor which calculate the number of macro partices (called `startPosition` functor) should not be called.
This avoid that the functor write e.g. the user who is using a free functor implementation must handle those edge cases.

changes:
 - do not call `startPosition` functor if density is `<= 0.0`
---
 include/picongpu/particles/ParticlesInit.kernel | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/picongpu/particles/ParticlesInit.kernel b/include/picongpu/particles/ParticlesInit.kernel
index 9fe1873610..21e4dc241d 100644
--- a/include/picongpu/particles/ParticlesInit.kernel
+++ b/include/picongpu/particles/ParticlesInit.kernel
@@ -229,7 +229,10 @@ namespace picongpu
                         totalCellOffset
                     );
 
-                    float_X const realParticlesPerCell = realDensity * CELL_VOLUME;
+                    /** @bug volatile is required for CUDA 9.2 and sm_60 else the compiler will
+                     * optimize out `if(realParticlesPerCell > 0.0_X)` later on.
+                     */
+                    volatile float_X const realParticlesPerCell = realDensity * CELL_VOLUME;
 
                     // create an independent position functor for each cell in the supercell
                     positionFunctorCtx[ idx ] = positionFunctor(
@@ -238,8 +241,9 @@ namespace picongpu
                         WorkerCfg< cellsPerSupercell >{ linearIdx }
                     );
 
-                    numParsPerCellCtx[ idx ] =
-                        positionFunctorCtx[ idx ].template numberOfMacroParticles< ParticleType >( realParticlesPerCell );
+                    if(realParticlesPerCell > 0.0_X)
+                        numParsPerCellCtx[ idx ] =
+                            positionFunctorCtx[ idx ].template numberOfMacroParticles< ParticleType >( realParticlesPerCell );
 
                     if( numParsPerCellCtx[ idx ] > 0 )
                         nvidia::atomicAllExch(

From b0c0a1fa9a3f29fecd688a45b8a1276ff83846da Mon Sep 17 00:00:00 2001
From: Axel Huebl <axel.huebl@plasma.ninja>
Date: Tue, 11 Dec 2018 11:41:49 +0100
Subject: [PATCH 10/40] Slurm: Link stdout live

Since the `stdout` files are cached and created live by
Slurm during the run, we can just symlink `simOutput/output` onto
it for scripts and spare the IO latency load of small little writes.
---
 etc/picongpu/draco-mpcdf/general.tpl   | 3 ++-
 etc/picongpu/hemera-hzdr/defq.tpl      | 3 ++-
 etc/picongpu/hemera-hzdr/gpu.tpl       | 3 ++-
 etc/picongpu/lawrencium-lbnl/fermi.tpl | 3 ++-
 etc/picongpu/lawrencium-lbnl/k20.tpl   | 3 ++-
 etc/picongpu/pizdaint-cscs/large.tpl   | 1 +
 etc/picongpu/pizdaint-cscs/normal.tpl  | 1 +
 etc/picongpu/taurus-tud/k20x.tpl       | 3 ++-
 etc/picongpu/taurus-tud/k80.tpl        | 3 ++-
 etc/picongpu/taurus-tud/knl.tpl        | 3 ++-
 10 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/etc/picongpu/draco-mpcdf/general.tpl b/etc/picongpu/draco-mpcdf/general.tpl
index 22dc02dc56..e0da9b6267 100644
--- a/etc/picongpu/draco-mpcdf/general.tpl
+++ b/etc/picongpu/draco-mpcdf/general.tpl
@@ -79,11 +79,12 @@ umask 0027
 
 mkdir simOutput 2> /dev/null
 cd simOutput
+ln -s ../stdout output
 
 #wait that all nodes see ouput folder
 sleep 1
 
 # Run PIConGPU
-srun -K1 !TBG_dstPath/tbg/cpuNumaStarter.sh !TBG_dstPath/input/bin/picongpu !TBG_author !TBG_programParams | tee output
+srun -K1 !TBG_dstPath/tbg/cpuNumaStarter.sh !TBG_dstPath/input/bin/picongpu !TBG_author !TBG_programParams
 
 
diff --git a/etc/picongpu/hemera-hzdr/defq.tpl b/etc/picongpu/hemera-hzdr/defq.tpl
index 9444e2c628..650d22f88e 100644
--- a/etc/picongpu/hemera-hzdr/defq.tpl
+++ b/etc/picongpu/hemera-hzdr/defq.tpl
@@ -89,8 +89,9 @@ umask 0027
 
 mkdir simOutput 2> /dev/null
 cd simOutput
+ln -s ../stdout output
 
 if [ $? -eq 0 ] ; then
   # Run PIConGPU
-  mpiexec --bind-to none !TBG_dstPath/tbg/cpuNumaStarter.sh !TBG_dstPath/input/bin/picongpu !TBG_author !TBG_programParams | tee output
+  mpiexec --bind-to none !TBG_dstPath/tbg/cpuNumaStarter.sh !TBG_dstPath/input/bin/picongpu !TBG_author !TBG_programParams
 fi
diff --git a/etc/picongpu/hemera-hzdr/gpu.tpl b/etc/picongpu/hemera-hzdr/gpu.tpl
index 8ebd75e3af..e991eafcd0 100644
--- a/etc/picongpu/hemera-hzdr/gpu.tpl
+++ b/etc/picongpu/hemera-hzdr/gpu.tpl
@@ -90,6 +90,7 @@ umask 0027
 
 mkdir simOutput 2> /dev/null
 cd simOutput
+ln -s ../stdout output
 
 # test if cuda_memtest binary is available and we have the node exclusive
 if [ -f !TBG_dstPath/input/bin/cuda_memtest ] && [ !TBG_numHostedGPUPerNode -eq !TBG_gpusPerNode ] ; then
@@ -101,5 +102,5 @@ fi
 
 if [ $? -eq 0 ] ; then
   # Run PIConGPU
-  mpiexec !TBG_dstPath/input/bin/picongpu !TBG_author !TBG_programParams | tee output
+  mpiexec !TBG_dstPath/input/bin/picongpu !TBG_author !TBG_programParams
 fi
diff --git a/etc/picongpu/lawrencium-lbnl/fermi.tpl b/etc/picongpu/lawrencium-lbnl/fermi.tpl
index 9c4a6509e5..db26a13996 100644
--- a/etc/picongpu/lawrencium-lbnl/fermi.tpl
+++ b/etc/picongpu/lawrencium-lbnl/fermi.tpl
@@ -90,6 +90,7 @@ umask 0027
 
 mkdir simOutput 2> /dev/null
 cd simOutput
+ln -s ../stdout output
 
 # openmpi/1.6.5 is not GPU aware and handles pinned memory correctly incorrectly
 #   see bug https://github.com/ComputationalRadiationPhysics/picongpu/pull/438
@@ -105,5 +106,5 @@ fi
 
 if [ $? -eq 0 ] ; then
   # Run PIConGPU
-  mpirun !TBG_dstPath/input/bin/picongpu !TBG_author !TBG_programParams | tee output
+  mpirun !TBG_dstPath/input/bin/picongpu !TBG_author !TBG_programParams
 fi
diff --git a/etc/picongpu/lawrencium-lbnl/k20.tpl b/etc/picongpu/lawrencium-lbnl/k20.tpl
index a73d08a093..86a69c0090 100644
--- a/etc/picongpu/lawrencium-lbnl/k20.tpl
+++ b/etc/picongpu/lawrencium-lbnl/k20.tpl
@@ -88,6 +88,7 @@ umask 0027
 
 mkdir simOutput 2> /dev/null
 cd simOutput
+ln -s ../stdout output
 
 # openmpi/1.6.5 is not GPU aware and handles pinned memory correctly incorrectly
 #   see bug https://github.com/ComputationalRadiationPhysics/picongpu/pull/438
@@ -104,5 +105,5 @@ fi
 
 if [ $? -eq 0 ] ; then
   # Run PIConGPU
-  mpirun !TBG_dstPath/input/bin/picongpu !TBG_author !TBG_programParams | tee output
+  mpirun !TBG_dstPath/input/bin/picongpu !TBG_author !TBG_programParams
 fi
diff --git a/etc/picongpu/pizdaint-cscs/large.tpl b/etc/picongpu/pizdaint-cscs/large.tpl
index 37a715a098..f4f355f088 100644
--- a/etc/picongpu/pizdaint-cscs/large.tpl
+++ b/etc/picongpu/pizdaint-cscs/large.tpl
@@ -76,6 +76,7 @@ unset MODULES_NO_OUTPUT
 
 mkdir simOutput 2> /dev/null
 cd simOutput
+ln -s ../stdout output
 
 # test if cuda_memtest binary is available
 if [ -f !TBG_dstPath/input/bin/cuda_memtest ] ; then
diff --git a/etc/picongpu/pizdaint-cscs/normal.tpl b/etc/picongpu/pizdaint-cscs/normal.tpl
index ee6eb5463c..0155757084 100644
--- a/etc/picongpu/pizdaint-cscs/normal.tpl
+++ b/etc/picongpu/pizdaint-cscs/normal.tpl
@@ -76,6 +76,7 @@ unset MODULES_NO_OUTPUT
 
 mkdir simOutput 2> /dev/null
 cd simOutput
+ln -s ../stdout output
 
 # the next three lines were recommended by Cray to avoid warnings
 export PMI_MMAP_SYNC_WAIT_TIME=300
diff --git a/etc/picongpu/taurus-tud/k20x.tpl b/etc/picongpu/taurus-tud/k20x.tpl
index ed8c9ccbaa..3d52df6248 100644
--- a/etc/picongpu/taurus-tud/k20x.tpl
+++ b/etc/picongpu/taurus-tud/k20x.tpl
@@ -80,6 +80,7 @@ umask 0027
 
 mkdir simOutput 2> /dev/null
 cd simOutput
+ln -s ../stdout output
 
 # we are not sure if the current bullxmpi/1.2.4.3 catches pinned memory correctly
 #   support ticket [Ticket:2014052241001186] srun: mpi mca flags
@@ -96,6 +97,6 @@ fi
 
 if [ $? -eq 0 ] ; then
   # Run PIConGPU
-  srun -K1 !TBG_dstPath/input/bin/picongpu !TBG_author !TBG_programParams | tee output
+  srun -K1 !TBG_dstPath/input/bin/picongpu !TBG_author !TBG_programParams
 fi
 
diff --git a/etc/picongpu/taurus-tud/k80.tpl b/etc/picongpu/taurus-tud/k80.tpl
index e700861647..6f10f555b9 100644
--- a/etc/picongpu/taurus-tud/k80.tpl
+++ b/etc/picongpu/taurus-tud/k80.tpl
@@ -80,6 +80,7 @@ umask 0027
 
 mkdir simOutput 2> /dev/null
 cd simOutput
+ln -s ../stdout output
 
 # we are not sure if the current bullxmpi/1.2.4.3 catches pinned memory correctly
 #   support ticket [Ticket:2014052241001186] srun: mpi mca flags
@@ -96,6 +97,6 @@ fi
 
 if [ $? -eq 0 ] ; then
   # Run PIConGPU
-  srun -K1 !TBG_dstPath/input/bin/picongpu !TBG_author !TBG_programParams | tee output
+  srun -K1 !TBG_dstPath/input/bin/picongpu !TBG_author !TBG_programParams
 fi
 
diff --git a/etc/picongpu/taurus-tud/knl.tpl b/etc/picongpu/taurus-tud/knl.tpl
index c2d61d9873..6a6a00543e 100644
--- a/etc/picongpu/taurus-tud/knl.tpl
+++ b/etc/picongpu/taurus-tud/knl.tpl
@@ -80,6 +80,7 @@ umask 0027
 
 mkdir simOutput 2> /dev/null
 cd simOutput
+ln -s ../stdout output
 
 # Run PIConGPU
-NUMA_HW_THREADS_PER_PHYSICAL_CORE=!TBG_hardwareThreadsPerCore mpiexec !TBG_dstPath/input/etc/picongpu/cpuNumaStarter.sh !TBG_dstPath/input/bin/picongpu !TBG_author !TBG_programParams | tee output
+NUMA_HW_THREADS_PER_PHYSICAL_CORE=!TBG_hardwareThreadsPerCore mpiexec !TBG_dstPath/input/etc/picongpu/cpuNumaStarter.sh !TBG_dstPath/input/bin/picongpu !TBG_author !TBG_programParams

From 0d6c333606917560f5600ebe7108c5a9af0af46b Mon Sep 17 00:00:00 2001
From: steindev <k.steiniger@hzdr.de>
Date: Thu, 23 Aug 2018 14:55:29 +0200
Subject: [PATCH 11/40] fix binomial current interpolation

Use all 26 neighbours for averaging the current on the grid in order to
damp radiation at the Nyquist frequency in all 3 spatial directions.
---
 .../Binomial/Binomial.hpp                     | 162 +++++++++++++++---
 1 file changed, 137 insertions(+), 25 deletions(-)

diff --git a/include/picongpu/fields/currentInterpolation/Binomial/Binomial.hpp b/include/picongpu/fields/currentInterpolation/Binomial/Binomial.hpp
index f21e8749ff..3d896eb52e 100644
--- a/include/picongpu/fields/currentInterpolation/Binomial/Binomial.hpp
+++ b/include/picongpu/fields/currentInterpolation/Binomial/Binomial.hpp
@@ -1,4 +1,4 @@
-/* Copyright 2015-2018 Axel Huebl, Benjamin Worpitz
+/* Copyright 2015-2019 Axel Huebl, Benjamin Worpitz, Klaus Steiniger
  *
  * This file is part of PIConGPU.
  *
@@ -29,10 +29,18 @@ namespace picongpu
 {
 namespace currentInterpolation
 {
+namespace detail
+{
+
+    template< uint32_t T_dim >
+    struct Binomial;
+
 
-    struct Binomial
+    //! Specialization for 3D
+    template< >
+    struct Binomial< DIM3 >
     {
-        static constexpr uint32_t dim = simDim;
+        static constexpr uint32_t dim = DIM3;
 
         using LowerMargin = typename pmacc::math::CT::make_Int<
             dim,
@@ -51,35 +59,139 @@ namespace currentInterpolation
             T_DataBoxJ const fieldJ
         )
         {
-            DataSpace< dim > const self;
             using TypeJ = typename T_DataBoxJ::ValueType;
+            using DS = DataSpace< dim >;
+
+            // weighting for original value, i.e. center element of a cell
+            constexpr float_X M = 8.0;
+            // weighting for nearest neighbours, i.e. cells sharing a face with the center cell
+            constexpr float_X S = 4.0;
+            // weighting for next to nearest neighbours, i.e. cells sharing an edge with the center cell
+            constexpr float_X D = 2.0;
+            // weighting for farthest neighbours, i.e. cells sharing a corner with the center cell
+            constexpr float_X T = 1.0;
+
+            TypeJ averagedJ =
+                // sum far neighbours, i.e. corner elements, weighting T
+                T * (
+                    fieldJ( DS( -1, -1, -1 ) ) + fieldJ( DS( +1, -1, -1 ) ) + fieldJ( DS( -1, +1, -1 ) ) + fieldJ( DS( +1, +1, -1 ) ) +
+                    fieldJ( DS( -1, -1, +1 ) ) + fieldJ( DS( +1, -1, +1 ) ) + fieldJ( DS( -1, +1, +1 ) ) + fieldJ( DS( +1, +1, +1 ) )
+                ) +
+                // sum next to nearest neighbours, i.e. edge elements, weighting D
+                D * (
+                    fieldJ( DS( -1, -1, 0 ) ) + fieldJ( DS( +1, -1, 0 ) ) + fieldJ( DS( -1, +1, 0 ) ) + fieldJ( DS( +1, +1, 0 ) ) +
+                    fieldJ( DS( -1, 0, -1 ) ) + fieldJ( DS( +1, 0, -1 ) ) + fieldJ( DS( -1, 0, +1 ) ) + fieldJ( DS( +1, 0, +1 ) ) +
+                    fieldJ( DS( 0, -1, -1 ) ) + fieldJ( DS( 0, +1, -1 ) ) + fieldJ( DS( 0, -1, +1 ) ) + fieldJ( DS( 0, +1, +1 ) )
+                ) +
+                // sum next neighbours, i.e. face elements, weighting S
+                S * (
+                    fieldJ( DS( -1, 0, 0 ) ) + fieldJ( DS( +1, 0, 0 ) ) +
+                    fieldJ( DS( 0, -1, 0 ) ) + fieldJ( DS( 0, +1, 0 ) ) +
+                    fieldJ( DS( 0, 0, -1 ) ) + fieldJ( DS( 0, 0, +1 ) )
+                ) +
+                // add original value, i.e. center element, weighting M
+                M * (
+                    fieldJ( DS( 0, 0, 0 ) )
+                );
+
+            /* calc average by normalizing weighted sum In 3D there are:
+             *   - original value with weighting M
+             *   - 6 nearest neighbours with weighting S
+             *   - 12 next to nearest neighbours with weighting D
+             *   - 8 farthest neighbours with weighting T
+             */
+            constexpr float_X inverseDivisor = 1._X / ( M + 6._X * S + 12._X * D + 8._X * T );
+            averagedJ *= inverseDivisor;
+
+            constexpr float_X deltaT = DELTA_T;
+            *fieldE -= averagedJ * ( 1._X / EPS0 ) * deltaT;
+        }
+    };
+
 
-            /* 1 2 1 weighting for "left"(1x) "center"(2x) "right"(1x),
-             * see Pascal's triangle level N=2 */
-            TypeJ dirSum( TypeJ::create( 0.0 ) );
-            for( uint32_t d = 0; d < dim; ++d )
-            {
-                DataSpace< dim > dw;
-                dw[d] = -1;
-                DataSpace< dim > up;
-                up[d] =  1;
-                TypeJ const dirDw = fieldJ( dw ) + fieldJ( self );
-                TypeJ const dirUp = fieldJ( up ) + fieldJ( self );
-
-                /* each fieldJ component is added individually */
-                dirSum += dirDw + dirUp;
-            }
-
-            /* component-wise division by sum of all weightings,
-             * in the second order binomial filter these are 4 values per direction
-             * (1D: 4 values; 2D: 8 values; 3D: 12 values)
+    //! Specialization for 2D
+    template< >
+    struct Binomial< DIM2 >
+    {
+        static constexpr uint32_t dim = DIM2;
+
+        using LowerMargin = typename pmacc::math::CT::make_Int<
+            dim,
+            1
+        >::type ;
+        using UpperMargin = LowerMargin;
+
+        template<
+            typename T_DataBoxE,
+            typename T_DataBoxB,
+            typename T_DataBoxJ
+        >
+        HDINLINE void operator()(
+            T_DataBoxE fieldE,
+            T_DataBoxB const,
+            T_DataBoxJ const fieldJ
+        )
+        {
+            using TypeJ = typename T_DataBoxJ::ValueType;
+            using DS = DataSpace< dim >;
+
+            // weighting for original value, i.e. center element of a cell
+            constexpr float_X M = 4.0;
+            // weighting for nearest neighbours, i.e. cells sharing an edge with the center cell
+            constexpr float_X S = 2.0;
+            // weighting for next to nearest neighbours, i.e. cells sharing a corner with the center cell
+            constexpr float_X D = 1.0;
+
+            TypeJ averagedJ =
+                // sum next to nearest neighbours, i.e. corner neighbors, weighting D
+                D * (
+                    fieldJ( DS( -1, -1 ) ) + fieldJ( DS( +1, -1 ) ) +
+                    fieldJ( DS( -1, +1 ) ) + fieldJ( DS( +1, +1 ) )
+                ) +
+                // sum next neighbours, i.e. edge neighbors, weighting S
+                S * (
+                    fieldJ( DS( -1, 0 ) ) + fieldJ( DS( +1, 0 ) ) +
+                    fieldJ( DS( 0, -1 ) ) + fieldJ( DS( 0, +1 ) )
+                ) +
+                // add original value, i.e. center cell, weighting M
+                M * (
+                    fieldJ( DS( 0, 0 ) )
+                );
+
+            /* calc average by normalizing weighted sum
+             * In 2D there are:
+             *    - original value with weighting M
+             *    - 4 nearest neighbours with weighting S
+             *    - 4 next to nearest neighbours with weighting D
              */
-            TypeJ const filteredJ = dirSum / TypeJ::create( float_X( 4.0 ) * dim );
+            constexpr float_X inverseDivisor = 1._X / ( M + 4._X * S + 4._X * D );
+            averagedJ *= inverseDivisor;
 
             constexpr float_X deltaT = DELTA_T;
-            fieldE( self ) -= filteredJ * ( float_X( 1.0 ) / EPS0 ) * deltaT;
+            *fieldE -= averagedJ * ( 1._X / EPS0 ) * deltaT;
         }
+    };
+
+} // namespace detail
 
+
+    /** Smoothing the current density before passing it to the field solver
+     *
+     * This technique mitigates numerical Cherenkov effects and short wavelength
+     * instabilities as it effectively implements a low pass filter which
+     * damps high frequency noise (near the Nyquist frequency) in the
+     * current distribution.
+     *
+     * A description and a two-dimensional implementation of this filter
+     * is given in
+     * CK Birdsall, AB Langdon. Plasma Physics via Computer Simulation. Appendix C. Taylor & Francis, 2004.
+     * It is a 2D version of the commonly used one-dimensional three points filter with binomial coefficients
+     *
+     * The three-dimensional extension of the above two-dimensional smoothing scheme
+     * uses all 26 neighbors of a cell.
+     */
+    struct Binomial : public detail::Binomial< simDim >
+    {
         static pmacc::traits::StringProperty getStringProperties()
         {
             pmacc::traits::StringProperty propList(

From 91b4b5fa34d06fdc6ea22225fd175d17d73f2756 Mon Sep 17 00:00:00 2001
From: PrometheusPi <r.pausch@hzdr.de>
Date: Fri, 21 Dec 2018 12:56:36 +0100
Subject: [PATCH 12/40] fix picongpu command line flags

---
 docs/source/usage/plugins/radiation.rst | 32 ++++++++++++-------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/docs/source/usage/plugins/radiation.rst b/docs/source/usage/plugins/radiation.rst
index bc192770a3..113cc52f2b 100644
--- a/docs/source/usage/plugins/radiation.rst
+++ b/docs/source/usage/plugins/radiation.rst
@@ -246,30 +246,30 @@ For a specific (charged) species ``<species>`` e.g. ``e``, the radiation can be
 ========================================= ==============================================================================================================================
 Command line option                       Description
 ========================================= ==============================================================================================================================
-``--radiation_<species>.period``          Gives the number of time steps between which the radiation should be calculated.
+``--<species>_radiation.period``          Gives the number of time steps between which the radiation should be calculated.
                                           Default is ``0``, which means that the radiation in never calculated and therefor off.
                                           Using `1` calculates the radiation constantly. Any value ``>=2`` is currently producing nonsense.
-``--radiation_<species>.dump``            Period, after which the calculated radiation data should be dumped to the file system.
+``--<species>_radiation.dump``            Period, after which the calculated radiation data should be dumped to the file system.
                                           Default is ``0``, therefor never.
                                           In order to store the radiation data, a value `>=1` should be used.
-``--radiation_<species>.lastRadiation``   If set, the radiation spectra summed between the last and the current dump-time-step are stored.
+``--<species>_radiation.lastRadiation``   If set, the radiation spectra summed between the last and the current dump-time-step are stored.
                                           Used for a better evaluation of the temporal evolution of the emitted radiation.
-``--radiation_<species>.folderLastRad``   Name of the folder, in which the summed spectra for the simulation time between the last dump and the current dump are stored.
+``--<species>_radiation.folderLastRad``   Name of the folder, in which the summed spectra for the simulation time between the last dump and the current dump are stored.
                                           Default is ``lastRad``.
-``--radiation_<species>.totalRadiation``  If set the spectra summed from simulation start till current time step are stored.
-``--radiation_<species>.folderTotalRad``  Folder name in which the total radiation spectra, integrated from the beginning of the simulation, are stored.
+``--<species>_radiation.totalRadiation``  If set the spectra summed from simulation start till current time step are stored.
+``--<species>_radiation.folderTotalRad``  Folder name in which the total radiation spectra, integrated from the beginning of the simulation, are stored.
                                           Default ``totalRad``.
-``--radiation_<species>.start``           Time step, at which PIConGPU starts calculating the radiation.
+``--<species>_radiation.start``           Time step, at which PIConGPU starts calculating the radiation.
                                           Default is ``2`` in order to get enough history of the particles.
-``--radiation_<species>.end``             Time step, at which the radiation calculation should end.
+``--<species>_radiation.end``             Time step, at which the radiation calculation should end.
                                           Default: `0`(stops at end of simulation).
-``--radiation_<species>.omegaList``       In case the frequencies for the spectrum are coming from a list stored in a file, this gives the path to this list.
+``--<species>_radiation.omegaList``       In case the frequencies for the spectrum are coming from a list stored in a file, this gives the path to this list.
                                           Default: `_noPath_` throws an error. *This does not switch on the frequency calculation via list.*
-``--radiation_<species>.radPerGPU``       If set, each GPU additionally stores its own spectra without summing over the entire simulation area.
+``--<species>_radiation.radPerGPU``       If set, each GPU additionally stores its own spectra without summing over the entire simulation area.
                                           This allows for a localization of specific spectral features.
-``--radiation_<species>.folderRadPerGPU`` Name of the folder, where the GPU specific spectra are stored.
+``--<species>_radiation.folderRadPerGPU`` Name of the folder, where the GPU specific spectra are stored.
                                           Default: ``radPerGPU``
-``--radiation_<species>.compression``     If set, the hdf5 output is compressed.
+``--<species>_radiation.compression``     If set, the hdf5 output is compressed.
 ========================================= ==============================================================================================================================
 
 Memory Complexity
@@ -293,14 +293,14 @@ Depending on the command line options used, there are different output files.
 ======================================== ========================================================================================================================
 Command line flag                        Output description
 ======================================== ========================================================================================================================
-``--radiation_<species>.totalRadiation`` Contains *ASCII* files that have the total spectral intensity until the timestep specified by the filename.
+``--<species>_radiation.totalRadiation`` Contains *ASCII* files that have the total spectral intensity until the timestep specified by the filename.
                                          Each row gives data for one observation direction (same order as specified in the ``observer.py``).
                                          The values for each frequency are separated by *tabs* and have the same order as specified in ``radiationConfig.param``.
                                          The spectral intensity is stored in the units **[J s]**.
-``--radiation_<species>.lastRadiation``  has the same format as the output of *totalRadiation*.
+``--<species>_radiation.lastRadiation``  has the same format as the output of *totalRadiation*.
                                          The spectral intensity is only summed over the last radiation `dump` period.
-``--radiation_<species>.radPerGPU``      Same output as *totalRadiation* but only summed over each GPU. 
-                                         ecause each GPU specifies a spatial region, the origin of radiation signatures can be distinguished.
+``--<species>_radiation.radPerGPU``      Same output as *totalRadiation* but only summed over each GPU. 
+                                         Because each GPU specifies a spatial region, the origin of radiation signatures can be distinguished.
 *radiationHDF5*                          In the folder  ``radiationHDF5``, hdf5 files for each radiation dump and species are stored.
                                          These are complex amplitudes in units used by *PIConGPU*.
                                          These are for restart purposes and for more complex data analysis.

From 800c61d28fac9fbe9a7158de1d17fbf3ccab9b26 Mon Sep 17 00:00:00 2001
From: Axel Huebl <axel.huebl@plasma.ninja>
Date: Wed, 2 Jan 2019 11:53:14 +0100
Subject: [PATCH 13/40] Update Versions Script: Containers

Also update container docs on version update.
---
 src/tools/bin/newVersion.sh | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/src/tools/bin/newVersion.sh b/src/tools/bin/newVersion.sh
index b181b9fcb5..392bdace13 100755
--- a/src/tools/bin/newVersion.sh
+++ b/src/tools/bin/newVersion.sh
@@ -117,7 +117,31 @@ sed -i "s/"\
 "release = u'$VERSION_STR'/g" \
     $REPO_DIR/docs/source/conf.py
 
-# @todo picongpu.pc (future)
+# containers
+#   share/picongpu/dockerfiles
+sed -i 's/'\
+'\/picongpu:[0-9]\+\.[0-9]\+\.[0-9]\+\(-.\+\)*/'\
+'\/picongpu:'$VERSION_STR'/g' \
+    $REPO_DIR/share/picongpu/dockerfiles/README.rst
+sed -i 's/'\
+'--tag [0-9]\+\.[0-9]\+\.[0-9]\+\(-.\+\)*/'\
+'--tag '$VERSION_STR'/g' \
+    $REPO_DIR/share/picongpu/dockerfiles/README.rst
+
+sed -i 's/'\
+'picongpu@[0-9]\+\.[0-9]\+\.[0-9]\+\(-.\+\)*/'\
+'picongpu@'$VERSION_STR'/g' \
+    $REPO_DIR/share/picongpu/dockerfiles/ubuntu-1604/Dockerfile
+
+sed -i 's/'\
+'\/picongpu:[0-9]\+\.[0-9]\+\.[0-9]\+\(-.\+\)*/'\
+'\/picongpu:'$VERSION_STR'/g' \
+    $REPO_DIR/share/picongpu/dockerfiles/ubuntu-1604/Singularity
+sed -i 's/'\
+'Version [0-9]\+\.[0-9]\+\.[0-9]\+\(-.\+\)*/'\
+'Version '$VERSION_STR'/g' \
+    $REPO_DIR/share/picongpu/dockerfiles/ubuntu-1604/Singularity
+
 # @todo `project(...)` version in CMakeLists.txt (future)
 
 

From 84b8e8e1084f4f906f77d0f25066bd06b6d79d52 Mon Sep 17 00:00:00 2001
From: Axel Huebl <axel.huebl@plasma.ninja>
Date: Fri, 4 Jan 2019 11:47:18 +0100
Subject: [PATCH 14/40] Docker & Singularity Updates

OpenMPI Vader fix:
- instead of disabling vader alltogether, disable the non-functioning
  copy-mechanism in Docker and allow OpenMPI 3+ usage

OpenMPI fabrics with spack:
- build all possible fabrics, do not rely on auto-detection which will
  never have IB (psm, psm2 and mxm do not build without actual HW
  drivers detected during OpenMPI configure time)

OpenMPI parallel IO work-around:
- crashes and data corruption in releases 2.0-4.0, fallback to ROMIO

OpenMPI CUDA:
- enable CUDA awareness

IceT:
- buggy CMake install path corrected in package.py (spack mainline)

Docker/Singularity:
- fix access rights to example dirs for non-root users
---
 .../dockerfiles/ubuntu-1604/Dockerfile        |  4 ++++
 .../dockerfiles/ubuntu-1604/modules.yaml      | 19 ++++++-------------
 .../dockerfiles/ubuntu-1604/packages.yaml     |  3 ++-
 3 files changed, 12 insertions(+), 14 deletions(-)

diff --git a/share/picongpu/dockerfiles/ubuntu-1604/Dockerfile b/share/picongpu/dockerfiles/ubuntu-1604/Dockerfile
index e5b013630b..e818cb4183 100644
--- a/share/picongpu/dockerfiles/ubuntu-1604/Dockerfile
+++ b/share/picongpu/dockerfiles/ubuntu-1604/Dockerfile
@@ -89,6 +89,10 @@ RUN        /bin/bash -l -c ' \
                pic-build -b "cuda:30;35;37;50;60;70" -c'-DCUDAMEMTEST_ENABLE=OFF' && \
                rm -rf .build'
 
+# make input directories readable and files executable for all users
+RUN        chmod a+x /opt/picInputs/*/bin/* && \
+           chmod a+r -R /opt/picInputs/* && \
+           find /opt/picInputs -type d -exec chmod a+rx {} \;
 
 COPY       start_lwfa.sh /usr/bin/lwfa
 COPY       start_lwfa_4.sh /usr/bin/lwfa4
diff --git a/share/picongpu/dockerfiles/ubuntu-1604/modules.yaml b/share/picongpu/dockerfiles/ubuntu-1604/modules.yaml
index 8c937cae83..75542577a9 100644
--- a/share/picongpu/dockerfiles/ubuntu-1604/modules.yaml
+++ b/share/picongpu/dockerfiles/ubuntu-1604/modules.yaml
@@ -2,25 +2,18 @@ modules:
   enable::
     - tcl
   tcl:
-    # Note on OpenMPI in Docker
-    # We should be able to use the latest MPI with
-    # `OMPI_MCA_btl_vader_single_copy_mechanism=none`
-    # to avoid disabling vader alltogether:
-    # https://github.com/open-mpi/ompi/issues/4948#issuecomment-377341406
+    # vader in docker: https://github.com/open-mpi/ompi/issues/4948
+    # ompio bugs: https://github.com/open-mpi/ompi/issues/6285
     openmpi:
       environment:
         set:
-          OMPI_MCA_mpi_leave_pinned: '0'
-          OMPI_MCA_btl: '^vader'
+          OMPI_MCA_btl_vader_single_copy_mechanism: 'none'
+          OMPI_MCA_io: '^ompio'
     # This anonymous spec selects any package that
     # depends on openmpi. The double colon at the
     # end clears the set of rules that matched so far.
     ^openmpi::
       environment:
         set:
-          OMPI_MCA_mpi_leave_pinned: '0'
-          OMPI_MCA_btl: '^vader'
-    icet:
-      environment:
-        prepend_path:
-          CMAKE_PREFIX_PATH: '${PREFIX}/lib'
+          OMPI_MCA_btl_vader_single_copy_mechanism: 'none'
+          OMPI_MCA_io: '^ompio'
diff --git a/share/picongpu/dockerfiles/ubuntu-1604/packages.yaml b/share/picongpu/dockerfiles/ubuntu-1604/packages.yaml
index a5fb2c736d..8afa92d782 100644
--- a/share/picongpu/dockerfiles/ubuntu-1604/packages.yaml
+++ b/share/picongpu/dockerfiles/ubuntu-1604/packages.yaml
@@ -12,7 +12,8 @@ packages:
       python@2.7.12%gcc@5.4.0 arch=linux-ubuntu16-x86_64: /usr
     buildable: False
   openmpi:
-    version: [2.1.2]
+    version: [3.1.3]
+    variants: +cuda fabrics=verbs,ucx,libfabric
   all:
     providers:
       mpi: [openmpi]

From b8f7e764d849ca3160b749719ab9073fb4b823ff Mon Sep 17 00:00:00 2001
From: Adam Simpson <asimpson@nvidia.com>
Date: Sun, 27 Jan 2019 12:07:59 -0800
Subject: [PATCH 15/40] Crate ENTRYPOINT wrapper that forces login shell

---
 share/picongpu/dockerfiles/ubuntu-1604/Dockerfile | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/share/picongpu/dockerfiles/ubuntu-1604/Dockerfile b/share/picongpu/dockerfiles/ubuntu-1604/Dockerfile
index e818cb4183..088a3f1f60 100644
--- a/share/picongpu/dockerfiles/ubuntu-1604/Dockerfile
+++ b/share/picongpu/dockerfiles/ubuntu-1604/Dockerfile
@@ -62,14 +62,20 @@ RUN        /bin/echo -e "source $SPACK_ROOT/share/spack/setup-env.sh\n" \
                         "spack load $PIC_PACKAGE\n" \
                         'if [ $(id -u) -eq 0 ]; then\n' \
                         '   function mpirun { $(which mpirun) --allow-run-as-root $@; }\n' \
+                        '   export -f mpirun\n' \
                         'fi\n' \
-                        'export -f mpirun\n' \
                         'if [ $(id -u) -eq 0 ]; then\n' \
                         '   function mpiexec { $(which mpiexec) --allow-run-as-root $@; }\n' \
+                        '   export -f mpiexec\n' \
                         'fi\n' \
-                        'export -f mpiexec\n' \
                > /etc/profile.d/picongpu.sh
 
+# force the use of a login shell
+RUN        /bin/echo -e '#!/bin/bash -l\n' \
+                        'exec "$@"\n' \
+               > /etc/entrypoint.sh
+RUN        chmod a+x /etc/entrypoint.sh
+
 # build example for out-of-the-box usage: LWFA
 RUN        /bin/bash -l -c ' \
                pic-create $PICSRC/share/picongpu/examples/LaserWakefield /opt/picInputs/lwfa && \
@@ -105,4 +111,6 @@ COPY       start_khi_4.sh /usr/bin/bench4
 COPY       start_khi_8.sh /usr/bin/bench8
 COPY       start_foil_4.sh /usr/bin/foil4
 COPY       start_foil_8.sh /usr/bin/foil8
-CMD        /bin/bash -l
+
+ENTRYPOINT ["/etc/entrypoint.sh"]
+CMD ["/bin/bash"]

From ed57ee83847037ac1c0acea5be9fcbb63684a16d Mon Sep 17 00:00:00 2001
From: Alexander Debus <a.debus@hzdr.de>
Date: Thu, 17 Jan 2019 15:33:33 +0100
Subject: [PATCH 16/40] Add templates for V100 GPUs on Power9-nodes on Taurus.

---
 etc/picongpu/taurus-tud/V100.tpl              | 113 +++++++++++
 .../taurus-tud/V100_picongpu.profile.example  |  80 ++++++++
 etc/picongpu/taurus-tud/V100_restart.tpl      | 183 ++++++++++++++++++
 3 files changed, 376 insertions(+)
 create mode 100644 etc/picongpu/taurus-tud/V100.tpl
 create mode 100644 etc/picongpu/taurus-tud/V100_picongpu.profile.example
 create mode 100644 etc/picongpu/taurus-tud/V100_restart.tpl

diff --git a/etc/picongpu/taurus-tud/V100.tpl b/etc/picongpu/taurus-tud/V100.tpl
new file mode 100644
index 0000000000..d7a105a761
--- /dev/null
+++ b/etc/picongpu/taurus-tud/V100.tpl
@@ -0,0 +1,113 @@
+#!/usr/bin/env bash
+# Copyright 2013-2019 Axel Huebl, Richard Pausch, Alexander Debus
+#
+# This file is part of PIConGPU.
+#
+# PIConGPU is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# PIConGPU is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with PIConGPU.
+# If not, see <http://www.gnu.org/licenses/>.
+#
+
+
+# PIConGPU batch script for taurus' SLURM batch system
+
+#SBATCH --partition=!TBG_queue
+#SBATCH --time=!TBG_wallTime
+# Sets batch job's name
+#SBATCH --job-name=!TBG_jobName
+#SBATCH --nodes=!TBG_nodes
+#SBATCH --ntasks=!TBG_tasks
+#SBATCH --mincpus=!TBG_mpiTasksPerNode
+#SBATCH --cpus-per-task=!TBG_coresPerGPU
+#SBATCH --mem-per-cpu=1511
+#SBATCH --gres=gpu:!TBG_gpusPerNode
+# send me mails on BEGIN, END, FAIL, REQUEUE, ALL,
+# TIME_LIMIT, TIME_LIMIT_90, TIME_LIMIT_80 and/or TIME_LIMIT_50
+#SBATCH --mail-type=!TBG_mailSettings
+#SBATCH --mail-user=!TBG_mailAddress
+#SBATCH --workdir=!TBG_dstPath
+
+#SBATCH -o stdout
+#SBATCH -e stderr
+
+
+## calculations will be performed by tbg ##
+.TBG_queue="ml"
+
+# settings that can be controlled by environment variables before submit
+.TBG_mailSettings=${MY_MAILNOTIFY:-"ALL"}
+.TBG_mailAddress=${MY_MAIL:-"someone@example.com"}
+.TBG_author=${MY_NAME:+--author \"${MY_NAME}\"}
+.TBG_profile=${PIC_PROFILE:-"~/picongpu.profile"}
+
+# 6 gpus per node
+.TBG_gpusPerNode=`if [ $TBG_tasks -gt 6 ] ; then echo 6; else echo $TBG_tasks; fi`
+
+# number of cores to block per GPU - we got 6 cpus per gpu
+#   and we will be accounted 6 CPUs per GPU anyway
+.TBG_coresPerGPU=28
+
+# We only start 1 MPI task per GPU
+.TBG_mpiTasksPerNode="$(( TBG_gpusPerNode * 1 ))"
+
+# use ceil to calculate nodes
+.TBG_nodes="$((( TBG_tasks + TBG_gpusPerNode -1 ) / TBG_gpusPerNode))"
+
+## end calculations ##
+
+echo 'Running program...'
+
+cd !TBG_dstPath
+
+export MODULES_NO_OUTPUT=1
+source !TBG_profile
+if [ $? -ne 0 ] ; then
+  echo "Error: PIConGPU environment profile under \"!TBG_profile\" not found!"
+  exit 1
+fi
+unset MODULES_NO_OUTPUT
+
+# set user rights to u=rwx;g=r-x;o=---
+umask 0027
+
+# Due to missing SLURM integration of the current MPI libraries
+# we have to create a suitable machinefile.
+rm -f machinefile.txt
+scontrol show hostnames $SLURM_JOB_NODELIST >> machinefile.txt
+scontrol show hostnames $SLURM_JOB_NODELIST >> machinefile.txt
+scontrol show hostnames $SLURM_JOB_NODELIST >> machinefile.txt
+scontrol show hostnames $SLURM_JOB_NODELIST >> machinefile.txt
+scontrol show hostnames $SLURM_JOB_NODELIST >> machinefile.txt
+scontrol show hostnames $SLURM_JOB_NODELIST >> machinefile.txt
+
+mkdir simOutput 2> /dev/null
+cd simOutput
+
+# we are not sure if the current bullxmpi/1.2.4.3 catches pinned memory correctly
+#   support ticket [Ticket:2014052241001186] srun: mpi mca flags
+#   see bug https://github.com/ComputationalRadiationPhysics/picongpu/pull/438
+export OMPI_MCA_mpi_leave_pinned=0
+
+# test if cuda_memtest binary is available
+if [ -f !TBG_dstPath/input/bin/cuda_memtest ] ; then
+  # Run CUDA memtest to check GPU's health
+  mpiexec -hostfile ../machinefile.txt !TBG_dstPath/input/bin/cuda_memtest.sh
+else
+  echo "no binary 'cuda_memtest' available, skip GPU memory test" >&2
+fi
+
+if [ $? -eq 0 ] ; then
+  # Run PIConGPU
+  mpiexec -hostfile ../machinefile.txt !TBG_dstPath/input/bin/picongpu !TBG_author !TBG_programParams | tee output
+fi
+
diff --git a/etc/picongpu/taurus-tud/V100_picongpu.profile.example b/etc/picongpu/taurus-tud/V100_picongpu.profile.example
new file mode 100644
index 0000000000..0be1d49d12
--- /dev/null
+++ b/etc/picongpu/taurus-tud/V100_picongpu.profile.example
@@ -0,0 +1,80 @@
+# Name and Path of this Script ############################### (DO NOT change!)
+export PIC_PROFILE=$(cd $(dirname $BASH_SOURCE) && pwd)"/"$(basename $BASH_SOURCE)
+
+# User Information ######################################### (edit those lines)
+#   - automatically add your name and contact to output file meta data
+#   - send me a mail on batch system jobs: BEGIN, END, FAIL, REQUEUE, ALL,
+#     TIME_LIMIT, TIME_LIMIT_90, TIME_LIMIT_80 and/or TIME_LIMIT_50
+export MY_MAILNOTIFY="ALL"
+export MY_MAIL="someone@example.com"
+export MY_NAME="$(whoami) <$MY_MAIL>"
+
+# Text Editor for Tools ###################################### (edit this line)
+#   - examples: "nano", "vim", "emacs -nw", "vi" or without terminal: "gedit"
+#export EDITOR="nano"
+
+# Modules #####################################################################
+#
+module purge
+module load modenv/ml
+# similar to foss/2018a, but also includes SpectrumMPI which basically is just a OpenMPI-fork of IBM
+module load gsolf/2018a
+module load GCC/6.4.0-2.28
+module load CMake/3.10.2-GCCcore-6.4.0
+# CUDA is no module in the current enviroment!
+#module load CUDA/9.2.88  # gcc <= 7, intel 15-17
+# OpenMPI is already loaded
+#module load OpenMPI/2.1.2-GCC-6.4.0-2.28
+module load git/2.18.0-GCCcore-6.4.0
+module load zlib/1.2.11-GCCcore-6.4.0
+
+# Self-Build Software #########################################################
+#
+# needs to be compiled by the user
+export PIC_LIBS="/scratch/p_electron/debus/power9/lib"
+export BOOST_ROOT=$PIC_LIBS/boost-1.69.0-Power9
+export PNG_ROOT=$PIC_LIBS/libpng-1.6.34-Power9
+export PNGwriter_DIR=$PIC_LIBS/pngwriter-0.7.0-Power9
+export ADIOS_ROOT=$PIC_LIBS/adios-1.13.1-Power9
+export Splash_DIR=$PIC_LIBS/splash-Power9
+export CMAKE_PREFIX_PATH=$Splash_DIR:$CMAKE_PREFIX_PATH
+export HDF5_ROOT=$PIC_LIBS/hdf5-Power9
+export BLOSC_ROOT=$PIC_LIBS/blosc-1.12.1-Power9
+
+export LD_LIBRARY_PATH=$BOOST_ROOT/lib:$LD_LIBRARY_PATH
+export LIBRARY_PATH=$BOOST_ROOT/lib:$LIBRARY_PATH
+export LD_LIBRARY_PATH=$PNG_ROOT/lib:$LD_LIBRARY_PATH
+export LD_LIBRARY_PATH=$PNGwriter_DIR/lib:$LD_LIBRARY_PATH
+export LD_LIBRARY_PATH=$ADIOS_ROOT/lib:$LD_LIBRARY_PATH
+export LD_LIBRARY_PATH=$Splash_DIR/lib:$LD_LIBRARY_PATH
+export LD_LIBRARY_PATH=$HDF5_ROOT/lib:$LD_LIBRARY_PATH
+export LD_LIBRARY_PATH=$BLOSC_ROOT/lib:$LD_LIBRARY_PATH
+
+export PATH=$PNG_ROOT/bin:$PATH
+export PATH=$ADIOS_ROOT/bin:$PATH
+
+export CMAKE_PREFIX_PATH=$PNG_ROOT:$CMAKE_PREFIX_PATH
+
+export PICSRC=$HOME/src/picongpu
+export PIC_EXAMPLES=$PICSRC/share/picongpu/examples
+export PIC_BACKEND="cuda:60"
+
+export PATH=$PATH:$PICSRC
+export PATH=$PATH:$PICSRC/bin
+export PATH=$PATH:$PICSRC/src/tools/bin
+
+# python not included yet
+#export PYTHONPATH=$PICSRC/lib/python:$PYTHONPATH
+
+# This is necessary in order to make alpaka compile.
+# The workaround is from Axel Huebl according to alpaka PR #702.
+export CXXFLAGS="-Dlinux"
+
+# "tbg" default options #######################################################
+#   - SLURM (sbatch)
+#   - "gpu2" queue
+export TBG_SUBMIT="sbatch"
+export TBG_TPLFILE="etc/picongpu/taurus-tud/V100.tpl"
+
+alias getNode='srun -p ml --gres=gpu:6 -n 6 --pty --mem-per-cpu=10000 -t 2:00:00 bash'
+
diff --git a/etc/picongpu/taurus-tud/V100_restart.tpl b/etc/picongpu/taurus-tud/V100_restart.tpl
new file mode 100644
index 0000000000..316a1f80ab
--- /dev/null
+++ b/etc/picongpu/taurus-tud/V100_restart.tpl
@@ -0,0 +1,183 @@
+#!/usr/bin/env bash
+# Copyright 2013-2019 Axel Huebl, Richard Pausch, Alexander Debus
+#
+# This file is part of PIConGPU.
+#
+# PIConGPU is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# PIConGPU is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with PIConGPU.
+# If not, see <http://www.gnu.org/licenses/>.
+#
+
+
+# PIConGPU batch script for taurus' SLURM batch system
+
+#SBATCH --partition=!TBG_queue
+#SBATCH --time=!TBG_wallTime
+# Sets batch job's name
+#SBATCH --job-name=!TBG_jobName
+#SBATCH --nodes=!TBG_nodes
+#SBATCH --ntasks=!TBG_tasks
+#SBATCH --mincpus=!TBG_mpiTasksPerNode
+#SBATCH --cpus-per-task=!TBG_coresPerGPU
+# Maximum memory setting the SLURM queue "ml" accepts.
+#SBATCH --mem-per-cpu=1511
+#SBATCH --gres=gpu:!TBG_gpusPerNode
+# send me mails on BEGIN, END, FAIL, REQUEUE, ALL,
+# TIME_LIMIT, TIME_LIMIT_90, TIME_LIMIT_80 and/or TIME_LIMIT_50
+#SBATCH --mail-type=!TBG_mailSettings
+#SBATCH --mail-user=!TBG_mailAddress
+#SBATCH --workdir=!TBG_dstPath
+
+#SBATCH -o stdout
+#SBATCH -e stderr
+
+
+## calculations will be performed by tbg ##
+.TBG_queue="ml"
+
+# settings that can be controlled by environment variables before submit
+.TBG_mailSettings=${MY_MAILNOTIFY:-"ALL"}
+.TBG_mailAddress=${MY_MAIL:-"someone@example.com"}
+.TBG_author=${MY_NAME:+--author \"${MY_NAME}\"}
+.TBG_profile=${PIC_PROFILE:-"~/picongpu.profile"}
+
+# 6 gpus per node
+.TBG_gpusPerNode=`if [ $TBG_tasks -gt 6 ] ; then echo 6; else echo $TBG_tasks; fi`
+
+# number of cores to block per GPU - we got 6 cpus per gpu
+#   and we will be accounted 6 CPUs per GPU anyway
+.TBG_coresPerGPU=28
+
+# We only start 1 MPI task per GPU
+.TBG_mpiTasksPerNode="$(( TBG_gpusPerNode * 1 ))"
+
+# use ceil to calculate nodes
+.TBG_nodes="$((( TBG_tasks + TBG_gpusPerNode -1 ) / TBG_gpusPerNode))"
+
+## end calculations ##
+
+echo 'Running program...'
+
+cd !TBG_dstPath
+
+export MODULES_NO_OUTPUT=1
+source !TBG_profile
+if [ $? -ne 0 ] ; then
+  echo "Error: PIConGPU environment profile under \"!TBG_profile\" not found!"
+  exit 1
+fi
+unset MODULES_NO_OUTPUT
+
+# set user rights to u=rwx;g=r-x;o=---
+umask 0027
+
+# Due to missing SLURM integration of the current MPI libraries
+# we have to create a suitable machinefile.
+rm -f machinefile.txt
+scontrol show hostnames $SLURM_JOB_NODELIST >> machinefile.txt
+scontrol show hostnames $SLURM_JOB_NODELIST >> machinefile.txt
+scontrol show hostnames $SLURM_JOB_NODELIST >> machinefile.txt
+scontrol show hostnames $SLURM_JOB_NODELIST >> machinefile.txt
+scontrol show hostnames $SLURM_JOB_NODELIST >> machinefile.txt
+scontrol show hostnames $SLURM_JOB_NODELIST >> machinefile.txt
+
+mkdir simOutput 2> /dev/null
+cd simOutput
+
+# we are not sure if the current bullxmpi/1.2.4.3 catches pinned memory correctly
+#   support ticket [Ticket:2014052241001186] srun: mpi mca flags
+#   see bug https://github.com/ComputationalRadiationPhysics/picongpu/pull/438
+export OMPI_MCA_mpi_leave_pinned=0
+
+sleep 1
+
+echo "----- automated restart routine -----" | tee -a output
+
+#check whether last checkpoint is valid
+file=""
+# ADIOS restart files take precedence over HDF5 files
+fileEnding="h5"
+hasADIOS=$(ls ./checkpoints/checkpoint_*.bp 2>/dev/null | wc -w)
+if [ $hasADIOS -gt 0 ]
+then
+    fileEnding="bp"
+fi
+
+for file in `ls -t ./checkpoints/checkpoint_*.$fileEnding`
+do
+    echo -n "validate checkpoint $file: " | tee -a output
+    $fileEnding"ls" $file &> /dev/null
+    if [ $? -eq 0 ]
+    then
+        echo "OK" | tee -a output
+        break
+    else
+        echo "FAILED" | tee -a output
+        file=""
+    fi
+done
+
+#this sed call extracts the final simulation step from the cfg (assuming a standard cfg)
+finalStep=`echo !TBG_programParams | sed 's/.*-s[[:blank:]]\+\([0-9]\+[^\s]\).*/\1/'`
+echo "final step      = " $finalStep | tee -a output
+#this sed call extracts the -s and --checkpoint flags
+programParams=`echo !TBG_programParams | sed 's/-s[[:blank:]]\+[0-9]\+[^\s]//g' | sed 's/--checkpoint\.period[[:blank:]]\+[0-9,:,\,]\+[^\s]//g'`
+#extract restart period
+restartPeriod=`echo !TBG_programParams | sed 's/.*--checkpoint\.period[[:blank:]]\+\([0-9,:,\,]\+[^\s]\).*/\1/'`
+echo  "restart period = " $restartPeriod | tee -a output
+
+
+# ******************************************* #
+# need some magic, if the restart period is in new notation with the ':' and ','
+
+currentStep=`basename $file | sed 's/checkpoint_//g' | sed 's/.'$fileEnding'//g'`
+nextStep=$(nextstep_from_period.sh $restartPeriod $finalStep $currentStep)
+
+if [ -z "$file" ]; then
+    stepSetup="-s $nextStep --checkpoint.period $restartPeriod"
+else
+    stepSetup="-s $nextStep --checkpoint.period $restartPeriod --checkpoint.restart --checkpoint.restart.step $currentStep"
+fi
+
+# ******************************************* #
+
+echo "--- end automated restart routine ---" | tee -a output
+
+#wait that all nodes see output folder
+sleep 1
+
+# test if cuda_memtest binary is available
+if [ -f !TBG_dstPath/input/bin/cuda_memtest ] ; then
+  # Run CUDA memtest to check GPU's health
+  mpiexec -hostfile ../machinefile.txt !TBG_dstPath/input/bin/cuda_memtest.sh
+else
+  echo "no binary 'cuda_memtest' available, skip GPU memory test" >&2
+fi
+
+if [ $? -eq 0 ] ; then
+  # Run PIConGPU
+  mpiexec -hostfile ../machinefile.txt !TBG_dstPath/input/bin/picongpu $stepSetup !TBG_author !TBG_programParams | tee output
+fi
+
+mpiexec -hostfile ../machinefile.txt /usr/bin/env bash -c "killall -9 picongpu 2>/dev/null || true"
+
+if [ $nextStep -lt $finalStep ]
+then
+    ssh tauruslogin6 "/usr/bin/sbatch !TBG_dstPath/tbg/submit.start"
+    if [ $? -ne 0 ] ; then
+        echo "error during job submission" | tee -a output
+    else
+        echo "job submitted" | tee -a output
+    fi
+fi
+

From 3e70c429f28545d76206377d97b1c23b5722fe42 Mon Sep 17 00:00:00 2001
From: Alexander Debus <a.debus@hzdr.de>
Date: Mon, 21 Jan 2019 13:19:31 +0100
Subject: [PATCH 17/40] Included reviewers' suggestions.

---
 etc/picongpu/taurus-tud/V100.tpl                    | 13 +++++++------
 .../taurus-tud/V100_picongpu.profile.example        |  2 +-
 etc/picongpu/taurus-tud/V100_restart.tpl            | 13 +++++++------
 3 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/etc/picongpu/taurus-tud/V100.tpl b/etc/picongpu/taurus-tud/V100.tpl
index d7a105a761..80ce196a99 100644
--- a/etc/picongpu/taurus-tud/V100.tpl
+++ b/etc/picongpu/taurus-tud/V100.tpl
@@ -83,12 +83,10 @@ umask 0027
 # Due to missing SLURM integration of the current MPI libraries
 # we have to create a suitable machinefile.
 rm -f machinefile.txt
-scontrol show hostnames $SLURM_JOB_NODELIST >> machinefile.txt
-scontrol show hostnames $SLURM_JOB_NODELIST >> machinefile.txt
-scontrol show hostnames $SLURM_JOB_NODELIST >> machinefile.txt
-scontrol show hostnames $SLURM_JOB_NODELIST >> machinefile.txt
-scontrol show hostnames $SLURM_JOB_NODELIST >> machinefile.txt
-scontrol show hostnames $SLURM_JOB_NODELIST >> machinefile.txt
+for i in `seq !TBG_gpusPerNode`
+do
+    scontrol show hostnames $SLURM_JOB_NODELIST >> machinefile.txt
+done
 
 mkdir simOutput 2> /dev/null
 cd simOutput
@@ -97,6 +95,9 @@ cd simOutput
 #   support ticket [Ticket:2014052241001186] srun: mpi mca flags
 #   see bug https://github.com/ComputationalRadiationPhysics/picongpu/pull/438
 export OMPI_MCA_mpi_leave_pinned=0
+# Use ROMIO for IO
+# according to ComputationalRadiationPhysics/picongpu#2857
+export OMPI_MCA_io=^ompio
 
 # test if cuda_memtest binary is available
 if [ -f !TBG_dstPath/input/bin/cuda_memtest ] ; then
diff --git a/etc/picongpu/taurus-tud/V100_picongpu.profile.example b/etc/picongpu/taurus-tud/V100_picongpu.profile.example
index 0be1d49d12..af0ca03dd7 100644
--- a/etc/picongpu/taurus-tud/V100_picongpu.profile.example
+++ b/etc/picongpu/taurus-tud/V100_picongpu.profile.example
@@ -31,7 +31,7 @@ module load zlib/1.2.11-GCCcore-6.4.0
 # Self-Build Software #########################################################
 #
 # needs to be compiled by the user
-export PIC_LIBS="/scratch/p_electron/debus/power9/lib"
+export PIC_LIBS="$HOME/lib"
 export BOOST_ROOT=$PIC_LIBS/boost-1.69.0-Power9
 export PNG_ROOT=$PIC_LIBS/libpng-1.6.34-Power9
 export PNGwriter_DIR=$PIC_LIBS/pngwriter-0.7.0-Power9
diff --git a/etc/picongpu/taurus-tud/V100_restart.tpl b/etc/picongpu/taurus-tud/V100_restart.tpl
index 316a1f80ab..0d534c5bb2 100644
--- a/etc/picongpu/taurus-tud/V100_restart.tpl
+++ b/etc/picongpu/taurus-tud/V100_restart.tpl
@@ -84,12 +84,10 @@ umask 0027
 # Due to missing SLURM integration of the current MPI libraries
 # we have to create a suitable machinefile.
 rm -f machinefile.txt
-scontrol show hostnames $SLURM_JOB_NODELIST >> machinefile.txt
-scontrol show hostnames $SLURM_JOB_NODELIST >> machinefile.txt
-scontrol show hostnames $SLURM_JOB_NODELIST >> machinefile.txt
-scontrol show hostnames $SLURM_JOB_NODELIST >> machinefile.txt
-scontrol show hostnames $SLURM_JOB_NODELIST >> machinefile.txt
-scontrol show hostnames $SLURM_JOB_NODELIST >> machinefile.txt
+for i in `seq !TBG_gpusPerNode`
+do
+    scontrol show hostnames $SLURM_JOB_NODELIST >> machinefile.txt
+done
 
 mkdir simOutput 2> /dev/null
 cd simOutput
@@ -98,6 +96,9 @@ cd simOutput
 #   support ticket [Ticket:2014052241001186] srun: mpi mca flags
 #   see bug https://github.com/ComputationalRadiationPhysics/picongpu/pull/438
 export OMPI_MCA_mpi_leave_pinned=0
+# Use ROMIO for IO
+# according to ComputationalRadiationPhysics/picongpu#2857
+export OMPI_MCA_io=^ompio
 
 sleep 1
 

From 4c20e34b1dfa704d78594f92e89defdf95ca95d6 Mon Sep 17 00:00:00 2001
From: Alexander Debus <a.debus@hzdr.de>
Date: Mon, 21 Jan 2019 15:37:13 +0100
Subject: [PATCH 18/40] Include new ml-partition on Taurus

Documents new `ml`-queue from #2856 .
---
 docs/source/install/profile.rst | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/docs/source/install/profile.rst b/docs/source/install/profile.rst
index 199bc9d571..f1f0d6502f 100644
--- a/docs/source/install/profile.rst
+++ b/docs/source/install/profile.rst
@@ -161,6 +161,28 @@ For this profile, you additionally need to install your own :ref:`boost <install
 
 .. literalinclude:: profiles/taurus-tud/knl_picongpu.profile.example
    :language: bash
+   
+Queue: ml (NVIDIA V100 GPUs on Power9 nodes)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+For this profile, you additionally need to compile and install everything for the power9-architecture including your own :ref:`boost <install-dependencies>`, :ref:`HDF5 <install-dependencies>`, c-blosc and :ref:`ADIOS <install-dependencies>`.
+
+Install script for `c-blosc`
+```
+cd $SOURCE_DIR
+git clone -b v1.12.1 https://github.com/Blosc/c-blosc.git \
+    $SOURCE_DIR/c-blosc
+mkdir c-blosc-build
+cd c-blosc-build
+cmake -DCMAKE_INSTALL_PREFIX=$BLOSC_ROOT \
+    -DPREFER_EXTERNAL_ZLIB=ON \
+    $SOURCE_DIR/c-blosc
+make -j4
+make install
+```
+
+.. literalinclude:: profiles/taurus-tud/V100_picongpu.profile.example
+   :language: bash
 
 Lawrencium (LBNL)
 -----------------

From 4fbab4563a93302ae4bcd0a9045f11d23111d29d Mon Sep 17 00:00:00 2001
From: Alexander Debus <a.debus@hzdr.de>
Date: Mon, 21 Jan 2019 15:42:29 +0100
Subject: [PATCH 19/40] Updated code-styling

---
 docs/source/install/profile.rst | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/docs/source/install/profile.rst b/docs/source/install/profile.rst
index f1f0d6502f..461aa1a10f 100644
--- a/docs/source/install/profile.rst
+++ b/docs/source/install/profile.rst
@@ -168,18 +168,19 @@ Queue: ml (NVIDIA V100 GPUs on Power9 nodes)
 For this profile, you additionally need to compile and install everything for the power9-architecture including your own :ref:`boost <install-dependencies>`, :ref:`HDF5 <install-dependencies>`, c-blosc and :ref:`ADIOS <install-dependencies>`.
 
 Install script for `c-blosc`
-```
-cd $SOURCE_DIR
-git clone -b v1.12.1 https://github.com/Blosc/c-blosc.git \
-    $SOURCE_DIR/c-blosc
-mkdir c-blosc-build
-cd c-blosc-build
-cmake -DCMAKE_INSTALL_PREFIX=$BLOSC_ROOT \
-    -DPREFER_EXTERNAL_ZLIB=ON \
-    $SOURCE_DIR/c-blosc
-make -j4
-make install
-```
+
+.. code-block:: bash
+
+   cd $SOURCE_DIR
+   git clone -b v1.12.1 https://github.com/Blosc/c-blosc.git \
+       $SOURCE_DIR/c-blosc
+   mkdir c-blosc-build
+   cd c-blosc-build
+   cmake -DCMAKE_INSTALL_PREFIX=$BLOSC_ROOT \
+       -DPREFER_EXTERNAL_ZLIB=ON \
+       $SOURCE_DIR/c-blosc
+   make -j4
+   make install
 
 .. literalinclude:: profiles/taurus-tud/V100_picongpu.profile.example
    :language: bash

From b291a016bfb84be7b20f38c5d951935cf69e872b Mon Sep 17 00:00:00 2001
From: Axel Huebl <axel.huebl@plasma.ninja>
Date: Mon, 21 Jan 2019 11:47:41 +0100
Subject: [PATCH 20/40] OpenMPI: Use ROMIO for IO

OpenMPI's default for its IO backend is, starting with 2.x, OMPIO.

Unfurtunately, that backend contains severe bugs leading to sporadic
crashes and data corruption.

For all systm templates that rely on OpenMPI, disable the "new" default
backend and fallback to the existing ROMIO backend for MPI-I/O.

Other MPI implementatiosn such as MPICH and MPICH-based flavors such as
IntelMPI use ROMIO by default (they develop ROMIO) and are not affected.
---
 etc/picongpu/bash/mpiexec.tpl             | 5 +++++
 etc/picongpu/bash/mpirun.tpl              | 5 +++++
 etc/picongpu/davide-cineca/gpu.tpl        | 5 +++++
 etc/picongpu/davinci-rice/picongpu.tpl    | 5 +++++
 etc/picongpu/hemera-hzdr/defq.tpl         | 5 +++++
 etc/picongpu/hemera-hzdr/gpu.tpl          | 5 +++++
 etc/picongpu/hydra-hzdr/default.tpl       | 5 +++++
 etc/picongpu/hypnos-hzdr/fermi.tpl        | 5 +++++
 etc/picongpu/hypnos-hzdr/k20.tpl          | 5 +++++
 etc/picongpu/hypnos-hzdr/k20_autoWait.tpl | 5 +++++
 etc/picongpu/hypnos-hzdr/k20_restart.tpl  | 5 +++++
 etc/picongpu/hypnos-hzdr/k20_vampir.tpl   | 5 +++++
 etc/picongpu/hypnos-hzdr/k20_wait.tpl     | 5 +++++
 etc/picongpu/hypnos-hzdr/k80.tpl          | 5 +++++
 etc/picongpu/hypnos-hzdr/k80_restart.tpl  | 5 +++++
 etc/picongpu/hypnos-hzdr/laser.tpl        | 5 +++++
 etc/picongpu/lawrencium-lbnl/fermi.tpl    | 5 +++++
 etc/picongpu/lawrencium-lbnl/k20.tpl      | 4 ++++
 etc/picongpu/taurus-tud/k20x.tpl          | 5 +++++
 etc/picongpu/taurus-tud/k80.tpl           | 5 +++++
 20 files changed, 99 insertions(+)

diff --git a/etc/picongpu/bash/mpiexec.tpl b/etc/picongpu/bash/mpiexec.tpl
index bbf8182501..2df28263ae 100644
--- a/etc/picongpu/bash/mpiexec.tpl
+++ b/etc/picongpu/bash/mpiexec.tpl
@@ -44,6 +44,11 @@ umask 0027
 mkdir simOutput 2> /dev/null
 cd simOutput
 
+# The OMPIO backend in OpenMPI up to 3.1.3 and 4.0.0 is broken, use the
+# fallback ROMIO backend instead.
+#   see bug https://github.com/open-mpi/ompi/issues/6285
+export OMPI_MCA_io=^ompio
+
 # test if cuda_memtest binary is available
 if [ -f !TBG_dstPath/input/bin/cuda_memtest ] ; then
   mpiexec -am !TBG_dstPath/tbg/openib.conf --mca mpi_leave_pinned 0 -npernode !TBG_gpusPerNode -n !TBG_tasks !TBG_dstPath/input/bin/cuda_memtest.sh
diff --git a/etc/picongpu/bash/mpirun.tpl b/etc/picongpu/bash/mpirun.tpl
index 1973d43f5b..fa633449b5 100644
--- a/etc/picongpu/bash/mpirun.tpl
+++ b/etc/picongpu/bash/mpirun.tpl
@@ -44,6 +44,11 @@ umask 0027
 mkdir simOutput 2> /dev/null
 cd simOutput
 
+# The OMPIO backend in OpenMPI up to 3.1.3 and 4.0.0 is broken, use the
+# fallback ROMIO backend instead.
+#   see bug https://github.com/open-mpi/ompi/issues/6285
+export OMPI_MCA_io=^ompio
+
 # test if cuda_memtest binary is available
 if [ -f !TBG_dstPath/input/bin/cuda_memtest ] ; then
   mpirun -am !TBG_dstPath/tbg/openib.conf --mca mpi_leave_pinned 0 -npernode !TBG_gpusPerNode -n !TBG_tasks !TBG_dstPath/input/bin/cuda_memtest.sh
diff --git a/etc/picongpu/davide-cineca/gpu.tpl b/etc/picongpu/davide-cineca/gpu.tpl
index 4172dd8215..042c8fd0b4 100644
--- a/etc/picongpu/davide-cineca/gpu.tpl
+++ b/etc/picongpu/davide-cineca/gpu.tpl
@@ -94,6 +94,11 @@ umask 0027
 mkdir simOutput 2> /dev/null
 cd simOutput
 
+# The OMPIO backend in OpenMPI up to 3.1.3 and 4.0.0 is broken, use the
+# fallback ROMIO backend instead.
+#   see bug https://github.com/open-mpi/ompi/issues/6285
+export OMPI_MCA_io=^ompio
+
 # test if cuda_memtest binary is available and we have the node exclusive
 if [ -f !TBG_dstPath/input/bin/cuda_memtest ] && [ !TBG_numHostedGPUPerNode -eq !TBG_gpusPerNode ] ; then
   # Run CUDA memtest to check GPU's health
diff --git a/etc/picongpu/davinci-rice/picongpu.tpl b/etc/picongpu/davinci-rice/picongpu.tpl
index eee4c6f4d9..229a4f2075 100644
--- a/etc/picongpu/davinci-rice/picongpu.tpl
+++ b/etc/picongpu/davinci-rice/picongpu.tpl
@@ -72,6 +72,11 @@ unset MODULES_NO_OUTPUT
 mkdir simOutput 2> /dev/null
 cd simOutput
 
+# The OMPIO backend in OpenMPI up to 3.1.3 and 4.0.0 is broken, use the
+# fallback ROMIO backend instead.
+#   see bug https://github.com/open-mpi/ompi/issues/6285
+export OMPI_MCA_io=^ompio
+
 # test if cuda_memtest binary is available
 if [ -f !TBG_dstPath/input/bin/cuda_memtest ] ; then
   mpirun -n TBG_tasks --display-map -am tbg/openib.conf --mca mpi_leave_pinned 0 !TBG_dstPath/input/bin/cuda_memtest.sh
diff --git a/etc/picongpu/hemera-hzdr/defq.tpl b/etc/picongpu/hemera-hzdr/defq.tpl
index 650d22f88e..e7e918ded8 100644
--- a/etc/picongpu/hemera-hzdr/defq.tpl
+++ b/etc/picongpu/hemera-hzdr/defq.tpl
@@ -91,6 +91,11 @@ mkdir simOutput 2> /dev/null
 cd simOutput
 ln -s ../stdout output
 
+# The OMPIO backend in OpenMPI up to 3.1.3 and 4.0.0 is broken, use the
+# fallback ROMIO backend instead.
+#   see bug https://github.com/open-mpi/ompi/issues/6285
+export OMPI_MCA_io=^ompio
+
 if [ $? -eq 0 ] ; then
   # Run PIConGPU
   mpiexec --bind-to none !TBG_dstPath/tbg/cpuNumaStarter.sh !TBG_dstPath/input/bin/picongpu !TBG_author !TBG_programParams
diff --git a/etc/picongpu/hemera-hzdr/gpu.tpl b/etc/picongpu/hemera-hzdr/gpu.tpl
index e991eafcd0..9989318be6 100644
--- a/etc/picongpu/hemera-hzdr/gpu.tpl
+++ b/etc/picongpu/hemera-hzdr/gpu.tpl
@@ -92,6 +92,11 @@ mkdir simOutput 2> /dev/null
 cd simOutput
 ln -s ../stdout output
 
+# The OMPIO backend in OpenMPI up to 3.1.3 and 4.0.0 is broken, use the
+# fallback ROMIO backend instead.
+#   see bug https://github.com/open-mpi/ompi/issues/6285
+export OMPI_MCA_io=^ompio
+
 # test if cuda_memtest binary is available and we have the node exclusive
 if [ -f !TBG_dstPath/input/bin/cuda_memtest ] && [ !TBG_numHostedGPUPerNode -eq !TBG_gpusPerNode ] ; then
   # Run CUDA memtest to check GPU's health
diff --git a/etc/picongpu/hydra-hzdr/default.tpl b/etc/picongpu/hydra-hzdr/default.tpl
index b4a2668248..9006a0b5b6 100644
--- a/etc/picongpu/hydra-hzdr/default.tpl
+++ b/etc/picongpu/hydra-hzdr/default.tpl
@@ -78,6 +78,11 @@ cd simOutput
 #wait that all nodes see ouput folder
 sleep 1
 
+# The OMPIO backend in OpenMPI up to 3.1.3 and 4.0.0 is broken, use the
+# fallback ROMIO backend instead.
+#   see bug https://github.com/open-mpi/ompi/issues/6285
+export OMPI_MCA_io=^ompio
+
 if [ $? -eq 0 ] ; then
   mpiexec --prefix $MPIHOME -x LIBRARY_PATH -tag-output --bind-to none --display-map -am !TBG_dstPath/tbg/openib.conf --mca mpi_leave_pinned 0 -npernode !TBG_gpusPerNode -n !TBG_tasks !TBG_dstPath/tbg/cpuNumaStarter.sh !TBG_dstPath/input/bin/picongpu !TBG_author !TBG_programParams | tee output
 fi
diff --git a/etc/picongpu/hypnos-hzdr/fermi.tpl b/etc/picongpu/hypnos-hzdr/fermi.tpl
index 415bd7fc0b..2fd6eb1fd6 100644
--- a/etc/picongpu/hypnos-hzdr/fermi.tpl
+++ b/etc/picongpu/hypnos-hzdr/fermi.tpl
@@ -77,6 +77,11 @@ cd simOutput
 #wait that all nodes see ouput folder
 sleep 1
 
+# The OMPIO backend in OpenMPI up to 3.1.3 and 4.0.0 is broken, use the
+# fallback ROMIO backend instead.
+#   see bug https://github.com/open-mpi/ompi/issues/6285
+export OMPI_MCA_io=^ompio
+
 # test if cuda_memtest binary is available and we have the node exclusive
 if [ -f !TBG_dstPath/input/bin/cuda_memtest ] && [ !TBG_numHostedGPUPerNode -eq !TBG_gpusPerNode ] ; then
   mpiexec --prefix $MPIHOME -tag-output --display-map -x LIBRARY_PATH -am !TBG_dstPath/tbg/openib.conf --mca mpi_leave_pinned 0 -npernode !TBG_gpusPerNode -n !TBG_tasks !TBG_dstPath/input/bin/cuda_memtest.sh
diff --git a/etc/picongpu/hypnos-hzdr/k20.tpl b/etc/picongpu/hypnos-hzdr/k20.tpl
index bd79980364..c53ef403b5 100644
--- a/etc/picongpu/hypnos-hzdr/k20.tpl
+++ b/etc/picongpu/hypnos-hzdr/k20.tpl
@@ -76,6 +76,11 @@ cd simOutput
 #wait that all nodes see ouput folder
 sleep 1
 
+# The OMPIO backend in OpenMPI up to 3.1.3 and 4.0.0 is broken, use the
+# fallback ROMIO backend instead.
+#   see bug https://github.com/open-mpi/ompi/issues/6285
+export OMPI_MCA_io=^ompio
+
 # test if cuda_memtest binary is available and we have the node exclusive
 if [ -f !TBG_dstPath/input/bin/cuda_memtest ] && [ !TBG_numHostedGPUPerNode -eq !TBG_gpusPerNode ] ; then
   mpiexec --prefix $MPIHOME -tag-output --display-map -x LIBRARY_PATH -am !TBG_dstPath/tbg/openib.conf --mca mpi_leave_pinned 0 -npernode !TBG_gpusPerNode -n !TBG_tasks !TBG_dstPath/input/bin/cuda_memtest.sh
diff --git a/etc/picongpu/hypnos-hzdr/k20_autoWait.tpl b/etc/picongpu/hypnos-hzdr/k20_autoWait.tpl
index a1f67415bb..803ad4996d 100644
--- a/etc/picongpu/hypnos-hzdr/k20_autoWait.tpl
+++ b/etc/picongpu/hypnos-hzdr/k20_autoWait.tpl
@@ -81,6 +81,11 @@ cd simOutput
 #wait that all nodes see ouput folder
 sleep 1
 
+# The OMPIO backend in OpenMPI up to 3.1.3 and 4.0.0 is broken, use the
+# fallback ROMIO backend instead.
+#   see bug https://github.com/open-mpi/ompi/issues/6285
+export OMPI_MCA_io=^ompio
+
 # test if cuda_memtest binary is available and we have the node exclusive
 if [ -f !TBG_dstPath/input/bin/cuda_memtest ] && [ !TBG_numHostedGPUPerNode -eq !TBG_gpusPerNode ] ; then
   mpiexec --prefix $MPIHOME -tag-output --display-map -x LIBRARY_PATH -am !TBG_dstPath/tbg/openib.conf --mca mpi_leave_pinned 0 -npernode !TBG_gpusPerNode -n !TBG_tasks !TBG_dstPath/input/bin/cuda_memtest.sh
diff --git a/etc/picongpu/hypnos-hzdr/k20_restart.tpl b/etc/picongpu/hypnos-hzdr/k20_restart.tpl
index 8a30122f2e..0728fac0d4 100644
--- a/etc/picongpu/hypnos-hzdr/k20_restart.tpl
+++ b/etc/picongpu/hypnos-hzdr/k20_restart.tpl
@@ -126,6 +126,11 @@ echo "--- end automated restart routine ---" | tee -a output
 #wait that all nodes see ouput folder
 sleep 1
 
+# The OMPIO backend in OpenMPI up to 3.1.3 and 4.0.0 is broken, use the
+# fallback ROMIO backend instead.
+#   see bug https://github.com/open-mpi/ompi/issues/6285
+export OMPI_MCA_io=^ompio
+
 # test if cuda_memtest binary is available and we have the node exclusive
 if [ -f !TBG_dstPath/input/bin/cuda_memtest ] && [ !TBG_numHostedGPUPerNode -eq !TBG_gpusPerNode ] ; then
   mpiexec --prefix $MPIHOME -tag-output --display-map -x LIBRARY_PATH -am !TBG_dstPath/tbg/openib.conf --mca mpi_leave_pinned 0 -npernode !TBG_gpusPerNode -n !TBG_tasks !TBG_dstPath/input/bin/cuda_memtest.sh
diff --git a/etc/picongpu/hypnos-hzdr/k20_vampir.tpl b/etc/picongpu/hypnos-hzdr/k20_vampir.tpl
index 509ba9a2b0..8abe0fa773 100644
--- a/etc/picongpu/hypnos-hzdr/k20_vampir.tpl
+++ b/etc/picongpu/hypnos-hzdr/k20_vampir.tpl
@@ -91,6 +91,11 @@ cd simOutput
 # wait for all nodes to see the output folder
 sleep 1
 
+# The OMPIO backend in OpenMPI up to 3.1.3 and 4.0.0 is broken, use the
+# fallback ROMIO backend instead.
+#   see bug https://github.com/open-mpi/ompi/issues/6285
+export OMPI_MCA_io=^ompio
+
 # test if cuda_memtest binary is available and we have the node exclusive
 if [ -f !TBG_dstPath/input/bin/cuda_memtest ] && [ !TBG_numHostedGPUPerNode -eq !TBG_gpusPerNode ] ; then
   mpiexec --prefix $MPIHOME -tag-output --display-map -x LIBRARY_PATH -am !TBG_dstPath/tbg/openib.conf --mca mpi_leave_pinned 0 -npernode !TBG_gpusPerNode -n !TBG_tasks !TBG_dstPath/input/bin/cuda_memtest.sh
diff --git a/etc/picongpu/hypnos-hzdr/k20_wait.tpl b/etc/picongpu/hypnos-hzdr/k20_wait.tpl
index a7621da6e7..d553498a2d 100644
--- a/etc/picongpu/hypnos-hzdr/k20_wait.tpl
+++ b/etc/picongpu/hypnos-hzdr/k20_wait.tpl
@@ -78,6 +78,11 @@ cd simOutput
 #wait that all nodes see ouput folder
 sleep 1
 
+# The OMPIO backend in OpenMPI up to 3.1.3 and 4.0.0 is broken, use the
+# fallback ROMIO backend instead.
+#   see bug https://github.com/open-mpi/ompi/issues/6285
+export OMPI_MCA_io=^ompio
+
 # test if cuda_memtest binary is available and we have the node exclusive
 if [ -f !TBG_dstPath/input/bin/cuda_memtest ] && [ !TBG_numHostedGPUPerNode -eq !TBG_gpusPerNode ] ; then
   mpiexec --prefix $MPIHOME -tag-output --display-map -x LIBRARY_PATH -am !TBG_dstPath/tbg/openib.conf --mca mpi_leave_pinned 0 -npernode !TBG_gpusPerNode -n !TBG_tasks !TBG_dstPath/input/bin/cuda_memtest.sh
diff --git a/etc/picongpu/hypnos-hzdr/k80.tpl b/etc/picongpu/hypnos-hzdr/k80.tpl
index da014586d3..10f12acb0e 100644
--- a/etc/picongpu/hypnos-hzdr/k80.tpl
+++ b/etc/picongpu/hypnos-hzdr/k80.tpl
@@ -76,6 +76,11 @@ cd simOutput
 #wait that all nodes see ouput folder
 sleep 1
 
+# The OMPIO backend in OpenMPI up to 3.1.3 and 4.0.0 is broken, use the
+# fallback ROMIO backend instead.
+#   see bug https://github.com/open-mpi/ompi/issues/6285
+export OMPI_MCA_io=^ompio
+
 # test if cuda_memtest binary is available and we have the node exclusive
 if [ -f !TBG_dstPath/input/bin/cuda_memtest ] && [ !TBG_numHostedGPUPerNode -eq !TBG_gpusPerNode ] ; then
   mpiexec --prefix $MPIHOME -tag-output --display-map -x LIBRARY_PATH -am !TBG_dstPath/tbg/openib.conf --mca mpi_leave_pinned 0 -npernode !TBG_gpusPerNode -n !TBG_tasks !TBG_dstPath/input/bin/cuda_memtest.sh
diff --git a/etc/picongpu/hypnos-hzdr/k80_restart.tpl b/etc/picongpu/hypnos-hzdr/k80_restart.tpl
index 3c976b7ecf..5958984bcc 100644
--- a/etc/picongpu/hypnos-hzdr/k80_restart.tpl
+++ b/etc/picongpu/hypnos-hzdr/k80_restart.tpl
@@ -130,6 +130,11 @@ echo "--- end automated restart routine ---" | tee -a output
 #wait that all nodes see ouput folder
 sleep 1
 
+# The OMPIO backend in OpenMPI up to 3.1.3 and 4.0.0 is broken, use the
+# fallback ROMIO backend instead.
+#   see bug https://github.com/open-mpi/ompi/issues/6285
+export OMPI_MCA_io=^ompio
+
 # test if cuda_memtest binary is available and we have the node exclusive
 if [ -f !TBG_dstPath/input/bin/cuda_memtest ] && [ !TBG_numHostedGPUPerNode -eq !TBG_gpusPerNode ] ; then
   mpiexec --prefix $MPIHOME -tag-output --display-map -x LIBRARY_PATH -am !TBG_dstPath/tbg/openib.conf --mca mpi_leave_pinned 0 -npernode !TBG_gpusPerNode -n !TBG_tasks !TBG_dstPath/input/bin/cuda_memtest.sh
diff --git a/etc/picongpu/hypnos-hzdr/laser.tpl b/etc/picongpu/hypnos-hzdr/laser.tpl
index d46929a774..8ccec05033 100644
--- a/etc/picongpu/hypnos-hzdr/laser.tpl
+++ b/etc/picongpu/hypnos-hzdr/laser.tpl
@@ -73,6 +73,11 @@ cd simOutput
 #wait that all nodes see ouput folder
 sleep 1
 
+# The OMPIO backend in OpenMPI up to 3.1.3 and 4.0.0 is broken, use the
+# fallback ROMIO backend instead.
+#   see bug https://github.com/open-mpi/ompi/issues/6285
+export OMPI_MCA_io=^ompio
+
 if [ $? -eq 0 ] ; then
   mpiexec --prefix $MPIHOME -x LIBRARY_PATH -tag-output --display-map -am !TBG_dstPath/tbg/openib.conf --mca mpi_leave_pinned 0 -npernode !TBG_gpusPerNode -n !TBG_tasks !TBG_dstPath/tbg/cpuNumaStarter.sh !TBG_dstPath/input/bin/picongpu !TBG_author !TBG_programParams | tee output
 fi
diff --git a/etc/picongpu/lawrencium-lbnl/fermi.tpl b/etc/picongpu/lawrencium-lbnl/fermi.tpl
index db26a13996..1a4cf2f914 100644
--- a/etc/picongpu/lawrencium-lbnl/fermi.tpl
+++ b/etc/picongpu/lawrencium-lbnl/fermi.tpl
@@ -96,6 +96,11 @@ ln -s ../stdout output
 #   see bug https://github.com/ComputationalRadiationPhysics/picongpu/pull/438
 export OMPI_MCA_mpi_leave_pinned=0
 
+# The OMPIO backend in OpenMPI up to 3.1.3 and 4.0.0 is broken, use the
+# fallback ROMIO backend instead.
+#   see bug https://github.com/open-mpi/ompi/issues/6285
+export OMPI_MCA_io=^ompio
+
 # test if cuda_memtest binary is available
 if [ -f !TBG_dstPath/input/bin/cuda_memtest ] ; then
   # Run CUDA memtest to check GPU's health
diff --git a/etc/picongpu/lawrencium-lbnl/k20.tpl b/etc/picongpu/lawrencium-lbnl/k20.tpl
index 86a69c0090..3b32ddce49 100644
--- a/etc/picongpu/lawrencium-lbnl/k20.tpl
+++ b/etc/picongpu/lawrencium-lbnl/k20.tpl
@@ -94,6 +94,10 @@ ln -s ../stdout output
 #   see bug https://github.com/ComputationalRadiationPhysics/picongpu/pull/438
 export OMPI_MCA_mpi_leave_pinned=0
 
+# The OMPIO backend in OpenMPI up to 3.1.3 and 4.0.0 is broken, use the
+# fallback ROMIO backend instead.
+#   see bug https://github.com/open-mpi/ompi/issues/6285
+export OMPI_MCA_io=^ompio
 
 # test if cuda_memtest binary is available
 if [ -f !TBG_dstPath/input/bin/cuda_memtest ] ; then
diff --git a/etc/picongpu/taurus-tud/k20x.tpl b/etc/picongpu/taurus-tud/k20x.tpl
index 3d52df6248..e309fc2e43 100644
--- a/etc/picongpu/taurus-tud/k20x.tpl
+++ b/etc/picongpu/taurus-tud/k20x.tpl
@@ -87,6 +87,11 @@ ln -s ../stdout output
 #   see bug https://github.com/ComputationalRadiationPhysics/picongpu/pull/438
 export OMPI_MCA_mpi_leave_pinned=0
 
+# The OMPIO backend in OpenMPI up to 3.1.3 and 4.0.0 is broken, use the
+# fallback ROMIO backend instead.
+#   see bug https://github.com/open-mpi/ompi/issues/6285
+export OMPI_MCA_io=^ompio
+
 # test if cuda_memtest binary is available
 if [ -f !TBG_dstPath/input/bin/cuda_memtest ] ; then
   # Run CUDA memtest to check GPU's health
diff --git a/etc/picongpu/taurus-tud/k80.tpl b/etc/picongpu/taurus-tud/k80.tpl
index 6f10f555b9..77c5de3a68 100644
--- a/etc/picongpu/taurus-tud/k80.tpl
+++ b/etc/picongpu/taurus-tud/k80.tpl
@@ -87,6 +87,11 @@ ln -s ../stdout output
 #   see bug https://github.com/ComputationalRadiationPhysics/picongpu/pull/438
 export OMPI_MCA_mpi_leave_pinned=0
 
+# The OMPIO backend in OpenMPI up to 3.1.3 and 4.0.0 is broken, use the
+# fallback ROMIO backend instead.
+#   see bug https://github.com/open-mpi/ompi/issues/6285
+export OMPI_MCA_io=^ompio
+
 # test if cuda_memtest binary is available
 if [ -f !TBG_dstPath/input/bin/cuda_memtest ] ; then
   # Run CUDA memtest to check GPU's health

From 02b7a03773d984ab76c0b2e1805e42ae02ead908 Mon Sep 17 00:00:00 2001
From: Igor Andriyash <igor.andriyash@gmail.com>
Date: Sun, 27 Jan 2019 15:34:37 +0200
Subject: [PATCH 21/40] typo-fix

related to #2864
---
 include/picongpu/param/ionizer.param | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/picongpu/param/ionizer.param b/include/picongpu/param/ionizer.param
index bfd8ac7268..f298caa4ef 100644
--- a/include/picongpu/param/ionizer.param
+++ b/include/picongpu/param/ionizer.param
@@ -233,7 +233,7 @@ namespace effectiveNuclearCharge
     );
 
     /* Example: aluminium */
-    PMACC_CONST_VECTOR(float_X, 13, Aluminum,
+    PMACC_CONST_VECTOR(float_X, 13, Aluminium,
         /* 3p^1 */
         4.066,
         /* 3s^2 */

From 3c6d7d4adae116f333a420f247c849a5626d1054 Mon Sep 17 00:00:00 2001
From: Axel Huebl <axel.huebl@plasma.ninja>
Date: Sun, 27 Jan 2019 15:07:57 +0100
Subject: [PATCH 22/40] Fix pyflakes compares

Fix pyflakes:
```
use ==/!= to compare str, bytes, and int literals
```
---
 src/tools/bin/smooth.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/tools/bin/smooth.py b/src/tools/bin/smooth.py
index 9cd6ab5249..bf0a0b5cbd 100644
--- a/src/tools/bin/smooth.py
+++ b/src/tools/bin/smooth.py
@@ -64,10 +64,10 @@ def makeOddNumber(number, larger=True):
     returns next odd number
 
     """
-    if number % 2 is 1:
+    if number % 2 == 1:
         # in case number is odd
         return number
-    elif number % 2 is 0:
+    elif number % 2 == 0:
         # in case number is even
         if larger:
             return number + 1

From 35b4baf0a9e8e9b923dc1d230373308c3b497eeb Mon Sep 17 00:00:00 2001
From: Axel Huebl <axel.huebl@plasma.ninja>
Date: Mon, 28 Jan 2019 13:13:48 +0100
Subject: [PATCH 23/40] Docs: Fix Title Linebreak

Fix a missing linebreak in a plot.
---
 docs/source/models/field_ionization_effective_potentials.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/source/models/field_ionization_effective_potentials.py b/docs/source/models/field_ionization_effective_potentials.py
index 46dccff90a..a4add2ddc3 100644
--- a/docs/source/models/field_ionization_effective_potentials.py
+++ b/docs/source/models/field_ionization_effective_potentials.py
@@ -78,7 +78,7 @@ def V_eff(x, Z_eff, F):
     plt.hlines(-E_CII, xmin, xmax)
 
     # add the legend and format the plot
-    plt.title(r"Effective atomic potentials of Carbon-II and Hydrogen in\n"
+    plt.title("Effective atomic potentials of Carbon-II and Hydrogen in\n"
               r"homogeneous electric field $F_\mathrm{BSI}$ (C-II)")
     plt.legend(loc="best")
     plt.text(xmin+1, -E_H+.05, r"$E_\mathrm{i}$ H")

From 3f50f5967ef72513ad78e82c9bf51a857036661a Mon Sep 17 00:00:00 2001
From: Sergei Bastrakov <bastrakov1@jrl10.jureca>
Date: Mon, 28 Jan 2019 17:55:41 +0100
Subject: [PATCH 24/40] System: JURECA
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Add new system documentation, profiles and tbg template for the JURECA cluster at JSC. So far the batch (CPU) and GPU queues are supported, KNL to be added soon.

Co-authored-by: René Widera <r.widera@hzdr.de>
---
 docs/source/install/profile.rst               |  29 +++++
 etc/picongpu/jureca-jsc/batch.tpl             |  93 +++++++++++++++
 .../jureca-jsc/batch_picongpu.profile.example | 109 +++++++++++++++++
 etc/picongpu/jureca-jsc/gpus.tpl              | 102 ++++++++++++++++
 .../jureca-jsc/gpus_picongpu.profile.example  | 112 ++++++++++++++++++
 5 files changed, 445 insertions(+)
 create mode 100644 etc/picongpu/jureca-jsc/batch.tpl
 create mode 100644 etc/picongpu/jureca-jsc/batch_picongpu.profile.example
 create mode 100644 etc/picongpu/jureca-jsc/gpus.tpl
 create mode 100644 etc/picongpu/jureca-jsc/gpus_picongpu.profile.example

diff --git a/docs/source/install/profile.rst b/docs/source/install/profile.rst
index 461aa1a10f..72355fb4a7 100644
--- a/docs/source/install/profile.rst
+++ b/docs/source/install/profile.rst
@@ -230,3 +230,32 @@ Queue: dvd_usr_prod (Nvidia P100 GPUs)
 
 .. literalinclude:: profiles/davide-cineca/gpu_picongpu.profile.example
    :language: bash
+
+JURECA (JSC)
+--------------------
+
+**System overview:** `link <http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/JURECA/JURECA_node.html>`_
+
+**User guide:** `link <http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/JURECA/UserInfo/UserInfo_node.html>`_
+
+**Production directory:** ``$SCRATCH`` (`link <http://www.fz-juelich.de/SharedDocs/FAQs/IAS/JSC/EN/JUST/FAQ_00_File_systems.html?nn=1297148>`_)
+
+For this profile to work, you need to download the :ref:`PIConGPU source code <install-dependencies-picongpu>` manually.
+
+Queue: batch (2 x Intel Xeon E5-2680 v3 CPUs, 12 Cores + 12 Hyperthreads/CPU)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. literalinclude:: profiles/jureca-jsc/batch_picongpu.profile.example
+   :language: bash
+
+Queue: gpus (2 x Nvidia Tesla K80 GPUs)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. literalinclude:: profiles/jureca-jsc/gpus_picongpu.profile.example
+   :language: bash
+
+  Queue: booster (Intel Xeon Phi 7250-F, 68 cores + Hyperthreads)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. literalinclude:: profiles/jureca-jsc/booster_picongpu.profile.example
+   :language: bash
diff --git a/etc/picongpu/jureca-jsc/batch.tpl b/etc/picongpu/jureca-jsc/batch.tpl
new file mode 100644
index 0000000000..269c715894
--- /dev/null
+++ b/etc/picongpu/jureca-jsc/batch.tpl
@@ -0,0 +1,93 @@
+#!/usr/bin/env bash
+# Copyright 2013-2019 Axel Huebl, Richard Pausch, Rene Widera, Sergei Bastrakov
+#
+# This file is part of PIConGPU.
+#
+# PIConGPU is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# PIConGPU is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with PIConGPU.
+# If not, see <http://www.gnu.org/licenses/>.
+#
+
+
+# PIConGPU batch script for JURECA's SLURM batch system
+
+#SBATCH --account=!TBG_nameProject
+#SBATCH --partition=!TBG_queue
+#SBATCH --time=!TBG_wallTime
+# Sets batch job's name
+#SBATCH --job-name=!TBG_jobName
+#SBATCH --nodes=!TBG_nodes
+#SBATCH --ntasks=!TBG_tasks
+#SBATCH --ntasks-per-node=!TBG_devicesPerNode
+#SBATCH --mem=!TBG_memPerNode
+#SBATCH --mail-type=!TBG_mailSettings
+#SBATCH --mail-user=!TBG_mailAddress
+#SBATCH --workdir=!TBG_dstPath
+
+#SBATCH -o stdout
+#SBATCH -e stderr
+
+
+## calculations will be performed by tbg ##
+.TBG_queue="batch"
+
+# settings that can be controlled by environment variables before submit
+.TBG_mailSettings=${MY_MAILNOTIFY:-"NONE"}
+.TBG_mailAddress=${MY_MAIL:-"someone@example.com"}
+.TBG_author=${MY_NAME:+--author \"${MY_NAME}\"}
+.TBG_nameProject=${proj:-""}
+.TBG_profile=${PIC_PROFILE:-"~/picongpu.profile"}
+
+# number of available/hosted devices per node in the system
+.TBG_numHostedDevicesPerNode=2
+
+# required devices per node for the current job
+.TBG_devicesPerNode=`if [ $TBG_tasks -gt $TBG_numHostedDevicesPerNode ] ; then echo $TBG_numHostedDevicesPerNode; else echo $TBG_tasks; fi`
+
+# host memory per device
+.TBG_memPerCPU="$((126000 / $TBG_devicesPerNode))"
+# host memory per node
+.TBG_memPerNode="$((TBG_memPerCPU * TBG_devicesPerNode))"
+
+# We only start 1 MPI task per device
+.TBG_mpiTasksPerNode="$(( TBG_devicesPerNode * 1 ))"
+
+# use ceil to caculate nodes
+.TBG_nodes="$((( TBG_tasks + TBG_devicesPerNode - 1 ) / TBG_devicesPerNode))"
+
+## end calculations ##
+
+echo 'Running program...'
+
+cd !TBG_dstPath
+
+export MODULES_NO_OUTPUT=1
+source !TBG_profile
+if [ $? -ne 0 ] ; then
+  echo "Error: PIConGPU environment profile under \"!TBG_profile\" not found!"
+  exit 1
+fi
+unset MODULES_NO_OUTPUT
+
+#set user rights to u=rwx;g=r-x;o=---
+umask 0027
+
+mkdir simOutput 2> /dev/null
+cd simOutput
+ln -s ../stdout output
+
+if [ $? -eq 0 ] ; then
+  # Run PIConGPU
+  export OMP_NUM_THREADS=24
+  srun --cpu_bind=sockets !TBG_dstPath/input/bin/picongpu !TBG_author !TBG_programParams
+fi
diff --git a/etc/picongpu/jureca-jsc/batch_picongpu.profile.example b/etc/picongpu/jureca-jsc/batch_picongpu.profile.example
new file mode 100644
index 0000000000..42436ac74c
--- /dev/null
+++ b/etc/picongpu/jureca-jsc/batch_picongpu.profile.example
@@ -0,0 +1,109 @@
+# Name and Path of this Script ############################### (DO NOT change!)
+export PIC_PROFILE=$(cd $(dirname $BASH_SOURCE) && pwd)"/"$(basename $BASH_SOURCE)
+
+# User Information ######################################### (edit those lines)
+#   - automatically add your name and contact to output file meta data
+#   - send me a mail on batch system jobs: NONE, BEGIN, END, FAIL, REQUEUE, ALL,
+#     TIME_LIMIT, TIME_LIMIT_90, TIME_LIMIT_80 and/or TIME_LIMIT_50
+export MY_MAILNOTIFY="NONE"
+export MY_MAIL="someone@example.com"
+export MY_NAME="$(whoami) <$MY_MAIL>"
+
+# Project Information ######################################## (edit this line)
+#   - project account for computing time
+export proj=$(groups | awk '{print $5}')
+
+# Text Editor for Tools ###################################### (edit this line)
+#   - examples: "nano", "vim", "emacs -nw", "vi" or without terminal: "gedit"
+#export EDITOR="nano"
+
+# General modules #############################################################
+#
+module purge
+module load Intel/2019.0.117-GCC-7.3.0
+module load CMake/3.13.0
+module load IntelMPI/2018.4.274
+module load Python/3.6.6
+module load Boost/1.68.0-Python-3.6.6
+
+# Other Software ##############################################################
+#
+module load zlib/.1.2.11
+module load HDF5/1.10.1
+module load libpng/.1.6.35
+export CMAKE_PREFIX_PATH=$EBROOTZLIB:$EBROOTLIBPNG:$CMAKE_PREFIX_PATH
+
+PARTITION_LIB=/p/project/$proj/lib_batch
+LIBSPLASH_ROOT=$PARTITION_LIB/libSplash
+PNGWRITER_ROOT=$PARTITION_LIB/pngwriter
+export CMAKE_PREFIX_PATH=$LIBSPLASH_ROOT:$PNGWRITER_ROOT:$CMAKE_PREFIX_PATH
+
+BLOSC_ROOT=$PARTITION_LIB/c-blosc
+export CMAKE_PREFIX_PATH=$BLOSC_ROOT:$CMAKE_PREFIX_PATH
+export LD_LIBRARY_PATH=$BLOSC_ROOT/lib:$LD_LIBRARY_PATH
+
+ADIOS_ROOT=$PARTITION_LIB/adios
+export PATH=$ADIOS_ROOT/bin:$PATH
+export CMAKE_PREFIX_PATH=$ADIOS_ROOT:$CMAKE_PREFIX_PATH
+
+# Environment #################################################################
+#
+#export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$BOOST_LIB
+
+export PICSRC=$HOME/src/picongpu
+export PIC_EXAMPLES=$PICSRC/share/picongpu/examples
+export PIC_BACKEND="omp2b:haswell"
+
+export PATH=$PATH:$PICSRC
+export PATH=$PATH:$PICSRC/bin
+export PATH=$PATH:$PICSRC/src/tools/bin
+
+export CC=$(which icc)
+export CXX=$(which icpc)
+
+export PYTHONPATH=$PICSRC/lib/python:$PYTHONPATH
+
+# Location for simulation results, purged after 90 days
+PROJECT_SCRATCH=SCRATCH_$proj
+export SCRATCH=${!PROJECT_SCRATCH}
+
+# "tbg" default options #######################################################
+#   - SLURM (sbatch)
+#   - "batch" queue
+export TBG_SUBMIT="sbatch"
+export TBG_TPLFILE="etc/picongpu/jureca-jsc/batch.tpl"
+
+# allocate an interactive shell for one hour
+#   getNode 2  # allocates 2 interactive nodes (default: 1)
+function getNode() {
+    if [ -z "$1" ] ; then
+        numNodes=1
+    else
+        numNodes=$1
+    fi
+    if [ $numNodes -gt 8 ] ; then
+        echo "The maximal number of interactive nodes is 8." 1>&2
+        return 1
+    fi
+    echo "Hint: please use 'srun --cpu_bind=sockets <COMMAND>' for launching multiple processes in the interactive mode"
+    export OMP_NUM_THREADS=24
+    salloc --time=1:00:00 --nodes=$numNodes --ntasks-per-node=2 --mem=126000 -A $proj -p devel bash
+}
+
+# allocate an interactive shell for one hour
+#   getDevice 2  # allocates 2 interactive devices (default: 1)
+function getDevice() {
+    if [ -z "$1" ] ; then
+        numDevices=1
+    else
+        if [ "$1" -gt 2 ] ; then
+            echo "The maximal number of devices per node is 2." 1>&2
+            return 1
+        else
+            numDevices=$1
+        fi
+    fi
+    echo "Hint: please use 'srun --cpu_bind=sockets <COMMAND>' for launching multiple processes in the interactive mode"
+    export OMP_NUM_THREADS=24
+    salloc --time=1:00:00 --ntasks-per-node=$(($numDevices)) --mem=126000 -A $proj -p devel bash
+}
diff --git a/etc/picongpu/jureca-jsc/gpus.tpl b/etc/picongpu/jureca-jsc/gpus.tpl
new file mode 100644
index 0000000000..a26a5ed44c
--- /dev/null
+++ b/etc/picongpu/jureca-jsc/gpus.tpl
@@ -0,0 +1,102 @@
+#!/usr/bin/env bash
+# Copyright 2013-2019 Axel Huebl, Richard Pausch, Rene Widera, Sergei Bastrakov
+#
+# This file is part of PIConGPU.
+#
+# PIConGPU is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# PIConGPU is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with PIConGPU.
+# If not, see <http://www.gnu.org/licenses/>.
+#
+
+
+# PIConGPU batch script for JURECA's SLURM batch system
+
+#SBATCH --account=!TBG_nameProject
+#SBATCH --partition=!TBG_queue
+#SBATCH --time=!TBG_wallTime
+# Sets batch job's name
+#SBATCH --job-name=!TBG_jobName
+#SBATCH --nodes=!TBG_nodes
+#SBATCH --ntasks=!TBG_tasks
+#SBATCH --ntasks-per-node=!TBG_devicesPerNode
+#SBATCH --mincpus=!TBG_mpiTasksPerNode
+#SBATCH --mem=!TBG_memPerNode
+#SBATCH --gres=gpu:!TBG_devicesPerNode
+#SBATCH --mail-type=!TBG_mailSettings
+#SBATCH --mail-user=!TBG_mailAddress
+#SBATCH --workdir=!TBG_dstPath
+
+#SBATCH -o stdout
+#SBATCH -e stderr
+
+
+## calculations will be performed by tbg ##
+.TBG_queue="gpus"
+
+# settings that can be controlled by environment variables before submit
+.TBG_mailSettings=${MY_MAILNOTIFY:-"NONE"}
+.TBG_mailAddress=${MY_MAIL:-"someone@example.com"}
+.TBG_author=${MY_NAME:+--author \"${MY_NAME}\"}
+.TBG_nameProject=${proj:-""}
+.TBG_profile=${PIC_PROFILE:-"~/picongpu.profile"}
+
+# number of available/hosted devices per node in the system
+.TBG_numHostedDevicesPerNode=4
+
+# required GPUs per node for the current job
+.TBG_devicesPerNode=`if [ $TBG_tasks -gt $TBG_numHostedDevicesPerNode ] ; then echo $TBG_numHostedDevicesPerNode; else echo $TBG_tasks; fi`
+
+# host memory per device
+.TBG_memPerDevice="$((126000 / $TBG_devicesPerNode))"
+# host memory per node
+.TBG_memPerNode="$((TBG_memPerDevice * TBG_devicesPerNode))"
+
+# We only start 1 MPI task per device
+.TBG_mpiTasksPerNode="$(( TBG_devicesPerNode * 1 ))"
+
+# use ceil to caculate nodes
+.TBG_nodes="$((( TBG_tasks + TBG_devicesPerNode - 1 ) / TBG_devicesPerNode))"
+
+## end calculations ##
+
+echo 'Running program...'
+
+cd !TBG_dstPath
+
+export MODULES_NO_OUTPUT=1
+source !TBG_profile
+if [ $? -ne 0 ] ; then
+  echo "Error: PIConGPU environment profile under \"!TBG_profile\" not found!"
+  exit 1
+fi
+unset MODULES_NO_OUTPUT
+
+#set user rights to u=rwx;g=r-x;o=---
+umask 0027
+
+mkdir simOutput 2> /dev/null
+cd simOutput
+ln -s ../stdout output
+
+# test if cuda_memtest binary is available and we have the node exclusive
+if [ -f !TBG_dstPath/input/bin/cuda_memtest ] && [ !TBG_numHostedDevicesPerNode -eq !TBG_devicesPerNode ] ; then
+  # Run CUDA memtest to check GPU's health
+  srun --cpu_bind=sockets !TBG_dstPath/input/bin/cuda_memtest.sh
+else
+  echo "no binary 'cuda_memtest' available or compute node is not exclusively allocated, skip GPU memory test" >&2
+fi
+
+if [ $? -eq 0 ] ; then
+  # Run PIConGPU
+  srun --cpu_bind=sockets !TBG_dstPath/input/bin/picongpu !TBG_author !TBG_programParams
+fi
diff --git a/etc/picongpu/jureca-jsc/gpus_picongpu.profile.example b/etc/picongpu/jureca-jsc/gpus_picongpu.profile.example
new file mode 100644
index 0000000000..8bccbf15f8
--- /dev/null
+++ b/etc/picongpu/jureca-jsc/gpus_picongpu.profile.example
@@ -0,0 +1,112 @@
+# Name and Path of this Script ############################### (DO NOT change!)
+export PIC_PROFILE=$(cd $(dirname $BASH_SOURCE) && pwd)"/"$(basename $BASH_SOURCE)
+
+# User Information ######################################### (edit those lines)
+#   - automatically add your name and contact to output file meta data
+#   - send me a mail on batch system jobs: NONE, BEGIN, END, FAIL, REQUEUE, ALL,
+#     TIME_LIMIT, TIME_LIMIT_90, TIME_LIMIT_80 and/or TIME_LIMIT_50
+export MY_MAILNOTIFY="NONE"
+export MY_MAIL="someone@example.com"
+export MY_NAME="$(whoami) <$MY_MAIL>"
+
+# Project Information ######################################## (edit this line)
+#   - project account for computing time
+export proj=$(groups | awk '{print $5}')
+
+# Text Editor for Tools ###################################### (edit this line)
+#   - examples: "nano", "vim", "emacs -nw", "vi" or without terminal: "gedit"
+#export EDITOR="nano"
+
+# General modules #############################################################
+#
+module purge
+module load GCC/7.3.0
+module load CUDA/9.2.88
+module load CMake/3.13.0
+module load MVAPICH2/2.3-GDR
+module load Python/3.6.6
+
+# Other Software ##############################################################
+#
+module load zlib/.1.2.11
+module load libpng/.1.6.35
+export CMAKE_PREFIX_PATH=$EBROOTZLIB:$EBROOTLIBPNG:$CMAKE_PREFIX_PATH
+
+PARTITION_LIB=/p/project/$proj/lib_gpus
+BOOST_ROOT=$PARTITION_LIB/boost
+export CMAKE_PREFIX_PATH=$BOOST_ROOT:$CMAKE_PREFIX_PATH
+export LD_LIBRARY_PATH=$BOOST_ROOT/lib:$LD_LIBRARY_PATH
+
+HDF5_ROOT=$PARTITION_LIB/hdf5
+export PATH=$HDF5_ROOT/bin:$PATH
+export CMAKE_PREFIX_PATH=$HDF5_ROOT:$CMAKE_PREFIX_PATH
+export LD_LIBRARY_PATH=$HDF5_ROOT/lib:$LD_LIBRARY_PATH
+
+LIBSPLASH_ROOT=$PARTITION_LIB/libSplash
+PNGWRITER_ROOT=$PARTITION_LIB/pngwriter
+export CMAKE_PREFIX_PATH=$LIBSPLASH_ROOT:$PNGWRITER_ROOT:$CMAKE_PREFIX_PATH
+
+BLOSC_ROOT=$PARTITION_LIB/c-blosc
+export CMAKE_PREFIX_PATH=$BLOSC_ROOT:$CMAKE_PREFIX_PATH
+export LD_LIBRARY_PATH=$BLOSC_ROOT/lib:$LD_LIBRARY_PATH
+
+ADIOS_ROOT=$PARTITION_LIB/adios
+export PATH=$ADIOS_ROOT/bin:$PATH
+export CMAKE_PREFIX_PATH=$ADIOS_ROOT:$CMAKE_PREFIX_PATH
+
+# Environment #################################################################
+#
+#export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$BOOST_LIB
+
+export PICSRC=$HOME/src/picongpu
+export PIC_EXAMPLES=$PICSRC/share/picongpu/examples
+export PIC_BACKEND="cuda:37" # Nvidia K80 architecture
+
+export PATH=$PATH:$PICSRC
+export PATH=$PATH:$PICSRC/bin
+export PATH=$PATH:$PICSRC/src/tools/bin
+
+export PYTHONPATH=$PICSRC/lib/python:$PYTHONPATH
+
+# Location for simulation results, purged after 90 days
+PROJECT_SCRATCH=SCRATCH_$proj
+export SCRATCH=${!PROJECT_SCRATCH}
+
+# "tbg" default options #######################################################
+#   - SLURM (sbatch)
+#   - "gpus" queue
+export TBG_SUBMIT="sbatch"
+export TBG_TPLFILE="etc/picongpu/jureca-jsc/gpus.tpl"
+
+# allocate an interactive shell for one hour
+#   getNode 2  # allocates 2 interactive nodes (default: 1)
+function getNode() {
+    if [ -z "$1" ] ; then
+        numNodes=1
+    else
+        numNodes=$1
+    fi
+    if [ $numNodes -gt 8 ] ; then
+        echo "The maximal number of interactive nodes is 8." 1>&2
+        return 1
+    fi
+    echo "Hint: please use 'srun --cpu_bind=sockets <COMMAND>' for launching multiple processes in the interactive mode"
+    salloc --time=1:00:00 --nodes=$numNodes --ntasks-per-node=4 --gres=gpu:4 --mem=126000 -A $proj -p develgpus bash
+}
+
+# allocate an interactive shell for one hour
+#   getDevice 2  # allocates 2 interactive devices (default: 1)
+function getDevice() {
+    if [ -z "$1" ] ; then
+        numDevices=1
+    else
+        if [ "$1" -gt 4 ] ; then
+            echo "The maximal number of devices per node is 4." 1>&2
+            return 1
+        else
+            numDevices=$1
+        fi
+    fi
+    echo "Hint: please use 'srun --cpu_bind=sockets <COMMAND>' for launching multiple processes in the interactive mode"
+    salloc --time=1:00:00 --ntasks-per-node=$(($numDevices)) --gres=gpu:4 --mem=126000 -A $proj -p develgpus bash
+}

From e8462a2e12332f262a27f896cc1d9a3b355e6974 Mon Sep 17 00:00:00 2001
From: Sergei Bastrakov <sergey.bastrakov@gmail.com>
Date: Wed, 30 Jan 2019 11:44:29 +0100
Subject: [PATCH 25/40] Add profile and .tpl for the Booster (KNL) partition

---
 etc/picongpu/jureca-jsc/booster.tpl           |  93 +++++++++++++++
 .../booster_picongpu.profile.example          | 108 ++++++++++++++++++
 2 files changed, 201 insertions(+)
 create mode 100644 etc/picongpu/jureca-jsc/booster.tpl
 create mode 100644 etc/picongpu/jureca-jsc/booster_picongpu.profile.example

diff --git a/etc/picongpu/jureca-jsc/booster.tpl b/etc/picongpu/jureca-jsc/booster.tpl
new file mode 100644
index 0000000000..38c6faf4d5
--- /dev/null
+++ b/etc/picongpu/jureca-jsc/booster.tpl
@@ -0,0 +1,93 @@
+#!/usr/bin/env bash
+# Copyright 2013-2019 Axel Huebl, Richard Pausch, Rene Widera, Sergei Bastrakov
+#
+# This file is part of PIConGPU.
+#
+# PIConGPU is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# PIConGPU is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with PIConGPU.
+# If not, see <http://www.gnu.org/licenses/>.
+#
+
+
+# PIConGPU batch script for JURECA's SLURM batch system
+
+#SBATCH --account=!TBG_nameProject
+#SBATCH --partition=!TBG_queue
+#SBATCH --time=!TBG_wallTime
+# Sets batch job's name
+#SBATCH --job-name=!TBG_jobName
+#SBATCH --nodes=!TBG_nodes
+#SBATCH --ntasks=!TBG_tasks
+#SBATCH --ntasks-per-node=!TBG_devicesPerNode
+#SBATCH --mem=!TBG_memPerNode
+#SBATCH --mail-type=!TBG_mailSettings
+#SBATCH --mail-user=!TBG_mailAddress
+#SBATCH --workdir=!TBG_dstPath
+
+#SBATCH -o stdout
+#SBATCH -e stderr
+
+
+## calculations will be performed by tbg ##
+.TBG_queue="booster"
+
+# settings that can be controlled by environment variables before submit
+.TBG_mailSettings=${MY_MAILNOTIFY:-"NONE"}
+.TBG_mailAddress=${MY_MAIL:-"someone@example.com"}
+.TBG_author=${MY_NAME:+--author \"${MY_NAME}\"}
+.TBG_nameProject=${proj:-""}
+.TBG_profile=${PIC_PROFILE:-"~/picongpu.profile"}
+
+# KNL is used in quadrant mode, treat each quadrant as a device
+.TBG_numHostedDevicesPerNode=4
+
+# required devices per node for the current job
+.TBG_devicesPerNode=`if [ $TBG_tasks -gt $TBG_numHostedDevicesPerNode ] ; then echo $TBG_numHostedDevicesPerNode; else echo $TBG_tasks; fi`
+
+# host memory per device
+.TBG_memPerCPU="$((94000 / $TBG_devicesPerNode))"
+# host memory per node
+.TBG_memPerNode="$((TBG_memPerCPU * TBG_devicesPerNode))"
+
+# We only start 1 MPI task per device
+.TBG_mpiTasksPerNode="$(( TBG_devicesPerNode * 1 ))"
+
+# use ceil to caculate nodes
+.TBG_nodes="$((( TBG_tasks + TBG_devicesPerNode - 1 ) / TBG_devicesPerNode))"
+
+## end calculations ##
+
+echo 'Running program...'
+
+cd !TBG_dstPath
+
+export MODULES_NO_OUTPUT=1
+source !TBG_profile
+if [ $? -ne 0 ] ; then
+  echo "Error: PIConGPU environment profile under \"!TBG_profile\" not found!"
+  exit 1
+fi
+unset MODULES_NO_OUTPUT
+
+#set user rights to u=rwx;g=r-x;o=---
+umask 0027
+
+mkdir simOutput 2> /dev/null
+cd simOutput
+ln -s ../stdout output
+
+if [ $? -eq 0 ] ; then
+  # Run PIConGPU
+  export OMP_NUM_THREADS=34
+  srun !TBG_dstPath/input/bin/picongpu !TBG_author !TBG_programParams
+fi
diff --git a/etc/picongpu/jureca-jsc/booster_picongpu.profile.example b/etc/picongpu/jureca-jsc/booster_picongpu.profile.example
new file mode 100644
index 0000000000..4275c532cb
--- /dev/null
+++ b/etc/picongpu/jureca-jsc/booster_picongpu.profile.example
@@ -0,0 +1,108 @@
+# Name and Path of this Script ############################### (DO NOT change!)
+export PIC_PROFILE=$(cd $(dirname $BASH_SOURCE) && pwd)"/"$(basename $BASH_SOURCE)
+
+# User Information ######################################### (edit those lines)
+#   - automatically add your name and contact to output file meta data
+#   - send me a mail on batch system jobs: NONE, BEGIN, END, FAIL, REQUEUE, ALL,
+#     TIME_LIMIT, TIME_LIMIT_90, TIME_LIMIT_80 and/or TIME_LIMIT_50
+export MY_MAILNOTIFY="NONE"
+export MY_MAIL="someone@example.com"
+export MY_NAME="$(whoami) <$MY_MAIL>"
+
+# Project Information ######################################## (edit this line)
+#   - project account for computing time
+export proj=$(groups | awk '{print $5}')
+
+# Text Editor for Tools ###################################### (edit this line)
+#   - examples: "nano", "vim", "emacs -nw", "vi" or without terminal: "gedit"
+#export EDITOR="nano"
+
+# General modules #############################################################
+#
+module purge
+module load Architecture/KNL
+module load Intel/2019.0.117-GCC-7.3.0
+module load CMake/3.12.3
+module load IntelMPI/2018.4.274
+module load Python/3.6.6
+module load Boost/1.68.0-Python-3.6.6
+
+# Other Software ##############################################################
+#
+module load zlib/.1.2.11
+module load HDF5/1.10.1
+module load libpng/.1.6.35
+export CMAKE_PREFIX_PATH=$EBROOTZLIB:$EBROOTLIBPNG:$CMAKE_PREFIX_PATH
+
+PARTITION_LIB=/p/project/$proj/lib_booster
+LIBSPLASH_ROOT=$PARTITION_LIB/libSplash
+PNGWRITER_ROOT=$PARTITION_LIB/pngwriter
+export CMAKE_PREFIX_PATH=$LIBSPLASH_ROOT:$PNGWRITER_ROOT:$CMAKE_PREFIX_PATH
+
+BLOSC_ROOT=$PARTITION_LIB/c-blosc
+export CMAKE_PREFIX_PATH=$BLOSC_ROOT:$CMAKE_PREFIX_PATH
+export LD_LIBRARY_PATH=$BLOSC_ROOT/lib:$LD_LIBRARY_PATH
+
+ADIOS_ROOT=$PARTITION_LIB/adios
+export PATH=$ADIOS_ROOT/bin:$PATH
+export CMAKE_PREFIX_PATH=$ADIOS_ROOT:$CMAKE_PREFIX_PATH
+
+# Environment #################################################################
+#
+#export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$BOOST_LIB
+
+export PICSRC=$HOME/src/picongpu
+export PIC_EXAMPLES=$PICSRC/share/picongpu/examples
+export PIC_BACKEND="omp2b:MIC-AVX512"
+
+export PATH=$PATH:$PICSRC
+export PATH=$PATH:$PICSRC/bin
+export PATH=$PATH:$PICSRC/src/tools/bin
+
+export CC=$(which icc)
+export CXX=$(which icpc)
+
+export PYTHONPATH=$PICSRC/lib/python:$PYTHONPATH
+
+# Location for simulation results, purged after 90 days
+PROJECT_SCRATCH=SCRATCH_$proj
+export SCRATCH=${!PROJECT_SCRATCH}
+
+# "tbg" default options #######################################################
+#   - SLURM (sbatch)
+#   - "booster" queue
+export TBG_SUBMIT="sbatch"
+export TBG_TPLFILE="etc/picongpu/jureca-jsc/booster.tpl"
+
+# allocate an interactive shell for one hour
+#   getNode 2  # allocates 2 interactive nodes (default: 1)
+function getNode() {
+    if [ -z "$1" ] ; then
+        numNodes=1
+    else
+        numNodes=$1
+    fi
+    if [ $numNodes -gt 8 ] ; then
+        echo "The maximal number of interactive nodes is 8." 1>&2
+        return 1
+    fi
+    export OMP_NUM_THREADS=34
+    salloc --time=1:00:00 --nodes=$numNodes --ntasks-per-node=4 --mem=94000 -A $proj -p develbooster bash
+}
+
+# allocate an interactive shell for one hour
+#   getDevice 2  # allocates 2 interactive devices (default: 1)
+function getDevice() {
+    if [ -z "$1" ] ; then
+        numDevices=1
+    else
+        if [ "$1" -gt 1 ] ; then
+            echo "The maximal number of devices per node is 4." 1>&2
+            return 1
+        else
+            numDevices=$1
+        fi
+    fi
+    export OMP_NUM_THREADS=34
+    salloc --time=1:00:00 --ntasks-per-node=$(($numDevices)) --mem=94000 -A $proj -p develbooster bash
+}

From 96476f88b5c7328a3a0076c335b5f6171cb428a8 Mon Sep 17 00:00:00 2001
From: Axel Huebl <axel.huebl@plasma.ninja>
Date: Thu, 31 Jan 2019 12:47:45 +0100
Subject: [PATCH 26/40] Apply reviewer suggestions

Co-Authored-By: Sergei Bastrakov <sergey.bastrakov@gmail.com>
---
 docs/source/install/profile.rst                      | 12 ++++++------
 etc/picongpu/jureca-jsc/batch.tpl                    |  2 +-
 .../jureca-jsc/batch_picongpu.profile.example        |  9 ++++-----
 etc/picongpu/jureca-jsc/booster.tpl                  |  4 ++--
 .../jureca-jsc/booster_picongpu.profile.example      |  9 ++++-----
 etc/picongpu/jureca-jsc/gpus.tpl                     |  2 +-
 .../jureca-jsc/gpus_picongpu.profile.example         |  9 ++++-----
 7 files changed, 22 insertions(+), 25 deletions(-)

diff --git a/docs/source/install/profile.rst b/docs/source/install/profile.rst
index 72355fb4a7..d6fdc5c26a 100644
--- a/docs/source/install/profile.rst
+++ b/docs/source/install/profile.rst
@@ -232,7 +232,7 @@ Queue: dvd_usr_prod (Nvidia P100 GPUs)
    :language: bash
 
 JURECA (JSC)
---------------------
+------------
 
 **System overview:** `link <http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/JURECA/JURECA_node.html>`_
 
@@ -240,22 +240,22 @@ JURECA (JSC)
 
 **Production directory:** ``$SCRATCH`` (`link <http://www.fz-juelich.de/SharedDocs/FAQs/IAS/JSC/EN/JUST/FAQ_00_File_systems.html?nn=1297148>`_)
 
-For this profile to work, you need to download the :ref:`PIConGPU source code <install-dependencies-picongpu>` manually.
+For these profiles to work, you need to download the :ref:`PIConGPU source code <install-dependencies-picongpu>` and install :ref:`PNGwriter, c-blosc, adios and libSplash <install-dependencies>`, for the gpus partition also :ref:`Boost and HDF5 <install-dependencies>`, manually.
 
 Queue: batch (2 x Intel Xeon E5-2680 v3 CPUs, 12 Cores + 12 Hyperthreads/CPU)
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 .. literalinclude:: profiles/jureca-jsc/batch_picongpu.profile.example
    :language: bash
 
 Queue: gpus (2 x Nvidia Tesla K80 GPUs)
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 .. literalinclude:: profiles/jureca-jsc/gpus_picongpu.profile.example
    :language: bash
 
-  Queue: booster (Intel Xeon Phi 7250-F, 68 cores + Hyperthreads)
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Queue: booster (Intel Xeon Phi 7250-F, 68 cores + Hyperthreads)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 .. literalinclude:: profiles/jureca-jsc/booster_picongpu.profile.example
    :language: bash
diff --git a/etc/picongpu/jureca-jsc/batch.tpl b/etc/picongpu/jureca-jsc/batch.tpl
index 269c715894..24821b47ab 100644
--- a/etc/picongpu/jureca-jsc/batch.tpl
+++ b/etc/picongpu/jureca-jsc/batch.tpl
@@ -52,7 +52,7 @@
 .TBG_numHostedDevicesPerNode=2
 
 # required devices per node for the current job
-.TBG_devicesPerNode=`if [ $TBG_tasks -gt $TBG_numHostedDevicesPerNode ] ; then echo $TBG_numHostedDevicesPerNode; else echo $TBG_tasks; fi`
+.TBG_devicesPerNode=$(if [ $TBG_tasks -gt $TBG_numHostedDevicesPerNode ] ; then echo $TBG_numHostedDevicesPerNode; else echo $TBG_tasks; fi)
 
 # host memory per device
 .TBG_memPerCPU="$((126000 / $TBG_devicesPerNode))"
diff --git a/etc/picongpu/jureca-jsc/batch_picongpu.profile.example b/etc/picongpu/jureca-jsc/batch_picongpu.profile.example
index 42436ac74c..8e22206a95 100644
--- a/etc/picongpu/jureca-jsc/batch_picongpu.profile.example
+++ b/etc/picongpu/jureca-jsc/batch_picongpu.profile.example
@@ -17,6 +17,9 @@ export proj=$(groups | awk '{print $5}')
 #   - examples: "nano", "vim", "emacs -nw", "vi" or without terminal: "gedit"
 #export EDITOR="nano"
 
+# Set up environment, including $SCRATCH and $PROJECT
+jutil env activate -p $proj
+
 # General modules #############################################################
 #
 module purge
@@ -33,7 +36,7 @@ module load HDF5/1.10.1
 module load libpng/.1.6.35
 export CMAKE_PREFIX_PATH=$EBROOTZLIB:$EBROOTLIBPNG:$CMAKE_PREFIX_PATH
 
-PARTITION_LIB=/p/project/$proj/lib_batch
+PARTITION_LIB=$PROJECT/lib_batch
 LIBSPLASH_ROOT=$PARTITION_LIB/libSplash
 PNGWRITER_ROOT=$PARTITION_LIB/pngwriter
 export CMAKE_PREFIX_PATH=$LIBSPLASH_ROOT:$PNGWRITER_ROOT:$CMAKE_PREFIX_PATH
@@ -63,10 +66,6 @@ export CXX=$(which icpc)
 
 export PYTHONPATH=$PICSRC/lib/python:$PYTHONPATH
 
-# Location for simulation results, purged after 90 days
-PROJECT_SCRATCH=SCRATCH_$proj
-export SCRATCH=${!PROJECT_SCRATCH}
-
 # "tbg" default options #######################################################
 #   - SLURM (sbatch)
 #   - "batch" queue
diff --git a/etc/picongpu/jureca-jsc/booster.tpl b/etc/picongpu/jureca-jsc/booster.tpl
index 38c6faf4d5..1d2342f286 100644
--- a/etc/picongpu/jureca-jsc/booster.tpl
+++ b/etc/picongpu/jureca-jsc/booster.tpl
@@ -52,7 +52,7 @@
 .TBG_numHostedDevicesPerNode=4
 
 # required devices per node for the current job
-.TBG_devicesPerNode=`if [ $TBG_tasks -gt $TBG_numHostedDevicesPerNode ] ; then echo $TBG_numHostedDevicesPerNode; else echo $TBG_tasks; fi`
+.TBG_devicesPerNode=$(if [ $TBG_tasks -gt $TBG_numHostedDevicesPerNode ] ; then echo $TBG_numHostedDevicesPerNode; else echo $TBG_tasks; fi)
 
 # host memory per device
 .TBG_memPerCPU="$((94000 / $TBG_devicesPerNode))"
@@ -60,7 +60,7 @@
 .TBG_memPerNode="$((TBG_memPerCPU * TBG_devicesPerNode))"
 
 # We only start 1 MPI task per device
-.TBG_mpiTasksPerNode="$(( TBG_devicesPerNode * 1 ))"
+.TBG_mpiTasksPerNode="$((TBG_devicesPerNode))"
 
 # use ceil to caculate nodes
 .TBG_nodes="$((( TBG_tasks + TBG_devicesPerNode - 1 ) / TBG_devicesPerNode))"
diff --git a/etc/picongpu/jureca-jsc/booster_picongpu.profile.example b/etc/picongpu/jureca-jsc/booster_picongpu.profile.example
index 4275c532cb..a8906199cd 100644
--- a/etc/picongpu/jureca-jsc/booster_picongpu.profile.example
+++ b/etc/picongpu/jureca-jsc/booster_picongpu.profile.example
@@ -17,6 +17,9 @@ export proj=$(groups | awk '{print $5}')
 #   - examples: "nano", "vim", "emacs -nw", "vi" or without terminal: "gedit"
 #export EDITOR="nano"
 
+# Set up environment, including $SCRATCH and $PROJECT
+jutil env activate -p $proj
+
 # General modules #############################################################
 #
 module purge
@@ -34,7 +37,7 @@ module load HDF5/1.10.1
 module load libpng/.1.6.35
 export CMAKE_PREFIX_PATH=$EBROOTZLIB:$EBROOTLIBPNG:$CMAKE_PREFIX_PATH
 
-PARTITION_LIB=/p/project/$proj/lib_booster
+PARTITION_LIB=$PROJECT/lib_booster
 LIBSPLASH_ROOT=$PARTITION_LIB/libSplash
 PNGWRITER_ROOT=$PARTITION_LIB/pngwriter
 export CMAKE_PREFIX_PATH=$LIBSPLASH_ROOT:$PNGWRITER_ROOT:$CMAKE_PREFIX_PATH
@@ -64,10 +67,6 @@ export CXX=$(which icpc)
 
 export PYTHONPATH=$PICSRC/lib/python:$PYTHONPATH
 
-# Location for simulation results, purged after 90 days
-PROJECT_SCRATCH=SCRATCH_$proj
-export SCRATCH=${!PROJECT_SCRATCH}
-
 # "tbg" default options #######################################################
 #   - SLURM (sbatch)
 #   - "booster" queue
diff --git a/etc/picongpu/jureca-jsc/gpus.tpl b/etc/picongpu/jureca-jsc/gpus.tpl
index a26a5ed44c..5160588adf 100644
--- a/etc/picongpu/jureca-jsc/gpus.tpl
+++ b/etc/picongpu/jureca-jsc/gpus.tpl
@@ -54,7 +54,7 @@
 .TBG_numHostedDevicesPerNode=4
 
 # required GPUs per node for the current job
-.TBG_devicesPerNode=`if [ $TBG_tasks -gt $TBG_numHostedDevicesPerNode ] ; then echo $TBG_numHostedDevicesPerNode; else echo $TBG_tasks; fi`
+.TBG_devicesPerNode=$(if [ $TBG_tasks -gt $TBG_numHostedDevicesPerNode ] ; then echo $TBG_numHostedDevicesPerNode; else echo $TBG_tasks; fi)
 
 # host memory per device
 .TBG_memPerDevice="$((126000 / $TBG_devicesPerNode))"
diff --git a/etc/picongpu/jureca-jsc/gpus_picongpu.profile.example b/etc/picongpu/jureca-jsc/gpus_picongpu.profile.example
index 8bccbf15f8..8ad6ab9cc7 100644
--- a/etc/picongpu/jureca-jsc/gpus_picongpu.profile.example
+++ b/etc/picongpu/jureca-jsc/gpus_picongpu.profile.example
@@ -17,6 +17,9 @@ export proj=$(groups | awk '{print $5}')
 #   - examples: "nano", "vim", "emacs -nw", "vi" or without terminal: "gedit"
 #export EDITOR="nano"
 
+# Set up environment, including $SCRATCH and $PROJECT
+jutil env activate -p $proj
+
 # General modules #############################################################
 #
 module purge
@@ -32,7 +35,7 @@ module load zlib/.1.2.11
 module load libpng/.1.6.35
 export CMAKE_PREFIX_PATH=$EBROOTZLIB:$EBROOTLIBPNG:$CMAKE_PREFIX_PATH
 
-PARTITION_LIB=/p/project/$proj/lib_gpus
+PARTITION_LIB=$PROJECT/lib_gpus
 BOOST_ROOT=$PARTITION_LIB/boost
 export CMAKE_PREFIX_PATH=$BOOST_ROOT:$CMAKE_PREFIX_PATH
 export LD_LIBRARY_PATH=$BOOST_ROOT/lib:$LD_LIBRARY_PATH
@@ -68,10 +71,6 @@ export PATH=$PATH:$PICSRC/src/tools/bin
 
 export PYTHONPATH=$PICSRC/lib/python:$PYTHONPATH
 
-# Location for simulation results, purged after 90 days
-PROJECT_SCRATCH=SCRATCH_$proj
-export SCRATCH=${!PROJECT_SCRATCH}
-
 # "tbg" default options #######################################################
 #   - SLURM (sbatch)
 #   - "gpus" queue

From cca73d4bf3e7c601788f8283d629a4ee843d04ef Mon Sep 17 00:00:00 2001
From: Axel Huebl <axel.huebl@plasma.ninja>
Date: Tue, 5 Feb 2019 22:17:25 +0100
Subject: [PATCH 27/40] gridDist: devices == no. of subdomains

verify the number of subdomains from the given `--gridDist` matches
the number of devices `-d`.
---
 .../initialization/ParserGridDistribution.hpp | 144 +++++++++++-------
 .../simulationControl/MySimulation.hpp        |   3 +
 2 files changed, 94 insertions(+), 53 deletions(-)

diff --git a/include/picongpu/initialization/ParserGridDistribution.hpp b/include/picongpu/initialization/ParserGridDistribution.hpp
index 55782bc68b..e41a8cdc66 100644
--- a/include/picongpu/initialization/ParserGridDistribution.hpp
+++ b/include/picongpu/initialization/ParserGridDistribution.hpp
@@ -17,58 +17,71 @@
  * If not, see <http://www.gnu.org/licenses/>.
  */
 
-
-
 #pragma once
 
 #include <pmacc/verify.hpp>
 #include <vector>   // std::vector
 #include <string>   // std::string
-#include <utility>  // std::pair
 #include <iterator> // std::distance
 
 #include <boost/regex.hpp>
 #include <boost/lexical_cast.hpp>
 
+
 namespace picongpu
 {
 
 class ParserGridDistribution
 {
 private:
-    typedef std::vector<std::pair<uint32_t, uint32_t> > value_type;
+    /** 1D sudomain extents
+     *
+     * Pair of extent and count entry in our grid distribution.
+     *
+     * For example, a single entry of the grid distribution a,b,c{n},d{m},e,f
+     * is stored as entry (a,1) in SubdomainPair. Another as (b,1), another
+     * n equally spaced subdomains as (c,n), another m subdomains of extent d
+     * as (d,m), and so on.
+     */
+    struct SubdomainPair {
+        // extent of the current subdomain
+        uint32_t extent;
+        // count of how often the subdomain shall be repeated
+        uint32_t count;
+    };
+    using value_type = std::vector< SubdomainPair >;
 
 public:
-    ParserGridDistribution( const std::string s )
+    ParserGridDistribution( std::string const s )
     {
-        parseString( s );
+        parsedInput = parse( s );
     }
 
     uint32_t
-    getOffset( const int gpuPos, const uint32_t maxCells ) const
+    getOffset( uint32_t const devicePos, uint32_t const maxCells ) const
     {
         value_type::const_iterator iter = parsedInput.begin();
-        // go to last gpu of this block b{n}
-        int i = iter->second - 1;
-        int sum = 0;
+        // go to last device of these n subdomains extent{n}
+        uint32_t i = iter->count - 1u;
+        uint32_t sum = 0u;
 
-        while( i < gpuPos )
+        while( i < devicePos )
         {
-            // add last block
-            sum += iter->first * iter->second;
+            // add last subdomain
+            sum += iter->extent * iter->count;
 
             ++iter;
-            // go to last gpu of this block b{n}
-            i += iter->second;
+            // go to last device of these n subdomains extent{n}
+            i += iter->count;
         }
 
-        // add part of this block that is before me
-        sum += iter->first * ( gpuPos + iter->second - i - 1 );
+        // add part of this subdomain that is before me
+        sum += iter->extent * ( devicePos + iter->count - i - 1u );
 
         // check total number of cells
-        uint32_t sumTotal = 0;
+        uint32_t sumTotal = 0u;
         for( iter = parsedInput.begin(); iter != parsedInput.end(); ++iter )
-            sumTotal += iter->first * iter->second;
+            sumTotal += iter->extent * iter->count;
 
         PMACC_VERIFY( sumTotal == maxCells );
 
@@ -77,40 +90,57 @@ class ParserGridDistribution
 
     /** Get local Size of this dimension
      *
-     *  \param[in] gpuPos as integer in the range [0, n-1] for this dimension
+     *  \param[in] devicePos as unsigned integer in the range [0, n-1] for this dimension
      *  \return uint32_t with local number of cells
      */
     uint32_t
-    getLocalSize( const int gpuPos ) const
+    getLocalSize( uint32_t const devicePos ) const
     {
         value_type::const_iterator iter = parsedInput.begin();
-        // go to last gpu of this block b{n}
-        int i = iter->second - 1;
+        // go to last device of these n subdomains extent{n}
+        uint32_t i = iter->count - 1u;
 
-        while( i < gpuPos )
+        while( i < devicePos )
         {
             ++iter;
-            // go to last gpu of this block b{n}
-            i += iter->second;
+            // go to last device of these n subdomains extent{n}
+            i += iter->count;
         }
 
-        return iter->first;
+        return iter->extent;
+    }
+
+    /** Verify the number of subdomains matches the devices
+     *
+     * Check that the number of subdomains in a dimension, after we
+     * expanded all regexes, matches the number of devices for it.
+     *
+     * \param[in] numDevices number of devices for this dimension
+     */
+    void
+    verifyDevices( uint32_t const numDevices ) const
+    {
+        uint32_t numSubdomains = 0u;
+        for( SubdomainPair const & p : parsedInput )
+            numSubdomains += p.count;
+
+        PMACC_VERIFY( numSubdomains == numDevices );
     }
 
 private:
     value_type parsedInput;
 
-    /** Parses the input string to a vector of pairs
+    /** Parses the input string to a vector of SubdomainPair(s)
      *
-     *  Parses the input string in the form a,b,c{n},d{m},e,f
-     *  to a vector of pairs with base number (a,b,c,d,e,f) and multipliers
-     *  (1,1,n,m,e,f)
+     * Parses the input string in the form a,b,c{n},d{m},e,f
+     * to a vector of SubdomainPair with extent number (a,b,c,d,e,f) and
+     * counts (1,1,n,m,e,f)
      *
-     *  \param[in] s as const std::string in the form a,b{n}
-     *  \return std::vector<pair> with uint32_t (base, multiplier)
+     * \param[in] s as string in the form a,b{n}
+     * \return std::vector<SubdomainPair> with 2x uint32_t (extent, count)
      */
-    void
-    parseString( const std::string s )
+    value_type
+    parse( std::string const s ) const
     {
         boost::regex regFind( "[0-9]+(\\{[0-9]+\\})*",
                               boost::regex_constants::perl );
@@ -119,30 +149,38 @@ class ParserGridDistribution
                                            regFind, 0 );
         boost::sregex_token_iterator end;
 
-        parsedInput.clear();
-        parsedInput.reserve( std::distance( iter, end ) );
+        value_type newInput;
+        newInput.reserve( std::distance( iter, end ) );
 
         for(; iter != end; ++iter )
         {
             std::string pM = *iter;
 
-            // find multiplier n and base b of b{n}
-            boost::regex regMultipl( "(.*\\{)|(\\})",
-                                     boost::regex_constants::perl );
-            std::string multipl = boost::regex_replace( pM, regMultipl, "" );
-            boost::regex regBase( "\\{.*\\}",
-                                  boost::regex_constants::perl );
-            std::string base = boost::regex_replace( pM, regBase, "" );
-
-            // no Multiplier {n} given
-            if( multipl == *iter )
-                multipl = "1";
-
-            const std::pair<uint32_t, uint32_t> g(
-                      boost::lexical_cast<uint32_t > ( base ),
-                      boost::lexical_cast<uint32_t > ( multipl ) );
-            parsedInput.push_back( g );
+            // find count n and extent b of b{n}
+            boost::regex regCount(
+                "(.*\\{)|(\\})",
+                boost::regex_constants::perl
+            );
+            std::string count = boost::regex_replace( pM, regCount, "" );
+
+            boost::regex regExtent(
+                "\\{.*\\}",
+                boost::regex_constants::perl
+            );
+            std::string extent = boost::regex_replace( pM, regExtent, "" );
+
+            // no count {n} given (implies one)
+            if( count == *iter )
+                count = "1";
+
+            const SubdomainPair g = {
+                boost::lexical_cast< uint32_t > ( extent ),
+                boost::lexical_cast< uint32_t > ( count )
+            };
+            newInput.emplace_back( g );
         }
+
+        return newInput;
     }
 
 };
diff --git a/include/picongpu/simulationControl/MySimulation.hpp b/include/picongpu/simulationControl/MySimulation.hpp
index 863a9e6a2d..9e4c531969 100644
--- a/include/picongpu/simulationControl/MySimulation.hpp
+++ b/include/picongpu/simulationControl/MySimulation.hpp
@@ -211,6 +211,9 @@ class MySimulation : public SimulationHelper<simDim>
             // parse string
             ParserGridDistribution parserGD(gridDistribution.at(dim));
 
+            // verify number of blocks and devices in dimension match
+            parserGD.verifyDevices(gpus[dim]);
+
             // calculate local grid points & offset
             gridSizeLocal[dim] = parserGD.getLocalSize(myGPUpos[dim]);
             gridOffset[dim] = parserGD.getOffset(myGPUpos[dim], global_grid_size[dim]);

From 4f96fa510aeff50119d32235518a22f2e746678a Mon Sep 17 00:00:00 2001
From: Axel Huebl <axel.huebl@plasma.ninja>
Date: Fri, 8 Feb 2019 10:17:21 +0100
Subject: [PATCH 28/40] PhaseSpace: Unit Colorbar 2D3V

Fix the unit for the phase space volume in the phase space
plugin for 2D3V simulations.

Since we properly switched for 2D3V simulations to represent
periodic, one-cell 3D3V scalings, every quantity is properly
"volumetric". Before this change, calculating a density by
integrating the phase space in p direction was weird and needed
(for known species filter volume) an additional division by
our normalized length. This fixes it.
---
 include/picongpu/plugins/PhaseSpace/PhaseSpace.tpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/picongpu/plugins/PhaseSpace/PhaseSpace.tpp b/include/picongpu/plugins/PhaseSpace/PhaseSpace.tpp
index ebdab5226b..72356c84b5 100644
--- a/include/picongpu/plugins/PhaseSpace/PhaseSpace.tpp
+++ b/include/picongpu/plugins/PhaseSpace/PhaseSpace.tpp
@@ -297,7 +297,7 @@ namespace picongpu
         /** \todo communicate GUARD and add it to the two neighbors BORDER */
 
         /* write to file */
-        const float_64 UNIT_VOLUME = math::pow( UNIT_LENGTH, (int)simDim );
+        const float_64 UNIT_VOLUME = UNIT_LENGTH * UNIT_LENGTH * UNIT_LENGTH;
         const float_64 unit = UNIT_CHARGE / UNIT_VOLUME;
 
         /* (momentum) p range: unit is m_species * c

From dd3411e96362e2bc427c7a47adea5cd69844029b Mon Sep 17 00:00:00 2001
From: Axel Huebl <axel.huebl@plasma.ninja>
Date: Fri, 8 Feb 2019 10:21:29 +0100
Subject: [PATCH 29/40] travis_wait 40: Spack install CMake

Allow more time for the CMake install with Spack on Travis-CI.
---
 .travis.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.travis.yml b/.travis.yml
index cb4703ada5..abcdeb10ae 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -38,7 +38,7 @@ install:
       echo -e "config:""\n  build_jobs:"" 2" > $SPACK_ROOT/etc/spack/config.yaml;
     fi
   - spack compiler add
-  - travis_wait spack install
+  - travis_wait 40 spack install
       cmake@3.10.0~openssl~ncurses
       $COMPILERSPEC
   - travis_wait spack install

From 5bf061286091fe628356aa4d48ba7d2f98aa9b74 Mon Sep 17 00:00:00 2001
From: Axel Huebl <axel.huebl@plasma.ninja>
Date: Fri, 8 Feb 2019 12:02:05 +0100
Subject: [PATCH 30/40] Spack on Travis: Use Pre-Build CMake

Just use a pre-build CMake with Spack.
---
 .travis.yml | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/.travis.yml b/.travis.yml
index abcdeb10ae..39d7b91350 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -6,6 +6,7 @@ cache:
   apt: true
   directories:
     - $HOME/.cache/spack
+    - $HOME/.cache/cmake-3.10.0
   pip: true
 
 addons:
@@ -35,19 +36,37 @@ install:
   - if [ $SPACK_FOUND -ne 0 ]; then
       mkdir -p $SPACK_ROOT &&
       git clone --depth 50 https://github.com/spack/spack.git $SPACK_ROOT &&
-      echo -e "config:""\n  build_jobs:"" 2" > $SPACK_ROOT/etc/spack/config.yaml;
+      echo -e "config:""\n  build_jobs:"" 2" > $SPACK_ROOT/etc/spack/config.yaml &&
+      echo -e "packages:""\n  cmake:""\n    version:"" [3.10.0]""\n    paths:""\n      cmake@3.10.0:"" /home/travis/.cache/cmake-3.10.0""\n    buildable:"" False" > $SPACK_ROOT/etc/spack/packages.yaml;
     fi
   - spack compiler add
-  - travis_wait 40 spack install
-      cmake@3.10.0~openssl~ncurses
+  # required dependencies - CMake 3.10.0
+  - if [ "$TRAVIS_OS_NAME" == "linux" ]; then
+      if [ ! -f $HOME/.cache/cmake-3.10.0/bin/cmake ]; then
+        wget -O cmake.sh https://cmake.org/files/v3.10/cmake-3.10.0-Linux-x86_64.sh &&
+        sh cmake.sh --skip-license --exclude-subdir --prefix=$HOME/.cache/cmake-3.10.0 &&
+        rm cmake.sh;
+      fi;
+    elif [ "$TRAVIS_OS_NAME" == "osx" ]; then
+      if [ ! -d /Applications/CMake.app/Contents/ ]; then
+        curl -L -s -o cmake.dmg https://cmake.org/files/v3.10/cmake-3.10.0-Darwin-x86_64.dmg &&
+        yes | hdiutil mount cmake.dmg &&
+        sudo cp -R "/Volumes/cmake-3.10.0-Darwin-x86_64/CMake.app" /Applications &&
+        hdiutil detach /dev/disk1s1 &&
+        rm cmake.dmg;
+      fi;
+    fi
+  - travis_wait spack install
+      cmake
       $COMPILERSPEC
+  # required dependencies - Boost 1.62.0
   - travis_wait spack install
       boost@1.62.0~date_time~graph~iostreams~locale~log~random~thread~timer~wave
       $COMPILERSPEC
   - spack clean -a
   - source /etc/profile &&
     source $SPACK_ROOT/share/spack/setup-env.sh
-  - spack load cmake $COMPILERSPEC
+  - spack load cmake
   - spack load boost $COMPILERSPEC
 
 jobs:

From 086266f4e490c37b9d3404706f6b45fe7e57c338 Mon Sep 17 00:00:00 2001
From: Marco Garten <m.garten@hzdr.de>
Date: Fri, 8 Feb 2019 10:57:18 +0100
Subject: [PATCH 31/40] Remove contributor name typo from LICENSE.md

---
 LICENSE.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/LICENSE.md b/LICENSE.md
index 57bf9c431f..188597079f 100644
--- a/LICENSE.md
+++ b/LICENSE.md
@@ -7,7 +7,7 @@ Sergei Bastrakov, Florian Berninger, Heiko Burau, Michael Bussmann,
 Alexander Debus, Robert Dietrich, Carlchristian Eckert, Wen Fu, Marco Garten,
 Ilja Goethel, Alexander Grund, Sebastian Hahn, Anton Helm, Wolfgang Hoehnig,
 Axel Huebl, Jeffrey Kelling, Maximilian Knespel, Remi Lehe, Alexander Matthes,
-Richard Pausch, Rophie Rudat, Felix Schmitt, Conrad Schumann,
+Richard Pausch, Sophie Rudat, Felix Schmitt, Conrad Schumann,
 Benjamin Schneider, Joseph Schuchart, Sebastian Starke, Klaus Steiniger,
 Rene Widera, Benjamin Worpitz
 

From d0acd1c907425e1b9f118e5b57c968b4ae991d3c Mon Sep 17 00:00:00 2001
From: Sergei Bastrakov <sergey.bastrakov@gmail.com>
Date: Fri, 1 Feb 2019 16:46:17 +0100
Subject: [PATCH 32/40] System JUWELS: add a profile for the CPU queue

---
 etc/picongpu/juwels-jsc/batch.tpl             |  93 +++++++++++++++
 .../juwels-jsc/batch_picongpu.profile.example | 108 ++++++++++++++++++
 2 files changed, 201 insertions(+)
 create mode 100644 etc/picongpu/juwels-jsc/batch.tpl
 create mode 100644 etc/picongpu/juwels-jsc/batch_picongpu.profile.example

diff --git a/etc/picongpu/juwels-jsc/batch.tpl b/etc/picongpu/juwels-jsc/batch.tpl
new file mode 100644
index 0000000000..723f92116e
--- /dev/null
+++ b/etc/picongpu/juwels-jsc/batch.tpl
@@ -0,0 +1,93 @@
+#!/usr/bin/env bash
+# Copyright 2013-2019 Axel Huebl, Richard Pausch, Rene Widera, Sergei Bastrakov
+#
+# This file is part of PIConGPU.
+#
+# PIConGPU is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# PIConGPU is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with PIConGPU.
+# If not, see <http://www.gnu.org/licenses/>.
+#
+
+
+# PIConGPU batch script for JUWELS' SLURM batch system
+
+#SBATCH --account=!TBG_nameProject
+#SBATCH --partition=!TBG_queue
+#SBATCH --time=!TBG_wallTime
+# Sets batch job's name
+#SBATCH --job-name=!TBG_jobName
+#SBATCH --nodes=!TBG_nodes
+#SBATCH --ntasks=!TBG_tasks
+#SBATCH --ntasks-per-node=!TBG_devicesPerNode
+#SBATCH --mem=!TBG_memPerNode
+#SBATCH --mail-type=!TBG_mailSettings
+#SBATCH --mail-user=!TBG_mailAddress
+#SBATCH --workdir=!TBG_dstPath
+
+#SBATCH -o stdout
+#SBATCH -e stderr
+
+
+## calculations will be performed by tbg ##
+.TBG_queue="batch"
+
+# settings that can be controlled by environment variables before submit
+.TBG_mailSettings=${MY_MAILNOTIFY:-"NONE"}
+.TBG_mailAddress=${MY_MAIL:-"someone@example.com"}
+.TBG_author=${MY_NAME:+--author \"${MY_NAME}\"}
+.TBG_nameProject=${proj:-""}
+.TBG_profile=${PIC_PROFILE:-"~/picongpu.profile"}
+
+# number of available/hosted devices per node in the system
+.TBG_numHostedDevicesPerNode=2
+
+# required devices per node for the current job
+.TBG_devicesPerNode=$(if [ $TBG_tasks -gt $TBG_numHostedDevicesPerNode ] ; then echo $TBG_numHostedDevicesPerNode; else echo $TBG_tasks; fi)
+
+# host memory per device
+.TBG_memPerCPU="$((94000 / $TBG_devicesPerNode))"
+# host memory per node
+.TBG_memPerNode="$((TBG_memPerCPU * TBG_devicesPerNode))"
+
+# We only start 1 MPI task per device
+.TBG_mpiTasksPerNode="$(( TBG_devicesPerNode * 1 ))"
+
+# use ceil to caculate nodes
+.TBG_nodes="$((( TBG_tasks + TBG_devicesPerNode - 1 ) / TBG_devicesPerNode))"
+
+## end calculations ##
+
+echo 'Running program...'
+
+cd !TBG_dstPath
+
+export MODULES_NO_OUTPUT=1
+source !TBG_profile
+if [ $? -ne 0 ] ; then
+  echo "Error: PIConGPU environment profile under \"!TBG_profile\" not found!"
+  exit 1
+fi
+unset MODULES_NO_OUTPUT
+
+#set user rights to u=rwx;g=r-x;o=---
+umask 0027
+
+mkdir simOutput 2> /dev/null
+cd simOutput
+ln -s ../stdout output
+
+if [ $? -eq 0 ] ; then
+  # Run PIConGPU
+  export OMP_NUM_THREADS=48
+  srun --cpu_bind=sockets !TBG_dstPath/input/bin/picongpu !TBG_author !TBG_programParams
+fi
diff --git a/etc/picongpu/juwels-jsc/batch_picongpu.profile.example b/etc/picongpu/juwels-jsc/batch_picongpu.profile.example
new file mode 100644
index 0000000000..45c98f0735
--- /dev/null
+++ b/etc/picongpu/juwels-jsc/batch_picongpu.profile.example
@@ -0,0 +1,108 @@
+# Name and Path of this Script ############################### (DO NOT change!)
+export PIC_PROFILE=$(cd $(dirname $BASH_SOURCE) && pwd)"/"$(basename $BASH_SOURCE)
+
+# User Information ######################################### (edit those lines)
+#   - automatically add your name and contact to output file meta data
+#   - send me a mail on batch system jobs: NONE, BEGIN, END, FAIL, REQUEUE, ALL,
+#     TIME_LIMIT, TIME_LIMIT_90, TIME_LIMIT_80 and/or TIME_LIMIT_50
+export MY_MAILNOTIFY="NONE"
+export MY_MAIL="someone@example.com"
+export MY_NAME="$(whoami) <$MY_MAIL>"
+
+# Project Information ######################################## (edit this line)
+#   - project account for computing time
+export proj=$(groups | awk '{print $4}')
+
+# Text Editor for Tools ###################################### (edit this line)
+#   - examples: "nano", "vim", "emacs -nw", "vi" or without terminal: "gedit"
+#export EDITOR="nano"
+
+# Set up environment, including $SCRATCH and $PROJECT
+jutil env activate -p $proj
+
+# General modules #############################################################
+#
+module purge
+module load Intel/2019.0.117-GCC-7.3.0
+module load CMake/3.13.0
+module load IntelMPI/2018.4.274
+module load Python/3.6.6
+module load Boost/1.68.0-Python-3.6.6
+
+# Other Software ##############################################################
+#
+module load zlib/.1.2.11
+module load HDF5/1.10.1
+module load libpng/.1.6.35
+export CMAKE_PREFIX_PATH=$EBROOTZLIB:$EBROOTLIBPNG:$CMAKE_PREFIX_PATH
+
+PARTITION_LIB=$PROJECT/lib_batch
+LIBSPLASH_ROOT=$PARTITION_LIB/libSplash
+PNGWRITER_ROOT=$PARTITION_LIB/pngwriter
+export CMAKE_PREFIX_PATH=$LIBSPLASH_ROOT:$PNGWRITER_ROOT:$CMAKE_PREFIX_PATH
+
+BLOSC_ROOT=$PARTITION_LIB/c-blosc
+export CMAKE_PREFIX_PATH=$BLOSC_ROOT:$CMAKE_PREFIX_PATH
+export LD_LIBRARY_PATH=$BLOSC_ROOT/lib:$LD_LIBRARY_PATH
+
+ADIOS_ROOT=$PARTITION_LIB/adios
+export PATH=$ADIOS_ROOT/bin:$PATH
+export CMAKE_PREFIX_PATH=$ADIOS_ROOT:$CMAKE_PREFIX_PATH
+
+# Environment #################################################################
+#
+#export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$BOOST_LIB
+
+export PICSRC=$HOME/src/picongpu
+export PIC_EXAMPLES=$PICSRC/share/picongpu/examples
+export PIC_BACKEND="omp2b:skylake"
+
+export PATH=$PATH:$PICSRC
+export PATH=$PATH:$PICSRC/bin
+export PATH=$PATH:$PICSRC/src/tools/bin
+
+export CC=$(which icc)
+export CXX=$(which icpc)
+
+export PYTHONPATH=$PICSRC/lib/python:$PYTHONPATH
+
+# "tbg" default options #######################################################
+#   - SLURM (sbatch)
+#   - "batch" queue
+export TBG_SUBMIT="sbatch"
+export TBG_TPLFILE="etc/picongpu/juwels-jsc/batch.tpl"
+
+# allocate an interactive shell for one hour
+#   getNode 2  # allocates 2 interactive nodes (default: 1)
+function getNode() {
+    if [ -z "$1" ] ; then
+        numNodes=1
+    else
+        numNodes=$1
+    fi
+    if [ $numNodes -gt 8 ] ; then
+        echo "The maximal number of interactive nodes is 8." 1>&2
+        return 1
+    fi
+    echo "Hint: please use 'srun --cpu_bind=sockets <COMMAND>' for launching multiple processes in the interactive mode"
+    export OMP_NUM_THREADS=48
+    salloc --time=1:00:00 --nodes=$numNodes --ntasks-per-node=2 --mem=94000 -A $proj -p batch bash
+}
+
+# allocate an interactive shell for one hour
+#   getDevice 2  # allocates 2 interactive devices (default: 1)
+function getDevice() {
+    if [ -z "$1" ] ; then
+        numDevices=1
+    else
+        if [ "$1" -gt 2 ] ; then
+            echo "The maximal number of devices per node is 2." 1>&2
+            return 1
+        else
+            numDevices=$1
+        fi
+    fi
+    echo "Hint: please use 'srun --cpu_bind=sockets <COMMAND>' for launching multiple processes in the interactive mode"
+    export OMP_NUM_THREADS=48
+    salloc --time=1:00:00 --ntasks-per-node=$(($numDevices)) --mem=94000 -A $proj -p batch bash
+}

From 13efd0ceb2fb0cb1a7ae9add2b36a2c30f9e4c27 Mon Sep 17 00:00:00 2001
From: Sergei Bastrakov <sergey.bastrakov@gmail.com>
Date: Mon, 11 Feb 2019 17:52:30 +0100
Subject: [PATCH 33/40] System JUWELS: add a profile for the GPU queue

---
 etc/picongpu/juwels-jsc/gpus.tpl              | 102 +++++++++++++++
 .../juwels-jsc/gpus_picongpu.profile.example  | 118 ++++++++++++++++++
 2 files changed, 220 insertions(+)
 create mode 100644 etc/picongpu/juwels-jsc/gpus.tpl
 create mode 100644 etc/picongpu/juwels-jsc/gpus_picongpu.profile.example

diff --git a/etc/picongpu/juwels-jsc/gpus.tpl b/etc/picongpu/juwels-jsc/gpus.tpl
new file mode 100644
index 0000000000..ef6fb9e467
--- /dev/null
+++ b/etc/picongpu/juwels-jsc/gpus.tpl
@@ -0,0 +1,102 @@
+#!/usr/bin/env bash
+# Copyright 2013-2019 Axel Huebl, Richard Pausch, Rene Widera, Sergei Bastrakov
+#
+# This file is part of PIConGPU.
+#
+# PIConGPU is free software: you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# PIConGPU is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with PIConGPU.
+# If not, see <http://www.gnu.org/licenses/>.
+#
+
+
+# PIConGPU batch script for JUWELS's SLURM batch system
+
+#SBATCH --account=!TBG_nameProject
+#SBATCH --partition=!TBG_queue
+#SBATCH --time=!TBG_wallTime
+# Sets batch job's name
+#SBATCH --job-name=!TBG_jobName
+#SBATCH --nodes=!TBG_nodes
+#SBATCH --ntasks=!TBG_tasks
+#SBATCH --ntasks-per-node=!TBG_devicesPerNode
+#SBATCH --mincpus=!TBG_mpiTasksPerNode
+#SBATCH --mem=!TBG_memPerNode
+#SBATCH --gres=gpu:!TBG_devicesPerNode
+#SBATCH --mail-type=!TBG_mailSettings
+#SBATCH --mail-user=!TBG_mailAddress
+#SBATCH --workdir=!TBG_dstPath
+
+#SBATCH -o stdout
+#SBATCH -e stderr
+
+
+## calculations will be performed by tbg ##
+.TBG_queue="gpus"
+
+# settings that can be controlled by environment variables before submit
+.TBG_mailSettings=${MY_MAILNOTIFY:-"NONE"}
+.TBG_mailAddress=${MY_MAIL:-"someone@example.com"}
+.TBG_author=${MY_NAME:+--author \"${MY_NAME}\"}
+.TBG_nameProject=${proj:-""}
+.TBG_profile=${PIC_PROFILE:-"~/picongpu.profile"}
+
+# number of available/hosted devices per node in the system
+.TBG_numHostedDevicesPerNode=4
+
+# required GPUs per node for the current job
+.TBG_devicesPerNode=$(if [ $TBG_tasks -gt $TBG_numHostedDevicesPerNode ] ; then echo $TBG_numHostedDevicesPerNode; else echo $TBG_tasks; fi)
+
+# host memory per device
+.TBG_memPerDevice="$((180000 / $TBG_devicesPerNode))"
+# host memory per node
+.TBG_memPerNode="$((TBG_memPerDevice * TBG_devicesPerNode))"
+
+# We only start 1 MPI task per device
+.TBG_mpiTasksPerNode="$(( TBG_devicesPerNode * 1 ))"
+
+# use ceil to caculate nodes
+.TBG_nodes="$((( TBG_tasks + TBG_devicesPerNode - 1 ) / TBG_devicesPerNode))"
+
+## end calculations ##
+
+echo 'Running program...'
+
+cd !TBG_dstPath
+
+export MODULES_NO_OUTPUT=1
+source !TBG_profile
+if [ $? -ne 0 ] ; then
+  echo "Error: PIConGPU environment profile under \"!TBG_profile\" not found!"
+  exit 1
+fi
+unset MODULES_NO_OUTPUT
+
+#set user rights to u=rwx;g=r-x;o=---
+umask 0027
+
+mkdir simOutput 2> /dev/null
+cd simOutput
+ln -s ../stdout output
+
+# test if cuda_memtest binary is available and we have the node exclusive
+if [ -f !TBG_dstPath/input/bin/cuda_memtest ] && [ !TBG_numHostedDevicesPerNode -eq !TBG_devicesPerNode ] ; then
+  # Run CUDA memtest to check GPU's health
+  srun --cpu_bind=sockets !TBG_dstPath/input/bin/cuda_memtest.sh
+else
+  echo "no binary 'cuda_memtest' available or compute node is not exclusively allocated, skip GPU memory test" >&2
+fi
+
+if [ $? -eq 0 ] ; then
+  # Run PIConGPU
+  srun --cpu_bind=sockets !TBG_dstPath/input/bin/picongpu !TBG_author !TBG_programParams
+fi
diff --git a/etc/picongpu/juwels-jsc/gpus_picongpu.profile.example b/etc/picongpu/juwels-jsc/gpus_picongpu.profile.example
new file mode 100644
index 0000000000..a12c0d9cfc
--- /dev/null
+++ b/etc/picongpu/juwels-jsc/gpus_picongpu.profile.example
@@ -0,0 +1,118 @@
+# Name and Path of this Script ############################### (DO NOT change!)
+export PIC_PROFILE=$(cd $(dirname $BASH_SOURCE) && pwd)"/"$(basename $BASH_SOURCE)
+
+# User Information ######################################### (edit those lines)
+#   - automatically add your name and contact to output file meta data
+#   - send me a mail on batch system jobs: NONE, BEGIN, END, FAIL, REQUEUE, ALL,
+#     TIME_LIMIT, TIME_LIMIT_90, TIME_LIMIT_80 and/or TIME_LIMIT_50
+export MY_MAILNOTIFY="NONE"
+export MY_MAIL="someone@example.com"
+export MY_NAME="$(whoami) <$MY_MAIL>"
+
+# Project Information ######################################## (edit this line)
+#   - project account for computing time
+export proj=$(groups | awk '{print $4}')
+
+# Text Editor for Tools ###################################### (edit this line)
+#   - examples: "nano", "vim", "emacs -nw", "vi" or without terminal: "gedit"
+#export EDITOR="nano"
+
+# Set up environment, including $SCRATCH and $PROJECT
+jutil env activate -p $proj
+
+# General modules #############################################################
+#
+module purge
+module load GCC/7.3.0
+module load CUDA/9.2.88
+module load CMake/3.13.0
+module load MVAPICH2/2.3-GDR
+module load Python/3.6.6
+
+# Other Software ##############################################################
+#
+module load zlib/.1.2.11
+module load libpng/.1.6.35
+export CMAKE_PREFIX_PATH=$EBROOTZLIB:$EBROOTLIBPNG:$CMAKE_PREFIX_PATH
+
+# This is required for Boost to have correct dynamic library dependencies
+module load ICU/61.1
+export LD_LIBRARY_PATH=$EBROOTICU/lib:$LD_LIBRARY_PATH
+
+PARTITION_LIB=$PROJECT/lib_gpus
+BOOST_ROOT=$PARTITION_LIB/boost
+export CMAKE_PREFIX_PATH=$BOOST_ROOT:$CMAKE_PREFIX_PATH
+export LD_LIBRARY_PATH=$BOOST_ROOT/lib:$LD_LIBRARY_PATH
+
+HDF5_ROOT=$PARTITION_LIB/hdf5
+export PATH=$HDF5_ROOT/bin:$PATH
+export CMAKE_PREFIX_PATH=$HDF5_ROOT:$CMAKE_PREFIX_PATH
+export LD_LIBRARY_PATH=$HDF5_ROOT/lib:$LD_LIBRARY_PATH
+
+LIBSPLASH_ROOT=$PARTITION_LIB/libSplash
+PNGWRITER_ROOT=$PARTITION_LIB/pngwriter
+export CMAKE_PREFIX_PATH=$LIBSPLASH_ROOT:$PNGWRITER_ROOT:$CMAKE_PREFIX_PATH
+
+BLOSC_ROOT=$PARTITION_LIB/c-blosc
+export CMAKE_PREFIX_PATH=$BLOSC_ROOT:$CMAKE_PREFIX_PATH
+export LD_LIBRARY_PATH=$BLOSC_ROOT/lib:$LD_LIBRARY_PATH
+
+ADIOS_ROOT=$PARTITION_LIB/adios
+export PATH=$ADIOS_ROOT/bin:$PATH
+export CMAKE_PREFIX_PATH=$ADIOS_ROOT:$CMAKE_PREFIX_PATH
+
+
+export LD_LIBRARY_PATH=$EBROOTICU/lib:$LD_LIBRARY_PATH
+
+# Environment #################################################################
+#
+#export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$BOOST_LIB
+
+export PICSRC=$HOME/src/picongpu
+export PIC_EXAMPLES=$PICSRC/share/picongpu/examples
+export PIC_BACKEND="cuda:70" # Nvidia V100 architecture
+
+export PATH=$PATH:$PICSRC
+export PATH=$PATH:$PICSRC/bin
+export PATH=$PATH:$PICSRC/src/tools/bin
+
+export PYTHONPATH=$PICSRC/lib/python:$PYTHONPATH
+
+# "tbg" default options #######################################################
+#   - SLURM (sbatch)
+#   - "gpus" queue
+export TBG_SUBMIT="sbatch"
+export TBG_TPLFILE="etc/picongpu/juwels-jsc/gpus.tpl"
+
+# allocate an interactive shell for one hour
+#   getNode 2  # allocates 2 interactive nodes (default: 1)
+function getNode() {
+    if [ -z "$1" ] ; then
+        numNodes=1
+    else
+        numNodes=$1
+    fi
+    if [ $numNodes -gt 8 ] ; then
+        echo "The maximal number of interactive nodes is 8." 1>&2
+        return 1
+    fi
+    echo "Hint: please use 'srun --cpu_bind=sockets <COMMAND>' for launching multiple processes in the interactive mode"
+    salloc --time=1:00:00 --nodes=$numNodes --ntasks-per-node=4 --gres=gpu:4 --mem=180000 -A $proj -p gpus bash
+}
+
+# allocate an interactive shell for one hour
+#   getDevice 2  # allocates 2 interactive devices (default: 1)
+function getDevice() {
+    if [ -z "$1" ] ; then
+        numDevices=1
+    else
+        if [ "$1" -gt 4 ] ; then
+            echo "The maximal number of devices per node is 4." 1>&2
+            return 1
+        else
+            numDevices=$1
+        fi
+    fi
+    echo "Hint: please use 'srun --cpu_bind=sockets <COMMAND>' for launching multiple processes in the interactive mode"
+    salloc --time=1:00:00 --ntasks-per-node=$(($numDevices)) --gres=gpu:4 --mem=180000 -A $proj -p gpus bash
+}

From 6cb27c69d915aef5e49254aca454dca6ef82e70c Mon Sep 17 00:00:00 2001
From: Sergei Bastrakov <sergey.bastrakov@gmail.com>
Date: Tue, 12 Feb 2019 09:25:36 +0100
Subject: [PATCH 34/40] System JUWELS: add system description to the docs

---
 docs/source/install/profile.rst | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/docs/source/install/profile.rst b/docs/source/install/profile.rst
index d6fdc5c26a..d95e60daea 100644
--- a/docs/source/install/profile.rst
+++ b/docs/source/install/profile.rst
@@ -259,3 +259,26 @@ Queue: booster (Intel Xeon Phi 7250-F, 68 cores + Hyperthreads)
 
 .. literalinclude:: profiles/jureca-jsc/booster_picongpu.profile.example
    :language: bash
+
+JUWELS (JSC)
+------------
+
+**System overview:** `link <http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/JUWELS/JUWELS_node.html>`_
+
+**User guide:** `link <http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/JUWELS/UserInfo/UserInfo_node.html>`_
+
+**Production directory:** ``$SCRATCH`` (`link <http://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/JUWELS/FAQ/juwels_FAQ_node.html#faq1495160>`_)
+
+For these profiles to work, you need to download the :ref:`PIConGPU source code <install-dependencies-picongpu>` and install :ref:`PNGwriter, c-blosc, adios and libSplash <install-dependencies>`, for the gpus partition also :ref:`Boost and HDF5 <install-dependencies>`, manually.
+
+Queue: batch (2 x Intel Xeon Platinum 8168 CPUs, 24 Cores + 24 Hyperthreads/CPU)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. literalinclude:: profiles/juwels-jsc/batch_picongpu.profile.example
+   :language: bash
+
+Queue: gpus (4 x Nvidia V100 GPUs)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. literalinclude:: profiles/juwels-jsc/gpus_picongpu.profile.example
+   :language: bash

From 03fe26fd4ee1384fe644e3730ec9a63effbec640 Mon Sep 17 00:00:00 2001
From: Axel Huebl <axel.huebl@plasma.ninja>
Date: Wed, 13 Feb 2019 16:16:30 +0100
Subject: [PATCH 35/40] Hypnos: CMake 3.13.4

Use a newer CMake on Hypnos.

For recent builds of openPMD-api and Alpaka a CMake 3.11+ is
recommended, saving later troubles for users in post-processing.
---
 etc/picongpu/hypnos-hzdr/k20_picongpu.profile.example   | 2 +-
 etc/picongpu/hypnos-hzdr/k80_picongpu.profile.example   | 2 +-
 etc/picongpu/hypnos-hzdr/laser_picongpu.profile.example | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/etc/picongpu/hypnos-hzdr/k20_picongpu.profile.example b/etc/picongpu/hypnos-hzdr/k20_picongpu.profile.example
index 39e307527a..b7a926bd09 100644
--- a/etc/picongpu/hypnos-hzdr/k20_picongpu.profile.example
+++ b/etc/picongpu/hypnos-hzdr/k20_picongpu.profile.example
@@ -22,7 +22,7 @@ then
 
         # Core Dependencies
         module load gcc/4.9.2
-        module load cmake/3.10.1
+        module load cmake/3.13.4
         module load boost/1.62.0
         module load cuda/8.0
         module load openmpi/2.1.2.cuda80
diff --git a/etc/picongpu/hypnos-hzdr/k80_picongpu.profile.example b/etc/picongpu/hypnos-hzdr/k80_picongpu.profile.example
index 40b69fd3c6..b79bf37ab9 100644
--- a/etc/picongpu/hypnos-hzdr/k80_picongpu.profile.example
+++ b/etc/picongpu/hypnos-hzdr/k80_picongpu.profile.example
@@ -22,7 +22,7 @@ then
 
         # Core Dependencies
         module load gcc/4.9.2
-        module load cmake/3.10.1
+        module load cmake/3.13.4
         module load boost/1.62.0
         module load cuda/8.0
         module load openmpi/2.1.2.cuda80
diff --git a/etc/picongpu/hypnos-hzdr/laser_picongpu.profile.example b/etc/picongpu/hypnos-hzdr/laser_picongpu.profile.example
index 96fdd5fc64..e69e0b2208 100644
--- a/etc/picongpu/hypnos-hzdr/laser_picongpu.profile.example
+++ b/etc/picongpu/hypnos-hzdr/laser_picongpu.profile.example
@@ -22,7 +22,7 @@ then
 
         # Core Dependencies
         module load gcc/5.3.0
-        module load cmake/3.10.1
+        module load cmake/3.13.4
         module load boost/1.62.0
         module load openmpi/1.8.6
         module load numactl

From cb1bee37108d0f54f244e532236005ccb2eeba6e Mon Sep 17 00:00:00 2001
From: Third Party <picongpu@hzdr.de>
Date: Thu, 14 Feb 2019 15:53:53 +0100
Subject: [PATCH 36/40] Squashed 'thirdParty/cuda_memtest/' changes from
 6d505ea39..fc69def93

fc69def93 Merge pull request #19 from ComputationalRadiationPhysics/topic-cmake312rootHints
57121f8da CMake: Honor _ROOT Env Hints

git-subtree-dir: thirdParty/cuda_memtest
git-subtree-split: fc69def93e281dd12014061740aa169206c46399
---
 CMakeLists.txt | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/CMakeLists.txt b/CMakeLists.txt
index e318360c78..208b5d5125 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -6,7 +6,7 @@ cmake_minimum_required(VERSION 2.8.5)
 
 
 ################################################################################
-# Project 
+# Project
 ################################################################################
 
 project(CUDA_memtest)
@@ -23,7 +23,19 @@ set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH} ${CMAKE_CURRENT_SOURCE_DIR}/cmake/)
 
 
 ################################################################################
-# Find CUDA 
+# CMake policies
+#
+# Search in <PackageName>_ROOT:
+#   https://cmake.org/cmake/help/v3.12/policy/CMP0074.html
+################################################################################
+
+if(POLICY CMP0074)
+    cmake_policy(SET CMP0074 NEW)
+endif()
+
+
+################################################################################
+# Find CUDA
 ################################################################################
 
 find_package(CUDA REQUIRED)

From d0385cd5339bfe919298224c1982167a1f2849ba Mon Sep 17 00:00:00 2001
From: Axel Huebl <axel.huebl@plasma.ninja>
Date: Thu, 14 Feb 2019 13:31:26 +0100
Subject: [PATCH 37/40] CMake: Honor _ROOT Env Hints

CMake 3.12.0+ honor `<Package>_ROOT` environment hints which are
often set on HPC systems. Previously, it was only looking for
`<Package>_DIR` paths in `find_package` calls.

This new policy is useful since HPC systems usually set `_DIR`,
`_ROOT` or expand the `CMAKE_PREFIX_PATH`. Therefore we want to
use it as soon as it is available.

On systems where those env vars are set, e.g. Hypnos, this also
throws a warning if the default (OLD) policy is used with CMake
3.12.4 or newer.
---
 include/mpiInfo/CMakeLists.txt                   | 12 ++++++++++++
 include/picongpu/CMakeLists.txt                  | 12 ++++++++++++
 include/pmacc/CMakeLists.txt                     | 12 ++++++++++++
 include/pmacc/PMaccConfig.cmake                  | 12 ++++++++++++
 include/pmacc/test/random/CMakeLists.txt         | 11 +++++++++++
 share/pmacc/examples/gameOfLife2D/CMakeLists.txt | 12 ++++++++++++
 src/tools/png2gas/CMakeLists.txt                 | 12 ++++++++++++
 src/tools/splash2txt/CMakeLists.txt              | 11 +++++++++++
 8 files changed, 94 insertions(+)

diff --git a/include/mpiInfo/CMakeLists.txt b/include/mpiInfo/CMakeLists.txt
index 1087287219..b516395042 100644
--- a/include/mpiInfo/CMakeLists.txt
+++ b/include/mpiInfo/CMakeLists.txt
@@ -45,6 +45,18 @@ list(APPEND CMAKE_PREFIX_PATH "$ENV{CMAKE_PREFIX_PATH}")
 list(APPEND "/usr/lib/x86_64-linux-gnu/")
 
 
+################################################################################
+# CMake policies
+#
+# Search in <PackageName>_ROOT:
+#   https://cmake.org/cmake/help/v3.12/policy/CMP0074.html
+################################################################################
+
+if(POLICY CMP0074)
+    cmake_policy(SET CMP0074 NEW)
+endif()
+
+
 ###############################################################################
 # Language Flags
 ###############################################################################
diff --git a/include/picongpu/CMakeLists.txt b/include/picongpu/CMakeLists.txt
index 7e146807c8..3778f1e6c4 100644
--- a/include/picongpu/CMakeLists.txt
+++ b/include/picongpu/CMakeLists.txt
@@ -46,6 +46,18 @@ list(APPEND CMAKE_PREFIX_PATH "$ENV{ADIOS_ROOT}")
 list(APPEND CMAKE_PREFIX_PATH "$ENV{CMAKE_PREFIX_PATH}")
 
 
+################################################################################
+# CMake policies
+#
+# Search in <PackageName>_ROOT:
+#   https://cmake.org/cmake/help/v3.12/policy/CMP0074.html
+################################################################################
+
+if(POLICY CMP0074)
+    cmake_policy(SET CMP0074 NEW)
+endif()
+
+
 ###############################################################################
 # Language Flags
 ###############################################################################
diff --git a/include/pmacc/CMakeLists.txt b/include/pmacc/CMakeLists.txt
index b1c42a8f0d..ea41b8c859 100644
--- a/include/pmacc/CMakeLists.txt
+++ b/include/pmacc/CMakeLists.txt
@@ -37,6 +37,18 @@ list(APPEND CMAKE_PREFIX_PATH "$ENV{VT_ROOT}")
 list(APPEND CMAKE_PREFIX_PATH "$ENV{CMAKE_PREFIX_PATH}")
 
 
+################################################################################
+# CMake policies
+#
+# Search in <PackageName>_ROOT:
+#   https://cmake.org/cmake/help/v3.12/policy/CMP0074.html
+################################################################################
+
+if(POLICY CMP0074)
+    cmake_policy(SET CMP0074 NEW)
+endif()
+
+
 ###############################################################################
 # Language Flags
 ###############################################################################
diff --git a/include/pmacc/PMaccConfig.cmake b/include/pmacc/PMaccConfig.cmake
index 27a551692d..8eab73817e 100644
--- a/include/pmacc/PMaccConfig.cmake
+++ b/include/pmacc/PMaccConfig.cmake
@@ -56,6 +56,18 @@ set_property(CACHE CMAKE_BUILD_TYPE PROPERTY STRINGS "${PMACC_BUILD_TYPE}")
 unset(PMACC_BUILD_TYPE)
 
 
+################################################################################
+# CMake policies
+#
+# Search in <PackageName>_ROOT:
+#   https://cmake.org/cmake/help/v3.12/policy/CMP0074.html
+################################################################################
+
+if(POLICY CMP0074)
+    cmake_policy(SET CMP0074 NEW)
+endif()
+
+
 ###############################################################################
 # Language Flags
 ###############################################################################
diff --git a/include/pmacc/test/random/CMakeLists.txt b/include/pmacc/test/random/CMakeLists.txt
index d2041ffcdc..59d4f6c293 100644
--- a/include/pmacc/test/random/CMakeLists.txt
+++ b/include/pmacc/test/random/CMakeLists.txt
@@ -24,6 +24,17 @@ project("TestRandomGenerators")
 
 set(CMAKE_PREFIX_PATH ${CMAKE_PREFIX_PATH} "${CMAKE_CURRENT_SOURCE_DIR}/../..")
 
+
+################################################################################
+# CMake policies
+#
+# Search in <PackageName>_ROOT:
+#   https://cmake.org/cmake/help/v3.12/policy/CMP0074.html
+################################################################################
+if(POLICY CMP0074)
+    cmake_policy(SET CMP0074 NEW)
+endif()
+
 ################################################################################
 # PMacc
 ################################################################################
diff --git a/share/pmacc/examples/gameOfLife2D/CMakeLists.txt b/share/pmacc/examples/gameOfLife2D/CMakeLists.txt
index 5f17438604..b12134060e 100644
--- a/share/pmacc/examples/gameOfLife2D/CMakeLists.txt
+++ b/share/pmacc/examples/gameOfLife2D/CMakeLists.txt
@@ -44,6 +44,18 @@ list(APPEND CMAKE_PREFIX_PATH "$ENV{BOOST_ROOT}")
 list(APPEND CMAKE_PREFIX_PATH "$ENV{CMAKE_PREFIX_PATH}")
 
 
+################################################################################
+# CMake policies
+#
+# Search in <PackageName>_ROOT:
+#   https://cmake.org/cmake/help/v3.12/policy/CMP0074.html
+################################################################################
+
+if(POLICY CMP0074)
+    cmake_policy(SET CMP0074 NEW)
+endif()
+
+
 ###############################################################################
 # Language Flags
 ###############################################################################
diff --git a/src/tools/png2gas/CMakeLists.txt b/src/tools/png2gas/CMakeLists.txt
index 3d4ca04d89..9ee5102dea 100644
--- a/src/tools/png2gas/CMakeLists.txt
+++ b/src/tools/png2gas/CMakeLists.txt
@@ -52,6 +52,18 @@ set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH}
     ${CMAKE_CURRENT_SOURCE_DIR}/../../../thirdParty/cmake-modules/)
 
 
+################################################################################
+# CMake policies
+#
+# Search in <PackageName>_ROOT:
+#   https://cmake.org/cmake/help/v3.12/policy/CMP0074.html
+################################################################################
+
+if(POLICY CMP0074)
+    cmake_policy(SET CMP0074 NEW)
+endif()
+
+
 ###############################################################################
 # Language Flags
 ###############################################################################
diff --git a/src/tools/splash2txt/CMakeLists.txt b/src/tools/splash2txt/CMakeLists.txt
index ac5636b42d..65459b214b 100644
--- a/src/tools/splash2txt/CMakeLists.txt
+++ b/src/tools/splash2txt/CMakeLists.txt
@@ -46,6 +46,17 @@ set(CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH}
 
 include_directories(include)
 
+################################################################################
+# CMake policies
+#
+# Search in <PackageName>_ROOT:
+#   https://cmake.org/cmake/help/v3.12/policy/CMP0074.html
+################################################################################
+
+if(POLICY CMP0074)
+    cmake_policy(SET CMP0074 NEW)
+endif()
+
 
 ###############################################################################
 # Language Flags

From 044f2f1d03fd9ba5bf3ffd635852210091951875 Mon Sep 17 00:00:00 2001
From: Third Party <picongpu@hzdr.de>
Date: Thu, 14 Feb 2019 17:20:28 +0100
Subject: [PATCH 38/40] Squashed 'thirdParty/mallocMC/' changes from
 4b779a34c..e2533d141

e2533d141 Merge pull request #153 from psychocoderHPC/topic-versionIncrease2.3.1
2723bc13d Merge pull request #154 from ax3l/topic-cmake312rootHints
60c467ece version increase to 2.3.1
5f57e6d1f CMake: Honor _ROOT Env Hints
e0bbb5fdd Merge pull request #151 from ax3l/merge-v230master
16cd2b9a5 Merge remote-tracking branch 'mainline/master' into merge-v230master
8dbb2dd6e Merge pull request #150 from psychocoderHPC/fix-warpsPerSM
cab1dd5fc fix style, fix wrong used qualifier
5a71062db fix illegal memory access

git-subtree-dir: thirdParty/mallocMC
git-subtree-split: e2533d14101c9fa7af3d11b9d02277591e06d8e4
---
 CHANGELOG.md                                  | 15 ++++++
 CMakeLists.txt                                | 12 +++++
 .../creationPolicies/Scatter_impl.hpp         |  6 +--
 .../distributionPolicies/XMallocSIMD_impl.hpp | 10 ++--
 src/include/mallocMC/mallocMC_utils.hpp       | 47 +++++++++++++++++++
 src/include/mallocMC/version.hpp              |  2 +-
 6 files changed, 85 insertions(+), 7 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 393035ee24..928f30158f 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,6 +1,21 @@
 Change Log / Release Log for mallocMC
 ================================================================
 
+2.3.1crp
+--------
+**Date:** 2019-02-14
+
+A critical bug was fixed which can result in an illegal memory access.
+
+### Changes to mallocMC 2.3.0crp
+
+**Bug fixes**
+ - fix illegal memory access in `XMallocSIMD` #150
+
+**Misc:**
+ - CMake: Honor `<packageName>_ROOT` Env Hints #154
+
+
 2.3.0crp
 --------
 **Date:** 2018-06-11
diff --git a/CMakeLists.txt b/CMakeLists.txt
index b71c63e6a3..4376bd1321 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -6,6 +6,18 @@ set(CMAKE_PREFIX_PATH "/usr/lib/x86_64-linux-gnu/"
     "$ENV{CUDA_ROOT}" "$ENV{BOOST_ROOT}")
 
 
+################################################################################
+# CMake policies
+#
+# Search in <PackageName>_ROOT:
+#   https://cmake.org/cmake/help/v3.12/policy/CMP0074.html
+################################################################################
+
+if(POLICY CMP0074)
+    cmake_policy(SET CMP0074 NEW)
+endif()
+
+
 ###############################################################################
 # CUDA
 ###############################################################################
diff --git a/src/include/mallocMC/creationPolicies/Scatter_impl.hpp b/src/include/mallocMC/creationPolicies/Scatter_impl.hpp
index 7f420853a9..88e3a611fc 100644
--- a/src/include/mallocMC/creationPolicies/Scatter_impl.hpp
+++ b/src/include/mallocMC/creationPolicies/Scatter_impl.hpp
@@ -933,15 +933,15 @@ namespace ScatterKernelDetail{
        */
       __device__ unsigned getAvailableSlotsAccelerator(size_t slotSize){
         int linearId;
-        int wId = threadIdx.x >> 5; //do not use warpid-function, since this value is not guaranteed to be stable across warp lifetime
+        int wId = warpid_withinblock(); //do not use warpid-function, since this value is not guaranteed to be stable across warp lifetime
 
 #if(__CUDACC_VER_MAJOR__ >= 9)
         uint32 activeThreads  = __popc(__activemask());
 #else
         uint32 activeThreads  = __popc(__ballot(true));
 #endif
-        __shared__ uint32 activePerWarp[32]; //32 is the maximum number of warps in a block
-        __shared__ unsigned warpResults[32];
+        __shared__ uint32 activePerWarp[MaxThreadsPerBlock::value / WarpSize::value]; //maximum number of warps in a block
+        __shared__ unsigned warpResults[MaxThreadsPerBlock::value / WarpSize::value];
         warpResults[wId]   = 0;
         activePerWarp[wId] = 0;
 
diff --git a/src/include/mallocMC/distributionPolicies/XMallocSIMD_impl.hpp b/src/include/mallocMC/distributionPolicies/XMallocSIMD_impl.hpp
index 54b804d9a4..37afe7f898 100644
--- a/src/include/mallocMC/distributionPolicies/XMallocSIMD_impl.hpp
+++ b/src/include/mallocMC/distributionPolicies/XMallocSIMD_impl.hpp
@@ -60,6 +60,11 @@ namespace DistributionPolicies{
     public:
       typedef T_Config Properties;
 
+      MAMC_ACCELERATOR
+      XMallocSIMD() : can_use_coalescing(false), warpid(warpid_withinblock()),
+        myoffset(0), threadcount(0), req_size(0)
+      {}
+
     private:
 /** Allow for a hierarchical validation of parameters:
  *
@@ -89,12 +94,11 @@ namespace DistributionPolicies{
       uint32 collect(uint32 bytes){
 
         can_use_coalescing = false;
-        warpid = mallocMC::warpid();
         myoffset = 0;
         threadcount = 0;
 
         //init with initial counter
-        __shared__ uint32 warp_sizecounter[32];
+        __shared__ uint32 warp_sizecounter[MaxThreadsPerBlock::value / WarpSize::value];
         warp_sizecounter[warpid] = 16;
 
         //second half: make sure that all coalesced allocations can fit within one page
@@ -121,7 +125,7 @@ namespace DistributionPolicies{
 
       MAMC_ACCELERATOR
       void* distribute(void* allocatedMem){
-        __shared__ char* warp_res[32];
+        __shared__ char* warp_res[MaxThreadsPerBlock::value / WarpSize::value];
 
         char* myalloc = (char*) allocatedMem;
         if (req_size && can_use_coalescing)
diff --git a/src/include/mallocMC/mallocMC_utils.hpp b/src/include/mallocMC/mallocMC_utils.hpp
index d6a7e5c734..2353cd373b 100644
--- a/src/include/mallocMC/mallocMC_utils.hpp
+++ b/src/include/mallocMC/mallocMC_utils.hpp
@@ -122,12 +122,24 @@ namespace mallocMC
     return mylaneid;
   }
 
+  /** warp index within a multiprocessor
+   *
+   * Index of the warp within the multiprocessor at the moment of the query.
+   * The result is volatile and can be different with each query.
+   *
+   * @return current index of the warp
+   */
   MAMC_ACCELERATOR inline boost::uint32_t warpid()
   {
     boost::uint32_t mywarpid;
     asm("mov.u32 %0, %%warpid;" : "=r" (mywarpid));
     return mywarpid;
   }
+
+  /** maximum number of warps on a multiprocessor
+   *
+   * @return maximum number of warps on a multiprocessor
+   */
   MAMC_ACCELERATOR inline boost::uint32_t nwarpid()
   {
     boost::uint32_t mynwarpid;
@@ -186,4 +198,39 @@ namespace mallocMC
   template<class T>
   MAMC_HOST MAMC_ACCELERATOR inline T divup(T a, T b) { return (a + b - 1)/b; }
 
+  /** the maximal number threads per block
+   *
+   * https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities
+   */
+  struct MaxThreadsPerBlock
+  {
+    // valid for sm_2.X - sm_7.5
+    BOOST_STATIC_CONSTEXPR uint32_t value = 1024;
+  };
+
+  /** number of threads within a warp
+   *
+   * https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#compute-capabilities
+   */
+  struct WarpSize
+  {
+    // valid for sm_2.X - sm_7.5
+    BOOST_STATIC_CONSTEXPR uint32_t value = 32;
+  };
+
+  /** warp id within a cuda block
+   *
+   * The id is constant over the lifetime of the thread.
+   * The id is not equal to warpid().
+   *
+   * @return warp id within the block
+   */
+  MAMC_ACCELERATOR inline boost::uint32_t warpid_withinblock()
+  {
+    return (
+      threadIdx.z * blockDim.y * blockDim.x +
+      threadIdx.y * blockDim.x +
+      threadIdx.x
+    ) / WarpSize::value;
+  }
 }
diff --git a/src/include/mallocMC/version.hpp b/src/include/mallocMC/version.hpp
index 28e8091464..f2c15e1e19 100644
--- a/src/include/mallocMC/version.hpp
+++ b/src/include/mallocMC/version.hpp
@@ -39,7 +39,7 @@
 /** the mallocMC version: major API changes should be reflected here */
 #define MALLOCMC_VERSION_MAJOR 2
 #define MALLOCMC_VERSION_MINOR 3
-#define MALLOCMC_VERSION_PATCH 0
+#define MALLOCMC_VERSION_PATCH 1
 
 /** the mallocMC flavor is used to differentiate the releases of the
  *  Computational Radiation Physics group (crp) from other releases

From 9f89a96f00c78e7c6dbdc60ee1f1cd3e0a1d4db7 Mon Sep 17 00:00:00 2001
From: tools <picongpu@hzdr.de>
Date: Wed, 13 Feb 2019 09:34:23 +0100
Subject: [PATCH 39/40] Bump Version: 0.4.3

Performed with script in

  src/tools/bin/newVersion.sh
---
 docs/source/conf.py                                |  4 ++--
 include/picongpu/version.hpp                       |  4 ++--
 share/picongpu/dockerfiles/README.rst              | 10 +++++-----
 share/picongpu/dockerfiles/ubuntu-1604/Dockerfile  |  2 +-
 share/picongpu/dockerfiles/ubuntu-1604/Singularity |  4 ++--
 5 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/docs/source/conf.py b/docs/source/conf.py
index e2c9bc977e..675f35fae5 100644
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -105,9 +105,9 @@
 # built documents.
 #
 # The short X.Y version.
-version = u'0.4.2'
+version = u'0.4.3'
 # The full version, including alpha/beta/rc tags.
-release = u'0.4.2'
+release = u'0.4.3'
 
 # The language for content autogenerated by Sphinx. Refer to documentation
 # for a list of supported languages.
diff --git a/include/picongpu/version.hpp b/include/picongpu/version.hpp
index 84255de2db..cb4e4a0e5f 100644
--- a/include/picongpu/version.hpp
+++ b/include/picongpu/version.hpp
@@ -1,4 +1,4 @@
-/* Copyright 2015-2018 Axel Huebl
+/* Copyright 2015-2019 Axel Huebl
  *
  * This file is part of PIConGPU.
  *
@@ -21,5 +21,5 @@
 
 #define PICONGPU_VERSION_MAJOR 0
 #define PICONGPU_VERSION_MINOR 4
-#define PICONGPU_VERSION_PATCH 2
+#define PICONGPU_VERSION_PATCH 3
 #define PICONGPU_VERSION_LABEL ""
diff --git a/share/picongpu/dockerfiles/README.rst b/share/picongpu/dockerfiles/README.rst
index 1fdca1c639..e88ec5081d 100644
--- a/share/picongpu/dockerfiles/README.rst
+++ b/share/picongpu/dockerfiles/README.rst
@@ -25,7 +25,7 @@ This exposes the ISAAC port to connect via the webclient to.
 .. code:: bash
 
     docker pull ax3l/picongpu
-    docker run --runtime=nvidia -p 2459:2459 -t ax3l/picongpu:0.4.0 lwfa_live
+    docker run --runtime=nvidia -p 2459:2459 -t ax3l/picongpu:0.4.3 lwfa_live
     # open firefox and isaac client
 
 or
@@ -56,12 +56,12 @@ You can also push the result to dockerhub and singularity-hub (you need an accou
     cd ubuntu-1604
 
     # docker image
-    docker build -t ax3l/picongpu:0.4.0 .
+    docker build -t ax3l/picongpu:0.4.3 .
     # optional: push to dockerhub (needed for singularity bootstrap)
     docker login
-    docker push ax3l/picongpu:0.4.0
+    docker push ax3l/picongpu:0.4.3
     # optional: mark as latest release
-    docker tag ax3l/picongpu:0.4.0 ax3l/picongpu:latest
+    docker tag ax3l/picongpu:0.4.3 ax3l/picongpu:latest
     docker push ax3l/picongpu:latest
 
     # singularity image
@@ -69,7 +69,7 @@ You can also push the result to dockerhub and singularity-hub (you need an accou
     sudo singularity bootstrap picongpu.img Singularity
     # optional: push to a singularity registry
     # setup your $HOME/.sregistry first
-    sregistry push picongpu.img --name ax3l/picongpu --tag 0.4.0
+    sregistry push picongpu.img --name ax3l/picongpu --tag 0.4.3
 
 Recipes
 -------
diff --git a/share/picongpu/dockerfiles/ubuntu-1604/Dockerfile b/share/picongpu/dockerfiles/ubuntu-1604/Dockerfile
index 088a3f1f60..2b993ecb20 100644
--- a/share/picongpu/dockerfiles/ubuntu-1604/Dockerfile
+++ b/share/picongpu/dockerfiles/ubuntu-1604/Dockerfile
@@ -6,7 +6,7 @@ ENV        DEBIAN_FRONTEND=noninteractive \
            FORCE_UNSAFE_CONFIGURE=1 \
            SPACK_ROOT=/usr/local \
            SPACK_EXTRA_REPO=/usr/local/share/spack-repo \
-           PIC_PACKAGE='picongpu@0.4.0+isaac backend=cuda'
+           PIC_PACKAGE='picongpu@0.4.3+isaac backend=cuda'
 
 # install minimal spack dependencies
 #   - adds gfortran for spack's openmpi package
diff --git a/share/picongpu/dockerfiles/ubuntu-1604/Singularity b/share/picongpu/dockerfiles/ubuntu-1604/Singularity
index 437fe8eb8a..b46ad09e9b 100644
--- a/share/picongpu/dockerfiles/ubuntu-1604/Singularity
+++ b/share/picongpu/dockerfiles/ubuntu-1604/Singularity
@@ -1,10 +1,10 @@
 Bootstrap: docker
-From: ax3l/picongpu:0.4.0
+From: ax3l/picongpu:0.4.3
 
 
 %labels
 Maintainer Axel Huebl <a.huebl@hzdr.de>
-Version 0.4.0
+Version 0.4.3
 
 
 %runscript

From e47f554d0b7002e023ff490ca65b059778f3002e Mon Sep 17 00:00:00 2001
From: Axel Huebl <axel.huebl@plasma.ninja>
Date: Wed, 13 Feb 2019 09:55:38 +0100
Subject: [PATCH 40/40] Changelog: 0.4.3

---
 CHANGELOG.md | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 58 insertions(+), 1 deletion(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 9137a71cfe..5b9eea0baa 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,9 +1,66 @@
 Changelog
 =========
 
+0.4.3
+-----
+
+**Date:** 2019-02-14
+
+System Updates and Bug Fixes
+
+This release adds updates and new HPC system templates. Important bug
+fixes include I/O work-arounds for issues in OpenMPI 2.0-4.0 (mainly
+with HDF5), guards for particle creation with user-defined
+profiles, a fixed binomial current smoothing, checks for the number
+of devices in grid distributions and container (Docker & Singularity)
+modernizations.
+
+Thanks to Axel Huebl, Alexander Debus, Igor Andriyash, Marco Garten,
+Sergei Bastrakov, Adam Simpson, Richard Pausch, Juncheng E,
+Klaus Steiniger, and René Widera for contributions to this release!
+
+### Changes to "0.4.2"
+
+**Bug Fixes:**
+- fix particle creation if density `<=` zero #2831
+- fix binomial current interpolation #2838
+- Docker & Singularity updates #2847
+- OpenMPI: use ROMIO for IO #2841 #2857
+- `--gridDist`: verify devices and blocks #2876
+- Phase space plugin: unit of colorbar in 2D3V #2878
+
+**Misc:**
+- `ionizer.param`: fix typo in "Aluminium" #2865
+- System Template Updates:
+  - Add system links #2818
+  - Taurus:
+    - add project #2819
+    - add Power9 V100 nodes #2856 
+  - add D.A.V.I.D.E (CINECA) #2821
+  - add JURECA (JSC) #2869
+  - add JUWELS (JSC) #2874
+  - Hypnos (HZDR): CMake update #2887
+  - Slurm systems: link `stdout` to `simOutput/output` #2839
+- Docs:
+  - Change link to CRP group @ HZDR #2814
+  - `FreeRng.def`: typo in example usage #2825
+  - More details on source builds #2828
+  - Dependencies: Blosc install #2829
+  - Ionization plot title linebreak #2867
+- plugins:
+  - ADIOS & phase space `-Wterminate` #2817
+  - Radiation: update documented options #2842
+- Update versions script: containers #2846
+- pyflakes: `str`/`bytes`/`int` compares #2866
+- Travis CI: Fix Spack CMake Install #2879
+- Contributor name typo in `LICENSE.md` #2880
+- Update mallocMC to 2.3.1crp #2893
+- CMake: Honor `_ROOT` Env Hints #2891 #2892 #2893
+
+
 0.4.2
 -----
-**Date:** 2018-11-TBA
+**Date:** 2018-11-19
 
 CPU Plugin Performance