update ml workflow for pasc revisions (ECP-WarpX#4768)

* update ml workflow for pasc revisions * update pasc reference: arxiv link+add andrew * update figures * Update Bibtex * refactor * clean duplication in references * add explicit reference to Zenodo archive
RevathiJambunathan · Mar 16, 2024 · d22e423 · d22e423
1 parent 4e568ad
commit d22e423
Show file tree

Hide file tree

Showing 7 changed files with 100 additions and 72 deletions.
diff --git a/Docs/source/acknowledge_us.rst b/Docs/source/acknowledge_us.rst
@@ -53,6 +53,11 @@ Prior WarpX references
 
 If your project uses a specific algorithm or component, please consider citing the respective publications in addition.
 
+- Sandberg R T, Lehe R, Mitchell C E, Garten M, Myers A, Qiang J, Vay J-L and Huebl A.
+  **Synthesizing Particle-in-Cell Simulations Through Learning and GPU Computing for Hybrid Particle Accelerator Beamlines**.
+  Proc. of Platform for Advanced Scientific Computing (PASC'24), *submitted*, 2024.
+  `preprint <http://arxiv.org/abs/2402.17248>__`
+
 - Sandberg R T, Lehe R, Mitchell C E, Garten M, Qiang J, Vay J-L and Huebl A.
   **Hybrid Beamline Element ML-Training for Surrogates in the ImpactX Beam-Dynamics Code**.
   14th International Particle Accelerator Conference (IPAC'23), WEPA101, 2023.

diff --git a/Docs/source/highlights.rst b/Docs/source/highlights.rst
@@ -24,6 +24,11 @@ Scientific works in laser-plasma and beam-plasma acceleration.
    Phys. Rev. Research **5**, 033112, 2023
    `DOI:10.1103/PhysRevResearch.5.033112 <https://doi.org/10.1103/PhysRevResearch.5.033112>`__
 
+#. Sandberg R T, Lehe R, Mitchell C E, Garten M, Myers A, Qiang J, Vay J-L and Huebl A.
+  **Synthesizing Particle-in-Cell Simulations Through Learning and GPU Computing for Hybrid Particle Accelerator Beamlines**.
+  Proc. of Platform for Advanced Scientific Computing (PASC'24), *submitted*, 2024.
+  `preprint <http://arxiv.org/abs/2402.17248>__`
+
 #. Sandberg R T, Lehe R, Mitchell C E, Garten M, Qiang J, Vay J-L and Huebl A.
    **Hybrid Beamline Element ML-Training for Surrogates in the ImpactX Beam-Dynamics Code**.
    14th International Particle Accelerator Conference (IPAC'23), WEPA101, 2023.
@@ -96,6 +101,11 @@ Particle Accelerator & Beam Physics
 
 Scientific works in particle and beam modeling.
 
+#. Sandberg R T, Lehe R, Mitchell C E, Garten M, Myers A, Qiang J, Vay J-L and Huebl A.
+  **Synthesizing Particle-in-Cell Simulations Through Learning and GPU Computing for Hybrid Particle Accelerator Beamlines**.
+  Proc. of Platform for Advanced Scientific Computing (PASC'24), *submitted*, 2024.
+  `preprint <http://arxiv.org/abs/2402.17248>__`
+
 #. Sandberg R T, Lehe R, Mitchell C E, Garten M, Qiang J, Vay J-L, Huebl A.
    **Hybrid Beamline Element ML-Training for Surrogates in the ImpactX Beam-Dynamics Code**.
    14th International Particle Accelerator Conference (IPAC'23), WEPA101, *in print*, 2023.
@@ -128,6 +138,7 @@ Microelectronics
    **Characterization of Transmission Lines in Microelectronic Circuits Using the ARTEMIS Solver**.
    IEEE Journal on Multiscale and Multiphysics Computational Techniques, vol. 8, pp. 31-39, 2023.
    `DOI:10.1109/JMMCT.2022.3228281 <https://doi.org/10.1109/JMMCT.2022.3228281>`__
+
 #. Kumar P, Nonaka A, Jambunathan R, Pahwa G and Salahuddin S, Yao Z.
    **FerroX: A GPU-accelerated, 3D Phase-Field Simulation Framework for Modeling Ferroelectric Devices**.
    arXiv preprint, 2022.

diff --git a/Docs/source/refs.bib b/Docs/source/refs.bib
@@ -207,13 +207,15 @@ @article{Roedel2010
 
 @misc{SandbergPASC24,
 address = {Zuerich, Switzerland},
-author = {Ryan Sandberg and Remi Lehe and Chad E Mitchell and Marco Garten and Ji Qiang and Jean-Luc Vay and Axel Huebl},
+author = {Ryan Sandberg and Remi Lehe and Chad E Mitchell and Marco Garten and Andrew Myers and Ji Qiang and Jean-Luc Vay and Axel Huebl},
 booktitle = {Proc. of PASC24},
-note = {submitted},
+note = {accepted},
 series = {PASC'24 - Platform for Advanced Scientific Computing},
 title = {{Synthesizing Particle-in-Cell Simulations Through Learning and GPU Computing for Hybrid Particle Accelerator Beamlines}},
 venue = {Zuerich, Switzerland},
-year = {2024}
+year = {2024},
+doi = {10.48550/arXiv.2402.17248},
+url = {https://arxiv.org/abs/2402.17248}
 }
 
 @inproceedings{SandbergIPAC23,

diff --git a/Docs/source/usage/workflows/ml_dataset_training.rst b/Docs/source/usage/workflows/ml_dataset_training.rst
@@ -14,42 +14,42 @@ For example, a simulation determined by the following input script
     .. literalinclude:: ml_materials/run_warpx_training.py
        :language: python
 
-In this section we walk through a workflow for data processing and model training.
+In this section we walk through a workflow for data processing and model training, using data from this input script as an example.
+The simulation output is stored in an online `Zenodo archive <https://zenodo.org/records/10368972>`__, in the ``lab_particle_diags`` directory.
+In the example scripts provided here, the data is downloaded from the Zenodo archive, properly formatted, and used to train a neural network.
 This workflow was developed and first presented in :cite:t:`ml-SandbergIPAC23,ml-SandbergPASC24`.
-
-This assumes you have an up-to-date environment with PyTorch and openPMD.
+It assumes you have an up-to-date environment with PyTorch and openPMD.
 
 Data Cleaning
 -------------
 
-It is important to inspect the data for artifacts to
+It is important to inspect the data for artifacts, to
 check that input/output data make sense.
-If we plot the final phase space for beams 1-8,
-the particle data is distributed in a single blob,
-as shown by :numref:`fig_phase_space_beam_1` for beam 1.
-This is as we expect and what is optimal for training neural networks.
+If we plot the final phase space of the particle beam,
+shown in :numref:`fig_unclean_phase_space`.
+we see outlying particles.
+Looking closer at the z-pz space, we see that some particles were not trapped in the accelerating region of the wake and have much less energy than the rest of the beam.
+
+.. _fig_unclean_phase_space:
 
-.. _fig_phase_space_beam_1:
+.. figure:: https://gist.githubusercontent.com/RTSandberg/649a81cc0e7926684f103729483eff90/raw/095ac2daccbcf197fa4e18a8f8505711b27e807a/unclean_stage_0.png
+   :alt: Plot showing the final phase space projections of a particle beam through a laser-plasma acceleration element where some beam particles were not accelerated.
 
-.. figure:: https://user-images.githubusercontent.com/10621396/290010209-c55baf1c-dd98-4d56-a675-ad3729481eee.png
-   :alt: Plot showing the final phase space projections for beam 1 of the training data, for a surrogate to stage 1.
+   The final phase space projections of a particle beam through a laser-plasma acceleration element where some beam particles were not accelerated.
 
-   The final phase space projections for beam 1 of the training data, for a surrogate to stage 1.
+To assist our neural network in learning dynamics of interest, we filter out these particles.
+It is sufficient for our purposes to select particles that are not too far back, setting
+``particle_selection={'z':[0.280025, None]}``.
+After filtering, we can see in :numref:`fig_clean_phase_space` that the beam phase space projections are much cleaner -- this is the beam we want to train on.
 
-.. _fig_phase_space_beam_0:
+.. _fig_clean_phase_space:
 
-.. figure:: https://user-images.githubusercontent.com/10621396/290010282-40560ac4-8509-4599-82ca-167bb1739cff.png
-   :alt: Plot showing the final phase space projections for beam 0 of the training data, for a surrogate to stage 0.
+.. figure:: https://gist.githubusercontent.com/RTSandberg/649a81cc0e7926684f103729483eff90/raw/095ac2daccbcf197fa4e18a8f8505711b27e807a/clean_stage_0.png
+   :alt: Plot showing the final phase space projections of a particle beam through a laser-plasma acceleration element after filtering out outlying particles.
 
-   The final phase space projections for beam 0 of the training data, for a surrogate to stage 0
+   The final phase space projections of a particle beam through a laser-plasma acceleration element after filtering out outlying particles.
 
-On the other hand, the final phase space for beam 0, shown in :numref:`fig_phase_space_beam_1`,
-has a halo of outlying particles.
-Looking closer at the z-pz space, we see that some particles got caught in a decelerating
-region of the wake, have slipped back and are much slower than the rest of the beam.
-To assist our neural network in learning dynamics of interest, we filter out these particles.
-It is sufficient for our purposes to select particles that are not too far back, setting
-``particle_selection={'z':[0.28002, None]}``. Then a particle tracker is set up to make sure
+A particle tracker is set up to make sure
 we consistently filter out these particles from both the initial and final data.
 
 .. literalinclude:: ml_materials/create_dataset.py
@@ -58,6 +58,9 @@ we consistently filter out these particles from both the initial and final data.
    :start-after: # Manual: Particle tracking START
    :end-before: # Manual: Particle tracking END
 
+This data cleaning ensures that the particle data is distributed in a single blob,
+as is optimal for training neural networks.
+
 Create Normalized Dataset
 -------------------------
 
@@ -119,7 +122,12 @@ This data are converted to an :math:`N\times 6` numpy array and then to a PyTorc
 Save Normalizations and Normalized Data
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-With the data properly normalized, it and the normalizations are saved to file for
+The data is split into training and testing subsets.
+We take most of the data (70%) for training, meaning that data is used to update
+the neural network parameters.
+The testing data is reserved to determine how well the neural network generalizes;
+that is, how well the neural network performs on data that wasn't used to update the neural network parameters.
+With the data split and properly normalized, it and the normalizations are saved to file for
 use in training and inference.
 
 .. literalinclude:: ml_materials/create_dataset.py
@@ -131,13 +139,22 @@ use in training and inference.
 Neural Network Structure
 ------------------------
 
-It was found in :cite:t:`ml-SandbergPASC24` that reasonable surrogate models are obtained with
-shallow feedforward neural networks consisting of fewer than 10 hidden layers and
-just under 1000 nodes per layer.
+It was found in :cite:t:`ml-SandbergPASC24` that a reasonable surrogate model is obtained with
+shallow feedforward neural networks consisting of about 5 hidden layers and 700-900 nodes per layer.
 The example shown here uses 3 hidden layers and 20 nodes per layer
 and is trained for 10 epochs.
 
+Some utility functions for creating neural networks are provided in the script below.
+These are mostly convenience wrappers and utilities for working with `PyTorch <https://pytorch.org/>`__ neural network objects.
+This script is imported in the training scripts shown later.
+
+.. dropdown:: Python neural network class definitions
+   :color: light
+   :icon: info
+   :animate: fade-in-slide-down
 
+    .. literalinclude:: ml_materials/neural_network_classes.py
+       :language: python3
 
 Train and Save Neural Network
 -----------------------------
@@ -188,8 +205,8 @@ which is later divided by the size of the dataset in the training loop.
    :start-after: # Manual: Test function START
    :end-before: # Manual: Test function END
 
-Train Loop
-^^^^^^^^^^
+Training Loop
+^^^^^^^^^^^^^
 
 The full training loop performs ``n_epochs`` number of iterations.
 At each iteration the training and testing functions are called,
@@ -228,22 +245,22 @@ When the test-loss starts to trend flat or even upward, the neural network is no
 
 .. _fig_train_test_loss:
 
-.. figure:: https://user-images.githubusercontent.com/10621396/290010428-f83725ab-a08f-494c-b075-314b0d26cb9a.png
+.. figure:: https://gist.githubusercontent.com/RTSandberg/649a81cc0e7926684f103729483eff90/raw/095ac2daccbcf197fa4e18a8f8505711b27e807a/beam_stage_0_training_testing_error.png
    :alt: Plot of training and testing loss curves versus number of training epochs.
 
    Training (in blue) and testing (in green) loss curves versus number of training epochs.
 
 .. _fig_train_evaluation:
 
-.. figure:: https://user-images.githubusercontent.com/10621396/290010486-4a3541e7-e0be-4cf1-b33b-57d5e5985196.png
+.. figure:: https://gist.githubusercontent.com/RTSandberg/649a81cc0e7926684f103729483eff90/raw/095ac2daccbcf197fa4e18a8f8505711b27e807a/beam_stage_0_model_evaluation.png
    :alt: Plot comparing model prediction with simulation output.
 
    A comparison of model prediction (yellow-red dots, colored by mean-squared error) with simulation output (black dots).
 
 A visual inspection of the model prediction can be seen in :numref:`fig_train_evaluation`.
 This plot compares the model prediction, with dots colored by mean-square error, on the testing data with the actual simulation output in black.
 The model obtained with the hyperparameters chosen here trains quickly but is not very accurate.
-A more accurate model is obtained with 5 hidden layers and 800 nodes per layer,
+A more accurate model is obtained with 5 hidden layers and 900 nodes per layer,
 as discussed in :cite:t:`ml-SandbergPASC24`.
 
 These figures can be generated with the following Python script.
@@ -261,7 +278,7 @@ Surrogate Usage in Accelerator Physics
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 A neural network such as the one we trained here can be incorporated in other BLAST codes.
-`Consider the example using neural networks in ImpactX <https://impactx.readthedocs.io/en/latest/usage/examples/pytorch_surrogate_model/README.html>`__.
+Consider this `example using neural network surrogates of WarpX simulations in ImpactX <https://impactx.readthedocs.io/en/latest/usage/examples/pytorch_surrogate_model/README.html>`__.
 
 .. bibliography::
    :keyprefix: ml-
diff --git a/Docs/source/usage/workflows/ml_materials/create_dataset.py b/Docs/source/usage/workflows/ml_materials/create_dataset.py
@@ -7,17 +7,16 @@
 # Authors: Ryan Sandberg
 # License: BSD-3-Clause-LBNL
 #
+
 import os
-import tarfile
+import zipfile
 from urllib import request
 
 import numpy as np
-
-c = 2.998e8
-
 import torch
 from openpmd_viewer import OpenPMDTimeSeries, ParticleTracker
 
+c = 2.998e8
 ###############
 
 def sanitize_dir_strings(*dir_strings):
@@ -31,14 +30,15 @@ def sanitize_dir_strings(*dir_strings):
     return dir_strings
 
 def download_and_unzip(url, data_dir):
-  request.urlretrieve(url, data_dir)
-  with tarfile.open(data_dir) as tar_dataset:
-      tar_dataset.extractall()
+    request.urlretrieve(url, data_dir)
+    with zipfile.ZipFile(data_dir, 'r') as zip_dataset:
+        zip_dataset.extractall()
 
 def create_source_target_data(data_dir,
                               species,
                               source_index=0,
                               target_index=-1,
+                              survivor_select_index=-1,
                               particle_selection=None
                              ):
     """Create dataset from openPMD files
@@ -67,7 +67,7 @@ def create_source_target_data(data_dir,
     relevant_times = [ts.t[source_index], ts.t[target_index]]
 
     # Manual: Particle tracking START
-    iteration = ts.iterations[target_index]
+    iteration = ts.iterations[survivor_select_index]
     pt = ParticleTracker( ts,
                          species=species,
                          iteration=iteration,
@@ -116,7 +116,6 @@ def create_source_target_data(data_dir,
 
     return source_data, source_means, source_stds, target_data, target_means, target_stds, relevant_times
 
-
 def save_warpx_surrogate_data(dataset_fullpath_filename,
                               diag_dir,
                               species,
@@ -128,12 +127,14 @@ def save_warpx_surrogate_data(dataset_fullpath_filename,
                               particle_selection=None
                              ):
 
-    source_target_data = create_source_target_data(data_dir=diag_dir,
-                                                      species=species,
-                                                      source_index=source_index,
-                                                      target_index=target_index,
-                                                      particle_selection=particle_selection
-                                                     )
+    source_target_data = create_source_target_data(
+        data_dir=diag_dir,
+        species=species,
+        source_index=source_index,
+        target_index=target_index,
+        survivor_select_index=survivor_select_index,
+        particle_selection=particle_selection
+    )
     source_data, source_means, source_stds, target_data, target_means, target_stds, times = source_target_data
 
     # Manual: Save dataset START
@@ -161,11 +162,10 @@ def save_warpx_surrogate_data(dataset_fullpath_filename,
 ######## end utility functions #############
 ######## start dataset creation ############
 
-data_url = "https://zenodo.org/records/10368972/files/ml_example_training.tar.gz?download=1"
-download_and_unzip(data_url, "training_dataset.tar.gz")
+data_url = "https://zenodo.org/records/10810754/files/lab_particle_diags.zip?download=1"
+download_and_unzip(data_url, "lab_particle_diags.zip")
 data_dir = "lab_particle_diags/lab_particle_diags/"
 
-
 # create data set
 
 source_index = 0
@@ -178,7 +178,7 @@ def save_warpx_surrogate_data(dataset_fullpath_filename,
 
 # improve stage 0 dataset
 stage_i = 0
-select = {'z':[0.28002, None]}
+select = {'z':[0.280025, None]}
 species = f'beam_stage_{stage_i}'
 dataset_filename = f'dataset_{species}.pt'
 dataset_file = 'datasets/' + dataset_filename
@@ -193,7 +193,7 @@ def save_warpx_surrogate_data(dataset_fullpath_filename,
                 particle_selection=select
                )
 
-for stage_i in range(1,9):
+for stage_i in range(1,15):
     species = f'beam_stage_{stage_i}'
     dataset_filename = f'dataset_{species}.pt'
     dataset_file = 'datasets/' + dataset_filename

diff --git a/Docs/source/usage/workflows/ml_materials/run_warpx_training.py b/Docs/source/usage/workflows/ml_materials/run_warpx_training.py
@@ -1,11 +1,4 @@
 #!/usr/bin/env python3
-#
-# Copyright 2022-2023 WarpX contributors
-# Authors: WarpX team
-# License: BSD-3-Clause-LBNL
-#
-# -*- coding: utf-8 -*-
-
 import math
 
 import numpy as np
@@ -21,7 +14,7 @@
 # Number of cells
 dim = '3'
 nx = ny = 128
-nz = 8832
+nz = 35328 #17664 #8832
 if dim == 'rz':
     nr = nx//2
 
@@ -33,9 +26,9 @@
 
 # Number of processes for static load balancing
 # Check with your submit script
-num_procs = [1, 1, 16*4]
+num_procs = [1, 1, 64*4]
 if dim == 'rz':
-    num_procs = [1, 16]
+    num_procs = [1, 64]
 
 # Number of time steps
 gamma_boost = 60.
@@ -73,7 +66,7 @@
 
 # plasma region
 plasma_rlim = 100.e-6
-N_stage = 9
+N_stage = 15
 L_plasma_bulk = 0.28
 L_ramp = 1.e-9
 L_ramp_up = L_ramp
@@ -141,12 +134,12 @@ def get_species_of_accelerator_stage(stage_idx, stage_zmin, stage_zmax,
 N_beam_particles = int(1e6)
 beam_centroid_z = -107.e-6
 beam_rms_z = 2.e-6
-#beam_gammas = [2000 + 13000 * i_stage for i_stage in range(N_stage)]
-beam_gammas = [1957, 15188, 28432, 41678, 54926, 68174, 81423,94672, 107922,121171] # From 3D run
+beam_gammas = [1960 + 13246 * i_stage for i_stage in range(N_stage)]
+#beam_gammas = [1957, 15188, 28432, 41678, 54926, 68174, 81423,94672, 107922,121171] # From 3D run
 beams = []
 for i_stage in range(N_stage):
     beam_gamma = beam_gammas[i_stage]
-    sigma_gamma = 0.10 * beam_gamma
+    sigma_gamma = 0.06 * beam_gamma
     gaussian_distribution = picmi.GaussianBunchDistribution(
         n_physical_particles= abs(beam_charge) / q_e,
         rms_bunch_size=[2.e-6, 2.e-6, beam_rms_z],