Benchmark with DaCe cpu and gpu backends (#50)

* Benchmark changes * Added benchmark configurations * Removed benchmark configs from configs to be tested in unit tests * Changed dace submodule to point to Florian fix * Dace submodule points to gcc_dies_on_dacecpu branch * Set 'rf_fast' to true in baroclinic_c384_cpu/gpu.yaml * fix mpi4py version (#51) * [Feature] Better translate test (#39) (#47) Translate test: small improvements - Parametrize perturbation upon failure - Refactor error folder to be `pwd` based - Fix GPU translate unable to dump error `nc` - Fix mixmatch precision on translate test - Update README.md Test fix: - Orchestrate YPPM for translate purposes Misc: - Fix bad logger formatting on DaCeProgress * [NASA] [Feature] Guarding against unimplemented configuration (#40) (#48) * [Feature] Guarding against unimplemented configuration (#40) Guarding against unimplemented namelists options: - a2b_ord4 - d_sw - fv_dynamics - fv_subgridz - neg_adj3 - divergence damping - xppm - yppm Misc: - Fix `netcdf_monitor` not mkdir the directory - Add `as_dict` to the dycore state to dump the dycore more easily * Unused assert * Update fv3core/pace/fv3core/stencils/yppm.py Co-authored-by: Oliver Elbert <[email protected]> * Update fv3core/pace/fv3core/stencils/xppm.py Co-authored-by: Oliver Elbert <[email protected]> * Change NotImplemented to ValueError for n_sponge<3 * lint --------- Co-authored-by: Oliver Elbert <[email protected]> * Re-try of updating dace submodule to track Florian fix branch * Reverted gt4py submodule to match main checkout * Added benchmark README file * Read in ak, bk coefficients (#36) * initial changes to read in ak bk * read ak/bk * add xfail * remove input dir * further changes to unit tests * finish up test * add history * commit uncommited files * fix test comment * add input to top * read in data * read in netcdf file in eta mod * remove txt file * test * modify test and fix generate.py * remove emacs backup file * driver tests pass * fix helper.py * fix fv3core tests * fix physics test * fix grid tests * nullcommconfig * cleanup input * remove driver input * remove top level input * fix circular import problems * modify eta_file readin for test_restart_serial * comment out 91 test * rm safety checks * revert diagnostics.py * restore driver.py * revert initialization.py * restore state.py * restore analytic_init.py * restore init_utils.py and analytic_init.py * restore c_sw.py * d2a2c_vect.py * restore fv3core/stensils * restore translate_fvdynamics * restore physics/stencils * restore stencils * remove circular dependency * use pytest parametrize * cleanup generation.py * fstrinngs * add eta_file to MetricTerm init * remove eta_file argument in new_from_metric_terms and geos_wrapper * use pytest parametrize for the xfail tests * use pytest parametrize for the xfail tests * fix geos_wrapper and grid * fix tests * fstring * add test comments * fix util/HISTORY.md * fix comments * remove __init__.py from tests/main/grid * add jupyter notebooks to generate eta files * generate ak,bk,ptop on metricterm init * fix tests * exploit np.all in eta mod * remove tests/main/grid/input * update ci * test * remove input * edit ci yaml * remove push --------- Co-authored-by: mlee03 <[email protected]> * Move Active Physics Schemes to Config (#44) * initial commit, need to adapt and run tests * revising scheme name * tests pass * update history * linting * changing typehints for physics schemes to enum instead of str * driver now works with physics config enum, tests pass * fixed tests * missed one * D-grid data input method (#42) * Testing changes reflected across branches * Undoing changes made in build_gaea_c5.sh * Testing vscode functionality, by adding a change to external_grid branch * Testing vscode functionality, by adding a change to external_grid branch * Addition of from_generated method and calc_flag to util/pace/util/grid/generation.py * Added get_grid method for external grid data to driver/pace/driver/grid.py * Preliminary xarray netcdf read in method added to driver/pace/driver/grid.py * Updating util/pace/util/grid/generation.py from_generated method * Addition of external grid data read in methods for initialization of grid. Current method uses xarray to interact with netcdf tile files. Values for longitutde, latitude, number of points in x an y, grid edge distances read in. * driver/examples/configs/test_external_C12_1x1.yaml * Preliminary unit test for external grid data read in * Current state of unit tests as of 27 Nov 2023 * External grid method and unit tests added * Re-excluding external grid data yamls from test_example_configs.py * Update driver/pace/driver/grid.py Co-authored-by: Florian Deconinck <[email protected]> * Changed name of grid initializer function to match NetCDF dependency and class descriptor * Update util/pace/util/grid/generation.py Moved position of doc string for "from_external" MetricTerms class method Co-authored-by: Oliver Elbert <[email protected]> * Fixed indentation error in generation.py from suggestion in PR 42 * Removal of TODO comment in grid.py, changes to method of file accessing in test_analytic_init, test_external_grid_* * Changed grid data read-in unit tests to compare data directly from file to driver grid data generated from yaml * Change to reading in lon and lat, other metric terms calculated as needed * Removed read-in of dx, dy, and area. Changed unit tests to compare calculated area to 'ideal' surface area as given by selected constants type. * Update tests/mpi_54rank/test_external_grid_1x1.py Incorrect name of test in test_external_grid_1x1.py changed to match file name Co-authored-by: Oliver Elbert <[email protected]> * Added comparisons for read-in vs generated by driver lon, lat, dx, dy, and area data to unit tests * Added relative error calculations to unit tests for external grid data read-in * External grid data read in tests changed: relative errors printed by each rank and get_tile_number replacing get_tile_index * Removing commented out sections in test_external_grid_2x2.py Co-authored-by: Oliver Elbert <[email protected]> * Updated external grid data read-in to take configuration and input data locations from command line, updated test description, and added documentation on grid construction to external grid data configuration selection dataclass. * Updated documentation in grid.py * Updated external grid data read in unit test to use parametrize functionality of pytest * Ammended files to reference changes to PR 36 --------- Co-authored-by: Frank Malatino <[email protected]> Co-authored-by: Florian Deconinck <[email protected]> Co-authored-by: Oliver Elbert <[email protected]> --------- Co-authored-by: Oliver Elbert <[email protected]> Co-authored-by: Florian Deconinck <[email protected]> Co-authored-by: Oliver Elbert <[email protected]> Co-authored-by: MiKyung Lee <[email protected]> Co-authored-by: mlee03 <[email protected]> Co-authored-by: Frank Malatino <[email protected]>
NOAA-GFDL · Jan 30, 2024 · cd1bd06 · cd1bd06
1 parent 77ff59c
commit cd1bd06
Show file tree

Hide file tree

Showing 68 changed files with 1,512 additions and 968 deletions.
diff --git a/.github/workflows/main_unit_tests.yml b/.github/workflows/main_unit_tests.yml
@@ -22,6 +22,10 @@ jobs:
           run: |
             python -m pip install --upgrade pip
             pip install -r requirements_dev.txt
+        - name: Clone datafiles
+          run: |
+            mkdir -p tests/main/input && cd tests/main/input
+            git clone -b store_files https://github.com/mlee03/pace.git tmp && mv tmp/*.nc . && rm -rf tmp
         - name: Run all main tests
           run: |
             pytest -x tests/main
diff --git a/.gitmodules b/.gitmodules
@@ -1,6 +1,8 @@
 [submodule "external/gt4py"]
 	path = external/gt4py
 	url = https://github.com/gridtools/gt4py.git
-[submodule "external/dace"]
+
+[submodule "dacefix"]
 	path = external/dace
-	url = https://github.com/spcl/dace.git
+	url = https://github.com/FlorianDeconinck/dace.git
+	branch = fix/gcc_dies_on_dacecpu
diff --git a/CONTRIBUTORS.md b/CONTRIBUTORS.md
@@ -11,7 +11,9 @@ List format (alphabetical order):  Surname, Name. Employer/Affiliation
 * Fuhrer, Oliver. Allen Institute for AI.
 * George, Rhea. Allen Institute for AI.
 * Harris, Lucas. GFDL.
+* Lee, Mi Kyung. GFDL.
 * Kung, Chris. NASA.
+* Malatino, Frank. GFDL
 * McGibbon, Jeremy. Allen Institute for AI.
 * Niedermayr, Yannick. ETH.
 * Savarin, Ajda. University of Washington.

diff --git a/benchmark_README.md b/benchmark_README.md
@@ -0,0 +1,96 @@
+# Benchmarking README
+
+The tests contained in this archive are for benchmarking purposes only.  Any
+distribution beyond those personnel performing the tests need explicit approval
+from NOAA/GFDL (Seth Underwood or Rusty Benson).
+
+## Cloning benchmark repository and generating conda environment
+
+Pace requires GCC > 9.2, MPI, and Python 3.8 on your system, and CUDA is required to run with a GPU backend. You will also need the headers of the boost libraries in your `$PATH` (boost itself does not need to be installed).
+
+```shell
+cd BOOST/ROOT
+wget https://boostorg.jfrog.io/artifactory/main/release/1.79.0/source/boost_1_79_0.tar.gz
+tar -xzf boost_1_79_0.tar.gz
+mkdir -p boost_1_79_0/include
+mv boost_1_79_0/boost boost_1_79_0/include/
+export BOOST_ROOT=BOOST/ROOT/boost_1_79_0
+```
+
+To clone the benchmark branch use the command:
+
+```shell
+git clone --recursive -b benchmark [email protected]:NOAA-GFDL/pace.git
+```
+
+or if you have already cloned the repository:
+
+```shell
+git submodule update --init --recursive
+```
+
+After cloning, change into the directory containing the clone. To generate the conda environment use the following commands:
+
+```shell
+conda create -y --name <desired_name> python=3.8
+conda activate <desired_name>
+pip3 install --upgrade pip setuptools wheel
+pip3 install -r requirements_dev.txt -c constraints.txt
+```
+
+## Benchmarking configurations
+
+There are four configurations of the PACE application contained within the branch to be used for benchmarking:
+
+```shell
+driver/examples/configs/baroclinic_c384_cpu.yaml
+driver/examples/configs/baroclinic_c384_gpu.yaml
+driver/examples/configs/baroclinic_c3072_cpu.yaml
+driver/examples/configs/baroclinic_c3072_gpu.yaml
+```
+
+## Building
+
+To build with the DaCe backends, set the following environment variables:
+
+```shell
+FV3_DACEMODE=Build
+PACE_FLOAT_PRECISION=64
+PACE_LOGLEVEL=INFO
+PYTHONOPTIMIZE=1
+OMP_NUM_THREAD=1
+```
+
+Adjust the time of the configuration to be built such that the time of the build is for one timestep. For example:
+
+```shell
+dt_atmos: 450
+seconds: 450
+```
+## Running
+To build with the DaCe backends, set the following environment variables:
+
+```shell
+FV3_DACEMODE=Run
+PACE_FLOAT_PRECISION=64
+PACE_LOGLEVEL=INFO
+PYTHONOPTIMIZE=1
+OMP_NUM_THREAD=1
+```
+
+Adjust the time of the configuration to be run to the desired length, example:
+
+```shell
+dt_atmos: 450
+days: 9
+```
+
+The time for the build or run can be set with units of seconds, minutes, hours, or days.
+
+An example command to start the build or run process with MPI using the DaCe CPU backend for the c384 configuration:
+
+```shell
+mpirun -n 1536 python3 -m pace.driver.run driver/examples/configs/baroclinic_c384_cpu.yaml
+```
+
+The build or run requires 1536 ranks, given that layout of 16x16 ranks per tile, and there are 6 tiles.
diff --git a/constraints.txt b/constraints.txt
@@ -81,7 +81,7 @@ coverage==5.5
     #   pytest-cov
 cytoolz==0.12.1
     # via gt4py
-dace==0.14.4
+dace==0.15.1
     # via
     #   -r requirements_dev.txt
     #   pace-dsl
@@ -184,7 +184,7 @@ googleapis-common-protos==1.53.0
     # via google-api-core
 gprof2dot==2021.2.21
     # via pytest-profiling
-gridtools-cpp==2.3.0
+gridtools-cpp==2.3.1
     # via gt4py
 h5netcdf==0.11.0
     # via -r util/requirements.txt

diff --git a/docs/physics/state.rst b/docs/physics/state.rst
@@ -38,6 +38,6 @@ You can initialize a zero-filled PhysicsState and MicrophysicsState from other P
 
     >>> quantity_factory = QuantityFactory.from_backend(sizer=sizer, backend="numpy")
     >>> physics_state = PhysicsState.init_zeros(
-    ...  quantity_factory=quantity_factory, active_packages=["microphysics"]
+    ...  quantity_factory=quantity_factory, schemes=["GFS_microphysics"]
     ... )
     >>> microphysics_state = physics_state.microphysics
diff --git a/driver/examples/configs/baroclinic_c12.yaml b/driver/examples/configs/baroclinic_c12.yaml
@@ -94,3 +94,8 @@ physics_config:
   hydrostatic: false
   nwat: 6
   do_qa: true
+
+grid_config:
+  type:  generated
+  config:
+    eta_file: 'tests/main/input/eta79.nc'
diff --git a/driver/examples/configs/baroclinic_c12_explicit_physics.yaml b/driver/examples/configs/baroclinic_c12_explicit_physics.yaml
@@ -0,0 +1,98 @@
+stencil_config:
+  compilation_config:
+    backend: numpy
+    rebuild: false
+    validate_args: true
+    format_source: false
+    device_sync: false
+initialization:
+  type: analytic
+  config:
+    case: baroclinic
+performance_config:
+  collect_performance: true
+  experiment_name: c12_baroclinic
+nx_tile: 12
+nz: 79
+dt_atmos: 225
+minutes: 15
+layout:
+  - 1
+  - 1
+diagnostics_config:
+  path: output
+  output_format: netcdf
+  names:
+    - u
+    - v
+    - ua
+    - va
+    - pt
+    - delp
+    - qvapor
+    - qliquid
+    - qice
+    - qrain
+    - qsnow
+    - qgraupel
+  z_select:
+    - level: 65
+      names:
+        - pt
+dycore_config:
+  a_imp: 1.0
+  beta: 0.
+  consv_te: 0.
+  d2_bg: 0.
+  d2_bg_k1: 0.2
+  d2_bg_k2: 0.1
+  d4_bg: 0.15
+  d_con: 1.0
+  d_ext: 0.0
+  dddmp: 0.5
+  delt_max: 0.002
+  do_sat_adj: true
+  do_vort_damp: true
+  fill: true
+  hord_dp: 6
+  hord_mt: 6
+  hord_tm: 6
+  hord_tr: 8
+  hord_vt: 6
+  hydrostatic: false
+  k_split: 1
+  ke_bg: 0.
+  kord_mt: 9
+  kord_tm: -9
+  kord_tr: 9
+  kord_wz: 9
+  n_split: 1
+  nord: 3
+  nwat: 6
+  p_fac: 0.05
+  rf_cutoff: 3000.
+  rf_fast: true
+  tau: 10.
+  vtdm4: 0.06
+  z_tracer: true
+  do_qa: true
+  tau_i2s: 1000.
+  tau_g2v: 1200.
+  ql_gen: 0.001
+  ql_mlt: 0.002
+  qs_mlt: 0.000001
+  qi_lim: 1.0
+  dw_ocean: 0.1
+  dw_land: 0.15
+  icloud_f: 0
+  tau_l2v: 300.
+  tau_v2l: 90.
+  fv_sg_adj: 0
+  n_sponge: 48
+
+physics_config:
+  hydrostatic: false
+  nwat: 6
+  do_qa: true
+  schemes:
+    - GFS_microphysics
diff --git a/driver/examples/configs/baroclinic_c12_orch_cpu.yaml b/driver/examples/configs/baroclinic_c12_orch_cpu.yaml
@@ -14,10 +14,30 @@ performance_config:
 nx_tile: 12
 nz: 79
 dt_atmos: 225
-minutes: 5
+seconds: 675
 layout:
-  - 1
-  - 1
+  - 2
+  - 2
+diagnostics_config:
+  path: output
+  output_format: netcdf
+  names:
+    - u
+    - v
+    - ua
+    - va
+    - pt
+    - delp
+    - qvapor
+    - qliquid
+    - qice
+    - qrain
+    - qsnow
+    - qgraupel
+  z_select:
+    - level: 65
+      names:
+        - pt
 dycore_config:
   a_imp: 1.0
   beta: 0.

diff --git a/driver/examples/configs/baroclinic_c12_write_restart.yaml b/driver/examples/configs/baroclinic_c12_write_restart.yaml
@@ -92,3 +92,8 @@ physics_config:
   hydrostatic: false
   nwat: 6
   do_qa: true
+
+grid_config:
+  type:  generated
+  config:
+    eta_file: "tests/main/input/eta79.nc"
diff --git a/driver/examples/configs/baroclinic_c384_cpu.yaml b/driver/examples/configs/baroclinic_c384_cpu.yaml
@@ -15,11 +15,10 @@ performance_config:
 nx_tile: 384
 nz: 79
 dt_atmos: 450
-minutes: 7
-seconds: 30
+days: 9
 layout:
-  - 1
-  - 1
+  - 16
+  - 16
 diagnostics_config:
   path: output
   output_format: netcdf
@@ -72,7 +71,7 @@ dycore_config:
   nwat: 6
   p_fac: 0.1
   rf_cutoff: 800.
-  rf_fast: false
+  rf_fast: true
   tau: 5.
   vtdm4: 0.06
   z_tracer: true

diff --git a/driver/examples/configs/baroclinic_c384_gpu.yaml b/driver/examples/configs/baroclinic_c384_gpu.yaml
@@ -71,7 +71,7 @@ dycore_config:
   nwat: 6
   p_fac: 0.1
   rf_cutoff: 800.
-  rf_fast: false
+  rf_fast: true
   tau: 5.
   vtdm4: 0.06
   z_tracer: true