Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GEFS regression test suite from EP5r2 configuration/case #2442

Draft
wants to merge 84 commits into
base: develop
Choose a base branch
from

Conversation

NickSzapiro-NOAA
Copy link
Collaborator

@NickSzapiro-NOAA NickSzapiro-NOAA commented Sep 19, 2024

Commit Queue Requirements:

  • Fill out all sections of this template.
  • All sub component pull requests have been reviewed by their code managers.
  • Run the full Intel+GNU RT suite (compared to current baselines) on either Hera/Derecho/Hercules
  • Commit 'test_changes.list' from previous step

Description:

This PR updates the cpld_bmark_p8 tests to a prototype GEFS test case of fully coupled s2swa+IAU+stochastics physics, with configuration and warm starts from restarts of EP5r2 ensemble member 1 for 2021-03-25 06Z.

The EP5r2 test case was kindly provided by @bingfu-NOAA via @junwang-noaa with aerosol input data and configurations from @lipan-NOAA.

A separate INPUTDATA_ROOT_BMIC is no longer needed and is removed.

This suite should pass some basic reproducibility/quality checks, particularly:

  • control reproduces itself
  • restart reproduces control
  • changing number of tasks reproduces control
  • Intel debug version reproduces itself
  • GNU and GNU debug versions run
    • GNU debug on hera fails with likely openmpi error:
      140: The OSC pt2pt component does not support MPI_THREAD_MULTIPLE in this release.
      140: Workarounds are to run on a single node, or to use a system with an RDMA
      140: capable network such as Infiniband.
    • GNU and GNU debug on hercules fail with NetCDF HDF error
      Error in handle_err: get_var3_r4 get_vara_real delp_inc NetCDF: HDF error
  • Runs on supported platforms
    • Hera (intel)
    • Hercules (intel)
    • Gaea (intel)
    • Derecho (intel)
      With increased WAV_tasks, hangs 30 minutes into simulation
    • Larger issue of GNU support
  • No major diffs from GEFS workflow configuration
    GOCART .rc files and ExtData directory structure to be revised for consistency with global-workflow
    Configuration/case may be updated

These failures seem to require library/platform support and work will continue in follow up issues.

Input data is currently in user space on hera:
/scratch1/NCEPDEV/nems/Nick.Szapiro/tasks/input_data/gefs.v13/RT_GEFS/
Scripts need updating once filepaths are in shared space.

Commit Message:

* UFSWM - Add GEFS regression test suite from EP5r2 configuration/case

Priority:

  • Normal

Git Tracking

UFSWM:

Sub component Pull Requests:

  • None

UFSWM Blocking Dependencies:


Changes

Regression Test Changes (Please commit test_changes.list):

  • PR Adds New Tests/Baselines.

Input data Changes:

  • New input data.

Library Changes/Upgrades:

  • No Updates

Testing Log:

  • RDHPCS
    • Hera
    • Orion
    • Hercules
    • Jet
    • Gaea
    • Derecho
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
  • opnReqTest (complete task if unnecessary)

NickSzapiro-NOAA and others added 30 commits May 6, 2024 06:24
@NickSzapiro-NOAA
Copy link
Collaborator Author

@WenMeng-NOAA Can we reduce the number of times lines like these get logged?

 820:  GEFS env var            0           0           0
 820:  Processing for GEFS and default setting is tmpl4_1 and tmpl4_11
 820:  After g2sec1 call we need to set listsec1(2) =            2
 820:  After g2sec1 call we need to set listsec1(13) =            1

They may be for development and double the size of the log file

@WenMeng-NOAA
Copy link
Contributor

@WenMeng-NOAA Can we reduce the number of times lines like these get logged?

 820:  GEFS env var            0           0           0
 820:  Processing for GEFS and default setting is tmpl4_1 and tmpl4_11
 820:  After g2sec1 call we need to set listsec1(2) =            2
 820:  After g2sec1 call we need to set listsec1(13) =            1

They may be for development and double the size of the log file

@NickSzapiro-NOAA Could you open an issue at https://github.com/NOAA-EMC/UPP/issues, so we will work on that?

@NickSzapiro-NOAA
Copy link
Collaborator Author

@WenMeng-NOAA Can we reduce the number of times lines like these get logged?

 820:  GEFS env var            0           0           0
 820:  Processing for GEFS and default setting is tmpl4_1 and tmpl4_11
 820:  After g2sec1 call we need to set listsec1(2) =            2
 820:  After g2sec1 call we need to set listsec1(13) =            1

They may be for development and double the size of the log file

@NickSzapiro-NOAA Could you open an issue at https://github.com/NOAA-EMC/UPP/issues, so we will work on that?

NOAA-EMC/UPP#1074

@NickSzapiro-NOAA
Copy link
Collaborator Author

fyi, this error on derecho goes away when increasing WAV_tasks (80 to 120)

cxil_map: write error
...
libmpi_intel.so.1  000015539CBE3611  PMPI_File_write_a     Unknown  Unknown
libpnetcdf.so.4.0  0000155393BBA048  ncmpio_read_write     Unknown  Unknown
libpnetcdf.so.4.0  0000155393BB4DA9  Unknown               Unknown  Unknown
libpnetcdf.so.4.0  0000155393BB2319  Unknown               Unknown  Unknown
libpnetcdf.so.4.0  0000155393BAFA5B  Unknown               Unknown  Unknown
libpnetcdf.so.4.0  0000155393AFA1A2  ncmpi_wait_all        Unknown  Unknown
libpioc.so         0000155399FE9E05  flush_output_buff     Unknown  Unknown
libpioc.so         0000155399FE3746  PIOc_write_darray     Unknown  Unknown
libpioc.so         0000155399FEA085  flush_buffer          Unknown  Unknown
libpioc.so         0000155399FB3955  PIOc_sync             Unknown  Unknown
libpiof.so         0000155399900160  piolib_mod_mp_fre     Unknown  Unknown
fv3.exe            0000000005B4880F  ice_history_write        1237  ice_history_write.F90
fv3.exe            000000000589BC2A  ice_history_mp_ac        4134  ice_history.F90
fv3.exe            00000000057F47B7  ice_comp_nuopc_mp         888  ice_comp_nuopc.F90
...
libmpi_intel.so.1  000014E4701213A5  PMPI_Alltoallw        Unknown  Unknown
libpioc.so         000014E46F8CE5EF  pio_swapm             Unknown  Unknown
libpioc.so         000014E46F8D0EBA  rearrange_io2comp     Unknown  Unknown
libpioc.so         000014E46F8F20BA  PIOc_read_darray      Unknown  Unknown
fv3.exe            00000000018D148B  Unknown               Unknown  Unknown
fv3.exe            0000000001610043  Unknown               Unknown  Unknown
fv3.exe            000000000160F20D  Unknown               Unknown  Unknown
fv3.exe            0000000000CC4D85  Unknown               Unknown  Unknown
fv3.exe            0000000000CBF971  Unknown               Unknown  Unknown
fv3.exe            000000000059B5B3  Unknown               Unknown  Unknown
fv3.exe            0000000001C48B10  wav_comp_nuopc_mp         823  wav_comp_nuopc.F90

This wav line calls ESMF_MeshCreate

@NickSzapiro-NOAA
Copy link
Collaborator Author

Hi @jkbk2004 . If there are no input data changes coming up, can we stage input data for new test cases? Maybe @[INPUTDATA_ROOT]/GEFS/ is a good place.

It's on hera at /scratch1/NCEPDEV/nems/Nick.Szapiro/tasks/input_data/gefs.v13/RT_GEFS/ . The contents of WW3 subdirectory probably belong under INPUTDATA_ROOT_WW3 instead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update cpld_bmark_p8 with GEFSv13 EP5 configuration Add RT test for gocart_on, gccpp_on, nasa_on
7 participants