Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZStd with Netcdf #2444

Draft
wants to merge 12 commits into
base: develop
Choose a base branch
from

Conversation

BrianCurtis-NOAA
Copy link
Collaborator

Commit Queue Requirements:

  • Fill out all sections of this template.
  • All sub component pull requests have been reviewed by their code managers.
  • Run the full Intel+GNU RT suite (compared to current baselines) on either Hera/Derecho/Hercules
  • Commit 'test_changes.list' from previous step

Description:

This PR brings in the zstd library and with it the netcdf support for it.

Commit Message:

* UFSWM - Add ZStd library and enable netcdf support for it.

Priority:

  • Normal

Git Tracking

UFSWM:

Sub component Pull Requests:

  • None

UFSWM Blocking Dependencies:

  • None

Changes

Regression Test Changes (Please commit test_changes.list):

  • PR Adds New Tests/Baselines.
  • PR Updates/Changes Baselines.
  • No Baseline Changes.

Input data Changes:

  • None.

Library Changes/Upgrades:

  • No Updates

Testing Log:

  • RDHPCS
    • Hera
    • Orion
    • Hercules
    • Jet
    • Gaea
    • Derecho
  • WCOSS2
    • Dogwood/Cactus
    • Acorn
  • CI
  • opnReqTest (complete task if unnecessary)

@BrianCurtis-NOAA
Copy link
Collaborator Author

I've generated new baselines with the test_changes.list and they generated OK and passed using those in comparison (-c and -m).

@junwang-noaa I'm trying to recall, were baselines changes acceptable, or do I need to make tests/tests file changes to ensure no baselines are changing in the end?

@junwang-noaa
Copy link
Collaborator

I don't expect those tests to change results. Would you please check which files are changed in cpld_control_p8 intel? It seems these tests have gocart, but the zstd Netcdf should not impact those tests.

@BrianCurtis-NOAA
Copy link
Collaborator Author

@junwang-noaa

baseline dir = /lfs/h2/emc/nems/noscrub/emc.nems/RT/NEMSfv3gfs/develop-20240909/cpld_control_p8_intel
working dir  = /lfs/h2/emc/ptmp/brian.curtis/FV3_RT/rt_255026/cpld_control_p8_intel
Checking test cpld_control_p8_intel results ....
 Comparing sfcf021.tile1.nc .....USING NCCMP......OK
 Comparing sfcf021.tile2.nc .....USING NCCMP......OK
 Comparing sfcf021.tile3.nc .....USING NCCMP......OK
 Comparing sfcf021.tile4.nc .....USING NCCMP......OK
 Comparing sfcf021.tile5.nc .....USING NCCMP......OK
 Comparing sfcf021.tile6.nc .....USING NCCMP......OK
 Comparing atmf021.tile1.nc .....USING NCCMP......NOT IDENTICAL
 Comparing atmf021.tile2.nc .....USING NCCMP......NOT IDENTICAL
 Comparing atmf021.tile3.nc .....USING NCCMP......NOT IDENTICAL
 Comparing atmf021.tile4.nc .....USING NCCMP......NOT IDENTICAL
 Comparing atmf021.tile5.nc .....USING NCCMP......NOT IDENTICAL
 Comparing atmf021.tile6.nc .....USING NCCMP......NOT IDENTICAL
 Comparing sfcf024.tile1.nc .....USING NCCMP......OK
 Comparing sfcf024.tile2.nc .....USING NCCMP......OK
 Comparing sfcf024.tile3.nc .....USING NCCMP......OK
 Comparing sfcf024.tile4.nc .....USING NCCMP......OK
 Comparing sfcf024.tile5.nc .....USING NCCMP......OK
 Comparing sfcf024.tile6.nc .....USING NCCMP......OK
 Comparing atmf024.tile1.nc .....USING NCCMP......NOT IDENTICAL
 Comparing atmf024.tile2.nc .....USING NCCMP......NOT IDENTICAL
 Comparing atmf024.tile3.nc .....USING NCCMP......NOT IDENTICAL
 Comparing atmf024.tile4.nc .....USING NCCMP......NOT IDENTICAL
 Comparing atmf024.tile5.nc .....USING NCCMP......NOT IDENTICAL
 Comparing atmf024.tile6.nc .....USING NCCMP......NOT IDENTICAL
 Comparing gocart.inst_aod.20210323_0600z.nc4 .....USING NCCMP......NOT IDENTICAL
 Comparing RESTART/20210323.060000.coupler.res .....USING CMP......OK
 Comparing RESTART/20210323.060000.fv_core.res.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_core.res.tile1.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_core.res.tile2.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_core.res.tile3.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_core.res.tile4.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_core.res.tile5.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_core.res.tile6.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_srf_wnd.res.tile1.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_srf_wnd.res.tile2.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_srf_wnd.res.tile3.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_srf_wnd.res.tile4.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_srf_wnd.res.tile5.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_srf_wnd.res.tile6.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_tracer.res.tile1.nc .....USING NCCMP......NOT IDENTICAL
 Comparing RESTART/20210323.060000.fv_tracer.res.tile2.nc .....USING NCCMP......NOT IDENTICAL
 Comparing RESTART/20210323.060000.fv_tracer.res.tile3.nc .....USING NCCMP......NOT IDENTICAL
 Comparing RESTART/20210323.060000.fv_tracer.res.tile4.nc .....USING NCCMP......NOT IDENTICAL
 Comparing RESTART/20210323.060000.fv_tracer.res.tile5.nc .....USING NCCMP......NOT IDENTICAL
 Comparing RESTART/20210323.060000.fv_tracer.res.tile6.nc .....USING NCCMP......NOT IDENTICAL
 Comparing RESTART/20210323.060000.phy_data.tile1.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.phy_data.tile2.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.phy_data.tile3.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.phy_data.tile4.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.phy_data.tile5.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.phy_data.tile6.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.sfc_data.tile1.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.sfc_data.tile2.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.sfc_data.tile3.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.sfc_data.tile4.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.sfc_data.tile5.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.sfc_data.tile6.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.MOM.res.nc .....USING NCCMP......OK
 Comparing RESTART/iced.2021-03-23-21600.nc .....USING NCCMP......OK
 Comparing RESTART/ufs.cpld.cpl.r.2021-03-23-21600.nc .....USING NCCMP......OK
 Comparing 20210323.060000.out_pnt.ww3 .....USING CMP......OK
 Comparing 20210323.060000.out_grd.ww3 .....USING CMP......OK

@junwang-noaa
Copy link
Collaborator

junwang-noaa commented Sep 24, 2024

Thanks, Brain. @lipan-NOAA @bbakernoaa @weiyuan-jiang @tclune may I ask if you have any idea why the GOCART results are changed when the netcdf library is built with zstandard?

@mathomp4
Copy link

mathomp4 commented Sep 25, 2024

@junwang-noaa I honestly have no idea why as zstandard should be lossless.

I'm currently doing a test build of GEOS with preliminary zstandard support. Once I get that working, I'll see what I can see with a run of GEOS+GOCART. (I'll go with level 5 as I think that is what you are using...)

NOTE: There is no zstandard support in MAPL yet, so whatever you are producing is either compressed offline or not compressed at all with zstandard, I suppose.

@mathomp4
Copy link

Update. I was able to get zstandard support into MAPL. And I do not see any difference in the output.

I had history output a 3d GOCART collection as uncompressed (tavg3d_aer_p_uncompress), with deflate compression (tavg3d_aer_p) and with zstandard compression (tavg3d_aer_p_zstd). And as can be seen the compressed ones are smaller:

❯ lt addzstd-2024Sep25-1day-c24.tavg3d_aer_p*
Permissions Size User     Group Date Modified    Name
.rw-r--r--@  13M mathomp4 staff 2024-09-25 10:55  addzstd-2024Sep25-1day-c24.tavg3d_aer_p_uncompress.20000414_2230z.nc4
.rw-r--r--@ 9.3M mathomp4 staff 2024-09-25 10:55  addzstd-2024Sep25-1day-c24.tavg3d_aer_p.20000414_2230z.nc4
.rw-r--r--@ 9.4M mathomp4 staff 2024-09-25 10:55  addzstd-2024Sep25-1day-c24.tavg3d_aer_p_zstd.20000414_2230z.nc4

Now, comparing with nccmp we see no data differences:

❯ nccmp -dmfsB addzstd-2024Sep25-1day-c24.tavg3d_aer_p_uncompress.20000414_2230z.nc4 addzstd-2024Sep25-1day-c24.tavg3d_aer_p.20000414_2230z.nc4
Files "addzstd-2024Sep25-1day-c24.tavg3d_aer_p_uncompress.20000414_2230z.nc4" and "addzstd-2024Sep25-1day-c24.tavg3d_aer_p.20000414_2230z.nc4" are identical.

❯ nccmp -dmfsB addzstd-2024Sep25-1day-c24.tavg3d_aer_p_uncompress.20000414_2230z.nc4 addzstd-2024Sep25-1day-c24.tavg3d_aer_p_zstd.20000414_2230z.nc4
Files "addzstd-2024Sep25-1day-c24.tavg3d_aer_p_uncompress.20000414_2230z.nc4" and "addzstd-2024Sep25-1day-c24.tavg3d_aer_p_zstd.20000414_2230z.nc4" are identical.

❯ nccmp -dmfsB addzstd-2024Sep25-1day-c24.tavg3d_aer_p.20000414_2230z.nc4 addzstd-2024Sep25-1day-c24.tavg3d_aer_p_zstd.20000414_2230z.nc4
Files "addzstd-2024Sep25-1day-c24.tavg3d_aer_p.20000414_2230z.nc4" and "addzstd-2024Sep25-1day-c24.tavg3d_aer_p_zstd.20000414_2230z.nc4" are identical.

Now, technically if you turn on comparison of global metadata there is one difference:

❯ nccmp -dmgfsB addzstd-2024Sep25-1day-c24.tavg3d_aer_p_uncompress.20000414_2230z.nc4 addzstd-2024Sep25-1day-c24.tavg3d_aer_p_zstd.20000414_2230z.nc4
DIFFER : LENGTHS OF GLOBAL ATTRIBUTE : Filename : 23 <> 17 : VALUES : tavg3d_aer_p_uncompress <> tavg3d_aer_p_zstd

but that's just the Filename metadata.

@junwang-noaa
Copy link
Collaborator

@mathomp4 Thanks for the testing. In our test case, we just write out the fields without doing any compression:

ideflate= 0
quantize_mode=quantize_bitround quantize_nsd= 0
zstandard_level= 0

Also I do see the aerosol fields, e.g. dms, are different:

8122c8122
<     8.579448e-32, 1.445623e-31,
---
>     8.579447e-32, 1.445623e-31,
8127c8127
<     5.531654e-32, 0, 0, 0, 0, 0, 6.213818e-32, 1.554074e-31, 1.65499e-31,
---
>     5.531652e-32, 0, 0, 0, 0, 0, 6.213818e-32, 1.554074e-31, 1.65499e-31,
...

Any clue?

@mathomp4
Copy link

I don't know your system but does this:

quantize_mode=quantize_bitround quantize_nsd= 0

mean you are not quantizing? In MAPL, I don't think we allow for a mode to be set without an nsd also set.

But beyond that, I can't see how MAPL would care about zstandard in your case since I only wrote in zstandard support today!

Are you reading any zstandard compressed files? Even then, ExtData shouldn't care since at that point we depend on netCDF to read in files correctly.

And I looked and MAPL hasn't done anything in recent times that should change answers. The last non-zero-diff change was a bug fix bit-shaved binary output which was a weird odd case someone reported (we don't do binary output much).

@mathomp4
Copy link

Well, one note is that you are running with Intel 19 it looks like. We haven't used that in years (5, 6 years?) so it's possible MAPL is interacting badly with it? We would never test it. (Heck our latest machines don't have anything newer than Intel 2022...and even that was a "we'll install for you this time")

I know @AlexanderRichert-NOAA also was having issues with Intel 19 and MAPL, but that was at unit test time and it was a test about reading in CS data. Are you ingesting cubed-sphere input data?

@junwang-noaa
Copy link
Collaborator

junwang-noaa commented Sep 27, 2024

Yes, we do ingest cubed sphere grid. Since there is only last digit change in GOCART fields and we won't be able to move to higher version Intel compiler with this PR, and we have a list of tests with results change, I think we can move on with this PR.

@BrianCurtis-NOAA
Copy link
Collaborator Author

@jkbk2004 all RDHPCS spack installs have netcdf with zstd, correct?
@AlexanderRichert-NOAA does the Acorn spack install have netcdf with zstd?

@jkbk2004
Copy link
Collaborator

@RatkoVasic-NOAA @ulmononian Please, make sure the netcdf with zstd is available in the current version of the spack stack on RDHPCS machines.

@BrianCurtis-NOAA
Copy link
Collaborator Author

BrianCurtis-NOAA commented Sep 27, 2024

I'm getting this error:

+ mpiexec -n 256 -ppn 128 -depth 1 ./fv3.exe
 file: module_write_netcdf.F90 line:          424
 NetCDF: Filter error: undefined filter encountered

on these tests:

control_wrtGauss_netcdf_parallel_intel
control_p8_intel
control_p8.v2.sfc_intel
regional_netcdf_parallel_intel
rrfs_v1beta_intel
control_wam_intel
control_wrtGauss_netcdf_parallel_debug_intel
control_debug_p8_intel
control_wam_debug_intel

line 424 of module_write_netcdf is:
https://github.com/NOAA-EMC/fv3atm/blob/a9364591091c836984a40107729720705847c195/io/module_write_netcdf.F90#L424

@jkbk2004
Copy link
Collaborator

I'm getting this error:

+ mpiexec -n 256 -ppn 128 -depth 1 ./fv3.exe
 file: module_write_netcdf.F90 line:          424
 NetCDF: Filter error: undefined filter encountered

on these tests:

control_wrtGauss_netcdf_parallel_intel
control_p8_intel
control_p8.v2.sfc_intel
regional_netcdf_parallel_intel
rrfs_v1beta_intel
control_wam_intel
control_wrtGauss_netcdf_parallel_debug_intel
control_debug_p8_intel
control_wam_debug_intel

line 424 of module_write_netcdf is: https://github.com/NOAA-EMC/fv3atm/blob/a9364591091c836984a40107729720705847c195/io/module_write_netcdf.F90#L424

@BrianCurtis-NOAA which machine?

@BrianCurtis-NOAA
Copy link
Collaborator Author

WCOSS2

@junwang-noaa
Copy link
Collaborator

@BrianCurtis-NOAA where is your run directory?

@BrianCurtis-NOAA
Copy link
Collaborator Author

@BrianCurtis-NOAA where is your run directory?

/lfs/h2/emc/ptmp/brian.curtis/FV3_RT/rt_259224

@junwang-noaa
Copy link
Collaborator

Thanks, Brain. Have you run these tests before? I saw your comments: "I've generated new baselines with the test_changes.list and they generated OK and passed using those in comparison (-c and -m)"

@DusanJovic-NOAA would you please take a look? Is it OK to set quantize_nsd 0? I see in control_wrtGauss_netcdf_parallel_intel, we have:

quilting:                .true.
quilting_restart:        .true.
write_groups:            1
write_tasks_per_group:   6
itasks:                  1
output_history:          .true.
history_file_on_native_grid: .false.
write_dopost:            .true.
write_nsflip:            .true.
num_files:               2
filename_base:           'atm' 'sfc'
output_grid:             gaussian_grid
output_file:             'netcdf'
zstandard_level:         5
ideflate:                0
quantize_mode:           'quantize_bitround'
quantize_nsd:            0

@RatkoVasic-NOAA
Copy link
Collaborator

@RatkoVasic-NOAA @ulmononian Please, make sure the netcdf with zstd is available in the current version of the spack stack on RDHPCS machines.

I checked on 5 machines, it looks OK:

Hera:
[role.epic]# grep zstd /scratch1/NCEPDEV/nems/role.epic/spack-stack/spack-stack-1.6.0/envs/fms-2024.01/install/modulefiles/intel-oneapi-mpi/2021.5.1/intel/2021.5.0/netcdf-c/4.9.2.lua
-- [email protected]%[email protected]+blosc~byterange+dap~fsync~hdf4~jna+mpi~nczarr_zip+optimize~parallel-netcdf+pic+shared+szip+zstd build_system=autotools patches=0161eb8 arch=linux-rocky8-haswell/ejp7j3k
depends_on("zstd/1.5.2")

Jet
[role.epic]$ grep zstd /contrib/spack-stack/spack-stack-1.6.0/envs/fms-2024.01/install/modulefiles/intel-oneapi-mpi/2021.5.1/intel/2021.5.0/netcdf-c/4.9.2.lua
-- [email protected]%[email protected]+blosc~byterange+dap~fsync~hdf4~jna+mpi~nczarr_zip+optimize~parallel-netcdf+pic+shared+szip+zstd build_system=autotools patches=0161eb8 arch=linux-rocky8-core2/wxvro24
depends_on("zstd/1.5.2")

Gaea:
[role.epic]# grep zstd /ncrc/proj/epic/spack-stack/spack-stack-1.6.0/envs/fms-2024.01/install/modulefiles/cray-mpich/8.1.25/intel/2023.1.0/netcdf-c/4.9.2.lua
-- [email protected]%[email protected]+blosc~byterange+dap~fsync~hdf4~jna+mpi~nczarr_zip+optimize~parallel-netcdf+pic+shared+szip+zstd build_system=autotools patches=0161eb8 arch=linux-sles15-zen2/zo6ia6l
depends_on("zstd/1.5.2")

Hercules:
[role-epic]# grep zstd /work/noaa/epic/role-epic/spack-stack/hercules/spack-stack-1.6.0/envs/fms-2024.01/install/modulefiles/intel-oneapi-mpi/2021.9.0/intel/2021.9.0/netcdf-c/4.9.2.lua
-- [email protected]%[email protected]+blosc~byterange+dap~fsync~hdf4~jna+mpi~nczarr_zip+optimize~parallel-netcdf+pic+shared+szip+zstd build_system=autotools patches=0161eb8 arch=linux-rocky9-icelake/tslbcfy
depends_on("zstd/1.5.2")

Orion:
[role-epic]$ grep zstd /work/noaa/epic/role-epic/spack-stack/orion/spack-stack-1.6.0/envs/fms-2024.01/install/modulefiles/intel-oneapi-mpi/2021.9.0/intel/2021.9.0/netcdf-c/4.9.2.lua
-- [email protected]%[email protected]+blosc~byterange+dap~fsync~hdf4~jna+mpi~nczarr_zip+optimize~parallel-netcdf+pic+shared+szip+zstd build_system=autotools patches=0161eb8 arch=linux-rocky9-skylake_avx512/oup2wyi
depends_on("zstd/1.5.2")

@BrianCurtis-NOAA
Copy link
Collaborator Author

@junwang-noaa The first run was for seeing what was goign to change before adjusting IDEFLATE to ZSTANDARD_LEVEL, These next failed tests are issues arising after switching IDEFLATE to ZSTANDARD_LEVEL.

@junwang-noaa
Copy link
Collaborator

I see,thanks.

@AlexanderRichert-NOAA
Copy link
Collaborator

@BrianCurtis-NOAA yes, spack-stack installs netcdf with zstd support (including on acorn).

@DusanJovic-NOAA
Copy link
Collaborator

Thanks, Brain. Have you run these tests before? I saw your comments: "I've generated new baselines with the test_changes.list and they generated OK and passed using those in comparison (-c and -m)"

@DusanJovic-NOAA would you please take a look? Is it OK to set quantize_nsd 0? I see in control_wrtGauss_netcdf_parallel_intel, we have:

quilting:                .true.
quilting_restart:        .true.
write_groups:            1
write_tasks_per_group:   6
itasks:                  1
output_history:          .true.
history_file_on_native_grid: .false.
write_dopost:            .true.
write_nsflip:            .true.
num_files:               2
filename_base:           'atm' 'sfc'
output_grid:             gaussian_grid
output_file:             'netcdf'
zstandard_level:         5
ideflate:                0
quantize_mode:           'quantize_bitround'
quantize_nsd:            0

Setting quantize_nsd to zero turns off quantization.

@DusanJovic-NOAA
Copy link
Collaborator

DusanJovic-NOAA commented Sep 27, 2024

I'm getting this error:

+ mpiexec -n 256 -ppn 128 -depth 1 ./fv3.exe
 file: module_write_netcdf.F90 line:          424
 NetCDF: Filter error: undefined filter encountered

on these tests:

control_wrtGauss_netcdf_parallel_intel
control_p8_intel
control_p8.v2.sfc_intel
regional_netcdf_parallel_intel
rrfs_v1beta_intel
control_wam_intel
control_wrtGauss_netcdf_parallel_debug_intel
control_debug_p8_intel
control_wam_debug_intel

line 424 of module_write_netcdf is: https://github.com/NOAA-EMC/fv3atm/blob/a9364591091c836984a40107729720705847c195/io/module_write_netcdf.F90#L424

NetCDF: Filter error: undefined filter encountered

This error means that the netcdf library does not support zstd filter.

@BrianCurtis-NOAA
Copy link
Collaborator Author

@DusanJovic-NOAA does it make sense only for those tests? There are many others that use ZSTANDARD_LEVEL.

@DusanJovic-NOAA
Copy link
Collaborator

@DusanJovic-NOAA does it make sense only for those tests? There are many others that use ZSTANDARD_LEVEL.

Where is your run directory?

@BrianCurtis-NOAA
Copy link
Collaborator Author

@DusanJovic-NOAA does it make sense only for those tests? There are many others that use ZSTANDARD_LEVEL.

Where is your run directory?

/lfs/h2/emc/ptmp/brian.curtis/FV3_RT/rt_259224

@BrianCurtis-NOAA
Copy link
Collaborator Author

OK, I'll give it a whirl.

@BrianCurtis-NOAA
Copy link
Collaborator Author

All tests ran to completion. But not all tests are getting zstandard_level into model_configure so I imagine there's still some work for the different model_configure files to make sure they get added where needed.

@BrianCurtis-NOAA
Copy link
Collaborator Author

After modifying the model_configures to add zstandard_level and running the full suite, all tests were able to run to completion with the following list of tests failing in comparison:

cpld_control_p8_mixedmode intel
cpld_control_p8 intel
cpld_control_p8.v2.sfc intel
cpld_restart_p8 intel
cpld_control_qr_p8 intel
cpld_restart_qr_p8 intel
cpld_2threads_p8 intel
cpld_decomp_p8 intel
cpld_mpi_p8 intel
cpld_control_ciceC_p8 intel
cpld_bmark_p8 intel
cpld_restart_bmark_p8 intel
cpld_s2sa_p8 intel
cpld_control_p8_faster intel
control_wrtGauss_netcdf_parallel intel
control_p8 intel
control_p8.v2.sfc intel
control_restart_p8 intel
regional_netcdf_parallel intel
rrfs_v1beta intel
control_wam intel
control_wrtGauss_netcdf_parallel_debug intel
control_debug_p8 intel
control_wam_debug intel
conus13km_control intel
conus13km_2threads intel
conus13km_restart_mismatch intel
hafs_regional_atm intel
hafs_regional_atm_ocn intel
hafs_regional_atm_wav intel
hafs_regional_atm_ocn_wav intel
hafs_regional_1nest_atm intel
hafs_regional_telescopic_2nests_atm intel
hafs_global_1nest_atm intel
hafs_global_multiple_4nests_atm intel
hafs_regional_specified_moving_1nest_atm intel
hafs_regional_storm_following_1nest_atm intel
hafs_regional_storm_following_1nest_atm_ocn intel
hafs_global_storm_following_1nest_atm intel
hafs_regional_storm_following_1nest_atm_ocn_debug intel
hafs_regional_storm_following_1nest_atm_ocn_wav intel
hafs_regional_storm_following_1nest_atm_ocn_wav_inline intel
hafs_regional_storm_following_1nest_atm_ocn_wav_mom6 intel
hafs_regional_docn intel
hafs_regional_docn_oisst intel
atmaero_control_p8 intel
atmaero_control_p8_rad intel
atmaero_control_p8_rad_micro intel

all the "UNABLE TO START TEST" were explainable with their parent tests failing comparison.

Onto running the full suite on Hera to get the official test_changes.list

@BrianCurtis-NOAA
Copy link
Collaborator Author

@Hang-Lei-NOAA Are these in official locations already? If not let me know and i'll re-run a full suite to confirm we still see the ancitipated changes.

@Hang-Lei-NOAA
Copy link

@BrianCurtis-NOAA they are not, and still there as before. We have not heard of a confirmation from ufs team for if the difference are acceptable.

@BrianCurtis-NOAA
Copy link
Collaborator Author

@BrianCurtis-NOAA they are not, and still there as before. We have not heard of a confirmation from ufs team for if the difference are acceptable.

OK Thanks! My re-test for WCOSS2 is almost finished.

@BrianCurtis-NOAA
Copy link
Collaborator Author

@Hang-Lei-NOAA go ahead and tell NCO to install officially. Please let me know if the official location is any different than normal.

@BrianCurtis-NOAA
Copy link
Collaborator Author

@Hang-Lei-NOAA go ahead and tell NCO to install officially. Please let me know if the official location is any different than normal.

Hold off please if you can. Running a couple more things may have some issue.

@Hang-Lei-NOAA
Copy link

Hang-Lei-NOAA commented Oct 16, 2024 via email

@BrianCurtis-NOAA
Copy link
Collaborator Author

BrianCurtis-NOAA commented Jan 14, 2025

baseline dir = /lfs/h2/emc/nems/noscrub/emc.nems/RT/NEMSfv3gfs/develop-20250107/cpld_control_p8_intel
working dir  = /lfs/h2/emc/ptmp/brian.curtis/FV3_RT/rt_2545763/cpld_control_p8_intel
Checking test cpld_control_p8_intel results ....
 Comparing sfcf021.tile1.nc .....USING NCCMP......OK
 Comparing sfcf021.tile2.nc .....USING NCCMP......OK
 Comparing sfcf021.tile3.nc .....USING NCCMP......OK
 Comparing sfcf021.tile4.nc .....USING NCCMP......OK
 Comparing sfcf021.tile5.nc .....USING NCCMP......OK
 Comparing sfcf021.tile6.nc .....USING NCCMP......OK
 Comparing atmf021.tile1.nc .....USING NCCMP......NOT IDENTICAL
 Comparing atmf021.tile2.nc .....USING NCCMP......NOT IDENTICAL
 Comparing atmf021.tile3.nc .....USING NCCMP......NOT IDENTICAL
 Comparing atmf021.tile4.nc .....USING NCCMP......NOT IDENTICAL
 Comparing atmf021.tile5.nc .....USING NCCMP......NOT IDENTICAL
 Comparing atmf021.tile6.nc .....USING NCCMP......NOT IDENTICAL
 Comparing sfcf024.tile1.nc .....USING NCCMP......OK
 Comparing sfcf024.tile2.nc .....USING NCCMP......OK
 Comparing sfcf024.tile3.nc .....USING NCCMP......OK
 Comparing sfcf024.tile4.nc .....USING NCCMP......OK
 Comparing sfcf024.tile5.nc .....USING NCCMP......OK
 Comparing sfcf024.tile6.nc .....USING NCCMP......OK
 Comparing atmf024.tile1.nc .....USING NCCMP......NOT IDENTICAL
 Comparing atmf024.tile2.nc .....USING NCCMP......NOT IDENTICAL
 Comparing atmf024.tile3.nc .....USING NCCMP......NOT IDENTICAL
 Comparing atmf024.tile4.nc .....USING NCCMP......NOT IDENTICAL
 Comparing atmf024.tile5.nc .....USING NCCMP......NOT IDENTICAL
 Comparing atmf024.tile6.nc .....USING NCCMP......NOT IDENTICAL
 Comparing gocart.inst_aod.20210323_0600z.nc4 .....USING NCCMP......NOT IDENTICAL
 Comparing RESTART/20210323.060000.coupler.res .....USING CMP......OK
 Comparing RESTART/20210323.060000.fv_core.res.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_core.res.tile1.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_core.res.tile2.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_core.res.tile3.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_core.res.tile4.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_core.res.tile5.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_core.res.tile6.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_srf_wnd.res.tile1.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_srf_wnd.res.tile2.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_srf_wnd.res.tile3.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_srf_wnd.res.tile4.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_srf_wnd.res.tile5.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_srf_wnd.res.tile6.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.fv_tracer.res.tile1.nc .....USING NCCMP......NOT IDENTICAL
 Comparing RESTART/20210323.060000.fv_tracer.res.tile2.nc .....USING NCCMP......NOT IDENTICAL
 Comparing RESTART/20210323.060000.fv_tracer.res.tile3.nc .....USING NCCMP......NOT IDENTICAL
 Comparing RESTART/20210323.060000.fv_tracer.res.tile4.nc .....USING NCCMP......NOT IDENTICAL
 Comparing RESTART/20210323.060000.fv_tracer.res.tile5.nc .....USING NCCMP......NOT IDENTICAL
 Comparing RESTART/20210323.060000.fv_tracer.res.tile6.nc .....USING NCCMP......NOT IDENTICAL
 Comparing RESTART/20210323.060000.phy_data.tile1.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.phy_data.tile2.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.phy_data.tile3.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.phy_data.tile4.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.phy_data.tile5.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.phy_data.tile6.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.sfc_data.tile1.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.sfc_data.tile2.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.sfc_data.tile3.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.sfc_data.tile4.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.sfc_data.tile5.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.sfc_data.tile6.nc .....USING NCCMP......OK
 Comparing RESTART/20210322.090000.MOM.res.nc .....USING NCCMP......OK
 Comparing RESTART/20210323.060000.MOM.res.nc .....USING NCCMP......OK
 Comparing RESTART/iced.2021-03-22-32400.nc .....USING NCCMP......OK
 Comparing RESTART/iced.2021-03-23-21600.nc .....USING NCCMP......OK
 Comparing RESTART/ufs.cpld.cpl.r.2021-03-22-32400.nc .....USING NCCMP......OK
 Comparing RESTART/ufs.cpld.cpl.r.2021-03-23-21600.nc .....USING NCCMP......OK
 Comparing ufs.cpld.ww3.r.2021-03-22-32400.nc .....USING NCCMP......OK
 Comparing ufs.cpld.ww3.r.2021-03-23-21600.nc .....USING NCCMP......OK
 Comparing 20210323.060000.out_pnt.ww3 .....USING CMP......OK
 Comparing 20210323.060000.out_grd.ww3 .....USING CMP......OK

The total amount of wall time                        = 523.164400
The maximum resident set size (KB)                   = 3221332

Compared atmf021.tile1.nc manually:

Variable Group  Count          Sum      AbsSum          Min         Max       Range         Mean      StdDev
dms      /     427649 -2.27115e-08 2.22405e-07 -3.56522e-10 5.38421e-10 8.94943e-10 -5.31078e-14 7.85593e-12
dust1    /       1500 -5.71907e-05   0.0024572 -7.62939e-06 1.90735e-05 2.67029e-05 -3.81271e-08 2.41078e-06
dust2    /        864 -0.000183374   0.0035862 -2.28882e-05 1.52588e-05  3.8147e-05 -2.12238e-07 5.57316e-06
dust3    /        850 -0.000742391   0.0039175 -1.52588e-05 1.52588e-05 3.05176e-05 -8.73401e-07 6.01777e-06
dust4    /       3475 -6.05285e-05  0.00316412 -2.28882e-05 1.52588e-05  3.8147e-05 -1.74183e-08 2.34021e-06
dust5    /        582 -0.000143422 0.000473035 -7.62939e-06 5.72205e-06 1.33514e-05 -2.46429e-07 1.33289e-06
msa      /     167655 -1.06582e-11 2.45359e-08 -1.40119e-11 1.28466e-11 2.68585e-11 -6.35723e-17 2.92134e-13
nh3      /      12341 -9.55404e-06 0.000143297 -9.53674e-07 9.53674e-07 1.90735e-06  -7.7417e-10 3.71795e-08
nh4a     /      89508  4.34819e-06 0.000106395 -2.38419e-07 1.78814e-07 4.17233e-07  4.85788e-11 5.87479e-09
no3an1   /     403515 -2.02919e-06 8.82412e-05 -5.96046e-07 2.64496e-07 8.60542e-07  -5.0288e-12 2.15396e-09
no3an2   /     424339 -2.78776e-06 0.000301565 -1.78814e-07 3.57628e-07 5.36442e-07 -6.56966e-12 3.53538e-09
no3an3   /     353711 -5.16119e-07  5.2117e-06 -7.45058e-09 5.58794e-09 1.30385e-08 -1.45915e-12 1.03943e-10
pm10     /      21083 -0.000443687   0.0120248 -3.05176e-05 3.05176e-05 6.10352e-05 -2.10448e-08 2.82745e-06
pm25     /      36885 -0.000314661  0.00666968 -3.05176e-05 1.52588e-05 4.57764e-05 -8.53086e-09 7.77924e-07
so2      /     240871 -1.14387e-08 5.82321e-07 -3.72529e-09 1.86265e-09 5.58794e-09  -4.7489e-14 2.04966e-11
so4      /     103202 -2.91142e-05  0.00287628 -3.18885e-06 2.33948e-06 5.52833e-06 -2.82109e-10 5.21124e-08

Do we expect those differences to be of concern?

@BrianCurtis-NOAA
Copy link
Collaborator Author

@Hang-Lei-NOAA unfortunately, netcdf is still causing issues:

2025-01-14 23:18:17.365621 +0000 ERROR ../../src/nccmp_data.c:3675 NetCDF: HDF error

@BrianCurtis-NOAA
Copy link
Collaborator Author

check out /lfs/h2/emc/nems/noscrub/brian.curtis/git/BrianCurtis-NOAA/ufs-weather-model/zstd_netcdf/tests/run_dir/hafs_regional_atm_intel

@BrianCurtis-NOAA
Copy link
Collaborator Author

I'm going to try the solution mentioned here: #2015 (comment)

@Hang-Lei-NOAA
Copy link

Hang-Lei-NOAA commented Jan 15, 2025 via email

@Hang-Lei-NOAA
Copy link

Hang-Lei-NOAA commented Jan 15, 2025 via email

@BrianCurtis-NOAA
Copy link
Collaborator Author

I checked the runs. It looks like that the model still uses wrong netcdf/4.7.4 and wrong nccmp/1.8.9.0 which we did not require. vi /lfs/h2/emc/nems/noscrub/brian.curtis/git/BrianCurtis-NOAA/ufs-weather-model/zstd_netcdf/tests/logs/log_wcoss2/run_hafs_regional_atm_intel.log ================== libfabric/1.20.1:1;craype-network-ofi:1;envvar/1.0:1;ecflow/5.6.0.13:1 ;intel/19.1.3.304:1;hdf5/1.10.6:1;netcdf/4.7.4:1;nccmp/1.8.9.0:1; On Wed, Jan 15, 2025 at 10:51 AM Hang Lei - NOAA Affiliate < @.> wrote:

I am testing with your code, and will get back to you. @brian Curtis On Wed, Jan 15, 2025 at 10:49 AM Brian Curtis @.
> wrote: > I'm going to try the solution mentioned here: #2015 (comment) > <#2015 (comment)> > > — > Reply to this email directly, view it on GitHub > <#2444 (comment)>, > or unsubscribe > https://github.com/notifications/unsubscribe-auth/AKWSMFHX5YJPET6CZGVYHOD2KZ7PZAVCNFSM6AAAAABOWYHPDOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOJTGI4DSMJRGA > . > You are receiving this because you were mentioned.Message ID: > @.***> >

@Hang-Lei-NOAA nothing in my build would hint towards an obvious answer to this. Would something in the hpc-stack be causing this?

@BrianCurtis-NOAA
Copy link
Collaborator Author

found netcdf/4.7.4 in run_test.sh, fixed and will run again.

@BrianCurtis-NOAA
Copy link
Collaborator Author

BrianCurtis-NOAA commented Jan 21, 2025

@Hang-Lei-NOAA I found a netcdf 4.7.4 loaded in run_test.sh, i've made the necessary changes to get netcdf 4.9 in that file and all testing passes now.

Thanks.

@Hang-Lei-NOAA
Copy link

Hang-Lei-NOAA commented Jan 21, 2025 via email

@BrianCurtis-NOAA
Copy link
Collaborator Author

Congratulations for the results

On Tue, Jan 21, 2025 at 7:56 AM Brian Curtis @.> wrote: @Hang-Lei-NOAA https://github.com/Hang-Lei-NOAA I found a netcdf 4.7.4 loaded in run_test.sh, i've made the necessary changes to get netcdf 4.9 in that file and all testing passes now. Thanks. — Reply to this email directly, view it on GitHub <#2444 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWSMFB5KV3EER5ABQXZLU32LY7YBAVCNFSM6AAAAABOWYHPDOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMBUGY3DGMBSGE . You are receiving this because you were mentioned.Message ID: @.>

Apologies, what I mean is that comparing against the baselines created using these changes are all comparing OK. I will run full testing to see how these changes impact the UFSWM develop branch baselines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Get zstd compression in netcdf on wcoss2 operation
8 participants