-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature test issues for rrfs_smoke_conus13km_hrrr_warm #1222
Comments
The issue was fixed in PR#1257. The issue will be closed. |
@junwang-noaa This was NOT fixed in #1257. Please re-open this issue so I don't have to make a new one. |
Sorry, I see the PR #1257 fixed the reproducibility for hrrr_control, not rrfs_smoke_conus13km_hrrr_warm. |
Actually, the hrrr_control variants already worked, they just weren't enabled. The reproducibility fix in that PR was for the rap_decomp. |
Can this issue be closed @junwang-noaa @SamuelTrahanNOAA ? |
No. This problem is not resolved. |
I can fix the debug and 2threads variants in this PR: #1437 Sadly, as yet, I have no fix for the restart or decomp variants. However, I suspect this bug may be breaking decomp: #1436 if it is using data from halo regions. I have no way to fix that bug, nor even confirm my suspicions, since that code goes well beyond my understanding of the boundary generation. |
I decided to test rrfs_smoke_conus13km_hrrr_warm with the various features decomposition, restart mode, and mpi, (I know debug and 2threads should now be passing with the merging of #1437 ) and it seems everything passed. @SamuelTrahanNOAA have you had the opportunity to test again recently? |
They fail for me. How did you test? You need to use the tests/tests files, not just change environment variables. The RRFS tests ignore several environment variables, and they're always warm starts. |
The RRFS has hard-coded values for some variables. If you're using an automated tool that tweaks variables, it won't test anything. These values are hard-coded:
All RRFS runs are warm starts. To do a restart test, you need to set |
I just retested hera.gnu and I can confirm the situation is unchanged. I'd like to know how @zach1221 ran the tests. This is not the first time someone has configured the RRFS tests incorrectly and falsely reported that the restart and decomp work. Is the tool "opnReqTest?" If so, I'll add an "if" statement to rrfs_warm_run.IN to abort the test if that tool is enabled. |
@SamuelTrahanNOAA I see. Well I guess I tested incorrectly. I was just running the tests sequentially out of rt.conf in tests/. I'll try again with the steps you provided to reproduce. Thank you! |
The I haven't tried that before. |
Use this: COMPILE | 13 | intel | -DAPP=ATM -DCCPP_SUITES=FV3_RAP,FV3_RAP_sfcdiff,FV3_HRRR,FV3_HRRR_flake,FV3_RRFS_v1beta,FV3_RRFS_v1nssl -D32BIT=ON | | fv3 |
RUN | rrfs_smoke_conus13km_hrrr_warm | | baseline |
RUN | rrfs_smoke_conus13km_hrrr_warm_2threads | | |
RUN | rrfs_conus13km_hrrr_warm | | baseline |
RUN | rrfs_smoke_conus13km_radar_tten_warm | | baseline |
RUN | rrfs_smoke_conus13km_hrrr_warm_decomp | | |
RUN | rrfs_smoke_conus13km_hrrr_warm_restart | | | rrfs_smoke_conus13km_hrrr_warm
RUN | rrfs_conus13km_hrrr_warm_restart_mismatch | | baseline | rrfs_conus13km_hrrr_warm |
@SamuelTrahanNOAA thanks, again. Let me try that now. |
My branch was not up-to-date with develop, so that test didn't check if the latest version works. It seems the regression test system has changed substantially. I'll have to check if it's even running those tests correctly. |
The 2threads test doesn't use 2 threads anymore, but the decomp test still changes the decomposition. |
The restart and decomp do not match the control, but they are executed correctly. It looks like the 2threads is using ESMF to turn on threading, without providing the mandatory OMP_NUM_THREADS variable that sets the maximum number of threads available to ESMF. I will try correcting this and see if it still passes. |
The 2threads test still passes if I set OMP_NUM_THREADS (THRD) to 2 |
The debug_decomp test (rrfs_smoke_conus13km_hrrr_warm_debug_decomp_intel) also fails. |
Hi, @SamuelTrahanNOAA . This issue is still under investigation, we'll attempt to keep you updated regularly going forward. |
Description
PR #1195 added a feature test rrfs_smoke_conus13km_hrrr_warm using suite file FV3_HRRR_smoke. The test owner needs to confirm that the feature test can reproduce results with different threads, decomposition, mpi tasks and in restart mode. It can also run in debug mode. Currently the test failed with decomposition and debug test.
To Reproduce:
Check out the branch in PR#1195, run rrfs_smoke_conus13km_hrrr_warm with different threading, decomposition, mpi tasks, in restart mode and debug mode.
Additional context
Add any other context about the problem here.
Directly reference any issues or PRs in this or other repositories that this is related to, and describe how they are related. Example:
Output
The text was updated successfully, but these errors were encountered: