-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add run time info #114
add run time info #114
Conversation
@jiandewang @DeniseWorthen Please let me know if you have any suggestions/concerns. |
Codecov Report
@@ Coverage Diff @@
## dev/emc #114 +/- ##
===========================================
+ Coverage 33.99% 38.17% +4.17%
===========================================
Files 259 269 +10
Lines 70267 76503 +6236
Branches 13020 14064 +1044
===========================================
+ Hits 23885 29202 +5317
- Misses 41880 42054 +174
- Partials 4502 5247 +745
... and 36 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@junwang-noaa have you tried this code in S2S run to make sure you get information as you expected ? |
@junwang-noaa I tried to use your MOM6 branch in UWM and run one case (cpld_bmark_p8_intel) but it died right away in ocean PE. Is there any setting I need to modify ? |
I will take a look, thanks |
@junwang-noaa UWM code and script at /scratch1/NCEPDEV/climate/Jiande.Wang/working/MOM6-add-cap-Jun/ufs-weather-model |
@jiandewang The issue is fixed and I ran the RT on hera and the test is running fine. Would you please review the code? Thanks |
@junwang-noaa do you keep the run log on HERA ? |
/scratch1/NCEPDEV/nems/Jun.Wang/nems/vlab/20230705/ufs-weather-model/tests/logs/RegressionTests_hera.log Please let me know if you see any issue, I am going to run full RT after that |
@junwang-noaa I don't see any issue grom the "out" file in your run dir, you may start full RT |
Won't this be written for all users? There doesn't seem a way to do this optionally. |
@DeniseWorthen do you mean you are worrying that NCAR side will see these information in their run log? |
Yes, this might not be something another user of the nuopc mom cap needs or wants. I had started to try to get a wrapper mod working for MOM (like we are trying to do for CICE) but it isn't working completely yet. Wrapping everything in 'if (.not.cesmcoupled) will be really messy. |
One thing I can do is provide Jun's branch to NCAR and ask them whether they feel those run time information is a hassle for them or not |
Thanks for checking, Jiande. @DeniseWorthen I am not sure if we want to isolate the code with a wrapper. I do see some print lines in the initialization/finalize in the original code. I thought these info could benefit everyone to understand/monitor computational performance, but I also understand people may have their own preference. |
@junwang-noaa let me contact NCAR and see what's their feedback |
I have two quick comments:
|
@alperaltuntas Thanks for your comments.
|
@junwang-noaa Should this information be written to a MOM specific log file vs just stdout for UFS? |
I feel either way is fine as the UFS stdout already has the MOM print info. |
@junwang-noaa, As an example of how the MOM_cpu_clock is used to measure elapsed time of a particular code segment, search for |
I am also mentioning @marshallward in case he has further comments regarding the usage of clocks. |
@alperaltuntas @marshallward I did look at the MOM_cpu_clock which uses the MPP_clock. Sorry, I did not make it clear. I am trying to get timing information for the NUOPC phases including initialization (where the MOM_cpu_clock is not available at the beginning of the initialization phase yet) and run/finalize phases. For the run phase, I need to get timing information for every time step (e.g. timing variance at different IO output time including history file, restart file etc). I can add cpu_clock_begin and cpu_clock_end in the run phase, but I can only get a total run time at the end of integration when the mpp_exit is called. Also I want to catch the waiting time of MOM6 grid component during coupling time step ( the time difference between MOM leaves a NUOPC run step and the next time it enters the run step), I am not sure if the MOM_cpu_clock can be used for this purpose. |
@junwang-noaa As long as the FMS |
I just looked at the code, I agree with @alperaltuntas that there should be no need to use I don't know how your coupler works, but it sounds like you want to log wait times outside of the cap. Is that right? In that case, I think it is a job for your coupler, not MOM, so you should use timer provided by the coupler. If the coupler happens to have access to FMS, then you could possibly use |
@marshallward Thanks for looking into this issue. One question is that I need to have the run time information at every time step, but the mpp_clock can only output the total time for each clock_id at the end of integration. How do I get that timing information at every integration time step? In UFS coupler runs on different MPI tasks, checking the waiting for other component requires additional communication, I think it is the simplest if we get the timing on each component itself just use mpi_wtime. Is it possible that we can implement this capability with a wrapper so that other applications that uses the NUOPC cap mom_cap will not be impacted? Thanks for any suggestions. |
@junwang-noaa You are right, the FMS timers are only integrated and won't provide per-time steps; if that is what you need, then you would also need an alternative timer. If I were to do this, I would consider using the I would also suggest wrapping this in some sort of debug flag, to avoid excessive output to stdout. Saving to a file might also be useful. In fact, given that every timer check happens at the beginning and end of each function, it still feels natural for the times to reside in the functions which call them, and might be better positioned to report them with respect to the other submodels. But perhaps that is just a subjective opinion. Having said all that, cap maintainers are the ones responsible for this code, and are free to make any changes that they deem necessary, so don't feel obliged to adopt any of these suggestions. Whatever you decide will be fine from our perspective. |
I agree with Marshall's suggestions, and especially the suggestion for wrapping the |
@marshallward @alperaltuntas Thank you very much for your comments. I have updated the code to
I see some differences compared to MPI_Wtime(), below - lines are from MPI_Wtime() while +lines are from system.clock.
To be consistent with other model components, I'd like to use MPI_Wtime(). Since these calls are within MOM6 NUOPC component, they are called on the MPI ranks within the communicator of the MOM6 grid component vm. So I think they won't be called outside MOM6 tasks. If MOM6 runs on global communicator, it will still work and the timing is what we want to get. Please let me know if you have any further questions/suggestions. Really appreciate you reviewed the code. |
Thanks for posting the numbers, that's an interesting comparison. Given that your other submodels use |
@alperaltuntas do you think this PR is in good shape now ? If yes can you make a test run on your system ? Thanks |
It all looks good and our tests are passing. Thanks for checking with us! |
@alperaltuntas thank you very much. |
@junwang-noaa I wonder if draft status can be converted to PR to get approvals from @jiandewang |
@junwang-noaa I think you can convert it to formal PR since NCAR side has no objection on it. |
@jiandewang Done. |
Testing on #1823 has completed successfully, please continue the merge process for this PR. Thank you. |
Add run time information in MOM6 NUOPC cap.