add run time info #114

junwang-noaa · 2023-07-05T19:57:42Z

Add run time information in MOM6 NUOPC cap.

junwang-noaa · 2023-07-05T19:58:22Z

@jiandewang @DeniseWorthen Please let me know if you have any suggestions/concerns.

codecov-commenter · 2023-07-05T20:03:20Z

Codecov Report

Merging #114 (3a0b235) into dev/emc (55e0472) will increase coverage by 4.17%.
Report is 459 commits behind head on dev/emc.
The diff coverage is 38.34%.

❗ Current head 3a0b235 differs from pull request most recent head 328766d. Consider uploading reports for the commit 328766d to get more accurate results

@@             Coverage Diff             @@
##           dev/emc     #114      +/-   ##
===========================================
+ Coverage    33.99%   38.17%   +4.17%     
===========================================
  Files          259      269      +10     
  Lines        70267    76503    +6236     
  Branches     13020    14064    +1044     
===========================================
+ Hits         23885    29202    +5317     
- Misses       41880    42054     +174     
- Partials      4502     5247     +745

Files Changed	Coverage Δ
...g_src/drivers/solo_driver/MESO_surface_forcing.F90	`0.00% <0.00%> (ø)`
...g_src/drivers/solo_driver/user_surface_forcing.F90	`0.00% <0.00%> (ø)`
...c/external/GFDL_ocean_BGC/generic_tracer_utils.F90	`0.00% <0.00%> (ø)`
...src/external/database_comms/MOM_database_comms.F90	`0.00% <0.00%> (ø)`
...ernal/database_comms/database_client_interface.F90	`0.00% <0.00%> (ø)`
config_src/infra/FMS1/MOM_coms_infra.F90	`59.47% <0.00%> (+24.86%)`	⬆️
config_src/infra/FMS1/MOM_io_infra.F90	`22.00% <0.00%> (+2.06%)`	⬆️
src/ALE/MOM_hybgen_regrid.F90	`0.00% <0.00%> (ø)`
src/ALE/MOM_hybgen_remap.F90	`0.00% <0.00%> (ø)`
src/ALE/P1M_functions.F90	`0.00% <0.00%> (ø)`
... and 185 more

... and 36 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

jiandewang · 2023-07-05T20:04:08Z

@junwang-noaa have you tried this code in S2S run to make sure you get information as you expected ?

jiandewang · 2023-07-06T03:00:58Z

@junwang-noaa I tried to use your MOM6 branch in UWM and run one case (cpld_bmark_p8_intel) but it died right away in ocean PE. Is there any setting I need to modify ?
see run directory at /scratch1/NCEPDEV/stmp2/Jiande.Wang/FV3_RT/rt_97485/cpld_bmark_p8_intel

junwang-noaa · 2023-07-06T03:04:39Z

I will take a look, thanks

jiandewang · 2023-07-06T03:11:34Z

@junwang-noaa UWM code and script at /scratch1/NCEPDEV/climate/Jiande.Wang/working/MOM6-add-cap-Jun/ufs-weather-model

…untime

junwang-noaa · 2023-08-17T20:03:03Z

@jiandewang The issue is fixed and I ran the RT on hera and the test is running fine. Would you please review the code? Thanks

jiandewang · 2023-08-17T20:08:21Z

@junwang-noaa do you keep the run log on HERA ?

junwang-noaa · 2023-08-17T20:11:29Z

/scratch1/NCEPDEV/nems/Jun.Wang/nems/vlab/20230705/ufs-weather-model/tests/logs/RegressionTests_hera.log

Please let me know if you see any issue, I am going to run full RT after that

jiandewang · 2023-08-17T20:45:57Z

/scratch1/NCEPDEV/nems/Jun.Wang/nems/vlab/20230705/ufs-weather-model/tests/logs/RegressionTests_hera.log

Please let me know if you see any issue, I am going to run full RT after that

@junwang-noaa I don't see any issue grom the "out" file in your run dir, you may start full RT

DeniseWorthen · 2023-08-17T20:51:21Z

Won't this be written for all users? There doesn't seem a way to do this optionally.

jiandewang · 2023-08-17T21:32:34Z

@DeniseWorthen do you mean you are worrying that NCAR side will see these information in their run log?

DeniseWorthen · 2023-08-17T21:46:52Z

Yes, this might not be something another user of the nuopc mom cap needs or wants. I had started to try to get a wrapper mod working for MOM (like we are trying to do for CICE) but it isn't working completely yet. Wrapping everything in 'if (.not.cesmcoupled) will be really messy.

jiandewang · 2023-08-17T22:13:43Z

One thing I can do is provide Jun's branch to NCAR and ask them whether they feel those run time information is a hassle for them or not

junwang-noaa · 2023-08-18T02:32:08Z

Thanks for checking, Jiande. @DeniseWorthen I am not sure if we want to isolate the code with a wrapper. I do see some print lines in the initialization/finalize in the original code. I thought these info could benefit everyone to understand/monitor computational performance, but I also understand people may have their own preference.

jiandewang · 2023-08-18T03:05:55Z

Thanks for checking, Jiande. @DeniseWorthen I am not sure if we want to isolate the code with a wrapper. I do see some print lines in the initialization/finalize in the original code. I thought these info could benefit everyone to understand computational performance, but I also understand people may have their own preference.

@junwang-noaa let me contact NCAR and see what's their feedback

alperaltuntas · 2023-08-18T16:53:48Z

I have two quick comments:

Rather than MPI_Wtime, the MOM_cpu_clock module routines should be used to measure and log elapsed time.
Instead of write(*,*), you should have write(stdout,*) as is done in the rest of the code. Otherwise, MOM6 logs go to the wrong log files in CESM.

junwang-noaa · 2023-08-18T18:21:11Z

@alperaltuntas Thanks for your comments.

I made changes for stdout, I removed the timing in InitializeP0 subroutine as "stdout" is not defined yet in that subroutine and it is called only once and timing is small.
Here we are looking for temporal run time information that allows us to configure the MPI tasks for MOM6 in coupled runs (run time in every step and waiting time between run steps). So far I only saw a statistics information for the whole integration from the MOM_cpu_clock, maybe I missed something. Any suggestions?

DeniseWorthen · 2023-08-21T13:57:51Z

@junwang-noaa Should this information be written to a MOM specific log file vs just stdout for UFS?

junwang-noaa · 2023-08-21T14:27:37Z

I feel either way is fine as the UFS stdout already has the MOM print info.

alperaltuntas · 2023-08-21T14:36:38Z

@junwang-noaa, As an example of how the MOM_cpu_clock is used to measure elapsed time of a particular code segment, search for id_clock_forcing in drivers/nuopc_cap/mom_surface_forcing_nuopc.F90. Note the usage of cpu_clock_id, cpu_clock_begin and cpu_clock_end function/subroutine calls. When you utilize these routines, the timing information will automatically end up in the log file and you won't have to worry about expicitly writing out timing.

alperaltuntas · 2023-08-21T14:48:43Z

I am also mentioning @marshallward in case he has further comments regarding the usage of clocks.

junwang-noaa · 2023-08-21T14:58:17Z

@alperaltuntas @marshallward I did look at the MOM_cpu_clock which uses the MPP_clock. Sorry, I did not make it clear. I am trying to get timing information for the NUOPC phases including initialization (where the MOM_cpu_clock is not available at the beginning of the initialization phase yet) and run/finalize phases. For the run phase, I need to get timing information for every time step (e.g. timing variance at different IO output time including history file, restart file etc). I can add cpu_clock_begin and cpu_clock_end in the run phase, but I can only get a total run time at the end of integration when the mpp_exit is called. Also I want to catch the waiting time of MOM6 grid component during coupling time step ( the time difference between MOM leaves a NUOPC run step and the next time it enters the run step), I am not sure if the MOM_cpu_clock can be used for this purpose.

marshallward · 2023-08-21T17:55:39Z

@junwang-noaa As long as the FMS mpp_init() has been called, the mpp_clock_(begin|end) functions should be available. The output from these clocks will also appear alongside the MOM6 clock output, which is generated by FMS. (As you mention above, cpu_clock_(begin|end) are the MOM interfaces to the FMS clocks. You would probably only use those inside of MOM6.)

marshallward · 2023-08-21T18:03:21Z

I just looked at the code, I agree with @alperaltuntas that there should be no need to use MPI_Wtime inside of the cap. Times measured inside the cap should use cpu_clock_*.

I don't know how your coupler works, but it sounds like you want to log wait times outside of the cap. Is that right? In that case, I think it is a job for your coupler, not MOM, so you should use timer provided by the coupler.

If the coupler happens to have access to FMS, then you could possibly use mpp_clock_*, but I suspect you probably want a model-wide solution, or perhaps just MPI_Wtime.

junwang-noaa · 2023-08-21T18:29:15Z

@marshallward Thanks for looking into this issue. One question is that I need to have the run time information at every time step, but the mpp_clock can only output the total time for each clock_id at the end of integration. How do I get that timing information at every integration time step?

In UFS coupler runs on different MPI tasks, checking the waiting for other component requires additional communication, I think it is the simplest if we get the timing on each component itself just use mpi_wtime. Is it possible that we can implement this capability with a wrapper so that other applications that uses the NUOPC cap mom_cap will not be impacted? Thanks for any suggestions.

marshallward · 2023-08-21T19:09:26Z

@junwang-noaa You are right, the FMS timers are only integrated and won't provide per-time steps; if that is what you need, then you would also need an alternative timer.

If I were to do this, I would consider using the system_clock() intrinsic rather than an MPI call. I am a bit concerned that MPI_Wtime()'s awareness of MPI ranks outside of the MOM communicator could cause problems. For example, if the global timer sync were enabled, then the model could hang. Since you only report the root PE time, it should be safe to use system_clock().

I would also suggest wrapping this in some sort of debug flag, to avoid excessive output to stdout. Saving to a file might also be useful.

In fact, given that every timer check happens at the beginning and end of each function, it still feels natural for the times to reside in the functions which call them, and might be better positioned to report them with respect to the other submodels. But perhaps that is just a subjective opinion.

Having said all that, cap maintainers are the ones responsible for this code, and are free to make any changes that they deem necessary, so don't feel obliged to adopt any of these suggestions. Whatever you decide will be fine from our perspective.

alperaltuntas · 2023-08-21T21:02:06Z

I agree with Marshall's suggestions, and especially the suggestion for wrapping the write calls in a switchable flag (that should be false by default.)

junwang-noaa · 2023-08-23T02:59:05Z

@marshallward @alperaltuntas Thank you very much for your comments. I have updated the code to

using a nems configuration variable: RunTimeLog to control if the run time information will be output.
In mom_cap.F90, there is an option to have the mom6 run log going to a specific log file:
https://github.com/NOAA-EMC/MOM6/blob/dev/emc/config_src/drivers/nuopc_cap/mom_cap.F90#L456
So when the RunTimeLog is turned on, the timing info will go to a user specified file.
I tried both MPI_Wtime and system.clock to see the timing difference. I used:

integer(8)           :: counti=0
integer(8)           :: countf=0
integer(8)           :: count_rate=1
...
  if(write_runtimelog) call system_clock(counti,count_rate)
...
  if(write_runtimelog) then
call system_clock(countf)
if(is_root_pe()) write(stdout,*) 'In ',trim(subname),' time ', REAL(countf-counti)/REAL(count_rate)

I see some differences compared to MPI_Wtime(), below - lines are from MPI_Wtime() while +lines are from system.clock.

- In (MOM_cap:InitializeRealize) time   0.208017795346677
+In (MOM_cap:InitializeRealize) time   0.208026000000000
- In (MOM_cap:DataInitialize) time   7.365932688117027E-004
+In (MOM_cap:DataInitialize) time   1.171000000000000E-003
- In (MOM_cap:ModelAdvance) time   6.539979949593544E-005
+In (MOM_cap:ModelAdvance) time   6.900000000000000E-005

To be consistent with other model components, I'd like to use MPI_Wtime(). Since these calls are within MOM6 NUOPC component, they are called on the MPI ranks within the communicator of the MOM6 grid component vm. So I think they won't be called outside MOM6 tasks. If MOM6 runs on global communicator, it will still work and the timing is what we want to get.

Please let me know if you have any further questions/suggestions. Really appreciate you reviewed the code.

marshallward · 2023-08-23T15:37:47Z

Thanks for posting the numbers, that's an interesting comparison. system_clock is usually a thin wrapper to clock_gettime() in Linux (GFortran does this, for example), although compilers can do whatever they choose. MPI_Wtime appears to use gettimeofday() in MPICH, which is older than clock_gettime and is considered more problematic. But again, the MPI library can use whatever they like.

Given that your other submodels use MPI_Wtime, it suppose it makes more sense for you to stay consistent. But if this affects any other potential NUOPC users, we may need to come back to this decision.

jiandewang · 2023-08-23T18:08:59Z

@alperaltuntas do you think this PR is in good shape now ? If yes can you make a test run on your system ? Thanks

alperaltuntas · 2023-08-30T17:55:39Z

It all looks good and our tests are passing. Thanks for checking with us!

jiandewang · 2023-08-30T18:01:50Z

It all looks good and our tests are passing. Thanks for checking with us!

@alperaltuntas thank you very much.

jkbk2004 · 2023-09-06T14:12:50Z

@junwang-noaa I wonder if draft status can be converted to PR to get approvals from @jiandewang

jiandewang · 2023-09-06T14:17:37Z

@junwang-noaa I think you can convert it to formal PR since NCAR side has no objection on it.

junwang-noaa · 2023-09-06T14:40:41Z

@jiandewang Done.

FernandoAndrade-NOAA · 2023-09-07T19:01:36Z

Testing on #1823 has completed successfully, please continue the merge process for this PR. Thank you.

add run time info

1841c26

junwang-noaa mentioned this pull request Jul 6, 2023

Add run time information and update fv3 to use unified missing value ufs-community/ufs-weather-model#1823

Merged

39 tasks

junwang-noaa and others added 5 commits July 7, 2023 12:04

fix output

c07ef0a

Merge remote-tracking branch 'upstream/dev/emc' into runtime

9ab1494

Merge branch 'runtime' of https://github.com/junwang-noaa/MOM6 into r…

e9d4d1a

…untime

remove prints

4217215

fix syntax

9410369

jiandewang requested review from jiandewang and DeniseWorthen August 17, 2023 20:07

output timgin in stdout

4954040

fix run time

328766d

add option to write run time

2636282

junwang-noaa marked this pull request as ready for review September 6, 2023 14:39

jiandewang approved these changes Sep 6, 2023

View reviewed changes

DeniseWorthen approved these changes Sep 6, 2023

View reviewed changes

jiandewang merged commit be40a41 into NOAA-EMC:dev/emc Sep 7, 2023
10 checks passed

junwang-noaa mentioned this pull request Sep 21, 2023

add run time info in a log file NOAA-EMC/WW3#1081

Open

add run time info #114

add run time info #114

Conversation

junwang-noaa commented Jul 5, 2023

junwang-noaa commented Jul 5, 2023

codecov-commenter commented Jul 5, 2023 • edited Loading

Codecov Report

jiandewang commented Jul 5, 2023

jiandewang commented Jul 6, 2023

junwang-noaa commented Jul 6, 2023

jiandewang commented Jul 6, 2023

junwang-noaa commented Aug 17, 2023 • edited Loading

jiandewang commented Aug 17, 2023

junwang-noaa commented Aug 17, 2023

jiandewang commented Aug 17, 2023

DeniseWorthen commented Aug 17, 2023 • edited Loading

jiandewang commented Aug 17, 2023

DeniseWorthen commented Aug 17, 2023 • edited Loading

jiandewang commented Aug 17, 2023

junwang-noaa commented Aug 18, 2023 • edited Loading

jiandewang commented Aug 18, 2023

alperaltuntas commented Aug 18, 2023

junwang-noaa commented Aug 18, 2023 • edited Loading

DeniseWorthen commented Aug 21, 2023

junwang-noaa commented Aug 21, 2023

alperaltuntas commented Aug 21, 2023

alperaltuntas commented Aug 21, 2023

junwang-noaa commented Aug 21, 2023 • edited Loading

marshallward commented Aug 21, 2023

marshallward commented Aug 21, 2023

junwang-noaa commented Aug 21, 2023 • edited Loading

marshallward commented Aug 21, 2023

alperaltuntas commented Aug 21, 2023

junwang-noaa commented Aug 23, 2023 • edited Loading

marshallward commented Aug 23, 2023 • edited Loading

jiandewang commented Aug 23, 2023

alperaltuntas commented Aug 30, 2023

jiandewang commented Aug 30, 2023

jkbk2004 commented Sep 6, 2023

jiandewang commented Sep 6, 2023

junwang-noaa commented Sep 6, 2023

FernandoAndrade-NOAA commented Sep 7, 2023

codecov-commenter commented Jul 5, 2023 •

edited

Loading

junwang-noaa commented Aug 17, 2023 •

edited

Loading

DeniseWorthen commented Aug 17, 2023 •

edited

Loading

DeniseWorthen commented Aug 17, 2023 •

edited

Loading

junwang-noaa commented Aug 18, 2023 •

edited

Loading

junwang-noaa commented Aug 18, 2023 •

edited

Loading

junwang-noaa commented Aug 21, 2023 •

edited

Loading

junwang-noaa commented Aug 21, 2023 •

edited

Loading

junwang-noaa commented Aug 23, 2023 •

edited

Loading

marshallward commented Aug 23, 2023 •

edited

Loading