Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WW3 cannot build pre-and-post processing jobs #2502

Open
JessicaMeixner-NOAA opened this issue Nov 18, 2024 · 22 comments · May be fixed by #2512
Open

WW3 cannot build pre-and-post processing jobs #2502

JessicaMeixner-NOAA opened this issue Nov 18, 2024 · 22 comments · May be fixed by #2512
Labels
bug Something isn't working

Comments

@JessicaMeixner-NOAA
Copy link
Collaborator

JessicaMeixner-NOAA commented Nov 18, 2024

Description

In PR #2445 WW3 was updated with PR NOAA-EMC/WW3#1303 which makes references to modules that are only in the cap code. This means that we cannot run the end-to-end system (for example see: NOAA-EMC/global-workflow#3106 (comment)).

We will need ot make an update to WW3 so that we can compile the pre-and post-scripts within the dev/ufs-weather-model.

To Reproduce:

The easiest way to reproduce this is to clone the g-w, update the model to the hash in develop and then build all or just the ww3 pre/post:

git clone https://github.com/noaA-EMC/global-workflow
cd global-workflow/
git submodule update --init --recursive 
cd sorc/ufs_model.fd/
git checkout develop
git submodule update --init --recursive 
cd ../
./build_ww3prepost.sh 

Additional context

@sbanihash had noticed this while reviewing a PR FYI @jswhit who ran into this in g-w.

@sbanihash and myself will be meeting with @MatthewMasarik-NOAA first thing tomorrow morning to find a solution.

Output

@JessicaMeixner-NOAA
Copy link
Collaborator Author

I'm working on some updates here: https://github.com/JessicaMeixner-NOAA/WW3/tree/bugfix/buildww3devufs which have allowed me to successfully build the pre & post scripts in the global-workflow. More testing is necessary but sharing current progress.

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Nov 19, 2024

I'm really confused by this. The WW3 build in UFS has always referenced files which were only in the dev/ufs-weather-model branch. The previous (prior to PIO commit) src file list for example contained

.....
  w3tidemd.F90
  wav_grdout.F90
  w3iogoncdmd.F90
  wav_shr_flags.F90
  )


set(nuopc_mesh_cap_src
  wav_kind_mod.F90
  wav_shr_mod.F90
  wav_shel_inp.F90
  wav_comp_nuopc.F90
  wav_import_export.F90
  wav_wrapper_mod.F90
  )
....

Why was this working prior to PIO and now is not?

@JessicaMeixner-NOAA
Copy link
Collaborator Author

The files in nuopc_mesh_cap_src were previously not referenced in the ftn_src files. There are now files in ftn_src trying to call files that are in the nuopc_mesh_cap_src. Moreover, the standalone WW3 builds do not seem to be finding PIO in the build either.

With the additions I added in https://github.com/JessicaMeixner-NOAA/WW3/tree/bugfix/buildww3devufs we can build the pre/post WW3 scripts, however we cannot build with MPI, which is where we start to get into additional issues including not being able to find the PIO variables, etc.

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Nov 19, 2024

These mesh-cap branch files

  wav_grdout.F90
  w3iogoncdmd.F90
  wav_shr_flags.F90

have always been in the ftn_src file list. See src_list.cmake at c7004b6.

EDIT: I suspect what you might need to do is just move some of the new files to the ftn_src list.

@JessicaMeixner-NOAA
Copy link
Collaborator Author

@DeniseWorthen - Moving things into the ftn_src list would solve some of the issues, yes. We will still need W3_MPI if defs around things that are parallel/use MPI so that when we build without MPI (for pre and post processing steps) that can still be built. I haven't quite figured out how to deal with the PIO parts - and the fact that we aren't finding PIO in WW3. I'm not sure if that's unrelated and just showing up because other errors have not yet been fixed, cmake updates are needed, or if we also need something in WW3 to say yes/no for PIO.

@DeniseWorthen
Copy link
Collaborator

I'm not the best source for CMake build know-how. But when I first started working on the PIO feature, I played around w/ adding PIO to the model/src/CMakeLists.txt, thinking I might need it. It turned out I didn't

  target_sources(ww3_lib PRIVATE ${cap_src})
  target_link_libraries(ww3_lib PUBLIC esmf
                                       PIO::PIO_Fortran)
  # Don't build executables when building WW3 ESMF library

@JessicaMeixner-NOAA
Copy link
Collaborator Author

Thank you @DeniseWorthen ! I didn't find any CMake additions in ufs-weather-model for PIO either, so maybe we just don't need it? Eventually when this goes back into develop, we'll have to figure out how to get around potentially putting more barriers aroudn PIO, but we don't have to worry about that for today.

I added a lot more files to the ftn_src and got the MPI build going, but now the standalone is breaking because those files have places we need the ifdefs for MPI. I'll want to run some additional standalone regression testing and a quick g-w run, but we should have a fix that shouldn't change any answers or anything like that tomorrow or Thursday. I keep adding my updates to: https://github.com/JessicaMeixner-NOAA/WW3/tree/bugfix/buildww3devufs if anyone is curious.

@JessicaMeixner-NOAA
Copy link
Collaborator Author

Okay - I ended up going a slightly different direction today (https://github.com/JessicaMeixner-NOAA/WW3/tree/bug/addPIOswitch) and found a minor bug that we needed to move and end statement inside of a W3_MPI ifdef, but I think the new way, which adds a W3_PIO if defs, will help us in the future and is a lot cleaner. It's not perfect in the sense that I cannot build standalone ww3 w/PIO - yet - because of EMSF dependencies in routines, but it opens up the door for that in the future. I can build the pre-and post-processed standalone routines with and without MPI.

That being said, I think we either need to change some of the src lists or maybe add a check that the nuopc cap has to use the PIO? (which I think is desirable, and I don't see anyone using NUOPC that would object).

Still to do: more standalone ww3 runs, ufs-rt tests, and running in g-w. I'm hoping I can do all of these things by Friday and have a PR ready by no later than Monday is my goal, if not before.

@DeniseWorthen - let me know if you would like to see other tests, have objections to this approach, or would like to see other additions.

@DeniseWorthen
Copy link
Collaborator

This change will breaks the capability of turning the netcdf feature on/off at run time vs compile time. That's a big step backward, imo.

I understand the intent is simply to solve the immediate problem of compiling the codes that are needed for pre/post in G-W. I think there are better ways of solving this, but those would take time to sort out.

My concern is that it will persist (as things in WW3 have a tendency to do). That leads to all sorts of messy problems, eg broken use-cases, like trying to set init-from-binary true and not have either restartnc or PIO set. There's basically no utility to having any code inside a use_restartnc or use_historync flag if you're not using PIO.

The PIO feature in the develop branch will require creating new nml namelists so that PIO can be configured that way (CICE has this implemented). I don't suspect that is happening any time soon.

@JessicaMeixner-NOAA
Copy link
Collaborator Author

Only having this based on a compile time flag means that even if you are not using PIO you have to have the library built correctly, yes? My concern with this is that requires the many users around the world outside of NOAA to at minimum build PIO, which is possibly non-trivial and not possible for them. WW3 is used by many small meteorology organizations where even building netcdf in the past was difficult and we need to discuss adding PIO with the llarger WW3 community which has not been done. I will happily bring this to that larger community and see if this is okay, but generally this is the way WW3 handles adding libraries that are not required for all use-cases. No it's not perfect, and I understand how you see this as a major step backwards. I just don't know another solution at the moment.

I agree it might be a while until a feature is ready to be used from PIO in the develop branch, but my hope is this is a small step towards enabling dev/ufs-weather-model to get merged into the develop branch without this feature fully ready. And solving the immediate issue of being able to run this in the end-to-end system.

We do need to get some sort of solution in sooner than later, I've mostly stopped all of my other work to address this, and will continue to so if you have a better solution - please let me know and I'll start implementing that immediately. At this point I am going to start testing this to make sure this is truly a viable solution.

@DeniseWorthen
Copy link
Collaborator

In terms of dev/ufs-weather-model going into develop in the short term, I think that is putting the cart before the horse. Unless there is a way (ie, outside of ESMF config) to use the feature, then what is the point? Get that done, then I can see the utility. And of course dev/uwm is a year behind develop---so how is that going to work?

There is also utility in UWM pointing to it's own branch. If UWM uses develop, then the merging of PRs to WW3 develop means that users 'around the world' are stuck in the UWM commit Q. WW3 cannot merge PRs to develop unless UWM also points to develop. Given how slow the UWM Q works.....

In any case, my essential point still holds. Anything w/in logical flags use_restartnc and use_historync is broken if PIO is not available, so carving out just sections around the specific calls to write_history or write_restart isn't sufficient imho. And I think hiding PIO for non-users could be done in other ways, short just introducing another set if ifdefs into the main WW3. But that solution would take a little more time to implement.

@JessicaMeixner-NOAA
Copy link
Collaborator Author

@DeniseWorthen if you can explain the other way to hide PIO please let me know and I'll start to implement it.

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Nov 21, 2024

I don't have a solution at this instant. I'm saying I think it could be done.

I'm also getting confused because it seems you're doing some things in some branches by adding an MPI switch, and other things in other branches adding a PIO switch.

I guess I'm not doing a good job of explaining....you have inside of a use_restartnc logical flag, just a carve out under PIO ifdef for calling read_restart for example.

#ifdef W3_PIO
            call read_restart(trim(fname), va=va, mapsta=mapsta, mapst2=mapst2)
#endif

But that doesn't make sense---nothing inside of the use_restartnc is usable w/o PIO. So why not just

#ifdef PIO
if (use_restartnc) then
.......
#endif

@JessicaMeixner-NOAA
Copy link
Collaborator Author

@DeniseWorthen Apologies for the confusion. My first effort was to see if just putting things behind W3_MPI siwtches in this branch: https://github.com/JessicaMeixner-NOAA/WW3/tree/bugfix/buildww3devufs however, I realized in the end it wasn't sufficient and I thought a cleaner way (and possibly more future proof), would be to add a new switch W3_MPI instead. So I abandoned that branch, and started a new approach here: https://github.com/JessicaMeixner-NOAA/WW3/tree/bug/addPIOswitch)

I was trying to put it in as small of place of possible while I was debugging an unrelated issue and forgot to circle back. I agree, it makes more sense to put it in the larger area - I'll make that change now.

If you think of a different solution - please let me know and I'm happy to dedicate time to implement it. I'll try to brainstorm other ways too.

@JessicaMeixner-NOAA
Copy link
Collaborator Author

@DeniseWorthen change in the if-defs is committed here: JessicaMeixner-NOAA/WW3@f528003

@DeniseWorthen
Copy link
Collaborator

In terms of thinking about 'a better way', exactly which codes needed for pre/post in G-W are in conflict w/ how things are currently implemented?

@JessicaMeixner-NOAA
Copy link
Collaborator Author

w3init and w3wave are the two codes that caused build errors.

@DeniseWorthen
Copy link
Collaborator

DeniseWorthen commented Nov 21, 2024

Understood, but which pre/post programs? I've been assuming these are 'ww3_X.F90' standalone programs which are built on top of w3 modules that are causing the issues.

@JessicaMeixner-NOAA
Copy link
Collaborator Author

@DeniseWorthen
Copy link
Collaborator

Ah, thanks. Looks very much like the createmoddefs script, which I have used before.

@DeniseWorthen
Copy link
Collaborator

@JessicaMeixner-NOAA I've had a re-think, and I'm pretty sure now that adding a PIO switch would not break the ability to choose netcdf vs binary at run-time for the UWM model itself.

@JessicaMeixner-NOAA
Copy link
Collaborator Author

As long as the PIO switch is there, you then have the runtime option of yes/no in the UWM. To run the NUOPC cap, we have to have the PIO switch included in the build.

I know we've had a history of not changing things, but if anyone comes up with a different solution to what I've implemented now -I really am happy to implement it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants