Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GFDL to main (2024-11-27) #1647

Open
wants to merge 125 commits into
base: main
Choose a base branch
from

Conversation

marshallward
Copy link
Collaborator

Celebrating American Thanksgiving with a new PR from GFDL 🦃. This patch includes new schemes to support eddy backscatter parameterization, tidal modeling, internal tide propagation, and shear mixing near shelves. It also includes many improvements and bugfixes to existing solvers.

Features and new capabilities

Updates with minor improvements or new diagnostics

Refactoring

Bugfixes (including self-consistency corrections)

Infrastructure

Contributors

kshedstrom and others added 30 commits June 2, 2024 05:55
* Spatially variable fields for MLE%Bodner

 - Allows reading in 2d fields for Cr and for MLD_decaying_Tfilt.

* Finish the job?

* Renamed one variable, fixed some units
* Added option to convert flux through static ice front into icebergs

Added variables for the accumulated iceberg mass and heat flux due to
calving from ice shelves (flux through the static ice front). These
will be passed to the coupler and SIS2/iceberg module to initialize
bergs. Also fixed the ice-shelf SMB override and reorganized
ice-shelf post data calls so that they do not strictly have to be
called at multiples of the ice velocity time step.

* Added ice-shelf scalar diagnostics

Added ice-shelf scalar diagnostics related to volume-above-floatation
and surface/basal mass balance. Had to modify ice-shelf diag mediator
to allow scalar diagnostics.

* Fixed units for volume-above-floatation and Cp_ice. Renamed volume-above-floatation variable from 'vab' to 'vaf'

* Fixed write_ice_shelf_energy call within subroutine solo_step_ice_shelf so that it is passing the correct arguments

* Fixed syntax of calving units

* Fixed units for ice shelf calving and scalar diagnostics
  Refactored the Idealized_Hurricane module to clean up strange or unscaled
variable units and eliminate dimensional scaling factors with the latest answer
dates.  This includes the introduction of 18 new runtime variables to replace
hard-coded dimensional constants and two runtime logical flags to reproduce
existing bugs.  An inconsistency in the sign convention for the distance from
the hurricane center with idealized_hurricane_wind_forcing in (the probably not
yet used) single column mode was also corrected. Also added descriptions with
units for all the variables in this module.  By default or with appropriate
parameter settings all answers are bitwise identical.
* Add rescaling paramter to KD Shear

Add a parameted (beta) to rescale the distance to the nearest solid
boundary that is used within calculation of kappa shear. While rescaling
this value is unphysical, this adjustment can be thought of as not
allowing shear instabilities to grow up to the full distance to the
nearest boundary because of other turbulent processes which would disrupt
their growth.

* Rename parameter and swtich to multiplication

Rename the parameter beta to lz_rescale to be more descriptive.
To avoid two divisions in a single line, the inverse of lz_rescale
is calculated and stored as I_lz_rescale_sqr. Where the calculation
is done logic has been added to check that lz_rescale is greater
than zero to avoid dividing by zero or making lz_rescale negative.

* Add Comment where lz_rescale is used

Based on Wei's suggestion, add a comment where lz_rescale is used
and put the multiplication by I_lz_rescale to the next line.

* Remove Trailing whitespace

---------

Co-authored-by: Theresa Morrison <[email protected]>
Co-authored-by: Theresa Morrison <[email protected]>
Co-authored-by: Theresa Morrison <[email protected]>
Co-authored-by: Theresa Morrison <[email protected]>
fix style errors

Modify axes_data to make it allocatable

fix style
  Added the new runtime parameters ROBUST_STOKES_PGF and LA_MISALIGNMENT_BUG to
allow for the selection of Stokes pressure gradient force calculations that work
properly in the limit of vanishingly thin layers and to correct a sign error in
the calculations of the misalignment between the waves and shears in the
Langmuir number calculations when LA_MISALIGNMENT is true.  By default, all
answers are bitwise identical, but as these options are not yet widely used it
might make sense to move aggressively to obsolete the previous code once more
extensive testing has take place.  There will be new entries in the
MOM_parameter_doc files for some cases, but there are not yet any such cases
in the MOM-examples regression suite.
-Account for re-entry boundary conditions and non-symmetric grids
-Optimized SSA CG scheme to eliminate unneeded loops and pass_var calls
-Added a missing pass_var call that should fix a reproducibility issue
 on different PE layouts
-Automatically adjust ice-shelf velocity data when it is read in from
 file so that it agrees with assigned BCs
-Changed some ice-shelf mass units that were affecting partially-fully
 cells at the ice front, so that they are always mass per ice-shelf
 area (not per grid area)
  This commit improves the documentation of the remaining real variables with
previously undocumented units in the framework directory, with some other
similar unit documentation improvements in other files.  Units were added to
comments describing about 79 real variables in 4 framework modules,
atmos_ocean_fluxes.F90 for 2 drivers, MOM_oda_incupd.F90, MOM_oda_driver.F90
and MOM_variables.F90.

  The variable nhours_incupd in initialize_oda_incupd was renamed to
incupd_timescale and the factor of 3600.0 that converts ODA_INCUPD_NHOURS
from hours into seconds was moved from where this variable is used into the
scale argument where it is set to help document the meaning and units of this
variable as simply and clearly as possible.

  The unused variable smb was removed from Shelf_main in ice_shelf_driver.F90.
The internal variable tmp in RGC_initialize_sponges was renamed rho to reflect
its contents, and its contents and units are now described in a comment.

  Although the units have been added to the description in MEKE_vec as though it
is being used in a dimensionally consistent way, I suspect that this might
actually be in MKS units, in which case a unit conversion factor might be needed,
and a comment has been added to note this.  However, the machine learning code
that is used to set this array comes from an external package that is not being
used yet at GFDL, so it is not clear what code should be examined to address
this question.

  A comment was added in set_up_global_tgrid noting a unit rescaling factor
that appears to be missing from a dimensional constant.

  All answers are bitwise identical, and for the most part only comments are
changed, although two internal variables were renamed.
  Corrected the sign convention used for the neutral_slope_x and neutral_slope_y
diagnostics in cases when there is not an equation of state being used to match
the usual sign convention (interfaces sloping up to the east or north is a
positive slope) and to match what is done when an equation of state is being
used to calculate neutral density slopes rather than just basing the slopes of
the interfaces alone.  The same correction was made to the slope_x and slope_y
arguments returned from calc_isoneutral_slopes, and self-consistent changes were
made to the overturning streamfunction calculations in thickness_diffuse_full.

  In addition, there were some corrections made to the documentation at the end
of MOM_thickness_diffuse.F90, and the calculate_slopes argument to
calc_slope_functions_using_just_e() that was always hard-coded to .true. was
eliminated to avoid any confusion about whether these changes might propagate
beyond the interface height diffusion calculations.  This commit addresses most
aspects of the issue discussed at github.com//issues/359, although
it does not change the sign convention for the overturning streamfunction.

  All solutions are bitwise identical, but there is a change in the sign
convention for two diagnostics and two arguments to a publicly visible interface
in pure layer mode when no equation of state is used.
  Added the new runtime parameter DETERMINE_TEMP_CONVERGENCE_BUG to avoid using
layout-dependent tests for temperature and salinity tolerances to determine when
to stop iterating in determine_temperature.  Also added logic that correctly
determines when iterations can be stopped because no further changes occur, and
rearranged to loops to avoid revisiting layers that have converged.

  Because the default tolerances for the temperature, salinity and density
changes are set to be the same and the partial derivatives of density with
temperature and salinity are less than 1 in the MKS units used here for a
realistic equation of state of seawater, this bug does not impact any of the
existing regression test cases, but this bug could lead to non-reproducibility
with non-default values.

  This commit changes the entries in the MOM_parameter_doc files that have
Z_INIT_ALE_REMAPPING = .false. and FIT_TO_TARGET_DENSITY_IC = .true., by adding
a new runtime parameter and by not logging DETERMINE_TEMP_T_TOLERANCE and
DETERMINE_TEMP_T_TOLERANCE if they are not used because
DETERMINE_TEMP_CONVERGENCE_BUG is false.  By default, all answers are bitwise
identical.
  Corrected the field being posted for the Kd_interface diagnostic inside of
diabatic_ALE() to be the minimum of Kd_heat and Kd_salt, which includes the
contributions from ePBL, but excluding any extra double-diffusive mixing, making
it analogous to the diagnostic that is currently being posted from
layered_diabatic() or diabatic_ALE_legacy().  Diabatic_ALE() is used when
USE_REGRIDDING=True and USE_LEGACY_DIABATIC_DRIVER=False, in which case a
diagnostic will change, correcting the issue noted at
/issues/398.  All solutions are bitwise identical.
  This commit adds the necessary information about the turns of the model grid
relative to the (unrotated) coupler_type data fields that are inside of the
forcing type and surface_state type and are used with passive tracers, so that
certain tracer packages can now be used with rotated grids.  The previous
version had problems where the model would not run when ROTATE_INTEX = True and
the CFCs (and probably other passive tracers) were being used, as noted at
/issues/621.  These problems have now been fixed.

  There are new calls to coupler_type_spawn() in allocate_forcing_by_ref() and
allocate_surface_state() that manage the creation of the coupler_2d_bt_types for
unrotated types based on information from their rotated counterparts.

  There is a new optional turns arguments to allocate_forcing_by_ref() and new
optional sfc_state_in and turns arguments to allocate_surface_state(), and these
are now being used in at least 6 places where unrotated temporary surface_state
or forcing types are being set up.

  There are also new optional turns argument to extract_coupler_type_data() and
set_coupler_type_data() that are used to deal with the fact that the internal
data arrays in the coupler_bc_types are never rotated, unlike the other MOM6
arrays, because they have to be passed directly into the generic tracer code.
These new turns arguments are used in 14 calls from various tracer packages,
including in 6 calls from the OCMIP2_CFC code.

  There are 4 new calls to deallocate_surface_state() or
deallocate_forcing_type() that were added to avoid (presumably minor) memory
leaks when grid rotation is enabled.  These new calls are in
initialize_ice_shelf_fluxes(), shelf_calc_flux() and in the surface flux
diagnostics section of step_MOM().

  All answers are bitwise identical in any cases that worked previously, but
some cases with grid rotation that previously were failing with cryptic error
messages are now running successfully.  There are new optional arguments to
several publicly visible routines.
The "eta has dropped below bathT" warning message has indices off by 1.
This is because G%isd_global = G%isd + HI%idg_offset and G%isd is
(most of the time) 1.

G%isd_global/G%jsd_global are now replaced by
G%HI%idg_offset/G%HI%jdg_offset.
Add an option to normalize wt_[uv] in the barotropic solver. This is
fix step 1 of 2 to address the issue that visc_rem is applied
incorrectly to the barotropic solver.

A runtime flag BAROTROPIC_VERTICAL_WEIGHT_FIX is added to control the
behavior of this commit. The current default is false (not applying the
fix).
Add an option to use `dt` instead of `dt_pred` in the calculation of
vertvisc_remnant() at the end of predictor stage. The change affects
the following `continuity`() call and `btstep`() call in corrector step.

Combined with the changes from the previous commit, the new options
should solve the issue of inconsistent barotropic velocities from
barotropic solver and baroclinic step.

A runtime flag VISC_REM_TIMESTEP_FIX is added to control the behavior of
 this commit. The current default is false (not applying the fix).
This commit is the third fix related to visc_rem in split mode. h_[uv]
from `zonal_flux_thickness`() and `meridional_flux_thickness`() is
currently multiplied by `visc_rem_[uv]`. This seems to be
double-counting as `visc_rem_[uv]` is accounted for in the barotropic
solver vertical weights. Moreover, for options other than BT_cont, the
velocity point thickness does not include visc_rem. The fix removes the
visc_rem factor from h_[uv] when BT_cont is used.

A runtime flag VISC_REM_CONT_HVEL_FIX is added to control the behavior
 of this commit. The current default is false (not applying the fix).

Other small changes:
* Write out explicitly all keyword arguments in `continuity`() calls
in MOM_dynamics_split_RK2.
* Rename a runtime parameter BAROTROPIC_VERTICAL_WEIGHT_FIX from a
previous commit to  VISC_REM_BT_WEIGHT_FIX, to be consistent with the
style of the other two flags.
* Add VISC_REM_TIMESTEP_FIX parameter to MOM_dynamics_split_RK2b
* Correct a bug using dimensional h_neglect in calculating
non-dimensional Iwt_[uv]_tot, by replacing the division with
an Adcroft_reciprocal style division.

* Add a parameter VISC_REM_BUG to control the defaults of all three
viscous remnant bug related parameters. The individual parameters should
 be removed in the future.
  Refactored PressureForce_FV_nonBouss and PressureForce_FV_Bouss to use more
3-dimensional arrays in preparation for the changes that we are expecting from
Claire Yung to reduce the pressure gradient errors in ice shelf cavities.  These
anticipated changes will involve selecting an internal interface at which to
define the shape of the interfaces with depth when there is an ice shelf, rather
than assuming that the top surface pressure varies linearly with depth between
the two top corners.  In PressureForce_FV_nonBouss, this refactoring includes
turning za, intx_za and inty_za into 3-d arrays and moving the code setting
these arrays outside of the k-loop setting the pressure gradient forces.
Changes in PressureForce_FV_Bouss are analogous but involve turning 7 arrays
(pa, dpa, intz_dpa, intx_pa, intx_dpa, inty_pa and inty_dpa) into 3-d arrays.
In both cases, the code for a reduced gravity model is consolidated into a
single block toward the end of the routines.  These changes to use more 3-d
arrays could increase the computational time for the pressure gradient
calculations, but in short regression tests any such increase in computational
time appears to be small.  No external interfaces are changed, and all answers
are bitwise identical.
  It was noted in /issues/491 that the call to
initialize_regridding() for diagnostics gives a segmentation fault when the
maximum depth of the ocean is greater than 9250 m unless DIAG_COORD_DEF_Z is
explicitly set.  This commit fixes this problem by (1) adding an extra layer to
the bottom of the hard-coded WOA09 list of diagnostic depths when the ocean is
deeper than the 9250 m maximum depth in that file, and (2) changing the default
behavior for diagnostic grids to be be uniform when the maximum ocean depth is
deeper than 9250 m, for which WOA09 is no longer likely to be a convenient
choice.  With this change, MOM6 no longer has segmentation faults for the
configuration described in issues/491.  All solutions are bitwise identical, but
some z-space diagnostic grids will be modified in cases that previously had
segmentation faults.
  Added the new runtime parameter BACKSCATTER_UNDERBOUND that can be set to
false to avoid a potential source of numerical instabilities from excessively
large biharmonic viscosities due to overly generous bounds where there are
negative Laplacian viscosities due to backscatter parameterizations.  Setting
this to true reproduces the previous solutions, while setting it to false treats
negative and zero Laplacian viscosities the same way when bounding the
biharmonic viscosity.  When the Laplacian viscosities are all non-negative, the
value of BACKSCATTER_UNDERBOUND makes no difference.  The default is true for
historical reasons, but this option probably should be set to false to avoid
certain numerical instabilities from the horizontal viscosities.  By default,
all answers are bitwise identical, but there are new entries in
MOM_parameter_doc files for cases that set both BETTER_BOUND_KH and
BETTER_BOUND_AH to true.
Add FRICTWORK_BUG parameter (default is true). If FRICTWORK_BUG is false, use the new way to compute FrictWork using the thickness flux. The previous version (realized by setting FRICTWORK_BUG as true) uses the velocity to compute FrictWork, which overestimates the total energy dissipation rate compared with the domain integral of KE_horvisc computed in MOM_diagnostics.
Bracket counter for <..> was correct but (..) was broken, it was
counting ( but not ).  I can only presume that such cases were very
rare.

This patch fixes the closed parenthesis count.
- Addressed compiler warnings about unused variables and
  argument intent.
- Note there is one dummy argument that was unused and the best way
  to address the warnings was to remove the argument, i.e. and API change.
  This was only in a debugging/utility function so does not impact the
  rest of the model.
- Address warnings about unused/uninitialized variables in MOM_remapping.F90
  - Sets values that appear to the compiler to be potentially unset, but
    always are in practice
- Added new driver, time_MOM_remapping, to config_src/timing_tests/
  that exercises the remapping_core_h() function with PCM and PLM.
- We should add other reconstruction schemes too, but the main reason for
  adding this now is to monitor the impact of refactoring the function
  remapping_core_h() which is mostly independent of the reconstruction
  schemes.
- Fixed .testing/Makefile to not fail with `make build.timing -j`
Added a simple driver for the remapping unit tests.
- Added a simple class within MOM_remapping.F90 to greatly abbreviate
  the lines comparing values of arrays.
- Translated the existing remapping unit tests to use the testing
  class.
- The first block of remap_via_sub_cells() computes the intersection between
  two columns (grids). I've split out this block into a stand alone function
  so that we can re-use it in a to-be-written analog of remap_via_sub_cells()
  that will remap integrated quantities.
- Added some in-line documentation of algorithm used by remap_via_sub_cells()
- Added new column-wise function intersect_src_tgt_grids()
- Added tests of intersect_src_tgt_grids() that check against known results
  - Had to add a new helper function to MOM_remapping to report state of
    integer arrays.
- This second phase of remap_via_sub_cells() maps the values from the source
  grid to the sub-grid using the pre-calculated arrays/indices from
  intersect_src_tgt_grids().
  This phase as-is is only used for remapping of intensive variables but
  it does implicitly calculate intermediate extensive quantities which are
  analogous to what we need when remapping momentum. Splitting this phase
  out primarily allows us to test this bit of code and it has indeed revealed
  an edge case that fails.
- Added new column-wise function remap_src_to_sub_grid()
- Added tests of remap_src_to_sub_grid() that check against known results
  - One test is commented out corresponding to an edge case that reveals
    the current code is wrong. Will enable this test after checking how
    many experiments are impacted but the associated bug fix.
Copy link
Collaborator

@kshedstrom kshedstrom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve this PR.

@mathomp4 mathomp4 self-requested a review December 3, 2024 16:29
Copy link
Collaborator

@mathomp4 mathomp4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be zero-diff for GEOS. Though quite possible I have no idea how to turn any of this on...but I'll leave that to scientists. 😄

Approve.

Copy link
Collaborator

@abozec abozec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COAPS approves.

@jiandewang
Copy link
Collaborator

@marshallward: two issues being deteced:
(1) answer changed on wcoss2 for 1x1 degree ocean non-debug setting. 0.25, 0.5 and 5x5 ocean settings are not being impacted. 1x1 debug setting is fine too. So it must be FMA issue related to a particular parameter being on (or off) in 1x1 setting.
See /gpfs/f5/nggps_emc/scratch/Jiande.Wang/For-Marshall/PR1647/wcoss2 for a pair of run log

As usual, give me some time and I will narrow down the issue for you. Unfortunatly wcoss2 will be down for the upcoming week so please wait a bit.

(2) we recently add some intel llvm test cases but this PR failed to match current llvm case's answers on GAEA C5. I will have some information for you to compare in 1-2 days.

@marshallward
Copy link
Collaborator Author

@jiandewang

  1. I will reactivate my WCOSS access just in case. A week buffer would also be helpful for us, since we're working through our own tight deadline. Hopefully this is just another FMA repro issue.

  2. I am not sure how we want to address reproducibility with ifx. It may not be possible to get ifort-ifx equivalence. We are not yet able to run our own tests on Gaea due to bugs in ifx 2023. But as you probably know, ifort will dropped in 2025 so perhaps now is the time to start making some decisions.

    Intel has taken some interest in our transition, so perhaps we can able to receive some necessary support on this issue.

@sanAkel
Copy link
Collaborator

sanAkel commented Dec 9, 2024

@marshallward, @jiandewang can you please assign me as a reviewer?

@marshallward. I would like to review those PRs that have lot of science changes. 🙏

@jiandewang jiandewang requested a review from sanAkel December 9, 2024 16:21
@jiandewang
Copy link
Collaborator

@marshallward for llvm we are not expecting ifort vs ifx to have identical answers. The issue I mentioed here is current main branch vs this PR codes generated different answers with ifx.

@@ -0,0 +1,103 @@
program time_MOM_remapping
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this ( unit ) test! Thank you for adding it.

@jiandewang
Copy link
Collaborator

@marshallward : with many many testing, I have strong reason to believe that NOAA-GFDL#706 is the one that changed the answer for UFS 1x1 non-debug job on wcoss2.

While the first several dozen of commits failed to let me finish UFS runs (which prevent me from doing any debug work),
f1ba822 Move MLD Diagnostics out of MOM_diabatic_aux.F90 is the first commit that allow me to finish the run although it failed to match UFS baseline.

Then at 91eee52 Inline harmonic analysis, it matched our baseline, and this "matching baseline" continues up to ba59078 Separate scalar.

But at df2cd12 EBT Backscatter it failed to match baseline again, and results from this hash are the same from the results from the run using the last commit of this PR (2b72682 *+HOR_VISC_ANSWER_DATE logic fix).

This is a super tricky PR I ever had.

@alperaltuntas
Copy link
Collaborator

NCAR regression tests are also failing. Setting USE_MEKE to False fixes the baseline failures, so this likely confirms @jiandewang's guess that NOAA-GFDL#706 may be the culprit.

@marshallward
Copy link
Collaborator Author

marshallward commented Dec 13, 2024

I've looked over NOAA-GFDL#706 with @Wendazhang33 and we identified at least one section which could have been affected by FMAs. The PR does touch several fields in MEKE. Although we tried to be careful and ensure reproducibility, we could have missed something.

@alperaltuntas Can you confirm that you have FMAs disabled? Can you post (or send me) your MEKE related parameters?
@jiandewang can you confirm that NOAA-GFDL#706 only causes an issue for you on WCOSS? My understanding is that there are no changes on the other machines.

@jiandewang
Copy link
Collaborator

jiandewang commented Dec 13, 2024

@marshallward you ar right, NOAA-GFDL#706 only changed answer for 1x1 ocean setting for non-debug job on wcoss2. Same setting compiled with debug mode is not being impacted. So it must be a FMA issue.

Note we also have 0.25x0.25 ocean setting but this PR didn't break our regression test for it. USE_MEKE are True for both 1x1 and 0.25 cases.

@jiandewang
Copy link
Collaborator

@marshallward: about ifx failed to retain UFS baseline, I just had a talk with Dusan (who added these test cases in UFS), ifx we are using on GAEA version is 2023.02, FMS version is 2024.01. He believes there are some issues for ifx on GAEA at this moment and the ifx test cases he added in UFS should be considered at experimental level. I will try to have a talk with EMC EIB group people to see if they agree for us to bypass these tests in this MOM6 PR.

@alperaltuntas
Copy link
Collaborator

@marshallward, yes we don't enable FMAs. Below are our MEKE related parameters:

USE_MEKE = True
MEKE_CB = 100.0
MEKE_MIN_GAMMA2 = 1.0E-05
MEKE_GMCOEFF = 1.0
MEKE_GEOMETRIC = True
MEKE_GEOMETRIC_ALPHA = 0.09
MEKE_EQUILIBRIUM_ALT = True
MEKE_FRCOEFF = 0.3
MEKE_POSITIVE = True
MEKE_VISC_DRAG = False
MEKE_KHTH_FAC = 1.0
MEKE_KHTR_FAC = 1.0
MEKE_KHMEKE_FAC = 0.75
MEKE_MIN_LSCALE = True
MEKE_ALPHA_DEFORM = 1.0
MEKE_ALPHA_RHINES = 1.0
MEKE_ALPHA_EADY = 1.0
MEKE_ALPHA_FRICT = 1.0
MEKE_ALPHA_GRID = 1.0
MEKE_ADVECTION_FACTOR = 1.0
MEKE_CDRAG = 0.0025

@marshallward
Copy link
Collaborator Author

@jiandewang I was able to produce a difference in NOAA-GFDL#706 when FMAs are enabled. This patch restored my answers:

diff --git a/src/parameterizations/lateral/MOM_MEKE.F90 b/src/parameterizations/lateral/MOM_MEKE.F90
index 9ebe8ae73..88bd5ec09 100644
--- a/src/parameterizations/lateral/MOM_MEKE.F90
+++ b/src/parameterizations/lateral/MOM_MEKE.F90
@@ -440,6 +440,9 @@ subroutine step_forward_MEKE(MEKE, h, SN_u, SN_v, visc, dt, G, GV, US, CS, hu, h
         !$OMP parallel do default(shared)
         do j=js,je ; do i=is,ie
           src(i,j) = src(i,j) - CS%MEKE_GMcoeff*I_mass(i,j)*MEKE%GM_src(i,j)
+        enddo ; enddo
+
+        do j=js,je ; do i=is,ie
           src_GM(i,j) = -CS%MEKE_GMcoeff*I_mass(i,j)*MEKE%GM_src(i,j)
         enddo ; enddo
       endif

Presumably the introduction of src_GM and reuse of its RHS prevented the FMA usage in src. This patch moves src_GM to a separate loop and (again, presumably) preserves the FMA in src.

When WCOSS is back up, can you test this?

@jiandewang
Copy link
Collaborator

wcoss is still available at this moment but will be down from coming Monday. Just prepare a patch branch and I will test it

@awallcraft
Copy link
Collaborator

The following would be more efficient and presumably has the same effect unless the compiler will re-merge these loops.

    !$OMP parallel do default(shared)
    do j=js,je  
      do i=is,ie
        src(i,j) = src(i,j) - CS%MEKE_GMcoeff*I_mass(i,j)*MEKE%GM_src(i,j)
      enddo
      do i=is,ie
         src_GM(i,j) = -CS%MEKE_GMcoeff*I_mass(i,j)*MEKE%GM_src(i,j)
      enddo
    enddo

In the original the parallel do won't apply to the 2nd loop and this version may fit is cache better.

@marshallward
Copy link
Collaborator Author

@awallcraft It's possibly faster, but oscillating between writes to src and src_GM could also reduce performance. Perhaps worth testing?

@marshallward
Copy link
Collaborator Author

@alperaltuntas Is FRICTWORK_BUG set in your runs?

@alperaltuntas
Copy link
Collaborator

FRICTWORK_BUG

It's set to True.

@marshallward
Copy link
Collaborator Author

@alperaltuntas We identified a problem when MEKE_FRCOEFF is set but MEKE_BHCOEFF is unset. The default value of -1 may have been applied to part of MEKE%mom_src.

There is a patch which fixes most of these issues. If you could try it out then that would be helpful.

https://github.com/marshallward/MOM6/tree/ebt_patch

@jiandewang
Copy link
Collaborator

@jiandewang I was able to produce a difference in NOAA-GFDL#706 when FMAs are enabled. This patch restored my answers:

diff --git a/src/parameterizations/lateral/MOM_MEKE.F90 b/src/parameterizations/lateral/MOM_MEKE.F90
index 9ebe8ae73..88bd5ec09 100644
--- a/src/parameterizations/lateral/MOM_MEKE.F90
+++ b/src/parameterizations/lateral/MOM_MEKE.F90
@@ -440,6 +440,9 @@ subroutine step_forward_MEKE(MEKE, h, SN_u, SN_v, visc, dt, G, GV, US, CS, hu, h
         !$OMP parallel do default(shared)
         do j=js,je ; do i=is,ie
           src(i,j) = src(i,j) - CS%MEKE_GMcoeff*I_mass(i,j)*MEKE%GM_src(i,j)
+        enddo ; enddo
+
+        do j=js,je ; do i=is,ie
           src_GM(i,j) = -CS%MEKE_GMcoeff*I_mass(i,j)*MEKE%GM_src(i,j)
         enddo ; enddo
       endif

Presumably the introduction of src_GM and reuse of its RHS prevented the FMA usage in src. This patch moves src_GM to a separate loop and (again, presumably) preserves the FMA in src.

When WCOSS is back up, can you test this?

@marshallward using the above modification in the code make our regression passed

@jiandewang
Copy link
Collaborator

for ifx test cases: based on discussion with EMC EIB group @junwang-noaa , we will create new baseline for the ifx cases for this PR. In the meantime we hope GFDL to speed up the starting of ifx testing in MOM6

@alperaltuntas
Copy link
Collaborator

@alperaltuntas We identified a problem when MEKE_FRCOEFF is set but MEKE_BHCOEFF is unset. The default value of -1 may have been applied to part of MEKE%mom_src.

There is a patch which fixes most of these issues. If you could try it out then that would be helpful.

https://github.com/marshallward/MOM6/tree/ebt_patch

@marshallward this patch seems to fix the baseline comparison failures (at least for a subset of our tests), but I ran into another unrelated issue, as mentioned in my upcoming comment.

@@ -2053,7 +2201,7 @@ subroutine set_diffusivity_init(Time, G, GV, US, param_file, diag, CS, int_tide_
type(diag_ctrl), target, intent(inout) :: diag !< A structure used to regulate diagnostic output.
type(set_diffusivity_CS), pointer :: CS !< pointer set to point to the module control
!! structure.
type(int_tide_CS), pointer :: int_tide_CSp !< Internal tide control structure
type(int_tide_CS), intent(in), target :: int_tide_CSp !< Internal tide control structure
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change causes the following error in one of our intel debug tests:

forrtl: severe (408): fort: (7): Attempt to use pointer INT_TIDE_CSP when it is not associated with a target
          
 mom_diabatic_driv        3523  MOM_diabatic_driver.F90

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got confirmation from @raphaeldussin that we can revert this back to pointer.

@marshallward
Copy link
Collaborator Author

marshallward commented Dec 16, 2024

NOAA-GFDL#706 replaced the expression for mom_src when MEKE_BACKSCAT_RO_C was enabled, which would most likely cause answer changes. This expression will need to be restored, and a similar expression for mom_src_bh will need to be restored from the original candidate PR.

@marshallward
Copy link
Collaborator Author

for ifx test cases: based on discussion with EMC EIB group @junwang-noaa , we will create new baseline for the ifx cases for this PR. In the meantime we hope GFDL to speed up the starting of ifx testing in MOM6

@jiandewang I understand the need to transition from ifort to ifx, since ifort support is ending this year, but GFDL does not have a production machine with ifx installed. And we won't have ifx until HPE can provide this for us, which could take a lot longer than any of us would like.

Each node is welcome to set its own regression rules, but GFDL can't guarantee that ifx will preserve answers until we are provided with the compilers and have enough time to understand how they work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.