Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

D-grid data input method #42

Merged
merged 38 commits into from
Jan 30, 2024
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
e70894d
Testing changes reflected across branches
Oct 10, 2023
2a8b97a
Undoing changes made in build_gaea_c5.sh
Oct 10, 2023
e2cdd5c
Testing vscode functionality, by adding a change to external_grid branch
fmalatino Oct 10, 2023
b4d9e5d
Testing vscode functionality, by adding a change to external_grid branch
fmalatino Oct 10, 2023
7038e30
After sync from PR 31 fixing GFDL constant names
fmalatino Oct 16, 2023
5b0991d
Addition of from_generated method and calc_flag to util/pace/util/gri…
fmalatino Oct 19, 2023
9001af1
Added get_grid method for external grid data to driver/pace/driver/gr…
fmalatino Oct 19, 2023
65c0e14
Preliminary xarray netcdf read in method added to driver/pace/driver/…
fmalatino Oct 19, 2023
dedd711
Updating util/pace/util/grid/generation.py from_generated method
fmalatino Oct 26, 2023
21e76c6
Merged PR 32
fmalatino Oct 26, 2023
17e281b
Addition of external grid data read in methods for initialization of …
fmalatino Oct 31, 2023
3a627aa
driver/examples/configs/test_external_C12_1x1.yaml
fmalatino Nov 7, 2023
f21774f
Merge branch 'main' into external_grid
fmalatino Nov 7, 2023
396025c
Preliminary unit test for external grid data read in
fmalatino Nov 7, 2023
a63be71
Current state of unit tests as of 27 Nov 2023
fmalatino Nov 28, 2023
d95e1fd
External grid method and unit tests added
fmalatino Dec 4, 2023
99b47ed
Merge branch 'external_grid' of github.com:fmalatino/pace into extern…
fmalatino Dec 8, 2023
a536d95
Re-excluding external grid data yamls from test_example_configs.py
fmalatino Dec 8, 2023
6b9ff5e
Update driver/pace/driver/grid.py
fmalatino Dec 9, 2023
8e0b829
Changed name of grid initializer function to match NetCDF dependency …
fmalatino Dec 11, 2023
caa2d43
Update util/pace/util/grid/generation.py
fmalatino Dec 11, 2023
6690e68
Fixed indentation error in generation.py from suggestion in PR 42
fmalatino Dec 12, 2023
ada497b
Removal of TODO comment in grid.py, changes to method of file accessi…
fmalatino Dec 12, 2023
6c1acfb
Changed grid data read-in unit tests to compare data directly from fi…
fmalatino Dec 12, 2023
d48f2ba
Change to reading in lon and lat, other metric terms calculated as ne…
fmalatino Dec 12, 2023
136db1a
Removed read-in of dx, dy, and area. Changed unit tests to compare ca…
fmalatino Dec 14, 2023
0be3439
Update tests/mpi_54rank/test_external_grid_1x1.py
fmalatino Dec 14, 2023
2d8eb41
Added comparisons for read-in vs generated by driver lon, lat, dx, dy…
fmalatino Dec 14, 2023
b7f379f
Added relative error calculations to unit tests for external grid dat…
fmalatino Dec 18, 2023
264549a
External grid data read in tests changed: relative errors printed by …
fmalatino Jan 4, 2024
75a3e1d
Removing commented out sections in test_external_grid_2x2.py
fmalatino Jan 24, 2024
f0a4a4d
Updated external grid data read-in to take configuration and input da…
fmalatino Jan 25, 2024
20ffd3f
Merging recent changes of 'main' into 'external_grid'
fmalatino Jan 25, 2024
1b7ca1a
Updated documentation in grid.py
fmalatino Jan 25, 2024
39467a9
Updated external grid data read in unit test to use parametrize funct…
fmalatino Jan 26, 2024
f6f1bb5
Merged changes from PR 36
fmalatino Jan 30, 2024
f37c4fd
Ammended files to reference changes to PR 36
fmalatino Jan 30, 2024
772e474
Merge branch 'main' into external_grid after PR 44 merge
fmalatino Jan 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CONTRIBUTORS.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ List format (alphabetical order): Surname, Name. Employer/Affiliation
* George, Rhea. Allen Institute for AI.
* Harris, Lucas. GFDL.
* Kung, Chris. NASA.
* Malatino, Frank. GFDL
fmalatino marked this conversation as resolved.
Show resolved Hide resolved
* McGibbon, Jeremy. Allen Institute for AI.
* Niedermayr, Yannick. ETH.
* Savarin, Ajda. University of Washington.
Expand Down
102 changes: 102 additions & 0 deletions driver/examples/configs/test_external_C12_1x1.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
stencil_config:
compilation_config:
backend: numpy
rebuild: false
validate_args: true
format_source: false
device_sync: false
grid_config:
type: external
config:
grid_type: 0
grid_file_path: "../../test_input/C12.tile"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This not part of the PR. This wasn't picked up by the tests because the 54 ranks tests are not running right now!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there way I can include them? Should we be running the 54 rank tests?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How big are the files? Technically I think it should be located at driver/examples/data so that the example folder is indeed self standing.

We should be running 54 ranks but I haven't look at if we can do that on the Github free runners without it taking 8 million years. Someone has to go ahead and make a github action and test it out

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can start a plan to set up a CI which uses cloud resources from the NOAA cloud resources provider. We've done this successfully for SHiELD, FMS, and other GFDL repos. The key is to make sure we have a container with the needed bits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FlorianDeconinck What do you think about including the tile data file in the PR? I know we don't love having test data in the repo but this one is small and if we want the tests run as part of CI...

initialization:
type: analytic
config:
case: baroclinic
performance_config:
collect_performance: true
experiment_name: c12_baroclinic
nx_tile: 12
nz: 79
dt_atmos: 225
minutes: 15
layout:
- 1
- 1
diagnostics_config:
path: output
output_format: netcdf
names:
- u
- v
- ua
- va
- pt
- delp
- qvapor
- qliquid
- qice
- qrain
- qsnow
- qgraupel
z_select:
- level: 65
names:
- pt
dycore_config:
a_imp: 1.0
beta: 0.
consv_te: 0.
d2_bg: 0.
d2_bg_k1: 0.2
d2_bg_k2: 0.1
d4_bg: 0.15
d_con: 1.0
d_ext: 0.0
dddmp: 0.5
delt_max: 0.002
do_sat_adj: true
do_vort_damp: true
fill: true
hord_dp: 6
hord_mt: 6
hord_tm: 6
hord_tr: 8
hord_vt: 6
hydrostatic: false
k_split: 1
ke_bg: 0.
kord_mt: 9
kord_tm: -9
kord_tr: 9
kord_wz: 9
n_split: 1
nord: 3
nwat: 6
p_fac: 0.05
rf_cutoff: 3000.
rf_fast: true
tau: 10.
vtdm4: 0.06
z_tracer: true
do_qa: true
tau_i2s: 1000.
tau_g2v: 1200.
ql_gen: 0.001
ql_mlt: 0.002
qs_mlt: 0.000001
qi_lim: 1.0
dw_ocean: 0.1
dw_land: 0.15
icloud_f: 0
tau_l2v: 300.
tau_v2l: 90.
fv_sg_adj: 0
n_sponge: 48
u_max: 355.0

physics_config:
hydrostatic: false
nwat: 6
do_qa: true
102 changes: 102 additions & 0 deletions driver/examples/configs/test_external_C12_2x2.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
stencil_config:
compilation_config:
backend: numpy
rebuild: false
validate_args: true
format_source: false
device_sync: false
grid_config:
type: external
config:
grid_type: 0
grid_file_path: "../../test_input/C12.tile"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are small files, they should be okay. Ideally we'd store them in the container (mentioned above) or generate them on the fly to test the grid creation and ingest as a group.

initialization:
type: analytic
config:
case: baroclinic
performance_config:
collect_performance: true
experiment_name: c12_baroclinic
nx_tile: 12
nz: 79
dt_atmos: 225
minutes: 15
layout:
- 2
- 2
diagnostics_config:
path: output
output_format: netcdf
names:
- u
- v
- ua
- va
- pt
- delp
- qvapor
- qliquid
- qice
- qrain
- qsnow
- qgraupel
z_select:
- level: 65
names:
- pt
dycore_config:
a_imp: 1.0
beta: 0.
consv_te: 0.
d2_bg: 0.
d2_bg_k1: 0.2
d2_bg_k2: 0.1
d4_bg: 0.15
d_con: 1.0
d_ext: 0.0
dddmp: 0.5
delt_max: 0.002
do_sat_adj: true
do_vort_damp: true
fill: true
hord_dp: 6
hord_mt: 6
hord_tm: 6
hord_tr: 8
hord_vt: 6
hydrostatic: false
k_split: 1
ke_bg: 0.
kord_mt: 9
kord_tm: -9
kord_tr: 9
kord_wz: 9
n_split: 1
nord: 3
nwat: 6
p_fac: 0.05
rf_cutoff: 3000.
rf_fast: true
tau: 10.
vtdm4: 0.06
z_tracer: true
do_qa: true
tau_i2s: 1000.
tau_g2v: 1200.
ql_gen: 0.001
ql_mlt: 0.002
qs_mlt: 0.000001
qi_lim: 1.0
dw_ocean: 0.1
dw_land: 0.15
icloud_f: 0
tau_l2v: 300.
tau_v2l: 90.
fv_sg_adj: 0
n_sponge: 48
u_max: 355.0

physics_config:
hydrostatic: false
nwat: 6
do_qa: true
110 changes: 109 additions & 1 deletion driver/pace/driver/grid.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,13 @@
from typing import ClassVar, Optional, Tuple

import f90nml
import xarray as xr

import pace.driver
import pace.dsl
import pace.physics
import pace.stencils
import pace.util
import pace.util.grid
from pace.stencils.testing import TranslateGrid
from pace.util import Communicator, QuantityFactory
Expand Down Expand Up @@ -65,7 +67,8 @@ def get_grid(
communicator: Communicator,
) -> Tuple[DampingCoefficients, DriverGridData, GridData]:
return self.config.get_grid(
quantity_factory=quantity_factory, communicator=communicator
quantity_factory=quantity_factory,
communicator=communicator,
)

@classmethod
Expand Down Expand Up @@ -196,6 +199,111 @@ def get_grid(
return damping_coefficients, driver_grid_data, grid_data


@GridInitializerSelector.register("external")
@dataclasses.dataclass
fmalatino marked this conversation as resolved.
Show resolved Hide resolved
class ExternalGridConfig(GridInitializer):
fmalatino marked this conversation as resolved.
Show resolved Hide resolved
"""
Configuration for grid initialized from external data.
Input is from tile files generated by FRE-NCtools methods
bensonr marked this conversation as resolved.
Show resolved Hide resolved
In its get_grid method it calls the MetricTerms class method
fmalatino marked this conversation as resolved.
Show resolved Hide resolved
from_generated which generates an object of MetricTerms to be
used to generate the damping_coefficients, driver_grid_data,
and grid_data variables
"""

grid_type: Optional[int] = 0
grid_file_path: str = "/input/tilefile"

# TODO: Area read in?
fmalatino marked this conversation as resolved.
Show resolved Hide resolved

def get_grid(
self,
quantity_factory: QuantityFactory,
communicator: Communicator,
) -> Tuple[DampingCoefficients, DriverGridData, GridData]:

pace_log.info("Using external grid data")

# ToDo: refactor when grid_type is an enum
if self.grid_type <= 3:
fmalatino marked this conversation as resolved.
Show resolved Hide resolved
tile_num = (
pace.util.get_tile_index(
communicator.rank, communicator.partitioner.total_ranks
)
+ 1
)
tile_file = self.grid_file_path + str(tile_num) + ".nc"
else:
tile_file = self.grid_file_path

ds = xr.open_dataset(tile_file)
lon = ds.x.values
lat = ds.y.values
dx = ds.dx.values
dy = ds.dy.values
area = ds.area.values
nx = ds.nx.values.size
ny = ds.ny.values.size
npx = ds.nxp.values.size
npy = ds.nyp.values.size

subtile_slice_grid = communicator.partitioner.tile.subtile_slice(
rank=communicator.rank,
global_dims=[pace.util.X_INTERFACE_DIM, pace.util.Y_INTERFACE_DIM],
global_extent=(npx, npy),
overlap=True,
)

subtile_slice_dx = communicator.partitioner.tile.subtile_slice(
rank=communicator.rank,
global_dims=[pace.util.Y_INTERFACE_DIM, pace.util.X_DIM],
global_extent=(npy, nx),
overlap=True,
)

subtile_slice_dy = communicator.partitioner.tile.subtile_slice(
rank=communicator.rank,
global_dims=[pace.util.Y_DIM, pace.util.X_INTERFACE_DIM],
global_extent=(ny, npx),
overlap=True,
)

subtile_slice_area = communicator.partitioner.tile.subtile_slice(
rank=communicator.rank,
global_dims=[pace.util.Y_DIM, pace.util.X_DIM],
global_extent=(ny, nx),
overlap=True,
)

metric_terms = MetricTerms.from_external(
x=lon[subtile_slice_grid],
y=lat[subtile_slice_grid],
dx=dx[subtile_slice_dx],
dy=dy[subtile_slice_dy],
area=area[subtile_slice_area],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so we are using the read-in area too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I figured since it was there to read it in as well, unless this is ill-advised

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's fine, just double-checking 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, the fortran version merely read in the lat-lon pairs and calculated all other metrics. It would be valuable to know how the values in the files compare to those calculated by the model code from the lat-lon pairs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could switch so that Pace behaves the same way and use dx, dy, and area for the tests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did discuss with Lucas whether we should carry the metrics in memory or recompute in the few instances where they are necessary. I know many of the metrics terms are for exclusively for remapping the winds amongst the various representations of A-, C-, and D-grid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just reviewed the Fortran code and here's what I got for a few of the cases we care about...

cartesian grid

  • grid_type = 4
  • internally created

orthogonal grid

  • grid_type = 5
  • read in x, y, dx, dy, area

CS grid

  • grid_type < 0
  • read in x, y
  • calculate dx, dy, area

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many of the metrics are infrequently used or in specific situations that are not always necessary. I think it would be best for some architectures (especially GPUs) to compute on-the-fly, although for traditional CPUs it may still be best to compute metrics once and save them in memory.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My 2 cent on an advanced system for this kind of "infrequent-but-vital" field.

Removing memory pressure on the GPU will quickly become a subject. Since the DSL generates from IR, in all backends, we can technically inject code. Therefore, we could have a system that registers this kind of calculation and at JIT compilation time, when we known about the size of the domain and can do memory pressure projection (which already exists in DaCe backend), we could decide to flip the read from those field by the analytics.

Of course this has a few caveats, especially that it can be prohibitive if the analytic solution calls on other fields that themselves can be inlined.

Ultimately, this kind of decision (save or recompute) should be taken of the hands of the modelers and push down the stack.

quantity_factory=quantity_factory,
communicator=communicator,
grid_type=self.grid_type,
extdgrid=True,
)

horizontal_data = HorizontalGridData.new_from_metric_terms(metric_terms)
vertical_data = VerticalGridData.new_from_metric_terms(metric_terms)
contravariant_data = ContravariantGridData.new_from_metric_terms(metric_terms)
angle_data = AngleGridData.new_from_metric_terms(metric_terms)
grid_data = GridData(
horizontal_data=horizontal_data,
vertical_data=vertical_data,
contravariant_data=contravariant_data,
angle_data=angle_data,
)

damping_coefficients = DampingCoefficients.new_from_metric_terms(metric_terms)
driver_grid_data = DriverGridData.new_from_metric_terms(metric_terms)

return damping_coefficients, driver_grid_data, grid_data


def _transform_horizontal_grid(
metric_terms: MetricTerms,
stretch_factor: float,
Expand Down
2 changes: 2 additions & 0 deletions tests/main/driver/test_example_configs.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@
"baroclinic_c12_orch_cpu.yaml",
"tropical_read_restart_fortran.yml",
"tropicalcyclone_c128.yaml",
"test_external_C12_1x1.yaml",
"test_external_C12_2x2.yaml",
]

JENKINS_CONFIGS_DIR = os.path.join(dirname, "../../../.jenkins/driver_configs/")
Expand Down
Loading