Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate interior state and boundary forcing to only predict state #84

Closed
wants to merge 92 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
92 commits
Select commit Hold shift + click to select a range
5df1bff
add datastore_boundary to neural_lam
sadamov Nov 18, 2024
46590ef
complete integration of boundary in weatherDataset
sadamov Nov 18, 2024
b990f49
Add test to check timestep length and spacing
sadamov Nov 18, 2024
3fd1d6b
setting default mdp boundary to 0 gridcells
sadamov Nov 18, 2024
1f2499c
implement time-based slicing
sadamov Nov 18, 2024
1af1481
remove all interior_mask and boundary_mask
sadamov Nov 19, 2024
d545cb7
added gcsfs dependency for era5 weatherbench download
sadamov Nov 19, 2024
5c1a7d7
added new era5 datastore config for boundary
sadamov Nov 19, 2024
30e4f05
removed left-over boundary-mask references
sadamov Nov 19, 2024
6a8c593
make check for existing category in datastore more flexible (for boun…
sadamov Nov 19, 2024
17c920d
implement xarray based (mostly) time slicing and windowing
sadamov Nov 20, 2024
7919995
cleanup analysis based time-slicing
sadamov Nov 21, 2024
9bafcee
implement datastore_boundary in existing tests
sadamov Nov 19, 2024
ce06bbc
allow for grid shape retrieval from forcing data
sadamov Nov 21, 2024
884b5c6
rearrange time slicing, boundary first
sadamov Nov 21, 2024
5904cbe
identified issue, cleanup next
leifdenby Nov 25, 2024
efe0302
use xarray plot only
leifdenby Nov 26, 2024
a489c2e
don't reraise
leifdenby Nov 26, 2024
242d08b
remove debug plot
leifdenby Nov 26, 2024
c1f706c
remove extent calc used in diagnosing issue
leifdenby Nov 26, 2024
cf8e3e4
add type annotation
leifdenby Nov 29, 2024
85160ce
ensure tensor copy to cpu mem before data-array creation
leifdenby Nov 29, 2024
52c4528
apply time-indexing to support ar_steps_val > 1
leifdenby Nov 29, 2024
b96d8eb
renaming test datastores
sadamov Nov 30, 2024
72da25f
adding num_past/future_boundary_step args
sadamov Nov 30, 2024
244f1cc
using combined config file
sadamov Nov 30, 2024
a9cc36e
proper handling of state/forcing/boundary in dataset
sadamov Nov 30, 2024
dcc0b46
datastore_boundars=None introduced
sadamov Nov 30, 2024
a3b3bde
bug fix for file retrieval per member
sadamov Nov 30, 2024
3ffc413
rename datastore for tests
sadamov Nov 30, 2024
85aad66
aligned time with danra for easier boundary testing
sadamov Nov 30, 2024
64f057f
Fixed test for temporal embedding
sadamov Nov 30, 2024
6205dbd
pin dataclass-wizard <0.31.0 to avoid bug in dataclass-wizard
leifdenby Dec 2, 2024
551cd26
allow boundary as input to ar_model.common_step
sadamov Dec 2, 2024
fc95350
linting
sadamov Dec 2, 2024
01fa807
improved docstrings and added some assertions
sadamov Dec 2, 2024
5a749f3
update mdp dependency
sadamov Dec 2, 2024
45ba607
remove boundary datastore from tests that don't need it
sadamov Dec 2, 2024
f36f360
fix scope of _get_slice_time
sadamov Dec 2, 2024
105108e
fix scope of _get_time_step
sadamov Dec 2, 2024
d760145
Merge branch 'feat/boundary_dataloader' of https://github.com/sadamov…
sadamov Dec 2, 2024
ae0cf76
added information about optional boundary datastore
sadamov Dec 2, 2024
9af27e0
add datastore_boundary to neural_lam
sadamov Nov 18, 2024
c25fb30
complete integration of boundary in weatherDataset
sadamov Nov 18, 2024
505ceeb
Add test to check timestep length and spacing
sadamov Nov 18, 2024
e733066
setting default mdp boundary to 0 gridcells
sadamov Nov 18, 2024
d8349a4
implement time-based slicing
sadamov Nov 18, 2024
fd791bf
remove all interior_mask and boundary_mask
sadamov Nov 19, 2024
ae82cdb
added gcsfs dependency for era5 weatherbench download
sadamov Nov 19, 2024
34a6cc7
added new era5 datastore config for boundary
sadamov Nov 19, 2024
2dc67a0
removed left-over boundary-mask references
sadamov Nov 19, 2024
9f8628e
make check for existing category in datastore more flexible (for boun…
sadamov Nov 19, 2024
388c79d
implement xarray based (mostly) time slicing and windowing
sadamov Nov 20, 2024
2529969
cleanup analysis based time-slicing
sadamov Nov 21, 2024
179a035
implement datastore_boundary in existing tests
sadamov Nov 19, 2024
2daeb16
allow for grid shape retrieval from forcing data
sadamov Nov 21, 2024
cbcdcae
rearrange time slicing, boundary first
sadamov Nov 21, 2024
e6ace27
renaming test datastores
sadamov Nov 30, 2024
42818f0
adding num_past/future_boundary_step args
sadamov Nov 30, 2024
0103b6e
using combined config file
sadamov Nov 30, 2024
0896344
proper handling of state/forcing/boundary in dataset
sadamov Nov 30, 2024
355423c
datastore_boundars=None introduced
sadamov Nov 30, 2024
121d460
bug fix for file retrieval per member
sadamov Nov 30, 2024
7e82eef
rename datastore for tests
sadamov Nov 30, 2024
320d7c4
aligned time with danra for easier boundary testing
sadamov Nov 30, 2024
f18dcc2
Fixed test for temporal embedding
sadamov Nov 30, 2024
e6327d8
allow boundary as input to ar_model.common_step
sadamov Dec 2, 2024
1374a19
linting
sadamov Dec 2, 2024
779f3e9
improved docstrings and added some assertions
sadamov Dec 2, 2024
f126ec2
remove boundary datastore from tests that don't need it
sadamov Dec 2, 2024
4b656da
fix scope of _get_time_step
sadamov Dec 2, 2024
75db4b8
added information about optional boundary datastore
sadamov Dec 2, 2024
58b4af6
Merge branch 'feat/boundary_dataloader' of https://github.com/sadamov…
sadamov Dec 2, 2024
4c17545
moved gcsfs to dev group
sadamov Dec 3, 2024
a700350
linting
sadamov Dec 3, 2024
315aa0f
Propagate separation of state and boundary change through training loop
joeloskarsson Oct 28, 2024
1967221
Start building graphs with wmg
joeloskarsson Nov 4, 2024
cb74e3f
Change forward pass to concat according to enforced node ordering
joeloskarsson Nov 11, 2024
9715ed8
wip to make tests pass
joeloskarsson Nov 11, 2024
336fba9
Fix edge index manipulation to make training work again
joeloskarsson Nov 12, 2024
ce3ea6d
Work on fixing plotting functionality
joeloskarsson Nov 12, 2024
a520505
Linting
joeloskarsson Nov 13, 2024
793e6c0
Add optional separate grid embedder for boundary
joeloskarsson Nov 13, 2024
3515460
Make new graph creation script main and only one
joeloskarsson Nov 13, 2024
05d91f1
Fix some typos and forgot code
joeloskarsson Nov 13, 2024
3eba43c
Correct handling of node indices for m2g when using decode_mask
joeloskarsson Nov 27, 2024
f1b7359
Linting and bugfixes
joeloskarsson Nov 28, 2024
fa6c9e3
Make graph creation and plotting work with datastores
joeloskarsson Dec 2, 2024
4d85384
Fix graph loading and boundary mask
joeloskarsson Dec 2, 2024
9edfec3
Fix boundary masking bug for static features
joeloskarsson Dec 2, 2024
6e1c53c
Add flag making boundary forcing optional in models
joeloskarsson Dec 3, 2024
4bcaa4b
Linting
joeloskarsson Dec 3, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 13 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,9 @@ Once `neural-lam` is installed you will be able to train/evaluate models. For th
interface that provides the data in a data-structure that can be used within
neural-lam. A datastore is used to create a `pytorch.Dataset`-derived
class that samples the data in time to create individual samples for
training, validation and testing.
training, validation and testing. A secondary datastore can be provided
for the boundary data. Currently, boundary datastore must be of type `mdp`
and only contain forcing features. This can easily be expanded in the future.

2. **The graph structure** is used to define message-passing GNN layers,
that are trained to emulate fluid flow in the atmosphere over time. The
Expand All @@ -121,7 +123,7 @@ different aspects about the training and evaluation of the model.

The path you provide to the neural-lam config (`config.yaml`) also sets the
root directory relative to which all other paths are resolved, as in the parent
directory of the config becomes the root directory. Both the datastore and
directory of the config becomes the root directory. Both the datastores and
graphs you generate are then stored in subdirectories of this root directory.
Exactly how and where a specific datastore expects its source data to be stored
and where it stores its derived data is up to the implementation of the
Expand All @@ -134,6 +136,7 @@ assume you placed `config.yaml` in a folder called `data`):
data/
├── config.yaml - Configuration file for neural-lam
├── danra.datastore.yaml - Configuration file for the datastore, referred to from config.yaml
├── era5.datastore.zarr/ - Optional configuration file for the boundary datastore, referred to from config.yaml
└── graphs/ - Directory containing graphs for training
```

Expand All @@ -142,18 +145,20 @@ And the content of `config.yaml` could in this case look like:
datastore:
kind: mdp
config_path: danra.datastore.yaml
datastore_boundary:
kind: mdp
config_path: era5.datastore.yaml
training:
state_feature_weighting:
__config_class__: ManualStateFeatureWeighting
values:
weights:
u100m: 1.0
v100m: 1.0
```

For now the neural-lam config only defines two things: 1) the kind of data
store and the path to its config, and 2) the weighting of different features in
the loss function. If you don't define the state feature weighting it will default
to weighting all features equally.
For now the neural-lam config only defines two things:
1) the kind of datastores and the path to their config
2) the weighting of different features in the loss function. If you don't define the state feature weighting it will default to weighting all features equally.

(This example is taken from the `tests/datastore_examples/mdp` directory.)

Expand Down Expand Up @@ -525,5 +530,4 @@ Furthermore, all tests in the ```tests``` directory will be run upon pushing cha

# Contact
If you are interested in machine learning models for LAM, have questions about the implementation or ideas for extending it, feel free to get in touch.
There is an open [mllam slack channel](https://join.slack.com/t/ml-lam/shared_invite/zt-2t112zvm8-Vt6aBvhX7nYa6Kbj_LkCBQ) that anyone can join (after following the link you have to request to join, this is to avoid spam bots).
You can also open a github issue on this page, or (if more suitable) send an email to [[email protected]](mailto:[email protected]).
There is an open [mllam slack channel](https://join.slack.com/t/ml-lam/shared_invite/zt-2t112zvm8-Vt6aBvhX7nYa6Kbj_LkCBQ) that anyone can join. You can also open a github issue on this page, or (if more suitable) send an email to [[email protected]](mailto:[email protected]).
177 changes: 177 additions & 0 deletions neural_lam/build_rectangular_graph.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
# Standard library
import argparse
import os

# Third-party
import numpy as np
import weather_model_graphs as wmg

# Local
from . import utils
from .config import load_config_and_datastore

WMG_ARCHETYPES = {
"keisler": wmg.create.archetype.create_keisler_graph,
"graphcast": wmg.create.archetype.create_graphcast_graph,
"hierarchical": wmg.create.archetype.create_oskarsson_hierarchical_graph,
}


def main(input_args=None):
parser = argparse.ArgumentParser(
description="Rectangular graph generation using weather-models-graph",
formatter_class=argparse.ArgumentDefaultsHelpFormatter,
)

# Inputs and outputs
parser.add_argument(
"--config_path",
type=str,
help="Path to the configuration for neural-lam",
)
parser.add_argument(
"--graph_name",
type=str,
help="Name to save graph as (default: multiscale)",
)
parser.add_argument(
"--output_dir",
type=str,
default="graphs",
help="Directory to save graph to",
)

# Graph structure
parser.add_argument(
"--archetype",
type=str,
default="keisler",
help="Archetype to use to create graph "
"(keisler/graphcast/hierarchical)",
)
parser.add_argument(
"--mesh_node_distance",
type=float,
default=3.0,
help="Distance between created mesh nodes",
)
parser.add_argument(
"--level_refinement_factor",
type=float,
default=3,
help="Refinement factor between grid points and bottom level of "
"mesh hierarchy",
)
parser.add_argument(
"--max_num_levels",
type=int,
help="Limit multi-scale mesh to given number of levels, "
"from bottom up",
)
args = parser.parse_args(input_args)

assert (
args.config_path is not None
), "Specify your config with --config_path"
assert (
args.graph_name is not None
), "Specify the name to save graph as with --graph_name"

_, datastore = load_config_and_datastore(config_path=args.config_path)

# Load grid positions
# TODO Do not get normalised positions
joeloskarsson marked this conversation as resolved.
Show resolved Hide resolved
coords = utils.get_reordered_grid_pos(datastore).numpy()
# (num_nodes_full, 2)

# Construct mask
num_full_grid = coords.shape[0]
num_boundary = datastore.boundary_mask.to_numpy().sum()
num_interior = num_full_grid - num_boundary
decode_mask = np.concatenate(
(
np.ones(num_interior, dtype=bool),
np.zeros(num_boundary, dtype=bool),
),
axis=0,
)

# Build graph
assert (
args.archetype in WMG_ARCHETYPES
), f"Unknown archetype: {args.archetype}"
archetype_create_func = WMG_ARCHETYPES[args.archetype]

create_kwargs = {
"coords": coords,
"mesh_node_distance": args.mesh_node_distance,
"decode_mask": decode_mask,
"return_components": True,
}
if args.archetype != "keisler":
# Add additional multi-level kwargs
create_kwargs.update(
{
"level_refinement_factor": args.level_refinement_factor,
"max_num_levels": args.max_num_levels,
}
)

graph_comp = archetype_create_func(**create_kwargs)

print("Created graph:")
for name, subgraph in graph_comp.items():
print(f"{name}: {subgraph}")

# Save graph
graph_dir_path = os.path.join(
datastore.root_path, "graphs", args.graph_name
)
os.makedirs(graph_dir_path, exist_ok=True)
for component, graph in graph_comp.items():
# This seems like a bit of a hack, maybe better if saving in wmg
# was made consistent with nl
if component == "m2m":
if args.archetype == "hierarchical":
# Split by direction
m2m_direction_comp = wmg.split_graph_by_edge_attribute(
graph, attr="direction"
)
for direction, graph in m2m_direction_comp.items():
if direction == "same":
# Name just m2m to be consistent with non-hierarchical
wmg.save.to_pyg(
graph=graph,
name="m2m",
list_from_attribute="level",
edge_features=["len", "vdiff"],
output_directory=graph_dir_path,
)
else:
# up and down directions
wmg.save.to_pyg(
graph=graph,
name=f"mesh_{direction}",
list_from_attribute="levels",
edge_features=["len", "vdiff"],
output_directory=graph_dir_path,
)
else:
wmg.save.to_pyg(
graph=graph,
name=component,
list_from_attribute="dummy", # Note: Needed to output list
edge_features=["len", "vdiff"],
output_directory=graph_dir_path,
)
else:
wmg.save.to_pyg(
graph=graph,
name=component,
edge_features=["len", "vdiff"],
output_directory=graph_dir_path,
)


if __name__ == "__main__":
main()
13 changes: 12 additions & 1 deletion neural_lam/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,4 +168,15 @@ def load_config_and_datastore(
datastore_kind=config.datastore.kind, config_path=datastore_config_path
)

return config, datastore
if config.datastore_boundary is not None:
datastore_boundary_config_path = (
Path(config_path).parent / config.datastore_boundary.config_path
)
datastore_boundary = init_datastore(
datastore_kind=config.datastore_boundary.kind,
config_path=datastore_boundary_config_path,
)
else:
datastore_boundary = None

return config, datastore, datastore_boundary
Loading
Loading