Skip to content

Commit

Permalink
Feature: add tests for meps dataset (#38)
Browse files Browse the repository at this point in the history
Implemeted tests for loading a reduced size meps example dataset,
creating graphs, and training model.

- reduce number of variables, size of domain etc in Joel's MEPS data
example so that the zip file is less than 500MB. Calling it
`meps_example_reduced`
- create test-data zip file and upload to EWC (credentials from
@leifdenby)
- implement test using pytorch to download and unpack testdata using
[pooch](https://pypi.org/project/pooch/)
- Implement testing of:
   - initiation of `neural_lam.weather_dataset.WeatherDataset` from
downloaded data
   - check shapes of returned parts of training item
   - create new graph in tests for reduced dataset
   - feed single batch through model and check shape of output
- add github action to run tests during ci/cd

closes #30
  • Loading branch information
SimonKamuk authored Jun 4, 2024
1 parent 743c07a commit 81d0840
Show file tree
Hide file tree
Showing 11 changed files with 443 additions and 7 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/pre-commit.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: lint
name: Linting

on:
# trigger on pushes to any branch, but not main
Expand Down
45 changes: 45 additions & 0 deletions .github/workflows/run_tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
name: Unit Tests

on:
# trigger on pushes to any branch, but not main
push:
branches-ignore:
- main
# and also on PRs to main
pull_request:
branches:
- main

jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.9", "3.10", "3.11", "3.12"]

steps:
- uses: actions/checkout@v3
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi
pip install torch-geometric>=2.5.2
- name: Load cache data
uses: actions/cache/restore@v4
with:
path: data
key: ${{ runner.os }}-meps-reduced-example-data-v0.1.0
restore-keys: |
${{ runner.os }}-meps-reduced-example-data-v0.1.0
- name: Test with pytest
run: |
pytest -v -s
- name: Save cache data
uses: actions/cache/save@v4
with:
path: data
key: ${{ runner.os }}-meps-reduced-example-data-v0.1.0
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [unreleased](https://github.com/joeloskarsson/neural-lam/compare/v0.1.0...HEAD)

### Added
- Added tests for loading dataset, creating graph, and training model based on reduced MEPS dataset stored on AWS S3, along with automatic running of tests on push/PR to GitHub. Added caching of test data tp speed up running tests.
[/#38](https://github.com/mllam/neural-lam/pull/38)
@SimonKamuk

- Replaced `constants.py` with `data_config.yaml` for data configuration management
[\#31](https://github.com/joeloskarsson/neural-lam/pull/31)
Expand Down
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
![Linting](https://github.com/mllam/neural-lam/actions/workflows/pre-commit.yml/badge.svg)
![Automatic tests](https://github.com/mllam/neural-lam/actions/workflows/run_tests.yml/badge.svg)

<p align="middle">
<img src="figures/neural_lam_header.png" width="700">
</p>
Expand Down Expand Up @@ -279,6 +282,8 @@ pre-commit run --all-files
```
from the root directory of the repository.

Furthermore, all tests in the ```tests``` directory will be run upon pushing changes by a github action. Failure in any of the tests will also reject the push/PR.

# Contact
If you are interested in machine learning models for LAM, have questions about our implementation or ideas for extending it, feel free to get in touch.
You can open a github issue on this page, or (if more suitable) send an email to [[email protected]](mailto:[email protected]).
4 changes: 2 additions & 2 deletions create_mesh.py
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ def prepend_node_index(graph, new_index):
return networkx.relabel_nodes(graph, to_mapping, copy=True)


def main():
def main(input_args=None):
parser = ArgumentParser(description="Graph generation arguments")
parser.add_argument(
"--data_config",
Expand Down Expand Up @@ -186,7 +186,7 @@ def main():
default=0,
help="Generate hierarchical mesh graph (default: 0, no)",
)
args = parser.parse_args()
args = parser.parse_args(input_args)

# Load grid positions
config_loader = config.Config.from_file(args.data_config)
Expand Down
239 changes: 239 additions & 0 deletions docs/notebooks/create_reduced_meps_dataset.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,239 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Creating meps_example_reduced\n",
"This notebook outlines how the small-size test dataset ```meps_example_reduced``` was created based on the slightly larger dataset ```meps_example```. The zipped up datasets are 263 MB and 2.6 GB, respectively. See [README.md](../../README.md) for info on how to download ```meps_example```.\n",
"\n",
"The dataset was reduced in size by reducing the number of grid points and variables.\n"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"# Standard library\n",
"import os\n",
"\n",
"# Third-party\n",
"import numpy as np\n",
"import torch"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"The number of grid points was reduced to 1/4 by halving the number of coordinates in both the x and y direction. This was done by removing a quarter of the grid points along each outer edge, so the center grid points would stay centered in the new set.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Load existing grid\n",
"grid_xy = np.load('data/meps_example/static/nwp_xy.npy')\n",
"# Get slices in each dimension by cutting off a quarter along each edge\n",
"num_x, num_y = grid_xy.shape[1:]\n",
"x_slice = slice(num_x//4, 3*num_x//4)\n",
"y_slice = slice(num_y//4, 3*num_y//4)\n",
"# Index and save reduced grid\n",
"grid_xy_reduced = grid_xy[:, x_slice, y_slice]\n",
"np.save('data/meps_example_reduced/static/nwp_xy.npy', grid_xy_reduced)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"This cut out the border, so a new perimeter of 10 grid points was established as border (10 was also the border size in the original \"meps_example\").\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"# Outer 10 grid points are border\n",
"old_border_mask = np.load('data/meps_example/static/border_mask.npy')\n",
"assert np.all(old_border_mask[10:-10, 10:-10] == False)\n",
"assert np.all(old_border_mask[:10, :] == True)\n",
"assert np.all(old_border_mask[:, :10] == True)\n",
"assert np.all(old_border_mask[-10:,:] == True)\n",
"assert np.all(old_border_mask[:,-10:] == True)\n",
"\n",
"# Create new array with False everywhere but the outer 10 grid points\n",
"border_mask = np.zeros_like(grid_xy_reduced[0,:,:], dtype=bool)\n",
"border_mask[:10] = True\n",
"border_mask[:,:10] = True\n",
"border_mask[-10:] = True\n",
"border_mask[:,-10:] = True\n",
"np.save('data/meps_example_reduced/static/border_mask.npy', border_mask)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A few other files also needed to be copied using only the new reduced grid"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Load surface_geopotential.npy, index only values from the reduced grid, and save to new file\n",
"surface_geopotential = np.load('data/meps_example/static/surface_geopotential.npy')\n",
"surface_geopotential_reduced = surface_geopotential[x_slice, y_slice]\n",
"np.save('data/meps_example_reduced/static/surface_geopotential.npy', surface_geopotential_reduced)\n",
"\n",
"# Load pytorch file grid_features.pt\n",
"grid_features = torch.load('data/meps_example/static/grid_features.pt')\n",
"# Index only values from the reduced grid. \n",
"# First reshape from (num_grid_points_total, 4) to (num_grid_points_x, num_grid_points_y, 4), \n",
"# then index, then reshape back to new total number of grid points\n",
"print(grid_features.shape)\n",
"grid_features_new = grid_features.reshape(num_x, num_y, 4)[x_slice,y_slice,:].reshape((-1, 4))\n",
"# Save to new file\n",
"torch.save(grid_features_new, 'data/meps_example_reduced/static/grid_features.pt')\n",
"\n",
"# flux_stats.pt is just a vector of length 2, so the grid shape and variable changes does not change this file\n",
"torch.save(torch.load('data/meps_example/static/flux_stats.pt'), 'data/meps_example_reduced/static/flux_stats.pt')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"The number of variables was reduced by truncating the variable list to the first 8."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"num_vars = 8\n",
"\n",
"# Load parameter_weights.npy, truncate to first 8 variables, and save to new file\n",
"parameter_weights = np.load('data/meps_example/static/parameter_weights.npy')\n",
"parameter_weights_reduced = parameter_weights[:num_vars]\n",
"np.save('data/meps_example_reduced/static/parameter_weights.npy', parameter_weights_reduced)\n",
"\n",
"# Do the same for following 4 pytorch files\n",
"for file in ['diff_mean', 'diff_std', 'parameter_mean', 'parameter_std']:\n",
" old_file = torch.load(f'data/meps_example/static/{file}.pt')\n",
" new_file = old_file[:num_vars]\n",
" torch.save(new_file, f'data/meps_example_reduced/static/{file}.pt')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lastly the files in each of the directories train, test, and val have to be reduced. The folders all have the same structure with files of the following types:\n",
"```\n",
"nwp_YYYYMMDDHH_mbrXXX.npy\n",
"wtr_YYYYMMDDHH.npy\n",
"nwp_toa_downwelling_shortwave_flux_YYYYMMDDHH.npy\n",
"```\n",
"with ```YYYYMMDDHH``` being some date with hours, and ```XXX``` being some 3-digit integer.\n",
"\n",
"The first type of file has x and y in dimensions 1 and 2, and variable index in dimension 3. Dimension 0 is unchanged.\n",
"The second type has has x and y in dimensions 1 and 2. Dimension 0 is unchanged.\n",
"The last type has just x and y as the only 2 dimensions.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(65, 268, 238, 18)\n",
"(65, 268, 238)\n"
]
}
],
"source": [
"print(np.load('data/meps_example/samples/train/nwp_2022040100_mbr000.npy').shape)\n",
"print(np.load('data/meps_example/samples/train/nwp_toa_downwelling_shortwave_flux_2022040112.npy').shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following loop goes through each file in each sample folder and indexes them according to the dimensions given by the file name."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for sample in ['train', 'test', 'val']:\n",
" files = os.listdir(f'data/meps_example/samples/{sample}')\n",
"\n",
" for f in files:\n",
" data = np.load(f'data/meps_example/samples/{sample}/{f}')\n",
" if 'mbr' in f:\n",
" data = data[:,x_slice,y_slice,:num_vars]\n",
" elif 'wtr' in f:\n",
" data = data[x_slice, y_slice]\n",
" else:\n",
" data = data[:,x_slice,y_slice]\n",
" np.save(f'data/meps_example_reduced/samples/{sample}/{f}', data)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lastly, the file ```data_config.yaml``` is modified manually by truncating the variable units, long and short names, and setting the new grid shape. Also the unit descriptions containing ```^``` was automatically parsed using latex, and to avoid having to install latex in the GitHub CI/CD pipeline, this was changed to ```**```. \n",
"\n",
"This new config file was placed in ```data/meps_example_reduced```, and that directory was then zipped and placed in a European Weather Cloud S3 bucket."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.14"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
7 changes: 6 additions & 1 deletion neural_lam/utils.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Standard library
import os
import shutil

# Third-party
import numpy as np
Expand Down Expand Up @@ -250,7 +251,11 @@ def fractional_plot_bundle(fraction):
Get the tueplots bundle, but with figure width as a fraction of
the page width.
"""
bundle = bundles.neurips2023(usetex=True, family="serif")
# If latex is not available, some visualizations might not render correctly,
# but will at least not raise an error.
# Alternatively, use unicode raised numbers.
usetex = True if shutil.which("latex") else False
bundle = bundles.neurips2023(usetex=usetex, family="serif")
bundle.update(figsizes.neurips2023())
original_figsize = bundle["figure.figsize"]
bundle["figure.figsize"] = (
Expand Down
2 changes: 2 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,5 @@ plotly>=5.15.0

# for dev
pre-commit>=2.15.0
pytest>=8.1.1
pooch>=1.8.1
Empty file added tests/__init__.py
Empty file.
Loading

0 comments on commit 81d0840

Please sign in to comment.