-
Notifications
You must be signed in to change notification settings - Fork 49
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Feature: add tests for meps dataset (#38)
Implemeted tests for loading a reduced size meps example dataset, creating graphs, and training model. - reduce number of variables, size of domain etc in Joel's MEPS data example so that the zip file is less than 500MB. Calling it `meps_example_reduced` - create test-data zip file and upload to EWC (credentials from @leifdenby) - implement test using pytorch to download and unpack testdata using [pooch](https://pypi.org/project/pooch/) - Implement testing of: - initiation of `neural_lam.weather_dataset.WeatherDataset` from downloaded data - check shapes of returned parts of training item - create new graph in tests for reduced dataset - feed single batch through model and check shape of output - add github action to run tests during ci/cd closes #30
- Loading branch information
1 parent
743c07a
commit 81d0840
Showing
11 changed files
with
443 additions
and
7 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
name: lint | ||
name: Linting | ||
|
||
on: | ||
# trigger on pushes to any branch, but not main | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
name: Unit Tests | ||
|
||
on: | ||
# trigger on pushes to any branch, but not main | ||
push: | ||
branches-ignore: | ||
- main | ||
# and also on PRs to main | ||
pull_request: | ||
branches: | ||
- main | ||
|
||
jobs: | ||
build: | ||
runs-on: ubuntu-latest | ||
strategy: | ||
matrix: | ||
python-version: ["3.9", "3.10", "3.11", "3.12"] | ||
|
||
steps: | ||
- uses: actions/checkout@v3 | ||
- name: Set up Python ${{ matrix.python-version }} | ||
uses: actions/setup-python@v4 | ||
with: | ||
python-version: ${{ matrix.python-version }} | ||
- name: Install dependencies | ||
run: | | ||
python -m pip install --upgrade pip | ||
if [ -f requirements.txt ]; then pip install -r requirements.txt; fi | ||
pip install torch-geometric>=2.5.2 | ||
- name: Load cache data | ||
uses: actions/cache/restore@v4 | ||
with: | ||
path: data | ||
key: ${{ runner.os }}-meps-reduced-example-data-v0.1.0 | ||
restore-keys: | | ||
${{ runner.os }}-meps-reduced-example-data-v0.1.0 | ||
- name: Test with pytest | ||
run: | | ||
pytest -v -s | ||
- name: Save cache data | ||
uses: actions/cache/save@v4 | ||
with: | ||
path: data | ||
key: ${{ runner.os }}-meps-reduced-example-data-v0.1.0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,6 @@ | ||
![Linting](https://github.com/mllam/neural-lam/actions/workflows/pre-commit.yml/badge.svg) | ||
![Automatic tests](https://github.com/mllam/neural-lam/actions/workflows/run_tests.yml/badge.svg) | ||
|
||
<p align="middle"> | ||
<img src="figures/neural_lam_header.png" width="700"> | ||
</p> | ||
|
@@ -279,6 +282,8 @@ pre-commit run --all-files | |
``` | ||
from the root directory of the repository. | ||
|
||
Furthermore, all tests in the ```tests``` directory will be run upon pushing changes by a github action. Failure in any of the tests will also reject the push/PR. | ||
|
||
# Contact | ||
If you are interested in machine learning models for LAM, have questions about our implementation or ideas for extending it, feel free to get in touch. | ||
You can open a github issue on this page, or (if more suitable) send an email to [[email protected]](mailto:[email protected]). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,239 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Creating meps_example_reduced\n", | ||
"This notebook outlines how the small-size test dataset ```meps_example_reduced``` was created based on the slightly larger dataset ```meps_example```. The zipped up datasets are 263 MB and 2.6 GB, respectively. See [README.md](../../README.md) for info on how to download ```meps_example```.\n", | ||
"\n", | ||
"The dataset was reduced in size by reducing the number of grid points and variables.\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Standard library\n", | ||
"import os\n", | ||
"\n", | ||
"# Third-party\n", | ||
"import numpy as np\n", | ||
"import torch" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"\n", | ||
"The number of grid points was reduced to 1/4 by halving the number of coordinates in both the x and y direction. This was done by removing a quarter of the grid points along each outer edge, so the center grid points would stay centered in the new set.\n", | ||
"\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Load existing grid\n", | ||
"grid_xy = np.load('data/meps_example/static/nwp_xy.npy')\n", | ||
"# Get slices in each dimension by cutting off a quarter along each edge\n", | ||
"num_x, num_y = grid_xy.shape[1:]\n", | ||
"x_slice = slice(num_x//4, 3*num_x//4)\n", | ||
"y_slice = slice(num_y//4, 3*num_y//4)\n", | ||
"# Index and save reduced grid\n", | ||
"grid_xy_reduced = grid_xy[:, x_slice, y_slice]\n", | ||
"np.save('data/meps_example_reduced/static/nwp_xy.npy', grid_xy_reduced)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"\n", | ||
"This cut out the border, so a new perimeter of 10 grid points was established as border (10 was also the border size in the original \"meps_example\").\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 6, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Outer 10 grid points are border\n", | ||
"old_border_mask = np.load('data/meps_example/static/border_mask.npy')\n", | ||
"assert np.all(old_border_mask[10:-10, 10:-10] == False)\n", | ||
"assert np.all(old_border_mask[:10, :] == True)\n", | ||
"assert np.all(old_border_mask[:, :10] == True)\n", | ||
"assert np.all(old_border_mask[-10:,:] == True)\n", | ||
"assert np.all(old_border_mask[:,-10:] == True)\n", | ||
"\n", | ||
"# Create new array with False everywhere but the outer 10 grid points\n", | ||
"border_mask = np.zeros_like(grid_xy_reduced[0,:,:], dtype=bool)\n", | ||
"border_mask[:10] = True\n", | ||
"border_mask[:,:10] = True\n", | ||
"border_mask[-10:] = True\n", | ||
"border_mask[:,-10:] = True\n", | ||
"np.save('data/meps_example_reduced/static/border_mask.npy', border_mask)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"A few other files also needed to be copied using only the new reduced grid" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Load surface_geopotential.npy, index only values from the reduced grid, and save to new file\n", | ||
"surface_geopotential = np.load('data/meps_example/static/surface_geopotential.npy')\n", | ||
"surface_geopotential_reduced = surface_geopotential[x_slice, y_slice]\n", | ||
"np.save('data/meps_example_reduced/static/surface_geopotential.npy', surface_geopotential_reduced)\n", | ||
"\n", | ||
"# Load pytorch file grid_features.pt\n", | ||
"grid_features = torch.load('data/meps_example/static/grid_features.pt')\n", | ||
"# Index only values from the reduced grid. \n", | ||
"# First reshape from (num_grid_points_total, 4) to (num_grid_points_x, num_grid_points_y, 4), \n", | ||
"# then index, then reshape back to new total number of grid points\n", | ||
"print(grid_features.shape)\n", | ||
"grid_features_new = grid_features.reshape(num_x, num_y, 4)[x_slice,y_slice,:].reshape((-1, 4))\n", | ||
"# Save to new file\n", | ||
"torch.save(grid_features_new, 'data/meps_example_reduced/static/grid_features.pt')\n", | ||
"\n", | ||
"# flux_stats.pt is just a vector of length 2, so the grid shape and variable changes does not change this file\n", | ||
"torch.save(torch.load('data/meps_example/static/flux_stats.pt'), 'data/meps_example_reduced/static/flux_stats.pt')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"\n", | ||
"The number of variables was reduced by truncating the variable list to the first 8." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"num_vars = 8\n", | ||
"\n", | ||
"# Load parameter_weights.npy, truncate to first 8 variables, and save to new file\n", | ||
"parameter_weights = np.load('data/meps_example/static/parameter_weights.npy')\n", | ||
"parameter_weights_reduced = parameter_weights[:num_vars]\n", | ||
"np.save('data/meps_example_reduced/static/parameter_weights.npy', parameter_weights_reduced)\n", | ||
"\n", | ||
"# Do the same for following 4 pytorch files\n", | ||
"for file in ['diff_mean', 'diff_std', 'parameter_mean', 'parameter_std']:\n", | ||
" old_file = torch.load(f'data/meps_example/static/{file}.pt')\n", | ||
" new_file = old_file[:num_vars]\n", | ||
" torch.save(new_file, f'data/meps_example_reduced/static/{file}.pt')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Lastly the files in each of the directories train, test, and val have to be reduced. The folders all have the same structure with files of the following types:\n", | ||
"```\n", | ||
"nwp_YYYYMMDDHH_mbrXXX.npy\n", | ||
"wtr_YYYYMMDDHH.npy\n", | ||
"nwp_toa_downwelling_shortwave_flux_YYYYMMDDHH.npy\n", | ||
"```\n", | ||
"with ```YYYYMMDDHH``` being some date with hours, and ```XXX``` being some 3-digit integer.\n", | ||
"\n", | ||
"The first type of file has x and y in dimensions 1 and 2, and variable index in dimension 3. Dimension 0 is unchanged.\n", | ||
"The second type has has x and y in dimensions 1 and 2. Dimension 0 is unchanged.\n", | ||
"The last type has just x and y as the only 2 dimensions.\n", | ||
"\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 12, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"(65, 268, 238, 18)\n", | ||
"(65, 268, 238)\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"print(np.load('data/meps_example/samples/train/nwp_2022040100_mbr000.npy').shape)\n", | ||
"print(np.load('data/meps_example/samples/train/nwp_toa_downwelling_shortwave_flux_2022040112.npy').shape)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"The following loop goes through each file in each sample folder and indexes them according to the dimensions given by the file name." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"for sample in ['train', 'test', 'val']:\n", | ||
" files = os.listdir(f'data/meps_example/samples/{sample}')\n", | ||
"\n", | ||
" for f in files:\n", | ||
" data = np.load(f'data/meps_example/samples/{sample}/{f}')\n", | ||
" if 'mbr' in f:\n", | ||
" data = data[:,x_slice,y_slice,:num_vars]\n", | ||
" elif 'wtr' in f:\n", | ||
" data = data[x_slice, y_slice]\n", | ||
" else:\n", | ||
" data = data[:,x_slice,y_slice]\n", | ||
" np.save(f'data/meps_example_reduced/samples/{sample}/{f}', data)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Lastly, the file ```data_config.yaml``` is modified manually by truncating the variable units, long and short names, and setting the new grid shape. Also the unit descriptions containing ```^``` was automatically parsed using latex, and to avoid having to install latex in the GitHub CI/CD pipeline, this was changed to ```**```. \n", | ||
"\n", | ||
"This new config file was placed in ```data/meps_example_reduced```, and that directory was then zipped and placed in a European Weather Cloud S3 bucket." | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.14" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,3 +13,5 @@ plotly>=5.15.0 | |
|
||
# for dev | ||
pre-commit>=2.15.0 | ||
pytest>=8.1.1 | ||
pooch>=1.8.1 |
Empty file.
Oops, something went wrong.