diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md deleted file mode 100644 index ecc26730..00000000 --- a/ARCHITECTURE.md +++ /dev/null @@ -1,75 +0,0 @@ -# Architecture - -This document describes the high level architecture of the nwp-consumer project. - -## Birds-eye view - -```mermaid -flowchart - subgraph "Hexagonal Architecture" - - subgraph "NWP Consumer" - subgraph "Ports" - portFI(FetcherInterface) --- core - core --- portSI(StorageInterface) - - subgraph "Core" - core{{Domain Logic}} - end - end - end - - subgraph "Driving Adaptors" - i1{ICON} --implements--> portFI - i2{ECMWF} --implements--> portFI - i3{MetOffice} --implements--> portFI - end - - subgraph "Driven Adaptors" - portSI --- o1{S3} - portSI --- o2{Huggingface} - portSI --- o3{LocalFS} - end - - end -``` - -At the top level, the consumer downloads raw NWP data, processes it to zarr, and saves it to a storage backend. - -It is built following the hexagonal architecture pattern. -This pattern is used to separate the core business logic from the driving and driven adaptors. -The core business logic is the `service` module, which contains the domain logic. -This logic is agnostic to the driving and driven actors, -instead relying on abstract classes as the ports to interact with them. - - -## Entry Points - -`src/nwp_consumer/cmd/main.py` contains the main function which runs the consumer. - -`src/nwp_consumer/internal/service/consumer.py` contains the `NWPConsumer` class, -the methods of which are the business use cases of the consumer. - -`StorageInterface` and `FetcherInterface` classes define the ports used by driving and driven actors. - -`src/nwp_consumer/internal/inputs` contains the adaptors for the driving actors. - -`src/nwp_consumer/internal/outputs` contains the adaptors for the driven actors. - -## Core - -The core business logic is contained in the `service` module. -According to the hexagonal pattern, the core logic is agnostic to the driving and driven actors. -As such, there is an internal data representation of the NWP data that the core logic acts upon. -Due to the multidimensional data of the NWP data, it is hard to define a schema for this. - -Internal data is stored an xarray dataset. -This dataset effectively acts as an array of `DataArrays` for each parameter or variable. -It should have the following dimensions and coordinates: - -- `time` dimension -- `step` dimension -- `latitude` or `x` dimension -- `longitude` or `y` dimension - -Parameters should be stored as DataArrays in the dataset. \ No newline at end of file diff --git a/README.md b/README.md index abfacfe2..4c15d748 100644 --- a/README.md +++ b/README.md @@ -42,26 +42,64 @@ $ docker pull ghcr.io/openclimatefix/nwp-consumer ## Example usage -**To create an archive of GFS data:** +**To download the latest available day of GFS data:*** -TODO +```bash +$ nwp-consumer consume +``` -## Documentation +**To create an archive of a month of GFS data:** + +> [!Note] +> This will download several gigabytes of data to your home partition. +> Make sure you have plenty of free space (and time!) -TODO: link to built documentation +```bash +$ nwp-consumer archive --year 2024 --month 1 +``` -Documentation is generated via [pydoctor](https://pydoctor.readthedocs.io/en/latest/). +## Documentation + +Documentation is generated via [pdoc](https://pdoc.dev/docs/pdoc.html). To build the documentation, run the following command in the repository root: ```bash -$ python -m pydoctor +$ PDOC_ALLOW_EXEC=1 python -m pdoc -o docs --docformat=google src/nwp_consumer ``` +> [!Note] +> The `PDOC_ALLOW_EXEC=1` environment variable is required due to a facet +> of the `ocf_blosc2` library, which imports itself automatically and hence +> necessitates execution to be enabled. + ## FAQ ### How do I authenticate with model repositories that require accounts? +Authentication, and model repository selection, is handled via environment variables. +Choose a repository via the `MODEL_REPOSITORY` environment variable. Required environment +variables can be found in the repository's metadata function. Missing variables will be +warned about at runtime. + +### How do I use an S3 bucket for created stores? + +The `ZARRDIR` environment variable can be set to an S3 url +(ex: `s3://some-bucket-name/some-prefix`). Valid credentials for accessing the bucket +must be discoverable in the environment as per +[Botocore's documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html) + +### How do I change what variables are pulled? + +With difficulty! This package pulls data specifically tailored to Open Climate Fix's needs, +and as such, the data it pulls (and the schema that data is surfaced with) +is a fixed part of the package. A large part of the value proposition of this consumer is +that the data it produces is consistent and comparable between different sources, so pull +requests to the effect of adding or changing this for a specific model are unlikely to be +approved. +However, desired changes can be made via cloning the repo and making the relevant +parameter modifications to the model's expected coordinates in it's metadata for the desired model +repository. ## Development @@ -77,7 +115,8 @@ $ python -m ruff check . ``` Be sure to do this periodically while developing to catch any errors early -and prevent headaches with the CI pipeline. +and prevent headaches with the CI pipeline. It may seem like a hassle at first, +but it prevents accidental creation of a whole suite of bugs. ### Running the test suite diff --git a/src/nwp_consumer/cmd/main.py b/src/nwp_consumer/cmd/main.py index eddb3dcf..92aab700 100644 --- a/src/nwp_consumer/cmd/main.py +++ b/src/nwp_consumer/cmd/main.py @@ -31,8 +31,11 @@ def parse_env() -> Adaptors: case "metoffice-datahub": model_repository_adaptor = \ repositories.model_repositories.MetOfficeDatahubModelRepository - case _ as model: - log.error(f"Unknown model: {model}") + case _ as mr: + log.error( + f"Unknown model repository '{mr}'. Expected one of " + f"['gfs', 'ceda', 'ecmwf-realtime', 'metoffice-datahub']" + ) sys.exit(1) notification_repository_adaptor: type[ports.NotificationRepository] diff --git a/src/nwp_consumer/cmd/test_main.py b/src/nwp_consumer/cmd/test_main.py deleted file mode 100644 index 73ee7683..00000000 --- a/src/nwp_consumer/cmd/test_main.py +++ /dev/null @@ -1,56 +0,0 @@ -import datetime as dt -import os -import unittest -from unittest import mock - -from nwp_consumer.internal import FetcherInterface - -from .main import _parse_from_to - - - -class TestParseFromTo(unittest.TestCase): - def test_today(self) -> None: - # Test that today is processed correctly - start, end = _parse_from_to("today", None) - self.assertEqual( - start, - dt.datetime.now(tz=dt.UTC).replace(hour=0, minute=0, second=0, microsecond=0), - ) - self.assertEqual( - end, - dt.datetime.now(tz=dt.UTC).replace(hour=0, minute=0, second=0, microsecond=0) - + dt.timedelta(days=1), - ) - - def test_from_date(self) -> None: - # Test that a date is processed correctly - start, end = _parse_from_to("2021-01-01", None) - self.assertEqual(start, dt.datetime(2021, 1, 1, tzinfo=dt.UTC)) - self.assertEqual(end, dt.datetime(2021, 1, 2, tzinfo=dt.UTC)) - - def test_from_datetime(self) -> None: - # Test that a datetime is processed correctly - start, end = _parse_from_to("2021-01-01T12:00", None) - self.assertEqual(start, dt.datetime(2021, 1, 1, 12, 0, tzinfo=dt.UTC)) - self.assertEqual(end, dt.datetime(2021, 1, 1, 12, 0, tzinfo=dt.UTC)) - - def test_from_datetime_to_date(self) -> None: - # Test that a datetime is processed correctly - start, end = _parse_from_to("2021-01-01T12:00", "2021-01-02") - self.assertEqual(start, dt.datetime(2021, 1, 1, 12, 0, tzinfo=dt.UTC)) - self.assertEqual(end, dt.datetime(2021, 1, 2, 0, tzinfo=dt.UTC)) - - def test_from_datetime_to_datetime(self) -> None: - # Test that a datetime is processed correctly - start, end = _parse_from_to("2021-01-01T12:00", "2021-01-02T12:00") - self.assertEqual(start, dt.datetime(2021, 1, 1, 12, 0, tzinfo=dt.UTC)) - self.assertEqual(end, dt.datetime(2021, 1, 2, 12, 0, tzinfo=dt.UTC)) - - def test_invalid_datetime(self) -> None: - # Test that an invalid datetime is processed correctly - with self.assertRaises(ValueError): - _parse_from_to("2021-01-01T12:00:00", None) - - with self.assertRaises(ValueError): - _parse_from_to("2021010100", None) diff --git a/src/nwp_consumer/internal/cache.py b/src/nwp_consumer/internal/cache.py deleted file mode 100644 index 4bdfd34b..00000000 --- a/src/nwp_consumer/internal/cache.py +++ /dev/null @@ -1,91 +0,0 @@ -"""Defines the cache for the application. - -Many sources of data do not give any option for accessing their files -via e.g. a BytesIO object. Were this the case, we could use a generic -local filesystem adaptor to handle all incoming data. Since it isn't, -and instead often a pre-existing file object is required to push data -into, a cache is required to store the data temporarily. - -The cache is a simple directory structure that stores files in a -hierarchical format; with the top level directory being the source of -the data, followed by a subdirectory for the type of data (raw or -zarr), then further subdirectories according to the init time -associated with the file. - -Driven actors are then responsible for mapping the cached data to the -desired storage location. - -Example: -|--- /tmp/nwpc -| |--- source1 -| | |--- raw -| | | |--- 2021 -| | | |--- 01 -| | | |--- 01 -| | | |--- 0000 -| | | |--- parameter1.grib -| | | |--- parameter2.grib -| | | |--- 1200 -| | | |--- parameter1.grib -| | | |--- parameter2.grib -| | |--- zarr -| | |--- 2021 -| | |--- 01 -| | |--- 01 -| | |--- 20210101T0000.zarr.zip -| | |--- 20210101T1200.zarr.zip -""" - -import datetime as dt -import pathlib - -# --- Constants --- # - -# Define the location of the consumer's cache directory -CACHE_DIR = pathlib.Path("/tmp/nwpc") # noqa: S108 -CACHE_DIR_RAW = CACHE_DIR / "raw" -CACHE_DIR_ZARR = CACHE_DIR / "zarr" - -# Define the datetime format strings for creating a folder -# structure from a datetime object for raw and zarr files -IT_FOLDER_STRUCTURE_RAW = "%Y/%m/%d/%H%M" -IT_FOLDER_GLOBSTR_RAW = "*/*/*/*" -IT_FOLDER_STRUCTURE_ZARR = "%Y/%m/%d" -IT_FOLDER_GLOBSTR_ZARR = "*/*/*" - -# Define the datetime format string for a zarr filename -IT_FILENAME_ZARR = "%Y%m%dT%H%M.zarr" -IT_FULLPATH_ZARR = f"{IT_FOLDER_STRUCTURE_ZARR}/{IT_FILENAME_ZARR}" - -# --- Functions --- # - - -def rawCachePath(it: dt.datetime, filename: str) -> pathlib.Path: - """Create a filepath to cache a raw file. - - Args: - it: The initialisation time of the file to cache. - filename: The name of the file (including extension). - - Returns: - The path to the cached file. - """ - # Build the directory structure according to the file's datetime - parent: pathlib.Path = CACHE_DIR_RAW / it.strftime(IT_FOLDER_STRUCTURE_RAW) - parent.mkdir(parents=True, exist_ok=True) - return parent / filename - - -def zarrCachePath(it: dt.datetime) -> pathlib.Path: - """Create a filepath to cache a zarr file. - - Args: - it: The initialisation time of the file to cache. - - Returns: - The path to the cache file. - """ - # Build the directory structure according to the file's datetime - parent: pathlib.Path = CACHE_DIR_ZARR / it.strftime(IT_FOLDER_STRUCTURE_ZARR) - parent.mkdir(parents=True, exist_ok=True) - return parent / it.strftime(IT_FILENAME_ZARR) diff --git a/src/nwp_consumer/internal/config/__init__.py b/src/nwp_consumer/internal/config/__init__.py deleted file mode 100644 index 84b9e414..00000000 --- a/src/nwp_consumer/internal/config/__init__.py +++ /dev/null @@ -1,31 +0,0 @@ -"""Configuration for the service.""" - -__all__ = [ - "EnvParser", - "CEDAEnv", - "ConsumerEnv", - "CMCEnv", - "ECMWFMARSEnv", - "ECMWFS3Env", - "ICONEnv", - "GFSEnv", - "HuggingFaceEnv", - "MetOfficeEnv", - "S3Env", - "LocalEnv", -] - -from .env import ( - CEDAEnv, - CMCEnv, - ConsumerEnv, - ECMWFMARSEnv, - ECMWFS3Env, - EnvParser, - GFSEnv, - HuggingFaceEnv, - ICONEnv, - LocalEnv, - MetOfficeEnv, - S3Env, -) diff --git a/src/nwp_consumer/internal/config/env.py b/src/nwp_consumer/internal/config/env.py deleted file mode 100644 index 2a7e80ad..00000000 --- a/src/nwp_consumer/internal/config/env.py +++ /dev/null @@ -1,248 +0,0 @@ -"""Config struct for application running.""" -import os -from distutils.util import strtobool -from typing import get_type_hints - -import structlog - -from nwp_consumer import internal -from nwp_consumer.internal import inputs, outputs - -log = structlog.getLogger() - - -class EnvParser: - """Mixin to parse environment variables into class fields. - - Whilst this could be done with Pydantic, it's nice to avoid the - extra dependency if possible, and pydantic would be overkill for - this small use case. - """ - - def __init__(self) -> None: - """Parse environment variables into class fields. - - If the class field is upper case, parse it into the indicated - type from the environment. Required fields are those set in - the child class without a default value. - - Examples: - >>> MyEnv(EnvParser): - >>> REQUIRED_ENV_VAR: str - >>> OPTIONAL_ENV_VAR: str = "default value" - >>> ignored_var: str = "ignored" - """ - for field, t in get_type_hints(self).items(): - # Skip item if not upper case - if not field.isupper(): - continue - - # Log Error if required field not supplied - default_value = getattr(self, field, None) - match (default_value, os.environ.get(field)): - case (None, None): - # No default value, and field not in env - raise OSError(f"Required field {field} not supplied") - case (_, None): - # A default value is set and field not in env - pass - case (_, _): - # Field is in env - env_value: str | bool = os.environ[field] - # Handle bools seperately as bool("False") == True - if t == bool: - env_value = bool(strtobool(os.environ[field])) - # Cast to desired type - self.__setattr__(field, t(env_value)) - - @classmethod - def print_env(cls) -> None: - """Print the required environment variables.""" - message: str = f"Environment variables for {cls.__class__.__name__}:\n" - for field, _ in get_type_hints(cls).items(): - if not field.isupper(): - continue - default_value = getattr(cls, field, None) - message += f"\t{field}{'(default: ' + default_value + ')' if default_value else ''}\n" - log.info(message) - - def configure_fetcher(self) -> internal.FetcherInterface: - """Configure the associated fetcher.""" - raise NotImplementedError( - "Fetcher not implemented for this environment. Check the available inputs.", - ) - - def configure_storer(self) -> internal.StorageInterface: - """Configure the associated storer.""" - raise NotImplementedError( - "Storer not implemented for this environment. Check the available outputs.", - ) - - -# --- Configuration environment variables --- # - - -class ConsumerEnv(EnvParser): - """Config for Consumer.""" - - DASK_SCHEDULER_ADDRESS: str = "" - - -# --- Inputs environment variables --- # - - -class CEDAEnv(EnvParser): - """Config for CEDA FTP server.""" - - CEDA_FTP_USER: str - CEDA_FTP_PASS: str - - def configure_fetcher(self) -> internal.FetcherInterface: - """Overrides the corresponding method in the parent class.""" - return inputs.ceda.Client(ftpUsername=self.CEDA_FTP_USER, ftpPassword=self.CEDA_FTP_PASS) - - -class MetOfficeEnv(EnvParser): - """Config for Met Office API.""" - - METOFFICE_ORDER_ID: str - METOFFICE_API_KEY: str - - def configure_fetcher(self) -> internal.FetcherInterface: - """Overrides the corresponding method in the parent class.""" - return inputs.metoffice.Client( - apiKey=self.METOFFICE_API_KEY, - orderID=self.METOFFICE_ORDER_ID, - ) - - -class ECMWFMARSEnv(EnvParser): - """Config for ECMWF MARS API.""" - - ECMWF_API_KEY: str - ECMWF_API_URL: str - ECMWF_API_EMAIL: str - ECMWF_AREA: str = "uk" - ECMWF_HOURS: int = 48 - ECMWF_PARAMETER_GROUP: str = "default" - - def configure_fetcher(self) -> internal.FetcherInterface: - """Overrides the corresponding method in the parent class.""" - return inputs.ecmwf.MARSClient( - area=self.ECMWF_AREA, - hours=self.ECMWF_HOURS, - param_group=self.ECMWF_PARAMETER_GROUP, - ) - - -class ECMWFS3Env(EnvParser): - """Config for ECMWF S3.""" - - ECMWF_AWS_S3_BUCKET: str - ECMWF_AWS_ACCESS_KEY: str = "" - ECMWF_AWS_ACCESS_SECRET: str = "" - ECMWF_AWS_REGION: str - ECMWF_AREA: str = "uk" - - def configure_fetcher(self) -> internal.FetcherInterface: - """Overrides the corresponding method in the parent class.""" - return inputs.ecmwf.S3Client( - bucket=self.ECMWF_AWS_S3_BUCKET, - area=self.ECMWF_AREA, - region=self.ECMWF_AWS_REGION, - key=self.ECMWF_AWS_ACCESS_KEY, - secret=self.ECMWF_AWS_ACCESS_SECRET, - ) - - -class ICONEnv(EnvParser): - """Config for ICON API.""" - - ICON_MODEL: str = "europe" - ICON_HOURS: int = 48 - ICON_PARAMETER_GROUP: str = "default" - - def configure_fetcher(self) -> internal.FetcherInterface: - """Overrides the corresponding method in the parent class.""" - return inputs.icon.Client( - model=self.ICON_MODEL, - hours=self.ICON_HOURS, - param_group=self.ICON_PARAMETER_GROUP, - ) - - -class CMCEnv(EnvParser): - """Config for CMC API.""" - - CMC_MODEL: str = "gdps" - CMC_HOURS: int = 240 - CMC_PARAMETER_GROUP: str = "full" - - def configure_fetcher(self) -> internal.FetcherInterface: - """Overrides the corresponding method in the parent class.""" - return inputs.cmc.Client( - model=self.CMC_MODEL, - hours=self.CMC_HOURS, - param_group=self.CMC_PARAMETER_GROUP, - ) - -class GFSEnv(EnvParser): - """Config for GFS API.""" - - GFS_MODEL: str = "global" - GFS_HOURS: int = 48 - GFS_PARAMETER_GROUP: str = "default" - - def configure_fetcher(self) -> internal.FetcherInterface: - """Overrides the corresponding method in the parent class.""" - return inputs.noaa.AWSClient( - model=self.GFS_MODEL, - param_group=self.GFS_PARAMETER_GROUP, - hours=self.GFS_HOURS, - ) - - -# --- Outputs environment variables --- # - - -class LocalEnv(EnvParser): - """Config for local storage.""" - - # Required for EnvParser to believe it's a valid class - dummy_field: str = "" - - def configure_storer(self) -> internal.StorageInterface: - """Overrides the corresponding method in the parent class.""" - return outputs.localfs.Client() - - -class S3Env(EnvParser): - """Config for S3.""" - - AWS_S3_BUCKET: str - AWS_ACCESS_KEY: str = "" - AWS_ACCESS_SECRET: str = "" - AWS_REGION: str - - def configure_storer(self) -> internal.StorageInterface: - """Overrides the corresponding method in the parent class.""" - return outputs.s3.Client( - bucket=self.AWS_S3_BUCKET, - region=self.AWS_REGION, - key=self.AWS_ACCESS_KEY, - secret=self.AWS_ACCESS_SECRET, - ) - - -class HuggingFaceEnv(EnvParser): - """Config for HuggingFace API.""" - - HUGGINGFACE_TOKEN: str - HUGGINGFACE_REPO_ID: str - - def configure_storer(self) -> internal.StorageInterface: - """Overrides the corresponding method in the parent class.""" - return outputs.huggingface.Client( - token=self.HUGGINGFACE_TOKEN, - repoID=self.HUGGINGFACE_REPO_ID, - ) diff --git a/src/nwp_consumer/internal/config/test_env.py b/src/nwp_consumer/internal/config/test_env.py deleted file mode 100644 index fc720140..00000000 --- a/src/nwp_consumer/internal/config/test_env.py +++ /dev/null @@ -1,63 +0,0 @@ -"""Tests for the config module.""" - -import unittest.mock - -from .env import EnvParser, ICONEnv - - -class TestConfig(EnvParser): - """Test config class.""" - - REQUIRED_STR: str - REQUIRED_BOOL: bool - REQUIRED_INT: int - OPTIONAL_STR: str = "default" - OPTIONAL_BOOL: bool = True - OPTIONAL_INT: int = 4 - - -class Test_EnvParser(unittest.TestCase): - """Tests for the _EnvParseMixin class.""" - - @unittest.mock.patch.dict( - "os.environ", - { - "REQUIRED_STR": "required_str", - "REQUIRED_BOOL": "false", - "REQUIRED_INT": "5", - }, - ) - def test_parsesEnvVars(self) -> None: - tc = TestConfig() - - self.assertEqual("required_str", tc.REQUIRED_STR) - self.assertFalse(tc.REQUIRED_BOOL) - self.assertEqual(5, tc.REQUIRED_INT) - self.assertEqual("default", tc.OPTIONAL_STR) - self.assertTrue(tc.OPTIONAL_BOOL) - self.assertEqual(4, tc.OPTIONAL_INT) - - @unittest.mock.patch.dict( - "os.environ", - { - "REQUIRED_STR": "required_str", - "REQUIRED_BOOL": "not a bool", - "REQUIRED_INT": "5.7", - }, - ) - def test_errorsIfCantCastType(self) -> None: - with self.assertRaises(ValueError): - TestConfig() - - def test_errorsIfRequiredFieldNotSet(self) -> None: - with self.assertRaises(OSError): - TestConfig() - - @unittest.mock.patch.dict( - "os.environ", {"ICON_HOURS": "3", "ICON_PARAMETER_GROUP": "basic"} - ) - def test_parsesIconConfig(self) -> None: - tc = ICONEnv() - - self.assertEqual(3, tc.ICON_HOURS) - self.assertEqual("basic", tc.ICON_PARAMETER_GROUP) diff --git a/src/nwp_consumer/internal/inputs/__init__.py b/src/nwp_consumer/internal/inputs/__init__.py deleted file mode 100644 index b8d4905f..00000000 --- a/src/nwp_consumer/internal/inputs/__init__.py +++ /dev/null @@ -1,22 +0,0 @@ -"""Available inputs to source data from.""" - -__all__ = [ - "ceda", - "metoffice", - "ecmwf", - "icon", - "cmc", - "meteofrance", - "noaa", -] - -from . import ( - ceda, - cmc, - ecmwf, - icon, - meteofrance, - metoffice, - noaa, -) - diff --git a/src/nwp_consumer/internal/inputs/ceda/README.md b/src/nwp_consumer/internal/inputs/ceda/README.md deleted file mode 100644 index 04f28d3c..00000000 --- a/src/nwp_consumer/internal/inputs/ceda/README.md +++ /dev/null @@ -1,273 +0,0 @@ -# CEDA - ---- - -## Data - -See -- https://artefacts.ceda.ac.uk/formats/grib/ -- https://dap.ceda.ac.uk/badc/ukmo-nwp/doc/NWP_UKV_Information.pdf - -Investigate files via eccodes: - -```shell -$ conda install -c conda-forge eccodes -``` - -More info on eccodes: https://confluence.ecmwf.int/display/ECC/grib_ls - -For example: - -```shell -$ grib_ls -n parameter -w stepRange=1 201901010000_u1096_ng_umqv_Wholesale1.grib -``` - -## Files - -Sourced from https://zenodo.org/record/7357056. There are two files per -`init_time` (model run time) that contain surface-level parameters of interest. - -The contents of those files differs somewhat from what is presented in the above -document - -#### Un-split File 1 `yyyymmddhhmm_u1096_ng_umqv_Wholesale1.grib` - -Full domain, 35 time steps and the following surface level parameters. - -| paramId | shortName | units | name | -|---------|-----------|----------------|-------------------------| -| 130 | t | K | Temperature | -| 3017 | dpt | K | Dew point temperature | -| 3020 | vis | m | Visibility | -| 157 | r | % | Relative humidity | -| 260074 | prmsl | Pa | Pressure reduced to MSL | -| 207 | 10si | m s**-1 | 10 metre wind speed | -| 260260 | 10wdir | Degree true | 10 metre wind direction | -| 3059 | prate | kg m**-2 s**-1 | Precipitation rate | -| | unknown | unknown | unknown | - -View via pasting the output of the following to this -[online table converter](https://tableconvert.com/json-to-markdown): - -```shell -$ grib_ls -n parameter -w stepRange=0 -j 201901010000_u1096_ng_umqv_Wholesale1.grib -``` - -When loading this file in using *cfgrib*, it loads in 5 distinct xarray datasets. - -
- Wholesale1 Datasets - - --- Dataset 1 --- - Dimensions: (step: 37, values: 385792) - Coordinates: - time datetime64[ns] 2019-01-01 - * step (step) timedelta64[ns] 00:00:00 ... 1 days 12:00:00 - heightAboveGround float64 1.0 - valid_time (step) datetime64[ns] 2019-01-01 ... 2019-01-02T12:00:00 - Dimensions without coordinates: values - Data variables: - t (step, values) float32 ... (1.5m temperature) - r (step, values) float32 ... (1.5m relative humidity) - dpt (step, values) float32 ... (1.5m dew point) - vis (step, values) float32 ... (1.5m visibility) - - --- Dataset 2 --- - Dimensions: (step: 37, values: 385792) - Coordinates: - time datetime64[ns] 2019-01-01 - * step (step) timedelta64[ns] 00:00:00 ... 1 days 12:00:00 - heightAboveGround float64 10.0 - valid_time (step) datetime64[ns] 2019-01-01 ... 2019-01-02T12:00:00 - Dimensions without coordinates: values - Data variables: - si10 (step, values) float32 ... (10m wind speed) - wdir10 (step, values) float32 ... (10m wind direction) - - --- Dataset 3 --- - Dataset 3 - Dimensions: (step: 37, values: 385792) - Coordinates: - time datetime64[ns] 2019-01-01 - * step (step) timedelta64[ns] 00:00:00 01:00:00 ... 1 days 12:00:00 - meanSea float64 0.0 - valid_time (step) datetime64[ns] ... - Dimensions without coordinates: values - Data variables: - prmsl (step, values) float32 ... (mean sea level pressure) - - --- Dataset 4 --- - Dimensions: (step: 36, values: 385792) - Coordinates: - time datetime64[ns] 2019-01-01 - * step (step) timedelta64[ns] 01:00:00 02:00:00 ... 1 days 12:00:00 - surface float64 0.0 - valid_time (step) datetime64[ns] ... - Dimensions without coordinates: values - Data variables: - unknown (step, values) float32 ... (?) - - --- Dataset 5 --- - Dimensions: (step: 37, values: 385792) - Coordinates: - time datetime64[ns] 2019-01-01 - * step (step) timedelta64[ns] 00:00:00 01:00:00 ... 1 days 12:00:00 - surface float64 0.0 - valid_time (step) datetime64[ns] 2019-01-01 ... 2019-01-02T12:00:00 - Dimensions without coordinates: values - Data variables: - unknown (step, values) float32 ... (?) - prate (step, values) float32 ... (total precipitation rate) - -
- -#### Un-split File 2 `yyyymmddhhmm_u1096_ng_umqv_Wholesale2.grib` - -Full domain, 35 time steps and the following surface level parameters: - -| centre | paramId | shortName | units | name | -|--------|---------|-----------|---------|------------------------------------| -| egrr | | unknown | unknown | unknown | -| egrr | 3073 | lcc | % | Low cloud cover | -| egrr | 3074 | mcc | % | Medium cloud cover | -| egrr | 3075 | hcc | % | High cloud cover | -| egrr | | unknown | unknown | unknown | -| egrr | 228046 | hcct | m | Height of convective cloud top | -| egrr | 3073 | lcc | % | Low cloud cover | -| egrr | 260107 | cdcb | m | Cloud base | -| egrr | 3066 | sde | m | Snow depth | -| egrr | 260087 | dswrf | W m**-2 | Downward short-wave radiation flux | -| egrr | 260097 | dlwrf | W m**-2 | Downward long-wave radiation flux | -| egrr | | unknown | unknown | unknown | -| egrr | 3008 | h | m | Geometrical height | - -View via pasting the ouput of the following to this -[online table converter](https://tableconvert.com/json-to-markdown): - -```shell -$ grib_ls -n parameter -w stepRange=0 -j 201901010000_u1096_ng_umqv_Wholesale2.grib -``` - -When loading this file to xarray using *cfgrib*, it comes in 6 distinct -datasets. These datasets only contain 11 of the 13 parameters specified -above, with two of the 11 being unknown variables. - -
- Wholesal21 Datasets - - --- Dataset 1 --- - Dimensions: (step: 37, values: 385792) - Coordinates: - time datetime64[ns] 2019-01-01 - * step (step) timedelta64[ns] 00:00:00 01:00:00 ... 1 days 12:00:00 - atmosphere float64 0.0 - valid_time (step) datetime64[ns] ... - Dimensions without coordinates: values - Data variables: - unknown (step, values) float32 ... (?) - - --- Dataset 2 --- - Dimensions: (step: 37, values: 385792) - Coordinates: - time datetime64[ns] 2019-01-01 - * step (step) timedelta64[ns] 00:00:00 01:00:00 ... 1 days 12:00:00 - cloudBase float64 0.0 - valid_time (step) datetime64[ns] ... - Dimensions without coordinates: values - Data variables: - cdcb (step, values) float32 ... (convective cloud base height) - - --- Dataset 3 --- - Dimensions: (step: 37, values: 385792) - Coordinates: - time datetime64[ns] 2019-01-01 - * step (step) timedelta64[ns] 00:00:00 ... 1 days 12:00:00 - heightAboveGroundLayer float64 0.0 - valid_time (step) datetime64[ns] ... - Dimensions without coordinates: values - Data variables: - lcc (step, values) float32 ... (low cloud amount) - - --- Dataset 4 --- - Dimensions: (step: 37, values: 385792) - Coordinates: - time datetime64[ns] 2019-01-01 - * step (step) timedelta64[ns] 00:00:00 ... 1 days 12:00:00 - heightAboveGroundLayer float64 1.524e+03 - valid_time (step) datetime64[ns] ... - Dimensions without coordinates: values - Data variables: - mcc (step, values) float32 ... (medium cloud amount) - - --- Dataset 5 --- - Dimensions: (step: 37, values: 385792) - Coordinates: - time datetime64[ns] 2019-01-01 - * step (step) timedelta64[ns] 00:00:00 ... 1 days 12:00:00 - heightAboveGroundLayer float64 4.572e+03 - valid_time (step) datetime64[ns] ... - Dimensions without coordinates: values - Data variables: - hcc (step, values) float32 ... (high cloud amount) - - --- Dataset 6 --- - Dimensions: (step: 37, values: 385792) - Coordinates: - time datetime64[ns] 2019-01-01 - * step (step) timedelta64[ns] 00:00:00 01:00:00 ... 1 days 12:00:00 - surface float64 0.0 - valid_time (step) datetime64[ns] 2019-01-01 ... 2019-01-02T12:00:00 - Dimensions without coordinates: values - Data variables: - unknown (step, values) float32 ... - sde (step, values) float32 ... (snow depth water equivalent) - hcct (step, values) float32 ... (height of convective cloud top) - dswrf (step, values) float32 ... (downward short-wave radiation flux) - dlwrf (step, values) float32 ... (downward long-wave radiation flux) - - --- Dataset 7 --- - Dimensions: (step: 37, values: 385792) - Coordinates: - time datetime64[ns] 2019-01-01 - * step (step) timedelta64[ns] 00:00:00 01:00:00 ... 1 days 12:00:00 - level float64 0.0 - valid_time (step) datetime64[ns] ... - Dimensions without coordinates: values - Data variables: - h (step, values) float32 ... (geometrical height) - -
- - -## Geography - - -The geography namespace of the files returns the following information: - -```shell -grib_ls -n geography -w shortName=t,stepRange=0 -j 201901010000_u1096_ng_umqv_Wholesale1.grib -``` - - -| Name | Value | -|------------------------------------|---------------------| -| Ni | 548 | -| Nj | 704 | -| latitudeOfReferencePointInDegrees | 4.9e-05 | -| longitudeOfReferencePointInDegrees | -2e-06 | -| m | 0 | -| XRInMetres | 400000 | -| YRInMetres | -100000 | -| iScansNegatively | 0 | -| jScansPositively | 1 | -| jPointsAreConsecutive | 0 | -| DiInMetres | 2000 | -| DjInMetres | 2000 | -| X1InGridLengths | -238000 | -| Y1InGridLengths | 1.222e+06 | -| X2InGridLengths | 856000 | -| Y2InGridLengths | -184000 | -| gridType | transverse_mercator | -| bitmapPresent | 1 | -| bitmap | 255... | - diff --git a/src/nwp_consumer/internal/inputs/ceda/__init__.py b/src/nwp_consumer/internal/inputs/ceda/__init__.py deleted file mode 100644 index 74f4c648..00000000 --- a/src/nwp_consumer/internal/inputs/ceda/__init__.py +++ /dev/null @@ -1,3 +0,0 @@ -__all__ = ['Client'] - -from .client import Client diff --git a/src/nwp_consumer/internal/inputs/ceda/_models.py b/src/nwp_consumer/internal/inputs/ceda/_models.py deleted file mode 100644 index 86d8f7df..00000000 --- a/src/nwp_consumer/internal/inputs/ceda/_models.py +++ /dev/null @@ -1,58 +0,0 @@ -import datetime as dt -from typing import ClassVar - -from marshmallow import EXCLUDE, Schema, fields -from marshmallow_dataclass import dataclass - -import nwp_consumer.internal as internal - - -@dataclass -class CEDAFileInfo(internal.FileInfoModel): - """Schema of the items section of the response from the CEDA API.""" - - class Meta: - unknown = EXCLUDE - - name: str - - Schema: ClassVar[type[Schema]] = Schema # To prevent confusing type checkers - - def it(self) -> dt.datetime: - """Overrides the corresponding method in the parent class. - - The init time is found the first part of the file name for CEDA files, - e.g. 202201010000_u1096_ng_umqv_Wholesale1.grib - """ - return dt.datetime.strptime(self.name.split("_")[0], "%Y%m%d%H%M").replace( - tzinfo=dt.UTC, - ) - - def filename(self) -> str: - """Overrides the corresponding method in the parent class.""" - return self.name - - def filepath(self) -> str: - """Overrides the corresponding method in the parent class.""" - return f"badc/ukmo-nwp/data/ukv-grib/{self.it():%Y/%m/%d}/{self.name}" - - def variables(self) -> list[str]: - """Overrides the corresponding method in the parent class.""" - raise NotImplementedError() - - def steps(self) -> list[int]: - """Overrides the corresponding method in the parent class.""" - raise NotImplementedError() - - -@dataclass -class CEDAResponse: - """Schema of the response from the CEDA API.""" - - class Meta: - unknown = EXCLUDE - - path: str - items: list[CEDAFileInfo] = fields.List(fields.Nested(CEDAFileInfo.Schema())) - - Schema: ClassVar[type[Schema]] = Schema # To prevent confusing type checkers diff --git a/src/nwp_consumer/internal/inputs/ceda/client.py b/src/nwp_consumer/internal/inputs/ceda/client.py deleted file mode 100644 index d190953a..00000000 --- a/src/nwp_consumer/internal/inputs/ceda/client.py +++ /dev/null @@ -1,327 +0,0 @@ -"""Client adapting CEDA API to internal Fetcher port.""" - -import datetime as dt -import pathlib -import typing -import urllib.parse -import urllib.request - -import cfgrib -import numpy as np -import requests -import structlog -import xarray as xr - -from nwp_consumer import internal - -from ._models import CEDAFileInfo, CEDAResponse - -log = structlog.getLogger() - -# Defines parameters in CEDA that are not available from MetOffice -PARAMETER_IGNORE_LIST: typing.Sequence[str] = ( - "unknown", - "h", - "hcct", - "cdcb", - "dpt", - "prmsl", - "cbh", -) - -COORDINATE_ALLOW_LIST: typing.Sequence[str] = ("time", "step", "x", "y") - -# Defines the mapping from CEDA parameter names to OCF parameter names - - - -class Client(internal.FetcherInterface): - """Implements a client to fetch data from CEDA.""" - - # CEDA FTP Username - __username: str - # CEDA FTP Password - __password: str - # FTP url for CEDA data - __ftpBase: str - - def __init__(self, ftpUsername: str, ftpPassword: str) -> None: - """Create a new CEDAClient. - - Exposes a client for CEDA's FTP server that conforms to the FetcherInterface. - - Args: - ftpUsername: The username to use to connect to the CEDA FTP server. - ftpPassword: The password to use to connect to the CEDA FTP server. - """ - self.__username: str = urllib.parse.quote(ftpUsername) - self.__password: str = urllib.parse.quote(ftpPassword) - self.__ftpBase: str = f"ftp://{self.__username}:{self.__password}@ftp.ceda.ac.uk" - - def datasetName(self) -> str: - """Overrides corresponding parent method.""" - return "UKV" - - def getInitHours(self) -> list[int]: - """Overrides corresponding parent method.""" - return [0, 3, 6, 9, 12, 15, 18, 21] - - def listRawFilesForInitTime(self, *, it: dt.datetime) -> list[internal.FileInfoModel]: - """Overrides corresponding parent method.""" - # Ignore inittimes that don't correspond to valid hours - if it.hour not in self.getInitHours(): - return [] - - # Fetch info for all files available on the input date - # * CEDA has a HTTPS JSON API for this purpose - response: requests.Response = requests.request( - method="GET", - url=f"https://data.ceda.ac.uk/badc/ukmo-nwp/data/ukv-grib/{it:%Y/%m/%d}?json", - ) - - if response.status_code == 404: - # No data available for this init time. Fail soft - log.warn( - event="no data available for init time", - init_time=f"{it:%Y/%m/%d %H:%M}", - url=response.url, - ) - return [] - if not response.ok: - # Something else has gone wrong. Fail hard - log.warn( - event="error response from filelist endpoint", - url=response.url, - response=response.json(), - ) - return [] - - # Map the response to a CEDAResponse object to ensure it looks as expected - try: - responseObj: CEDAResponse = CEDAResponse.Schema().load(response.json()) - except Exception as e: - log.warn( - event="response from ceda does not match expected schema", - error=e, - response=response.json(), - ) - return [] - - # Filter the files for the desired init time - wantedFiles: list[CEDAFileInfo] = [ - fileInfo for fileInfo in responseObj.items if _isWantedFile(fi=fileInfo, dit=it) - ] - - return wantedFiles - - def downloadToCache( - self, *, fi: internal.FileInfoModel, - ) -> pathlib.Path: - """Overrides corresponding parent method.""" - if self.__password == "" or self.__username == "": - log.error(event="all ceda credentials not provided") - return pathlib.Path() - - log.debug(event="requesting download of file", file=fi.filename(), path=fi.filepath()) - url: str = f"{self.__ftpBase}/{fi.filepath()}" - try: - response = urllib.request.urlopen(url=url) - except Exception as e: - log.warn( - event="error calling url for file", - url=fi.filepath(), - filename=fi.filename(), - error=e, - ) - return pathlib.Path() - - # Stream the filedata into a cached file - cfp: pathlib.Path = internal.rawCachePath(it=fi.it(), filename=fi.filename()) - with cfp.open("wb") as f: - for chunk in iter(lambda: response.read(16 * 1024), b""): - f.write(chunk) - f.flush() - - log.debug( - event="fetched all data from file", - filename=fi.filename(), - url=fi.filepath(), - filepath=cfp.as_posix(), - nbytes=cfp.stat().st_size, - ) - - return cfp - - def mapCachedRaw(self, *, p: pathlib.Path) -> xr.Dataset: - """Overrides corresponding parent method.""" - if p.suffix != ".grib": - log.warn(event="cannot map non-grib file to dataset", filepath=p.as_posix()) - return xr.Dataset() - - log.debug(event="mapping raw file to xarray dataset", filepath=p.as_posix()) - - # Check the file has the right name - if not any(setname in p.name.lower() for setname in [ - "wholesale1.grib", "wholesale2.grib", "wholesale1t54.grib", "wholesale2t54.grib", - ]): - log.debug( - event="skipping file as it does not match expected name", - filepath=p.as_posix(), - ) - return xr.Dataset() - - # Load the wholesale file as a list of datasets - # * cfgrib loads multiple hypercubes for a single multi-parameter grib file - # * Can also set backend_kwargs={"indexpath": ""}, to avoid the index file - try: - datasets: list[xr.Dataset] = cfgrib.open_datasets( - path=p.as_posix(), - chunks={"time": 1, "step": -1, "variable": -1, "x": "auto", "y": "auto"}, - backend_kwargs={"indexpath": ""}, - ) - except Exception as e: - log.warn(event="error converting raw file to dataset", filepath=p.as_posix(), error=e) - return xr.Dataset() - - for i, ds in enumerate(datasets): - # Ensure the temperature is defined at 1 meter above ground level - # * In the early NWPs (definitely in the 2016-03-22 NWPs): - # - `heightAboveGround` only has one entry ("1" meter above ground) - # - `heightAboveGround` isn't set as a dimension for `t`. - # * In later NWPs, 'heightAboveGround' has 2 values (0, 1) and is a dimension for `t`. - if "t" in ds and "heightAboveGround" in ds["t"].dims: - ds = ds.sel(heightAboveGround=1) - - # Snow depth is in `m` from CEDA, but OCF expects `kg m-2`. - # * A scaling factor of 1000 converts between the two. - # * See "Snow Depth" entry in https://gridded-data-ui.cda.api.metoffice.gov.uk/glossary - if "sde" in ds: - ds = ds.assign(sde=ds["sde"] * 1000) - - # Delete unnecessary data variables - for var_name in PARAMETER_IGNORE_LIST: - if var_name in ds: - del ds[var_name] - - # Delete unwanted coordinates - ds = ds.drop_vars( - names=[c for c in ds.coords if c not in COORDINATE_ALLOW_LIST], - errors="ignore", - ) - - # Put the modified dataset back in the list - datasets[i] = ds - - # Merge the datasets back into one - wholesaleDataset = xr.merge( - objects=datasets, - compat="override", - combine_attrs="drop_conflicts", - ) - - del datasets - - # Add in x and y coordinates - try: - wholesaleDataset = _reshapeTo2DGrid(ds=wholesaleDataset) - except Exception as e: - log.warn(event="error reshaping to 2D grid", filepath=p.as_posix(), error=e) - return xr.Dataset() - - # Map the data to the internal dataset representation - # * Transpose the Dataset so that the dimensions are correctly ordered - # * Rechunk the data to a more optimal size - wholesaleDataset = ( - wholesaleDataset.rename({"time": "init_time"}) - .expand_dims("init_time") - .transpose("init_time", "step", "y", "x") - .sortby("step") - .chunk( - { - "init_time": 1, - "step": -1, - "y": len(wholesaleDataset.y) // 2, - "x": len(wholesaleDataset.x) // 2, - }, - ) - ) - return wholesaleDataset - - def parameterConformMap(self) -> dict[str, internal.OCFParameter]: - """Overrides corresponding parent method.""" - return { - "10wdir": internal.OCFParameter.WindDirectionFromWhichBlowingSurfaceAdjustedAGL, - "10si": internal.OCFParameter.WindSpeedSurfaceAdjustedAGL, - "prate": internal.OCFParameter.RainPrecipitationRate, - "r": internal.OCFParameter.RelativeHumidityAGL, - "t": internal.OCFParameter.TemperatureAGL, - "vis": internal.OCFParameter.VisibilityAGL, - "dswrf": internal.OCFParameter.DownwardShortWaveRadiationFlux, - "dlwrf": internal.OCFParameter.DownwardLongWaveRadiationFlux, - "hcc": internal.OCFParameter.HighCloudCover, - "mcc": internal.OCFParameter.MediumCloudCover, - "lcc": internal.OCFParameter.LowCloudCover, - "sde": internal.OCFParameter.SnowDepthWaterEquivalent, - } - - -def _isWantedFile(*, fi: CEDAFileInfo, dit: dt.datetime) -> bool: - """Check if the input FileInfo corresponds to a wanted GRIB file. - - :param fi: The File Info object describing the file to check - :param dit: The desired init time - """ - if fi.it().date() != dit.date() or fi.it().time() != dit.time(): - return False - # False if item doesn't correspond to Wholesale1 or Wholesale2 files up to 54 time steps - if not any(setname in fi.filename() for setname in ["Wholesale1.grib", "Wholesale2.grib", "Wholesale1T54.grib", "Wholesale2T54.grib"]): - return False - - return True - - -def _reshapeTo2DGrid(*, ds: xr.Dataset) -> xr.Dataset: - """Convert 1D into 2D array. - - In the grib files, the pixel values are in a flat 1D array (indexed by the `values` dimension). - The ordering of the pixels in the grib are left to right, bottom to top. - - This function replaces the `values` dimension with an `x` and `y` dimension, - and, for each step, reshapes the images to be 2D. - - :param ds: The dataset to reshape - """ - # Adapted from https://stackoverflow.com/a/62667154 and - # https://github.com/SciTools/iris-grib/issues/140#issuecomment-1398634288 - - # Define geographical domain for UKV. Taken from page 4 of https://zenodo.org/record/7357056 - dx = dy = 2000 - maxY = 1223000 - minY = -185000 - minX = -239000 - maxX = 857000 - # * Note that the UKV NWPs y is top-to-bottom, hence step is negative. - northing = np.arange(start=maxY, stop=minY, step=-dy, dtype=np.int32) - easting = np.arange(start=minX, stop=maxX, step=dx, dtype=np.int32) - - if ds.sizes["values"] != len(northing) * len(easting): - raise ValueError( - f"dataset has {ds.sizes['values']} values, " - f"but expected {len(northing) * len(easting)}", - ) - - # Create new coordinates, - # which give the `x` and `y` position for each position in the `values` dimension: - ds = ds.assign_coords( - { - "x": ("values", np.tile(easting, reps=len(northing))), - "y": ("values", np.repeat(northing, repeats=len(easting))), - }, - ) - - # Now set `values` to be a MultiIndex, indexed by `y` and `x`: - ds = ds.set_index(values=("y", "x")) - - # Now unstack. This gets rid of the `values` dimension and indexes - # the data variables using `y` and `x`. - return ds.unstack("values") diff --git a/src/nwp_consumer/internal/inputs/ceda/test_client.py b/src/nwp_consumer/internal/inputs/ceda/test_client.py deleted file mode 100644 index 503b6378..00000000 --- a/src/nwp_consumer/internal/inputs/ceda/test_client.py +++ /dev/null @@ -1,132 +0,0 @@ -import datetime as dt -import pathlib -import unittest.mock - -import numpy as np -import xarray as xr - -from ._models import CEDAFileInfo -from .client import ( - Client, - _isWantedFile, - _reshapeTo2DGrid, -) - -# --------- Test setup --------- # - -testClient = Client(ftpPassword="", ftpUsername="") - - -# --------- Client methods --------- # - -class TestClient_ListRawFilesForInitTime(unittest.TestCase): - - def test_listsFilesCorrectly(self) -> None: - pass - - -class TestClient_FetchRawFileBytes(unittest.TestCase): - - def test_fetchesFileCorrectly(self) -> None: - pass - - -class TestClient_MapCachedRaw(unittest.TestCase): - - def test_convertsWholesale1FileCorrectly(self) -> None: - wholesalePath: pathlib.Path = pathlib.Path(__file__).parent / "test_wholesale1.grib" - - out = testClient.mapCachedRaw(p=wholesalePath) - - # Ensure the dimensions have the right sizes - self.assertDictEqual( - {"init_time": 1, "step": 4, "y": 704, "x": 548}, - dict(out.sizes.items()), - ) - # Ensure the correct variables are in the variable dimension - self.assertCountEqual( - ["prate", "r", "si10", "t", "vis", "wdir10"], - list(out.data_vars.keys()), - ) - - @unittest.skip("Broken on github ci") - def test_convertsWholesale2FileCorrectly(self) -> None: - wholesalePath: pathlib.Path = pathlib.Path(__file__).parent / "test_wholesale2.grib" - - out = testClient.mapCachedRaw(p=wholesalePath) - - # Ensure the dimensions have the right sizes - self.assertDictEqual( - {"init_time": 1, "step": 4, "y": 704, "x": 548}, - dict(out.sizes.items()), - ) - # Ensure the correct variables are in the variable dimension - self.assertCountEqual( - ["dlwrf", "dswrf", "hcc", "lcc", "mcc", "sde"], - list(out.data_vars.keys()), - ) - -# --------- Static methods --------- # - -class TestIsWantedFile(unittest.TestCase): - - def test_correctlyFiltersCEDAFileInfos(self) -> None: - initTime: dt.datetime = dt.datetime( - year=2021, month=1, day=1, hour=0, minute=0, tzinfo=dt.timezone.utc, - ) - - wantedFileInfos: list[CEDAFileInfo] = [ - CEDAFileInfo(name="202101010000_u1096_ng_umqv_Wholesale1.grib"), - CEDAFileInfo(name="202101010000_u1096_ng_umqv_Wholesale2.grib"), - ] - - unwantedFileInfos: list[CEDAFileInfo] = [ - CEDAFileInfo(name="202101010000_u1096_ng_umqv_Wholesale1T54.grib"), - CEDAFileInfo(name="202101010000_u1096_ng_umqv_Wholesale2T54.grib"), - CEDAFileInfo(name="202101010000_u1096_ng_umqv_Wholesale3.grib"), - CEDAFileInfo(name="202101010000_u1096_ng_umqv_Wholesale3T54.grib"), - CEDAFileInfo(name="202101010000_u1096_ng_umqv_Wholesale4.grib"), - CEDAFileInfo(name="202101010000_u1096_ng_umqv_Wholesale5.grib"), - CEDAFileInfo(name="202101010000_u1096_ng_umqv_Wholesale5T54.grib"), - CEDAFileInfo(name="202101010300_u1096_ng_umqv_Wholesale1T120.grib"), - CEDAFileInfo(name="202101010300_u1096_ng_umqv_Wholesale1.grib"), - ] - - self.assertTrue( - all(_isWantedFile(fi=fo, dit=initTime) for fo in wantedFileInfos)) - self.assertFalse( - all(_isWantedFile(fi=fo, dit=initTime) for fo in unwantedFileInfos)) - - -class TestReshapeTo2DGrid(unittest.TestCase): - - def test_correctlyReshapesData(self) -> None: - dataset = xr.Dataset( - data_vars={ - "wdir10": (("step", "values"), np.random.rand(4, 385792)), - }, - coords={ - "step": [0, 1, 2, 3], - }, - ) - - reshapedDataset = _reshapeTo2DGrid(ds=dataset) - - self.assertEqual(548, reshapedDataset.dims["x"]) - self.assertEqual(704, reshapedDataset.dims["y"]) - - with self.assertRaises(KeyError): - _ = reshapedDataset["values"] - - def test_raisesErrorForIncorrectNumberOfValues(self) -> None: - ds1 = xr.Dataset( - data_vars={ - "wdir10": (("step", "values"), [[1, 2, 3, 4], [5, 6, 7, 8]]), - }, - coords={ - "step": [0, 1], - }, - ) - - with self.assertRaises(ValueError): - _ = _reshapeTo2DGrid(ds=ds1) diff --git a/src/nwp_consumer/internal/inputs/ceda/test_wholesale1.grib b/src/nwp_consumer/internal/inputs/ceda/test_wholesale1.grib deleted file mode 100644 index 82c48e73..00000000 Binary files a/src/nwp_consumer/internal/inputs/ceda/test_wholesale1.grib and /dev/null differ diff --git a/src/nwp_consumer/internal/inputs/ceda/test_wholesale2.grib b/src/nwp_consumer/internal/inputs/ceda/test_wholesale2.grib deleted file mode 100644 index 1c8d302f..00000000 Binary files a/src/nwp_consumer/internal/inputs/ceda/test_wholesale2.grib and /dev/null differ diff --git a/src/nwp_consumer/internal/inputs/cmc/CMC_glb_CAPE_SFC_0_latlon.15x.15_2023080900_P027.grib2 b/src/nwp_consumer/internal/inputs/cmc/CMC_glb_CAPE_SFC_0_latlon.15x.15_2023080900_P027.grib2 deleted file mode 100644 index 6950eae6..00000000 Binary files a/src/nwp_consumer/internal/inputs/cmc/CMC_glb_CAPE_SFC_0_latlon.15x.15_2023080900_P027.grib2 and /dev/null differ diff --git a/src/nwp_consumer/internal/inputs/cmc/CMC_glb_TMP_TGL_2_latlon.15x.15_2023080900_P027.grib2 b/src/nwp_consumer/internal/inputs/cmc/CMC_glb_TMP_TGL_2_latlon.15x.15_2023080900_P027.grib2 deleted file mode 100644 index 6548a192..00000000 Binary files a/src/nwp_consumer/internal/inputs/cmc/CMC_glb_TMP_TGL_2_latlon.15x.15_2023080900_P027.grib2 and /dev/null differ diff --git a/src/nwp_consumer/internal/inputs/cmc/CMC_glb_VGRD_ISBL_200_latlon.15x.15_2023080900_P027.grib2 b/src/nwp_consumer/internal/inputs/cmc/CMC_glb_VGRD_ISBL_200_latlon.15x.15_2023080900_P027.grib2 deleted file mode 100644 index b7e53b91..00000000 Binary files a/src/nwp_consumer/internal/inputs/cmc/CMC_glb_VGRD_ISBL_200_latlon.15x.15_2023080900_P027.grib2 and /dev/null differ diff --git a/src/nwp_consumer/internal/inputs/cmc/__init__.py b/src/nwp_consumer/internal/inputs/cmc/__init__.py deleted file mode 100644 index 5d97b9a1..00000000 --- a/src/nwp_consumer/internal/inputs/cmc/__init__.py +++ /dev/null @@ -1,4 +0,0 @@ -__all__ = ["Client"] - -from .client import Client - diff --git a/src/nwp_consumer/internal/inputs/cmc/_consts.py b/src/nwp_consumer/internal/inputs/cmc/_consts.py deleted file mode 100644 index aba6595f..00000000 --- a/src/nwp_consumer/internal/inputs/cmc/_consts.py +++ /dev/null @@ -1,62 +0,0 @@ -"""Defines all parameters available from GDPS.""" - - -GDPS_VARIABLES = [ - "ALBDO", - "ABSV", - "CWAT", - "TSOIL", - "SOILVIC", - "SOILM", - "SFCWRO", - "CAPE", - "CIN", - "ACPCP", - "DLWRF", - "DSWRF", - "HGT", - "FPRATE", - "IPRATE", - "PCPNTYPE", - "LHTFL", - "NLWRS", - "NSWRS", - "PRATE", - "PRES", - "RH", - "SKINT", - "SDEN", - "SNOD", - "SPRATE", - "SPFH", - "TMP", - "TCDC", - "APCP", - "ULWRF", - "VVEL", - "GUST", - "UGRD", - "VGRD", -] - -GEPS_VARIABLES = [ - "CAPE", - "CIN", - "HGT", - "ICETK", - "PRES", - "PRMSL", - "PWAT", - "RH", - "SCWRO", - "SNOD", - "SPFH", - "TCDC", - "TMP", - "TSOIL", - "UGRD", - "VGRD", - "WEASD", - "WIND", - "VVEL" - ] diff --git a/src/nwp_consumer/internal/inputs/cmc/_models.py b/src/nwp_consumer/internal/inputs/cmc/_models.py deleted file mode 100644 index fa414c8b..00000000 --- a/src/nwp_consumer/internal/inputs/cmc/_models.py +++ /dev/null @@ -1,37 +0,0 @@ -import datetime as dt - -from nwp_consumer import internal - - -class CMCFileInfo(internal.FileInfoModel): - def __init__( - self, - it: dt.datetime, - filename: str, - currentURL: str, - step: int, - ) -> None: - self._it = it - self._filename = filename - self._url = currentURL - self.step = step - - def filename(self) -> str: - """Overrides the corresponding method in the parent class.""" - return self._filename - - def filepath(self) -> str: - """Overrides the corresponding method in the parent class.""" - return self._url + "/" + self._filename - - def it(self) -> dt.datetime: - """Overrides the corresponding method in the parent class.""" - return self._it - - def steps(self) -> list[int]: - """Overrides the corresponding method in the parent class.""" - return [self.step] - - def variables(self) -> list: - """Overrides the corresponding method in the parent class.""" - raise NotImplementedError() diff --git a/src/nwp_consumer/internal/inputs/cmc/client.py b/src/nwp_consumer/internal/inputs/cmc/client.py deleted file mode 100644 index 4e6bf1f8..00000000 --- a/src/nwp_consumer/internal/inputs/cmc/client.py +++ /dev/null @@ -1,332 +0,0 @@ -"""Implements a client to fetch GDPS/GEPS data from CMC.""" -import datetime as dt -import pathlib -import re -import typing -import urllib.request - -import requests -import structlog -import xarray as xr - -from nwp_consumer import internal - -from ._consts import GDPS_VARIABLES, GEPS_VARIABLES -from ._models import CMCFileInfo -from ... import OCFParameter - -log = structlog.getLogger() - - -COORDINATE_ALLOW_LIST: typing.Sequence[str] = ("time", "step", "latitude", "longitude") - - -class Client(internal.FetcherInterface): - """Implements a client to fetch GDPS/GEPS data from CMC.""" - - baseurl: str # The base URL for the GDPS/GEPS model - model: str # The model to fetch data for - parameters: list[str] # The parameters to fetch - - def __init__(self, model: str, hours: int = 48, param_group: str = "default") -> None: - """Create a new GDPS Client. - - Exposes a client for GDPS and GEPS data from Canada CMC that conforms to the FetcherInterface. - - Args: - model: The model to fetch data for. Valid models are "gdps" and "geps". - param_group: The set of parameters to fetch. - Valid groups are "default", "full", and "basic". - """ - self.baseurl = "https://dd.weather.gc.ca" - - match model: - case "gdps": - self.baseurl += "/model_gem_global/15km/grib2/lat_lon/" - case "geps": - self.baseurl += "/ensemble/geps/grib2/raw/" - case _: - raise ValueError( - f"unknown GDPS/GEPS model {model}. Valid models are 'gdps' and 'geps'", - ) - - match (param_group, model): - case ("default", _): - self.parameters = ["t", "tclc", "dswrf", "dlwrf", "snod", "rh", "u", "v"] - case ("full", "geps"): - self.parameters = GEPS_VARIABLES - case ("full", "gdps"): - self.parameters = GDPS_VARIABLES - case ("basic", "geps"): - self.parameters = GEPS_VARIABLES[:2] - case ("basic", "gdps"): - self.parameters = GDPS_VARIABLES[:2] - case (_, _): - raise ValueError( - f"unknown parameter group {param_group}." - "Valid groups are 'default', 'full', 'basic'", - ) - - self.model = model - self.hours = hours - - def datasetName(self) -> str: - """Overrides the corresponding method in the parent class.""" - return f"CMC_{self.model}".upper() - - def getInitHours(self) -> list[int]: # noqa: D102 - return [0, 12] - - def listRawFilesForInitTime(self, *, it: dt.datetime) -> list[internal.FileInfoModel]: # noqa: D102 - # GDPS data is only available for today's and yesterday's date. - # If data hasn't been uploaded for that inittime yet, - # then yesterday's data will still be present on the server. - if it.date() != dt.datetime.now(dt.UTC).date(): - raise ValueError("GDPS/GEPS data is only available on today's date") - - # Ignore inittimes that don't correspond to valid hours - if it.hour not in self.getInitHours(): - return [] - - files: list[internal.FileInfoModel] = [] - - # Files are split per parameter, level, and step, with a webpage per parameter - # * The webpage contains a list of files for the parameter - # * Find these files for each parameter and add them to the list - for param in self.parameters: - # The list of files for the parameter - parameterFiles: list[internal.FileInfoModel] = [] - - # Fetch CMC webpage detailing the available files for the timestep - response = requests.get(f"{self.baseurl}/{it.strftime('%H')}/000/", timeout=3) - - if response.status_code != 200: - log.warn( - event="error fetching filelisting webpage for parameter", - status=response.status_code, - url=response.url, - param=param, - inittime=it.strftime("%Y-%m-%d %H:%M"), - ) - continue - - # The webpage's HTML contains a list of tags - # * Each tag has a href, most of which point to a file) - for line in response.text.splitlines(): - # Check if the line contains a href, if not, skip it - refmatch = re.search(pattern=r'href="(.+)">', string=line) - if refmatch is None: - continue - - # The href contains the name of a file - parse this into a FileInfo object - fi: CMCFileInfo | None = None - # If downloading all variables, match all files - # * Otherwise only match single level and time invariant - fi = _parseCMCFilename( - name=refmatch.groups()[0], - baseurl=self.baseurl, - match_pl=self.parameters in ["t", "tclc", "dswrf", "dlwrf", "snod", "rh", "u", "v"], - match_hl=self.parameters in ["t", "tclc", "dswrf", "dlwrf", "snod", "rh", "u", "v"], - ) - # Ignore the file if it is not for today's date or has a step > 48 (when conforming) - if fi is None or fi.it() != it or (fi.step > self.hours and self.conform): - continue - - # Add the file to the list - parameterFiles.append(fi) - - log.debug( - event="listed files for parameter", - param=param, - inittime=it.strftime("%Y-%m-%d %H:%M"), - url=response.url, - numfiles=len(parameterFiles), - ) - - # Add the files for the parameter to the list of all files - files.extend(parameterFiles) - - return files - - def mapCachedRaw(self, *, p: pathlib.Path) -> xr.Dataset: # noqa: D102 - if p.suffix != ".grib2": - log.warn( - event="cannot map non-grib file to dataset", - filepath=p.as_posix(), - ) - return xr.Dataset() - - log.debug(event="mapping raw file to xarray dataset", filepath=p.as_posix()) - - # Load the raw file as a dataset - try: - ds = xr.open_dataset( - p.as_posix(), - engine="cfgrib", - chunks={ - "time": 1, - "step": 1, - "latitude": "auto", - "longitude": "auto", - }, - ) - except Exception as e: - log.warn( - event="error converting raw file as dataset", - error=e, - filepath=p.as_posix(), - ) - return xr.Dataset() - # Rename variable to the value, as some have unknown as the name - if next(iter(ds.data_vars.keys())) == "unknown": - ds = ds.rename({"unknown": str(p.name).split("_")[2].lower()}) - - # Rename variables that are both pressure level and surface - if "surface" in list(ds.coords): - ds = ds.rename({"surface": "heightAboveGround"}) - - if "heightAboveGround" in list(ds.coords) and next(iter(ds.data_vars.keys())) in [ - "q", - "t", - "u", - "v", - ]: - # Rename data variable to add _surface to it so merging works later - ds = ds.rename( - {next(iter(ds.data_vars.keys())): f"{next(iter(ds.data_vars.keys()))}_surface"}, - ) - - if "isobaricInhPa" in list(ds.coords): - if "rh" in list(ds.data_vars.keys()): - ds = ds.rename({"isobaricInhPa": "isobaricInhPa_humidity"}) - if "absv" in list(ds.data_vars.keys()) or "vvel" in list(ds.data_vars.keys()): - ds = ds.rename({"isobaricInhPa": "isobaricInhPa_absv_vvel"}) - - # Map the data to the internal dataset representation - # * Transpose the Dataset so that the dimensions are correctly ordered - # * Rechunk the data to a more optimal size - ds = ( - ds.rename({"time": "init_time"}) - .expand_dims("init_time") - .expand_dims("step") - .transpose("init_time", "step", ...) - .sortby("step") - .chunk( - { - "init_time": 1, - "step": -1, - }, - ) - ) - - return ds - - def downloadToCache( # noqa: D102 - self, - *, - fi: internal.FileInfoModel, - ) -> pathlib.Path: - log.debug(event="requesting download of file", file=fi.filename(), path=fi.filepath()) - try: - response = urllib.request.urlopen(fi.filepath()) - except Exception as e: - log.warn( - event="error calling url for file", - url=fi.filepath(), - filename=fi.filename(), - error=e, - ) - return pathlib.Path() - - if response.status != 200: - log.warn( - event="error downloading file", - status=response.status, - url=fi.filepath(), - filename=fi.filename(), - ) - return pathlib.Path() - - cfp: pathlib.Path = internal.rawCachePath(it=fi.it(), filename=fi.filename()) - with open(cfp, "wb") as f: - f.write(response.read()) - - log.debug( - event="fetched all data from file", - filename=fi.filename(), - url=fi.filepath(), - filepath=cfp.as_posix(), - nbytes=cfp.stat().st_size, - ) - - return cfp - - def parameterConformMap(self) -> dict[str, OCFParameter]: - """Overrides the corresponding method in the parent class.""" - # See https://eccc-msc.github.io/open-data/msc-data/nwp_gdps/readme_gdps-datamart_en/ - # for a list of CMC parameters - return { - "t": internal.OCFParameter.TemperatureAGL, - "tclc": internal.OCFParameter.TotalCloudCover, - "dswrf": internal.OCFParameter.DownwardShortWaveRadiationFlux, - "dlwrf": internal.OCFParameter.DownwardLongWaveRadiationFlux, - "snod": internal.OCFParameter.SnowDepthWaterEquivalent, - "rh": internal.OCFParameter.RelativeHumidityAGL, - "u": internal.OCFParameter.WindUComponentAGL, - "v": internal.OCFParameter.WindVComponentAGL, - } - - - -def _parseCMCFilename( - name: str, - baseurl: str, - match_sl: bool = True, - match_hl: bool = False, - match_pl: bool = False, -) -> CMCFileInfo | None: - """Parse a string of HTML into an CMCFileInfo object, if it contains one. - - Args: - name: The name of the file to parse - baseurl: The base URL for the GDPS model - match_sl: Whether to match single-level files - match_hl: Whether to match Height Above Ground-level files - match_pl: Whether to match pressure-level files - """ - # TODO: @Jacob even fixed, these do not match a lot of the files in the store, is that on purpose? # noqa: E501 - # Define the regex patterns to match the different types of file - # * Single Level GDPS: `CMC___SFC_0_latlon_YYYYMMDD_PLLL.grib2` - # * Sinle Level GEPS: `CMC_geps-raw_CIN_SFC_0_latlon0p5x0p5_2024011800_P000_allmbrs.grib2` - slRegex = r"CMC_[a-z-]{3,8}_([A-Za-z_\d]+)_SFC_0_latlon[\S]{7}_(\d{10})_P(\d{3})[\S]*.grib" - # * HeightAboveGround GDPS: `CMC_glb_ISBL_TGL_40_latlon.15x.15_2023080900_P027.grib2` - # * HeightAboveGround GEPS: `CMC_geps-raw_SPFH_TGL_2_latlon0p5x0p5_2023080900_P027_allmbrs.grib2` # noqa: E501 - hlRegex = r"CMC_[a-z-]{3,8}_([A-Za-z_\d]+)_TGL_(\d{1,4})_latlon[\S]{7}_(\d{10})_P(\d{3})[\S]*.grib" # noqa: E501 - # * Pressure Level GDPS: `CMC_glb_TMP_ISBL_500_latlon.15x.15_2023080900_P027.grib2` - # * Pressure Level GEPS: `CMC_geps-raw_TMP_ISBL_500_latlon0p5x0p5_2023080900_P027_allmbrs.grib2` - plRegex = r"CMC_[a-z-]{3,8}_([A-Za-z_\d]+)_ISBL_(\d{1,4})_latlon[\S]{7}_(\d{10})_P(\d{3})[\S]*.grib" # noqa: E501 - - itstring = paramstring = "" - stepstring = "000" - # Try to match the href to one of the regex patterns - slmatch = re.search(pattern=slRegex, string=name) - hlmatch = re.search(pattern=hlRegex, string=name) - plmatch = re.search(pattern=plRegex, string=name) - - if slmatch and match_sl: - paramstring, itstring, stepstring = slmatch.groups() - elif hlmatch and match_hl: - paramstring, levelstring, itstring, stepstring = hlmatch.groups() - elif plmatch and match_pl: - paramstring, levelstring, itstring, stepstring = plmatch.groups() - else: - return None - - it = dt.datetime.strptime(itstring, "%Y%m%d%H").replace(tzinfo=dt.UTC) - - return CMCFileInfo( - it=it, - filename=name, - currentURL=f"{baseurl}/{it.strftime('%H')}/{stepstring}/", - step=int(stepstring), - ) diff --git a/src/nwp_consumer/internal/inputs/cmc/test_client.py b/src/nwp_consumer/internal/inputs/cmc/test_client.py deleted file mode 100644 index edc7c990..00000000 --- a/src/nwp_consumer/internal/inputs/cmc/test_client.py +++ /dev/null @@ -1,78 +0,0 @@ -import datetime as dt -import pathlib -import unittest -from typing import TYPE_CHECKING - -if TYPE_CHECKING: - from ._models import CMCFileInfo - -from .client import Client, _parseCMCFilename - -testClient = Client(model="gdps") - - -class TestClient(unittest.TestCase): - def test_mapCachedRaw(self) -> None: - # Test with global file - testFilePath: pathlib.Path = ( - pathlib.Path(__file__).parent / "CMC_glb_VGRD_ISBL_200_latlon.15x.15_2023080900_P027.grib2" - ) - out = testClient.mapCachedRaw(p=testFilePath) - - # Check latitude and longitude are injected - self.assertTrue("latitude" in out.coords) - self.assertTrue("longitude" in out.coords) - self.assertEqual(len(out["latitude"].values), 1201) - self.assertEqual(len(out["longitude"].values), 2400) - # Check that the dimensions are correctly ordered and renamed - self.assertEqual( - out[next(iter(out.data_vars.keys()))].dims, - ("init_time", "step", "latitude", "longitude"), - ) - - # Test with europe file - testFilePath: pathlib.Path = ( - pathlib.Path(__file__).parent / "CMC_glb_CAPE_SFC_0_latlon.15x.15_2023080900_P027.grib2" - ) - out = testClient.mapCachedRaw(p=testFilePath) - - # Check latitude and longitude are present - self.assertTrue("latitude" in out.coords) - self.assertTrue("longitude" in out.coords) - self.assertEqual(len(out["latitude"].values), 1201) - self.assertEqual(len(out["longitude"].values), 2400) - # Check that the dimensions are correctly ordered and renamed - self.assertEqual( - out[next(iter(out.data_vars.keys()))].dims, - ("init_time", "step", "latitude", "longitude"), - ) - - - -class TestParseCMCFilename(unittest.TestCase): - baseurl = "https://dd.weather.gc.ca/model_gem_global/15km/grib2/lat_lon/" - - def test_parses(self) -> None: - tests = { - "gdps-sl": "CMC_glb_CIN_SFC_0_latlon.15x.15_2023080900_P027.grib2", - "geps-sl": "CMC_geps-raw_CIN_SFC_0_latlon0p5x0p5_2023080900_P027_allmbrs.grib2", - "gdps-hl": "CMC_glb_SPFH_TGL_40_latlon.15x.15_2023080900_P027.grib2", - "geps-hl": "CMC_geps-raw_SPFH_TGL_80_latlon0p5x0p5_2023080900_P000_allmbrs.grib2", - "gdps-pl": "CMC_glb_TMP_ISBL_300_latlon.15x.15_2023080900_P000.grib2", - "geps-pl": "CMC_geps-raw_TMP_ISBL_0500_latlon0p5x0p5_2023080900_P000_allmbrs.grib2", - } - - for k, v in tests.items(): - with self.subTest(msg=k): - out: CMCFileInfo | None = _parseCMCFilename( - name=v, - baseurl=self.baseurl, - match_hl="hl" in k, - match_pl="pl" in k, - ) - if out is None: - self.fail(f"Failed to parse filename {v}") - self.assertEqual(out.filename(), v) - self.assertEqual(out.it(), dt.datetime(2023, 8, 9, 0, tzinfo=dt.UTC)) - - diff --git a/src/nwp_consumer/internal/inputs/ecmwf/README.md b/src/nwp_consumer/internal/inputs/ecmwf/README.md deleted file mode 100644 index f00bcf34..00000000 --- a/src/nwp_consumer/internal/inputs/ecmwf/README.md +++ /dev/null @@ -1,24 +0,0 @@ -# ECMWF API - -## Authentication - -The ECMWF API requires the setting of a few environment variables, -or an `.ecmwfapirc` file in the user's home directory. See the PyPi entry: -https://pypi.org/project/ecmwf-api-client/, or the ECMWFMARSConfig class -in `nwp_consumer/internal/config/config.py`. The variables are - -```shell -ECMWF_API_KEY= -ECMWF_API_EMAIL= -ECMWF_API_URL= -``` - -which can be accessed via visiting [https://api.ecmwf.int/v1/key/](https://api.ecmwf.int/v1/key/). - -## MARS - -View the glossary for ECMWF MARS variables available for the operational forecast: -https://codes.ecmwf.int/grib/param-db - -View the glossary for the MARS postprocessing keywords: -https://confluence.ecmwf.int/display/UDOC/Post-processing+keywords diff --git a/src/nwp_consumer/internal/inputs/ecmwf/__init__.py b/src/nwp_consumer/internal/inputs/ecmwf/__init__.py deleted file mode 100644 index 2e777948..00000000 --- a/src/nwp_consumer/internal/inputs/ecmwf/__init__.py +++ /dev/null @@ -1,7 +0,0 @@ -__all__ = [ - "MARSClient", - "S3Client", -] - -from .mars import MARSClient -from .s3 import S3Client diff --git a/src/nwp_consumer/internal/inputs/ecmwf/_models.py b/src/nwp_consumer/internal/inputs/ecmwf/_models.py deleted file mode 100644 index 1ed2dbe0..00000000 --- a/src/nwp_consumer/internal/inputs/ecmwf/_models.py +++ /dev/null @@ -1,77 +0,0 @@ -import datetime as dt -from dataclasses import dataclass - -import nwp_consumer.internal as internal - - -@dataclass -class ECMWFMarsFileInfo(internal.FileInfoModel): - inittime: dt.datetime - area: str - params: list[str] - steplist: list[int] - - def filename(self) -> str: - """Overrides the corresponding method in the parent class.""" - # ECMWF does not have explicit filenames when using the MARS API - # * As such, name manually based on their inittime and area covered - # e.g. `ecmwf_uk_20210101T0000.grib` - return f"ecmwf_{self.area}_{self.inittime.strftime('%Y%m%dT%H%M')}.grib" - - def filepath(self) -> str: - """Overrides the corresponding method in the parent class.""" - return "" - - def it(self) -> dt.datetime: - """Overrides the corresponding method in the parent class.""" - return self.inittime - - def variables(self) -> list[str]: - """Overrides the corresponding method in the parent class.""" - return self.params - - def steps(self) -> list[int]: - """Overrides the corresponding method in the parent class.""" - return self.steplist - - -@dataclass -class ECMWFLiveFileInfo(internal.FileInfoModel): - """Dataclass for ECMWF live data files. - - Live ECMWF files are extensionless grib files named e.g. 'A1D02200000022001001'. - The files contain data for two areas. The names contain the following information - - A1D%m%d%H%M%m'%d'%H'%M'1, where the first time is the initialisation time - and the second the target time. - """ - - fname: str - - def filename(self) -> str: - """Overrides the corresponding method in the parent class.""" - return self.fname + ".grib" - - def filepath(self) -> str: - """Overrides the corresponding method in the parent class.""" - return f"ecmwf/{self.fname}" - - def it(self) -> dt.datetime: - """Overrides the corresponding method in the parent class. - - The file name doesn't have the year in it, so we've added it. - This might be a problem around the new year. - """ - return dt.datetime.strptime( - f"{self.fname[3:10]}-{dt.datetime.now().year}", "%m%d%H%M-%Y" - ).replace( - tzinfo=dt.UTC, - ) - - def variables(self) -> list[str]: - """Overrides the corresponding method in the parent class.""" - raise NotImplementedError() - - def steps(self) -> list[int]: - """Overrides the corresponding method in the parent class.""" - raise NotImplementedError() diff --git a/src/nwp_consumer/internal/inputs/ecmwf/mars.py b/src/nwp_consumer/internal/inputs/ecmwf/mars.py deleted file mode 100644 index 565a104f..00000000 --- a/src/nwp_consumer/internal/inputs/ecmwf/mars.py +++ /dev/null @@ -1,395 +0,0 @@ -"""Implements a client to fetch data from ECMWF.""" - -import datetime as dt -import inspect -import os -import pathlib -import re -import tempfile -import typing - -import cfgrib -import ecmwfapi.api -import structlog -import xarray as xr -from ecmwfapi import ECMWFService - -from nwp_consumer import internal - -from ._models import ECMWFMarsFileInfo - -log = structlog.getLogger() - -# Mapping from ECMWF eccode to ECMWF short name -# * https://codes.ecmwf.int/grib/param-db/?filter=All -PARAMETER_ECMWFCODE_MAP: dict[str, str] = { - "167.128": "tas", # 2 metre temperature - "165.128": "uas", # 10 metre U-component of wind - "166.128": "vas", # 10 metre V-component of wind - "47.128": "dsrp", # Direct solar radiation - "57.128": "uvb", # Downward uv radiation at surface - "188.128": "hcc", # High cloud cover - "187.128": "mcc", # Medium cloud cover - "186.128": "lcc", # Low cloud cover - "164.128": "clt", # Total cloud cover - "169.128": "ssrd", # Surface shortwave radiation downward - "175.128": "strd", # Surface longwave radiation downward - "260048": "tprate", # Total precipitation rate - "141.128": "sd", # Snow depth, m - "246.228": "u100", # 100 metre U component of wind - "247.228": "v100", # 100 metre V component of wind - "239.228": "u200", # 200 metre U component of wind - "240.228": "v200", # 200 metre V component of wind - "20.3": "vis", # Visibility -} - -AREA_MAP: dict[str, str] = { - "uk": "62/-12/48/3", - "nw-india": "31/68/20/79", - "india": "35/67/6/97", - "malta": "37/13/35/15", - "eu": "E", - "global": "G", -} - -COORDINATE_ALLOW_LIST: typing.Sequence[str] = ("time", "step", "latitude", "longitude") - - -def marsLogger(msg: str) -> None: - """Redirect log from ECMWF API to structlog. - - Keyword Arguments: - ----------------- - msg: The message to redirect. - """ - debugSubstrings: list[str] = ["Requesting", "Transfering", "efficiency", "Done"] - errorSubstrings: list[str] = ["ERROR", "FATAL"] - if any(map(msg.__contains__, debugSubstrings)): - log.debug(event=msg, caller="mars") - if any(map(msg.__contains__, errorSubstrings)): - log.warning(event=msg, caller="mars") - - -class MARSClient(internal.FetcherInterface): - """Implements a client to fetch data from ECMWF's MARS API.""" - - server: ecmwfapi.api.ECMWFService - area: str - desired_params: list[str] - - def __init__( - self, - area: str = "uk", - hours: int = 48, - param_group: str = "default", - ) -> None: - """Create a new ECMWF Mars Client. - - Exposes a client for ECMWF's MARS API that conforms to the FetcherInterface. - - Args: - area: The area to fetch data for. Can be one of: - ["uk", "nw-india", "malta", "eu", "global"] - hours: The number of hours to fetch data for. Must be less than 90. - param_group: The parameter group to fetch data for. Can be one of: - ["default", "basic"] - """ - self.server = ECMWFService(service="mars", log=marsLogger) - - if area not in AREA_MAP: - raise KeyError(f"area must be one of {list(AREA_MAP.keys())}") - self.area = area - - self.hours = hours - - match param_group: - case "basic": - log.debug(event="Initialising ECMWF Client with basic parameter group") - self.desired_params = ["167.128", "169.128"] # 2 Metre Temperature, Dswrf - case _: - self.desired_params = list(PARAMETER_ECMWFCODE_MAP.keys()) - - def datasetName(self) -> str: - """Overrides the corresponding method in the parent class.""" - return f"ECMWF_{self.area.upper()}" - - def getInitHours(self) -> list[int]: # noqa: D102 - # MARS data of the operational archive is available at 00 and 12 UTC - return [0, 12] - - def listRawFilesForInitTime(self, *, it: dt.datetime) -> list[internal.FileInfoModel]: # noqa: D102 - # Ignore inittimes that don't correspond to valid hours - if it.hour not in self.getInitHours(): - return [] - - # MARS requests can only ask for data that is more than 24 hours old: see - # https://confluence.ecmwf.int/display/UDOC/MARS+access+restrictions - if it > dt.datetime.now(tz=dt.UTC) - dt.timedelta(hours=24): - raise ValueError( - "ECMWF MARS requests can only ask for data that is more than 24 hours old", - ) - return [] - - tf = tempfile.NamedTemporaryFile(suffix=".txt", delete=False) - - with open(tf.name, "w") as f: - req: str = self._buildMarsRequest( - list_only=True, - it=it, - target=tf.name, - params=self.desired_params, - steps=list(range(0, self.hours)), - ) - - log.debug(event="listing ECMWF MARS inittime data", request=req, inittime=it) - - try: - self.server.execute(req=req, target=tf.name) - except ecmwfapi.api.APIException as e: - log.warn("error listing ECMWF MARS inittime data", error=e) - return [] - - # Explicitly check that the MARS listing file is readable and non-empty - if (os.access(tf.name, os.R_OK) is False) or (os.stat(tf.name).st_size < 100): - log.warn( - event="ECMWF filelisting is empty, check error logs", - filepath=tf.name, - ) - return [] - - # Ensure only available parameters are requested by populating the - # `available_params` list according to the result of the list request - with open(tf.name) as f: - file_contents: str = f.read() - available_data = _parseListing(fileData=file_contents) - for parameter in self.desired_params: - if parameter not in available_data["params"]: - log.warn( - event=f"ECMWF MARS inittime data does not contain parameter {parameter}", - parameter=parameter, - inittime=it, - ) - - log.debug( - event="Listed raw files for ECMWF MARS inittime", - inittime=it, - available_params=available_data["params"], - ) - - # Clean up the temporary file - tf.close() - os.unlink(tf.name) - - return [ - ECMWFMarsFileInfo( - inittime=it, - area=self.area, - params=available_data["params"], - steplist=available_data["steps"], - ), - ] - - def downloadToCache( # noqa: D102 - self, - *, - fi: internal.FileInfoModel, - ) -> pathlib.Path: - cfp: pathlib.Path = internal.rawCachePath(it=fi.it(), filename=fi.filename()) - - req: str = self._buildMarsRequest( - list_only=False, - it=fi.it(), - target=cfp.as_posix(), - params=fi.variables(), - steps=fi.steps(), - ) - - log.debug( - event="fetching ECMWF MARS data", - request=req, - inittime=fi.it(), - filename=fi.filename(), - ) - - try: - self.server.execute(req=req, target=cfp.as_posix()) - except ecmwfapi.api.APIException as e: - log.warn("error fetching ECMWF MARS data", error=e) - return pathlib.Path() - - if cfp.exists() is False: - log.warn("ECMWF data file does not exist", filepath=cfp.as_posix()) - return pathlib.Path() - - log.debug( - event="fetched all data from MARS", - filename=fi.filename(), - filepath=cfp.as_posix(), - nbytes=cfp.stat().st_size, - ) - - return cfp - - def mapCachedRaw(self, *, p: pathlib.Path) -> xr.Dataset: - """Overrides the corresponding method in the parent class.""" - if p.suffix != ".grib": - log.warn(event="cannot map non-grib file to dataset", filepath=p.as_posix()) - return xr.Dataset() - - log.debug(event="mapping raw file to xarray dataset", filepath=p.as_posix()) - - # Load the wholesale file as a list of datasets - # * cfgrib loads multiple hypercubes for a single multi-parameter grib file - # * Can also set backend_kwargs={"indexpath": ""}, to avoid the index file - try: - datasets: list[xr.Dataset] = cfgrib.open_datasets( - path=p.as_posix(), - chunks={ - "time": 1, - "step": -1, - "longitude": "auto", - "latitude": "auto", - }, - backend_kwargs={"indexpath": ""}, - ) - except Exception as e: - log.warn(event="error converting raw file to dataset", filepath=p.as_posix(), error=e) - return xr.Dataset() - - # Merge the datasets back into one - wholesaleDataset = xr.merge( - objects=datasets, - compat="override", - combine_attrs="drop_conflicts", - ) - del datasets - - # Map the data to the internal dataset representation - # * Transpose the Dataset so that the dimensions are correctly ordered - # * Rechunk the data to a more optimal size - wholesaleDataset = ( - wholesaleDataset.rename({"time": "init_time"}) - .expand_dims("init_time") - .transpose("init_time", "step", "latitude", "longitude") - .sortby("step") - .chunk( - { - "init_time": 1, - "step": -1, - "latitude": len(wholesaleDataset.latitude) // 2, - "longitude": len(wholesaleDataset.longitude) // 2, - }, - ) - ) - - return wholesaleDataset - - def parameterConformMap(self) -> dict[str, internal.OCFParameter]: - """Overrides the corresponding method in the parent class.""" - return { - "tas": internal.OCFParameter.TemperatureAGL, - "t2m": internal.OCFParameter.TemperatureAGL, - "uas": internal.OCFParameter.WindUComponentAGL, - "vas": internal.OCFParameter.WindVComponentAGL, - "dsrp": internal.OCFParameter.DirectSolarRadiation, - "uvb": internal.OCFParameter.DownwardUVRadiationAtSurface, - "hcc": internal.OCFParameter.HighCloudCover, - "mcc": internal.OCFParameter.MediumCloudCover, - "lcc": internal.OCFParameter.LowCloudCover, - "clt": internal.OCFParameter.TotalCloudCover, - "ssrd": internal.OCFParameter.DownwardShortWaveRadiationFlux, - "strd": internal.OCFParameter.DownwardLongWaveRadiationFlux, - "tprate": internal.OCFParameter.RainPrecipitationRate, - "sd": internal.OCFParameter.SnowDepthWaterEquivalent, - "u100": internal.OCFParameter.WindUComponent100m, - "v100": internal.OCFParameter.WindVComponent100m, - "u200": internal.OCFParameter.WindUComponent200m, - "v200": internal.OCFParameter.WindVComponent200m, - "vis": internal.OCFParameter.VisibilityAGL, - } - - def _buildMarsRequest( - self, - *, - list_only: bool, - it: dt.datetime, - target: str, - params: list[str], - steps: list[int], - ) -> str: - """Build a MARS request according to the parameters of the client. - - Args: - list_only: Whether to build a request that only lists the files that match - the request, or whether to build a request that downloads the files - that match the request. - it: The initialisation time to request data for. - target: The path to the target file to write the data to. - params: The parameters to request data for. - steps: The steps to request data for. - - Returns: - The MARS request. - """ - marsReq: str = f""" - {"list" if list_only else "retrieve"}, - class = od, - date = {it.strftime("%Y%m%d")}, - expver = 1, - levtype = sfc, - param = {'/'.join(params)}, - step = {'/'.join(map(str, steps))}, - stream = oper, - time = {it.strftime("%H")}, - type = fc, - area = {AREA_MAP[self.area]}, - grid = 0.1/0.1, - target = "{target}" - """ - - return inspect.cleandoc(marsReq) - - -def _parseListing(fileData: str) -> dict[str, list[str] | list[int]]: - """Parse the response from a MARS list request. - - When calling LIST to MARS, the response is a file containing the available - parameters, steps, times and sizes etc. This function parses the file to - extract the available parameters. - - The files contains some metadata, followed by a table as follows: - - ``` - file length missing offset param step - 0 13204588 . 149401026 20.3 0 - 0 13204588 . 502365532 47.128 0 - 0 13204588 . 568388472 57.128 0 - 0 19804268 . 911707760 141.128 0 - 0 13204588 . 1050353320 164.128 0 - - Grand Total - ``` - - This function uses positive lookahead and lookbehind regex to extract the - lines between the table header and the "Grand Total" line. The fourth - column of each line is the parameter. The fifth is the step. - - Args: - fileData: The data from the file. - - Returns: - A dict of parameters and steps available in the remote file. - """ - tablematch = re.search( - pattern=r"(? 4: - out["steps"].add(int(line.split()[5])) - out["params"].add(line.split()[4]) - out = {k: sorted(list(v)) for k, v in out.items()} - return out diff --git a/src/nwp_consumer/internal/inputs/ecmwf/s3.py b/src/nwp_consumer/internal/inputs/ecmwf/s3.py deleted file mode 100644 index aefc08bf..00000000 --- a/src/nwp_consumer/internal/inputs/ecmwf/s3.py +++ /dev/null @@ -1,201 +0,0 @@ -"""Input covering an OCF-specific use case of pulling ECMWF data from an s3 bucket.""" - -import datetime as dt -import pathlib -import typing - -import cfgrib -import s3fs -import structlog -import xarray as xr - -from nwp_consumer import internal - -from ._models import ECMWFLiveFileInfo -from ... import OCFParameter - -log = structlog.getLogger() - -COORDINATE_ALLOW_LIST: typing.Sequence[str] = ("time", "step", "latitude", "longitude") - - -class S3Client(internal.FetcherInterface): - """Implements a client to fetch ECMWF data from S3.""" - - area: str - desired_params: list[str] - bucket: pathlib.Path - - __fs: s3fs.S3FileSystem - - bucketPath: str = "ecmwf" - - def __init__( - self, - bucket: str, - region: str, - area: str = "uk", - key: str | None = "", - secret: str | None = "", - endpointURL: str = "", - ) -> None: - """Creates a new ECMWF S3 client. - - Exposes a client for fetching ECMWF data from an S3 bucket conforming to the - FetcherInterface. ECMWF S3 data is order-based, so parameters and steps cannot be - requested by this client. - - Args: - bucket: The name of the S3 bucket to fetch data from. - region: The AWS region to connect to. - key: The AWS access key to use for authentication. - secret: The AWS secret key to use for authentication. - area: The area for which to fetch data. - endpointURL: The endpoint URL to use for the S3 connection. - """ - if (key, secret) == ("", ""): - log.info( - event="attempting AWS connection using default credentials", - ) - key, secret = None, None - - self.__fs: s3fs.S3FileSystem = s3fs.S3FileSystem( - key=key, - secret=secret, - client_kwargs={ - "region_name": region, - "endpoint_url": None if endpointURL == "" else endpointURL, - }, - ) - self.area = area - self.bucket = pathlib.Path(bucket) - - def datasetName(self) -> str: - """Overrides the corresponding method in the parent class.""" - return f"ECMWF_{self.area}".upper() - - def listRawFilesForInitTime(self, *, it: dt.datetime) -> list[internal.FileInfoModel]: - """Overrides the corresponding method in the parent class.""" - allFiles: list[str] = self.__fs.ls((self.bucket / self.bucketPath).as_posix()) - # List items are of the form "bucket/folder/filename, so extract just the filename - initTimeFiles: list[internal.FileInfoModel] = [ - ECMWFLiveFileInfo(fname=pathlib.Path(file).name) - for file in allFiles - if it.strftime("A2D%m%d%H") in file - ] - return initTimeFiles - - def downloadToCache( - self, - *, - fi: internal.FileInfoModel, - ) -> pathlib.Path: - """Overrides the corresponding method in the parent class.""" - cfp: pathlib.Path = internal.rawCachePath(it=fi.it(), filename=fi.filename()) - with open(cfp, "wb") as f, self.__fs.open( - (self.bucket / fi.filepath()).as_posix(), - "rb", - ) as s: - for chunk in iter(lambda: s.read(12 * 1024), b""): - f.write(chunk) - f.flush() - - if not cfp.exists(): - log.warn(event="Failed to download file", filepath=fi.filepath()) - return pathlib.Path() - - # Check the sizes are the same - s3size = self.__fs.info((self.bucket / fi.filepath()).as_posix())["size"] - if cfp.stat().st_size != s3size: - log.warn( - event="Downloaded file size does not match expected size", - expected=s3size, - actual=cfp.stat().st_size, - ) - return pathlib.Path() - - return cfp - - def mapCachedRaw(self, *, p: pathlib.Path) -> xr.Dataset: - """Overrides the corresponding method in the parent class.""" - all_dss: list[xr.Dataset] = cfgrib.open_datasets(p.as_posix()) - area_dss: list[xr.Dataset] = _filterDatasetsByArea(all_dss, self.area) - if len(area_dss) == 0: - log.warn( - event="No datasets found for area", - area=self.area, - file=p, - file_datasets=len(all_dss), - ) - return xr.Dataset() - - ds: xr.Dataset = xr.merge(area_dss, combine_attrs="drop_conflicts") - del area_dss, all_dss - - ds = ds.drop_vars( - names=[v for v in ds.coords if v not in COORDINATE_ALLOW_LIST], - errors="ignore", - ) - - # Map the data to the internal dataset representation - # * Transpose the Dataset so that the dimensions are correctly ordered - # * Rechunk the data to a more optimal size - ds = ( - ds.rename({"time": "init_time"}) - .expand_dims("init_time") - .expand_dims("step") - .transpose("init_time", "step", "latitude", "longitude") - .sortby("step") - .chunk( - { - "init_time": 1, - "step": -1, - "latitude": len(ds.latitude) // 2, - "longitude": len(ds.longitude) // 2, - }, - ) - ) - - return ds - - def getInitHours(self) -> list[int]: - """Overrides the corresponding method in the parent class.""" - return [0, 6, 12, 18] - - def parameterConformMap(self) -> dict[str, OCFParameter]: - """Overrides the corresponding method in the parent class.""" - return { - "dsrp": internal.OCFParameter.DirectSolarRadiation, - "uvb": internal.OCFParameter.DownwardUVRadiationAtSurface, - "sd": internal.OCFParameter.SnowDepthWaterEquivalent, - "tcc": internal.OCFParameter.TotalCloudCover, - "clt": internal.OCFParameter.TotalCloudCover, - "u10": internal.OCFParameter.WindUComponentAGL, - "v10": internal.OCFParameter.WindVComponentAGL, - "t2m": internal.OCFParameter.TemperatureAGL, - "ssrd": internal.OCFParameter.DownwardShortWaveRadiationFlux, - "strd": internal.OCFParameter.DownwardLongWaveRadiationFlux, - "lcc": internal.OCFParameter.LowCloudCover, - "mcc": internal.OCFParameter.MediumCloudCover, - "hcc": internal.OCFParameter.HighCloudCover, - "vis": internal.OCFParameter.VisibilityAGL, - "u200": internal.OCFParameter.WindUComponent200m, - "v200": internal.OCFParameter.WindVComponent200m, - "u100": internal.OCFParameter.WindUComponent100m, - "v100": internal.OCFParameter.WindVComponent100m, - "tprate": internal.OCFParameter.RainPrecipitationRate, - } - - -def _filterDatasetsByArea(dss: list[xr.Dataset], area: str) -> list[xr.Dataset]: - """Filters a list of datasets by area.""" - match area: - case "uk": - return list(filter(lambda ds: ds.coords["latitude"].as_numpy().max() == 60, dss)) - case "nw-india": - return list(filter(lambda ds: ds.coords["latitude"].as_numpy().max() == 31, dss)) - case "india": - return list(filter(lambda ds: ds.coords["latitude"].as_numpy().max() == 35, dss)) - case _: - log.warn(event="Unknown area", area=area) - return [] diff --git a/src/nwp_consumer/internal/inputs/ecmwf/test_2params.grib b/src/nwp_consumer/internal/inputs/ecmwf/test_2params.grib deleted file mode 100644 index 7ddaab6e..00000000 Binary files a/src/nwp_consumer/internal/inputs/ecmwf/test_2params.grib and /dev/null differ diff --git a/src/nwp_consumer/internal/inputs/ecmwf/test_list_response.txt b/src/nwp_consumer/internal/inputs/ecmwf/test_list_response.txt deleted file mode 100644 index 0fa26e8c..00000000 --- a/src/nwp_consumer/internal/inputs/ecmwf/test_list_response.txt +++ /dev/null @@ -1,753 +0,0 @@ -class = od -date = 2017-09-11 -expver = 1 -file[0] = hpss:/mars/prod/od/o/oper/fc/sfc/marsodoper/0001/fc/20170911/sfc/1200/879664.20170927.205633 -id = 879664 -levtype = sfc -month = 201709 -stream = oper -time = 12:00:00 -type = fc -year = 2017 -file length missing offset param step -0 13204588 . 149401026 20.3 0 -0 13204588 . 502365532 47.128 0 -0 13204588 . 568388472 57.128 0 -0 19804268 . 911707760 141.128 0 -0 13204588 . 1050353320 164.128 0 -0 13204588 . 1063557908 165.128 0 -0 13204588 . 1076762496 166.128 0 -0 13204588 . 1089967084 167.128 0 -0 13204588 . 1116376260 169.128 0 -0 13204588 . 1155990024 175.128 0 -0 13204588 . 1274831316 186.128 0 -0 13204588 . 1288035904 187.128 0 -0 13204588 . 1301240492 188.128 0 -0 13204588 . 1822819104 246.228 0 -0 13204588 . 1836023692 247.228 0 -0 13204588 . 2059730628 20.3 1 -0 13204588 . 2373076942 47.128 1 -0 13204588 . 2439099882 57.128 1 -0 19804268 . 2742805406 141.128 1 -0 13204588 . 2881450966 164.128 1 -0 13204588 . 2894655554 165.128 1 -0 13204588 . 2907860142 166.128 1 -0 13204588 . 2921064730 167.128 1 -0 13204588 . 2947473906 169.128 1 -0 13204588 . 2987087670 175.128 1 -0 13204588 . 3105928962 186.128 1 -0 13204588 . 3119133550 187.128 1 -0 13204588 . 3132338138 188.128 1 -0 13204588 . 3601098398 246.228 1 -0 13204588 . 3614302986 247.228 1 -0 13204588 . 3838001328 20.3 2 -0 13204588 . 4151280934 47.128 2 -0 13204588 . 4217303874 57.128 2 -0 19804268 . 4521009398 141.128 2 -0 13204588 . 4659654958 164.128 2 -0 13204588 . 4672859546 165.128 2 -0 13204588 . 4686064134 166.128 2 -0 13204588 . 4699268722 167.128 2 -0 13204588 . 4725677898 169.128 2 -0 13204588 . 4765291662 175.128 2 -0 13204588 . 4884132954 186.128 2 -0 13204588 . 4897337542 187.128 2 -0 13204588 . 4910542130 188.128 2 -0 13204588 . 5379302390 246.228 2 -0 13204588 . 5392506978 247.228 2 -0 13204588 . 5616173368 20.3 3 -0 13204588 . 5968973866 47.128 3 -0 13204588 . 6034996806 57.128 3 -0 19804268 . 6338702330 141.128 3 -0 13204588 . 6477347890 164.128 3 -0 13204588 . 6490552478 165.128 3 -0 13204588 . 6503757066 166.128 3 -0 13204588 . 6516961654 167.128 3 -0 13204588 . 6543370830 169.128 3 -0 13204588 . 6582984594 175.128 3 -0 13204588 . 6701825886 186.128 3 -0 13204588 . 6715030474 187.128 3 -0 13204588 . 6728235062 188.128 3 -0 13204588 . 7223404498 246.228 3 -0 13204588 . 7236609086 247.228 3 -0 13204588 . 7460219078 20.3 4 -0 13204588 . 7773355806 47.128 4 -0 13204588 . 7839378746 57.128 4 -0 19804268 . 8143084270 141.128 4 -0 13204588 . 8281729830 164.128 4 -0 13204588 . 8294934418 165.128 4 -0 13204588 . 8308139006 166.128 4 -0 13204588 . 8321343594 167.128 4 -0 13204588 . 8347752770 169.128 4 -0 13204588 . 8387366534 175.128 4 -0 13204588 . 8506207826 186.128 4 -0 13204588 . 8519412414 187.128 4 -0 13204588 . 8532617002 188.128 4 -0 13204588 . 9001377262 246.228 4 -0 13204588 . 9014581850 247.228 4 -0 13204588 . 9238129714 20.3 5 -0 13204588 . 9551226076 47.128 5 -0 13204588 . 9617249016 57.128 5 -0 19804268 . 9920954540 141.128 5 -0 13204588 . 10059600100 164.128 5 -0 13204588 . 10072804688 165.128 5 -0 13204588 . 10086009276 166.128 5 -0 13204588 . 10099213864 167.128 5 -0 13204588 . 10125623040 169.128 5 -0 13204588 . 10165236804 175.128 5 -0 13204588 . 10284078096 186.128 5 -0 13204588 . 10297282684 187.128 5 -0 13204588 . 10310487272 188.128 5 -0 13204588 . 10779247532 246.228 5 -0 13204588 . 10792452120 247.228 5 -0 13204588 . 11015964730 20.3 6 -0 13204588 . 11368638958 47.128 6 -0 13204588 . 11434661898 57.128 6 -0 19804268 . 11777981186 141.128 6 -0 13204588 . 11916626746 164.128 6 -0 13204588 . 11929831334 165.128 6 -0 13204588 . 11943035922 166.128 6 -0 13204588 . 11956240510 167.128 6 -0 13204588 . 11982649686 169.128 6 -0 13204588 . 12022263450 175.128 6 -0 13204588 . 12141104742 186.128 6 -0 13204588 . 12154309330 187.128 6 -0 13204588 . 12167513918 188.128 6 -0 13204588 . 12689092530 246.228 6 -0 13204588 . 12702297118 247.228 6 -0 13204588 . 12925782934 20.3 7 -0 13204588 . 13238809550 47.128 7 -0 13204588 . 13304832490 57.128 7 -0 19804268 . 13608538014 141.128 7 -0 13204588 . 13747183574 164.128 7 -0 13204588 . 13760388162 165.128 7 -0 13204588 . 13773592750 166.128 7 -0 13204588 . 13786797338 167.128 7 -0 13204588 . 13813206514 169.128 7 -0 13204588 . 13852820278 175.128 7 -0 13204588 . 13971661570 186.128 7 -0 13204588 . 13984866158 187.128 7 -0 13204588 . 13998070746 188.128 7 -0 13204588 . 14466831006 246.228 7 -0 13204588 . 14480035594 247.228 7 -0 13204588 . 14703501378 20.3 8 -0 13204588 . 15016493576 47.128 8 -0 13204588 . 15082516516 57.128 8 -0 19804268 . 15386222040 141.128 8 -0 13204588 . 15524867600 164.128 8 -0 13204588 . 15538072188 165.128 8 -0 13204588 . 15551276776 166.128 8 -0 13204588 . 15564481364 167.128 8 -0 13204588 . 15590890540 169.128 8 -0 13204588 . 15630504304 175.128 8 -0 13204588 . 15749345596 186.128 8 -0 13204588 . 15762550184 187.128 8 -0 13204588 . 15775754772 188.128 8 -0 13204588 . 16244515032 246.228 8 -0 13204588 . 16257719620 247.228 8 -0 13204588 . 16481180904 20.3 9 -0 13204588 . 16833774492 47.128 9 -0 13204588 . 16899797432 57.128 9 -0 19804268 . 17203502956 141.128 9 -0 13204588 . 17342148516 164.128 9 -0 13204588 . 17355353104 165.128 9 -0 13204588 . 17368557692 166.128 9 -0 13204588 . 17381762280 167.128 9 -0 13204588 . 17408171456 169.128 9 -0 13204588 . 17447785220 175.128 9 -0 13204588 . 17566626512 186.128 9 -0 13204588 . 17579831100 187.128 9 -0 13204588 . 17593035688 188.128 9 -0 13204588 . 18088205124 246.228 9 -0 13204588 . 18101409712 247.228 9 -0 13204588 . 18324865646 20.3 10 -0 13204588 . 18637850684 47.128 10 -0 13204588 . 18703873624 57.128 10 -0 19804268 . 19007579148 141.128 10 -0 13204588 . 19146224708 164.128 10 -0 13204588 . 19159429296 165.128 10 -0 13204588 . 19172633884 166.128 10 -0 13204588 . 19185838472 167.128 10 -0 13204588 . 19212247648 169.128 10 -0 13204588 . 19251861412 175.128 10 -0 13204588 . 19370702704 186.128 10 -0 13204588 . 19383907292 187.128 10 -0 13204588 . 19397111880 188.128 10 -0 13204588 . 19865872140 246.228 10 -0 13204588 . 19879076728 247.228 10 -0 13204588 . 20102545664 20.3 11 -0 13204588 . 20415549110 47.128 11 -0 13204588 . 20481572050 57.128 11 -0 19804268 . 20785277574 141.128 11 -0 13204588 . 20923923134 164.128 11 -0 13204588 . 20937127722 165.128 11 -0 13204588 . 20950332310 166.128 11 -0 13204588 . 20963536898 167.128 11 -0 13204588 . 20989946074 169.128 11 -0 13204588 . 21029559838 175.128 11 -0 13204588 . 21148401130 186.128 11 -0 13204588 . 21161605718 187.128 11 -0 13204588 . 21174810306 188.128 11 -0 13204588 . 21643570566 246.228 11 -0 13204588 . 21656775154 247.228 11 -0 13204588 . 21880262776 20.3 12 -0 13204588 . 22232906434 47.128 12 -0 13204588 . 22298929374 57.128 12 -0 19804268 . 22642248662 141.128 12 -0 13204588 . 22780894222 164.128 12 -0 13204588 . 22794098810 165.128 12 -0 13204588 . 22807303398 166.128 12 -0 13204588 . 22820507986 167.128 12 -0 13204588 . 22846917162 169.128 12 -0 13204588 . 22886530926 175.128 12 -0 13204588 . 23005372218 186.128 12 -0 13204588 . 23018576806 187.128 12 -0 13204588 . 23031781394 188.128 12 -0 13204588 . 23553360006 246.228 12 -0 13204588 . 23566564594 247.228 12 -0 13204588 . 23790080002 20.3 13 -0 13204588 . 24103127738 47.128 13 -0 13204588 . 24169150678 57.128 13 -0 19804268 . 24472856202 141.128 13 -0 13204588 . 24611501762 164.128 13 -0 13204588 . 24624706350 165.128 13 -0 13204588 . 24637910938 166.128 13 -0 13204588 . 24651115526 167.128 13 -0 13204588 . 24677524702 169.128 13 -0 13204588 . 24717138466 175.128 13 -0 13204588 . 24835979758 186.128 13 -0 13204588 . 24849184346 187.128 13 -0 13204588 . 24862388934 188.128 13 -0 13204588 . 25331149194 246.228 13 -0 13204588 . 25344353782 247.228 13 -0 13204588 . 25567894082 20.3 14 -0 13204588 . 25880963710 47.128 14 -0 13204588 . 25946986650 57.128 14 -0 19804268 . 26250692174 141.128 14 -0 13204588 . 26389337734 164.128 14 -0 13204588 . 26402542322 165.128 14 -0 13204588 . 26415746910 166.128 14 -0 13204588 . 26428951498 167.128 14 -0 13204588 . 26455360674 169.128 14 -0 13204588 . 26494974438 175.128 14 -0 13204588 . 26613815730 186.128 14 -0 13204588 . 26627020318 187.128 14 -0 13204588 . 26640224906 188.128 14 -0 13204588 . 27108985166 246.228 14 -0 13204588 . 27122189754 247.228 14 -0 13204588 . 27345740392 20.3 15 -0 13204588 . 27698449984 47.128 15 -0 13204588 . 27764472924 57.128 15 -0 19804268 . 28068178448 141.128 15 -0 13204588 . 28206824008 164.128 15 -0 13204588 . 28220028596 165.128 15 -0 13204588 . 28233233184 166.128 15 -0 13204588 . 28246437772 167.128 15 -0 13204588 . 28272846948 169.128 15 -0 13204588 . 28312460712 175.128 15 -0 13204588 . 28431302004 186.128 15 -0 13204588 . 28444506592 187.128 15 -0 13204588 . 28457711180 188.128 15 -0 13204588 . 28952880616 246.228 15 -0 13204588 . 28966085204 247.228 15 -0 13204588 . 29189649042 20.3 16 -0 13204588 . 29502773520 47.128 16 -0 13204588 . 29568796460 57.128 16 -0 19804268 . 29872501984 141.128 16 -0 13204588 . 30011147544 164.128 16 -0 13204588 . 30024352132 165.128 16 -0 13204588 . 30037556720 166.128 16 -0 13204588 . 30050761308 167.128 16 -0 13204588 . 30077170484 169.128 16 -0 13204588 . 30116784248 175.128 16 -0 13204588 . 30235625540 186.128 16 -0 13204588 . 30248830128 187.128 16 -0 13204588 . 30262034716 188.128 16 -0 13204588 . 30730794976 246.228 16 -0 13204588 . 30743999564 247.228 16 -0 13204588 . 30967569542 20.3 17 -0 13204588 . 31280699790 47.128 17 -0 13204588 . 31346722730 57.128 17 -0 19804268 . 31650428254 141.128 17 -0 13204588 . 31789073814 164.128 17 -0 13204588 . 31802278402 165.128 17 -0 13204588 . 31815482990 166.128 17 -0 13204588 . 31828687578 167.128 17 -0 13204588 . 31855096754 169.128 17 -0 13204588 . 31894710518 175.128 17 -0 13204588 . 32013551810 186.128 17 -0 13204588 . 32026756398 187.128 17 -0 13204588 . 32039960986 188.128 17 -0 13204588 . 32508721246 246.228 17 -0 13204588 . 32521925834 247.228 17 -0 13204588 . 32745476662 20.3 18 -0 13204588 . 33098200614 47.128 18 -0 13204588 . 33164223554 57.128 18 -0 19804268 . 33507542842 141.128 18 -0 13204588 . 33646188402 164.128 18 -0 13204588 . 33659392990 165.128 18 -0 13204588 . 33672597578 166.128 18 -0 13204588 . 33685802166 167.128 18 -0 13204588 . 33712211342 169.128 18 -0 13204588 . 33751825106 175.128 18 -0 13204588 . 33870666398 186.128 18 -0 13204588 . 33883870986 187.128 18 -0 13204588 . 33897075574 188.128 18 -0 13204588 . 34418654186 246.228 18 -0 13204588 . 34431858774 247.228 18 -0 13204588 . 34655395052 20.3 19 -0 13204588 . 34968483438 47.128 19 -0 13204588 . 35034506378 57.128 19 -0 19804268 . 35338211902 141.128 19 -0 13204588 . 35476857462 164.128 19 -0 13204588 . 35490062050 165.128 19 -0 13204588 . 35503266638 166.128 19 -0 13204588 . 35516471226 167.128 19 -0 13204588 . 35542880402 169.128 19 -0 13204588 . 35582494166 175.128 19 -0 13204588 . 35701335458 186.128 19 -0 13204588 . 35714540046 187.128 19 -0 13204588 . 35727744634 188.128 19 -0 13204588 . 36196504894 246.228 19 -0 13204588 . 36209709482 247.228 19 -0 13204588 . 36433237374 20.3 20 -0 13204588 . 36746346620 47.128 20 -0 13204588 . 36812369560 57.128 20 -0 19804268 . 37116075084 141.128 20 -0 13204588 . 37254720644 164.128 20 -0 13204588 . 37267925232 165.128 20 -0 13204588 . 37281129820 166.128 20 -0 13204588 . 37294334408 167.128 20 -0 13204588 . 37320743584 169.128 20 -0 13204588 . 37360357348 175.128 20 -0 13204588 . 37479198640 186.128 20 -0 13204588 . 37492403228 187.128 20 -0 13204588 . 37505607816 188.128 20 -0 13204588 . 37974368076 246.228 20 -0 13204588 . 37987572664 247.228 20 -0 13204588 . 38211115288 20.3 21 -0 13204588 . 38563846038 47.128 21 -0 13204588 . 38629868978 57.128 21 -0 19804268 . 38933574502 141.128 21 -0 13204588 . 39072220062 164.128 21 -0 13204588 . 39085424650 165.128 21 -0 13204588 . 39098629238 166.128 21 -0 13204588 . 39111833826 167.128 21 -0 13204588 . 39138243002 169.128 21 -0 13204588 . 39177856766 175.128 21 -0 13204588 . 39296698058 186.128 21 -0 13204588 . 39309902646 187.128 21 -0 13204588 . 39323107234 188.128 21 -0 13204588 . 39818276670 246.228 21 -0 13204588 . 39831481258 247.228 21 -0 13204588 . 40055020832 20.3 22 -0 13204588 . 40368123314 47.128 22 -0 13204588 . 40434146254 57.128 22 -0 19804268 . 40737851778 141.128 22 -0 13204588 . 40876497338 164.128 22 -0 13204588 . 40889701926 165.128 22 -0 13204588 . 40902906514 166.128 22 -0 13204588 . 40916111102 167.128 22 -0 13204588 . 40942520278 169.128 22 -0 13204588 . 40982134042 175.128 22 -0 13204588 . 41100975334 186.128 22 -0 13204588 . 41114179922 187.128 22 -0 13204588 . 41127384510 188.128 22 -0 13204588 . 41596144770 246.228 22 -0 13204588 . 41609349358 247.228 22 -0 13204588 . 41832865678 20.3 23 -0 13204588 . 42145953858 47.128 23 -0 13204588 . 42211976798 57.128 23 -0 19804268 . 42515682322 141.128 23 -0 13204588 . 42654327882 164.128 23 -0 13204588 . 42667532470 165.128 23 -0 13204588 . 42680737058 166.128 23 -0 13204588 . 42693941646 167.128 23 -0 13204588 . 42720350822 169.128 23 -0 13204588 . 42759964586 175.128 23 -0 13204588 . 42878805878 186.128 23 -0 13204588 . 42892010466 187.128 23 -0 13204588 . 42905215054 188.128 23 -0 13204588 . 43373975314 246.228 23 -0 13204588 . 43387179902 247.228 23 -0 13204588 . 43610686714 20.3 24 -0 13204588 . 43963368300 47.128 24 -0 13204588 . 44029391240 57.128 24 -0 19804268 . 44372710528 141.128 24 -0 13204588 . 44511356088 164.128 24 -0 13204588 . 44524560676 165.128 24 -0 13204588 . 44537765264 166.128 24 -0 13204588 . 44550969852 167.128 24 -0 13204588 . 44577379028 169.128 24 -0 13204588 . 44616992792 175.128 24 -0 13204588 . 44735834084 186.128 24 -0 13204588 . 44749038672 187.128 24 -0 13204588 . 44762243260 188.128 24 -0 13204588 . 45283821872 246.228 24 -0 13204588 . 45297026460 247.228 24 -0 13204588 . 45520524088 20.3 25 -0 13204588 . 45833552868 47.128 25 -0 13204588 . 45899575808 57.128 25 -0 19804268 . 46203281332 141.128 25 -0 13204588 . 46341926892 164.128 25 -0 13204588 . 46355131480 165.128 25 -0 13204588 . 46368336068 166.128 25 -0 13204588 . 46381540656 167.128 25 -0 13204588 . 46407949832 169.128 25 -0 13204588 . 46447563596 175.128 25 -0 13204588 . 46566404888 186.128 25 -0 13204588 . 46579609476 187.128 25 -0 13204588 . 46592814064 188.128 25 -0 13204588 . 47061574324 246.228 25 -0 13204588 . 47074778912 247.228 25 -0 13204588 . 47298259896 20.3 26 -0 13204588 . 47611250486 47.128 26 -0 13204588 . 47677273426 57.128 26 -0 19804268 . 47980978950 141.128 26 -0 13204588 . 48119624510 164.128 26 -0 13204588 . 48132829098 165.128 26 -0 13204588 . 48146033686 166.128 26 -0 13204588 . 48159238274 167.128 26 -0 13204588 . 48185647450 169.128 26 -0 13204588 . 48225261214 175.128 26 -0 13204588 . 48344102506 186.128 26 -0 13204588 . 48357307094 187.128 26 -0 13204588 . 48370511682 188.128 26 -0 13204588 . 48839271942 246.228 26 -0 13204588 . 48852476530 247.228 26 -0 13204588 . 49075946602 20.3 27 -0 13204588 . 49428498458 47.128 27 -0 13204588 . 49494521398 57.128 27 -0 19804268 . 49798226922 141.128 27 -0 13204588 . 49936872482 164.128 27 -0 13204588 . 49950077070 165.128 27 -0 13204588 . 49963281658 166.128 27 -0 13204588 . 49976486246 167.128 27 -0 13204588 . 50002895422 169.128 27 -0 13204588 . 50042509186 175.128 27 -0 13204588 . 50161350478 186.128 27 -0 13204588 . 50174555066 187.128 27 -0 13204588 . 50187759654 188.128 27 -0 13204588 . 50682929090 246.228 27 -0 13204588 . 50696133678 247.228 27 -0 13204588 . 50919591620 20.3 28 -0 13204588 . 51232499532 47.128 28 -0 13204588 . 51298522472 57.128 28 -0 19804268 . 51602227996 141.128 28 -0 13204588 . 51740873556 164.128 28 -0 13204588 . 51754078144 165.128 28 -0 13204588 . 51767282732 166.128 28 -0 13204588 . 51780487320 167.128 28 -0 13204588 . 51806896496 169.128 28 -0 13204588 . 51846510260 175.128 28 -0 13204588 . 51965351552 186.128 28 -0 13204588 . 51978556140 187.128 28 -0 13204588 . 51991760728 188.128 28 -0 13204588 . 52460520988 246.228 28 -0 13204588 . 52473725576 247.228 28 -0 13204588 . 52697157692 20.3 29 -0 13204588 . 53010049064 47.128 29 -0 13204588 . 53076072004 57.128 29 -0 19804268 . 53379777528 141.128 29 -0 13204588 . 53518423088 164.128 29 -0 13204588 . 53531627676 165.128 29 -0 13204588 . 53544832264 166.128 29 -0 13204588 . 53558036852 167.128 29 -0 13204588 . 53584446028 169.128 29 -0 13204588 . 53624059792 175.128 29 -0 13204588 . 53742901084 186.128 29 -0 13204588 . 53756105672 187.128 29 -0 13204588 . 53769310260 188.128 29 -0 13204588 . 54238070520 246.228 29 -0 13204588 . 54251275108 247.228 29 -0 13204588 . 54474679350 20.3 30 -0 13204588 . 54827191178 47.128 30 -0 13204588 . 54893214118 57.128 30 -0 19804268 . 55236533406 141.128 30 -0 13204588 . 55375178966 164.128 30 -0 13204588 . 55388383554 165.128 30 -0 13204588 . 55401588142 166.128 30 -0 13204588 . 55414792730 167.128 30 -0 13204588 . 55441201906 169.128 30 -0 13204588 . 55480815670 175.128 30 -0 13204588 . 55599656962 186.128 30 -0 13204588 . 55612861550 187.128 30 -0 13204588 . 55626066138 188.128 30 -0 13204588 . 56147644750 246.228 30 -0 13204588 . 56160849338 247.228 30 -0 13204588 . 56384230992 20.3 31 -0 13204588 . 56697120466 47.128 31 -0 13204588 . 56763143406 57.128 31 -0 19804268 . 57066848930 141.128 31 -0 13204588 . 57205494490 164.128 31 -0 13204588 . 57218699078 165.128 31 -0 13204588 . 57231903666 166.128 31 -0 13204588 . 57245108254 167.128 31 -0 13204588 . 57271517430 169.128 31 -0 13204588 . 57311131194 175.128 31 -0 13204588 . 57429972486 186.128 31 -0 13204588 . 57443177074 187.128 31 -0 13204588 . 57456381662 188.128 31 -0 13204588 . 57925141922 246.228 31 -0 13204588 . 57938346510 247.228 31 -0 13204588 . 58161715180 20.3 32 -0 13204588 . 58474580614 47.128 32 -0 13204588 . 58540603554 57.128 32 -0 19804268 . 58844309078 141.128 32 -0 13204588 . 58982954638 164.128 32 -0 13204588 . 58996159226 165.128 32 -0 13204588 . 59009363814 166.128 32 -0 13204588 . 59022568402 167.128 32 -0 13204588 . 59048977578 169.128 32 -0 13204588 . 59088591342 175.128 32 -0 13204588 . 59207432634 186.128 32 -0 13204588 . 59220637222 187.128 32 -0 13204588 . 59233841810 188.128 32 -0 13204588 . 59702602070 246.228 32 -0 13204588 . 59715806658 247.228 32 -0 13204588 . 59939173910 20.3 33 -0 13204588 . 60291640332 47.128 33 -0 13204588 . 60357663272 57.128 33 -0 19804268 . 60661368796 141.128 33 -0 13204588 . 60800014356 164.128 33 -0 13204588 . 60813218944 165.128 33 -0 13204588 . 60826423532 166.128 33 -0 13204588 . 60839628120 167.128 33 -0 13204588 . 60866037296 169.128 33 -0 13204588 . 60905651060 175.128 33 -0 13204588 . 61024492352 186.128 33 -0 13204588 . 61037696940 187.128 33 -0 13204588 . 61050901528 188.128 33 -0 13204588 . 61546070964 246.228 33 -0 13204588 . 61559275552 247.228 33 -0 13204588 . 61782632774 20.3 34 -0 13204588 . 62095478808 47.128 34 -0 13204588 . 62161501748 57.128 34 -0 19804268 . 62465207272 141.128 34 -0 13204588 . 62603852832 164.128 34 -0 13204588 . 62617057420 165.128 34 -0 13204588 . 62630262008 166.128 34 -0 13204588 . 62643466596 167.128 34 -0 13204588 . 62669875772 169.128 34 -0 13204588 . 62709489536 175.128 34 -0 13204588 . 62828330828 186.128 34 -0 13204588 . 62841535416 187.128 34 -0 13204588 . 62854740004 188.128 34 -0 13204588 . 63323500264 246.228 34 -0 13204588 . 63336704852 247.228 34 -0 13204588 . 63560060114 20.3 35 -0 13204588 . 63872912524 47.128 35 -0 13204588 . 63938935464 57.128 35 -0 19804268 . 64242640988 141.128 35 -0 13204588 . 64381286548 164.128 35 -0 13204588 . 64394491136 165.128 35 -0 13204588 . 64407695724 166.128 35 -0 13204588 . 64420900312 167.128 35 -0 13204588 . 64447309488 169.128 35 -0 13204588 . 64486923252 175.128 35 -0 13204588 . 64605764544 186.128 35 -0 13204588 . 64618969132 187.128 35 -0 13204588 . 64632173720 188.128 35 -0 13204588 . 65100933980 246.228 35 -0 13204588 . 65114138568 247.228 35 -0 13204588 . 65337505120 20.3 36 -0 13204588 . 65689975490 47.128 36 -0 13204588 . 65755998430 57.128 36 -0 19804268 . 66099317718 141.128 36 -0 13204588 . 66237963278 164.128 36 -0 13204588 . 66251167866 165.128 36 -0 13204588 . 66264372454 166.128 36 -0 13204588 . 66277577042 167.128 36 -0 13204588 . 66303986218 169.128 36 -0 13204588 . 66343599982 175.128 36 -0 13204588 . 66462441274 186.128 36 -0 13204588 . 66475645862 187.128 36 -0 13204588 . 66488850450 188.128 36 -0 13204588 . 67010429062 246.228 36 -0 13204588 . 67023633650 247.228 36 -0 13204588 . 67247021976 20.3 37 -0 13204588 . 67559886024 47.128 37 -0 13204588 . 67625908964 57.128 37 -0 19804268 . 67929614488 141.128 37 -0 13204588 . 68068260048 164.128 37 -0 13204588 . 68081464636 165.128 37 -0 13204588 . 68094669224 166.128 37 -0 13204588 . 68107873812 167.128 37 -0 13204588 . 68134282988 169.128 37 -0 13204588 . 68173896752 175.128 37 -0 13204588 . 68292738044 186.128 37 -0 13204588 . 68305942632 187.128 37 -0 13204588 . 68319147220 188.128 37 -0 13204588 . 68787907480 246.228 37 -0 13204588 . 68801112068 247.228 37 -0 13204588 . 69024509532 20.3 38 -0 13204588 . 69337403932 47.128 38 -0 13204588 . 69403426872 57.128 38 -0 19804268 . 69707132396 141.128 38 -0 13204588 . 69845777956 164.128 38 -0 13204588 . 69858982544 165.128 38 -0 13204588 . 69872187132 166.128 38 -0 13204588 . 69885391720 167.128 38 -0 13204588 . 69911800896 169.128 38 -0 13204588 . 69951414660 175.128 38 -0 13204588 . 70070255952 186.128 38 -0 13204588 . 70083460540 187.128 38 -0 13204588 . 70096665128 188.128 38 -0 13204588 . 70565425388 246.228 38 -0 13204588 . 70578629976 247.228 38 -0 13204588 . 70802043166 20.3 39 -0 13204588 . 71154568210 47.128 39 -0 13204588 . 71220591150 57.128 39 -0 19804268 . 71524296674 141.128 39 -0 13204588 . 71662942234 164.128 39 -0 13204588 . 71676146822 165.128 39 -0 13204588 . 71689351410 166.128 39 -0 13204588 . 71702555998 167.128 39 -0 13204588 . 71728965174 169.128 39 -0 13204588 . 71768578938 175.128 39 -0 13204588 . 71887420230 186.128 39 -0 13204588 . 71900624818 187.128 39 -0 13204588 . 71913829406 188.128 39 -0 13204588 . 72408998842 246.228 39 -0 13204588 . 72422203430 247.228 39 -0 13204588 . 72645638886 20.3 40 -0 13204588 . 72958547860 47.128 40 -0 13204588 . 73024570800 57.128 40 -0 19804268 . 73328276324 141.128 40 -0 13204588 . 73466921884 164.128 40 -0 13204588 . 73480126472 165.128 40 -0 13204588 . 73493331060 166.128 40 -0 13204588 . 73506535648 167.128 40 -0 13204588 . 73532944824 169.128 40 -0 13204588 . 73572558588 175.128 40 -0 13204588 . 73691399880 186.128 40 -0 13204588 . 73704604468 187.128 40 -0 13204588 . 73717809056 188.128 40 -0 13204588 . 74186569316 246.228 40 -0 13204588 . 74199773904 247.228 40 -0 13204588 . 74423238860 20.3 41 -0 13204588 . 74736120340 47.128 41 -0 13204588 . 74802143280 57.128 41 -0 19804268 . 75105848804 141.128 41 -0 13204588 . 75244494364 164.128 41 -0 13204588 . 75257698952 165.128 41 -0 13204588 . 75270903540 166.128 41 -0 13204588 . 75284108128 167.128 41 -0 13204588 . 75310517304 169.128 41 -0 13204588 . 75350131068 175.128 41 -0 13204588 . 75468972360 186.128 41 -0 13204588 . 75482176948 187.128 41 -0 13204588 . 75495381536 188.128 41 -0 13204588 . 75964141796 246.228 41 -0 13204588 . 75977346384 247.228 41 -0 13204588 . 76200835622 20.3 42 -0 13204588 . 76553294646 47.128 42 -0 13204588 . 76619317586 57.128 42 -0 19804268 . 76962636874 141.128 42 -0 13204588 . 77101282434 164.128 42 -0 13204588 . 77114487022 165.128 42 -0 13204588 . 77127691610 166.128 42 -0 13204588 . 77140896198 167.128 42 -0 13204588 . 77167305374 169.128 42 -0 13204588 . 77206919138 175.128 42 -0 13204588 . 77325760430 186.128 42 -0 13204588 . 77338965018 187.128 42 -0 13204588 . 77352169606 188.128 42 -0 13204588 . 77873748218 246.228 42 -0 13204588 . 77886952806 247.228 42 -0 13204588 . 78110453028 20.3 43 -0 13204588 . 78423282710 47.128 43 -0 13204588 . 78489305650 57.128 43 -0 19804268 . 78793011174 141.128 43 -0 13204588 . 78931656734 164.128 43 -0 13204588 . 78944861322 165.128 43 -0 13204588 . 78958065910 166.128 43 -0 13204588 . 78971270498 167.128 43 -0 13204588 . 78997679674 169.128 43 -0 13204588 . 79037293438 175.128 43 -0 13204588 . 79156134730 186.128 43 -0 13204588 . 79169339318 187.128 43 -0 13204588 . 79182543906 188.128 43 -0 13204588 . 79651304166 246.228 43 -0 13204588 . 79664508754 247.228 43 -0 13204588 . 79888021814 20.3 44 -0 13204588 . 80200843328 47.128 44 -0 13204588 . 80266866268 57.128 44 -0 19804268 . 80570571792 141.128 44 -0 13204588 . 80709217352 164.128 44 -0 13204588 . 80722421940 165.128 44 -0 13204588 . 80735626528 166.128 44 -0 13204588 . 80748831116 167.128 44 -0 13204588 . 80775240292 169.128 44 -0 13204588 . 80814854056 175.128 44 -0 13204588 . 80933695348 186.128 44 -0 13204588 . 80946899936 187.128 44 -0 13204588 . 80960104524 188.128 44 -0 13204588 . 81428864784 246.228 44 -0 13204588 . 81442069372 247.228 44 -0 13204588 . 81665595122 20.3 45 -0 13204588 . 82018034998 47.128 45 -0 13204588 . 82084057938 57.128 45 -0 19804268 . 82387763462 141.128 45 -0 13204588 . 82526409022 164.128 45 -0 13204588 . 82539613610 165.128 45 -0 13204588 . 82552818198 166.128 45 -0 13204588 . 82566022786 167.128 45 -0 13204588 . 82592431962 169.128 45 -0 13204588 . 82632045726 175.128 45 -0 13204588 . 82750887018 186.128 45 -0 13204588 . 82764091606 187.128 45 -0 13204588 . 82777296194 188.128 45 -0 13204588 . 83272465630 246.228 45 -0 13204588 . 83285670218 247.228 45 -0 13204588 . 83509197478 20.3 46 -0 13204588 . 83822017948 47.128 46 -0 13204588 . 83888040888 57.128 46 -0 19804268 . 84191746412 141.128 46 -0 13204588 . 84330391972 164.128 46 -0 13204588 . 84343596560 165.128 46 -0 13204588 . 84356801148 166.128 46 -0 13204588 . 84370005736 167.128 46 -0 13204588 . 84396414912 169.128 46 -0 13204588 . 84436028676 175.128 46 -0 13204588 . 84554869968 186.128 46 -0 13204588 . 84568074556 187.128 46 -0 13204588 . 84581279144 188.128 46 -0 13204588 . 85050039404 246.228 46 -0 13204588 . 85063243992 247.228 46 -0 13204588 . 85286768454 20.3 47 -0 13204588 . 85599585576 47.128 47 -0 13204588 . 85665608516 57.128 47 -0 19804268 . 85969314040 141.128 47 -0 13204588 . 86107959600 164.128 47 -0 13204588 . 86121164188 165.128 47 -0 13204588 . 86134368776 166.128 47 -0 13204588 . 86147573364 167.128 47 -0 13204588 . 86173982540 169.128 47 -0 13204588 . 86213596304 175.128 47 -0 13204588 . 86332437596 186.128 47 -0 13204588 . 86345642184 187.128 47 -0 13204588 . 86358846772 188.128 47 -0 13204588 . 86827607032 246.228 47 -0 13204588 . 86840811620 247.228 47 -0 13204588 . 87064340310 20.3 48 -0 13204588 . 87416750940 47.128 48 -0 13204588 . 87482773880 57.128 48 -0 19804268 . 87826093168 141.128 48 -0 13204588 . 87964738728 164.128 48 -0 13204588 . 87977943316 165.128 48 -0 13204588 . 87991147904 166.128 48 -0 13204588 . 88004352492 167.128 48 -0 13204588 . 88030761668 169.128 48 -0 13204588 . 88070375432 175.128 48 -0 13204588 . 88189216724 186.128 48 -0 13204588 . 88202421312 187.128 48 -0 13204588 . 88215625900 188.128 48 -0 13204588 . 88737204512 246.228 48 -0 13204588 . 88750409100 247.228 48 - -Grand Total: -============ - -Entries : 735 -Total : 10,028,756,500 (9.34001 Gbytes) diff --git a/src/nwp_consumer/internal/inputs/ecmwf/test_mars.py b/src/nwp_consumer/internal/inputs/ecmwf/test_mars.py deleted file mode 100644 index bc260abb..00000000 --- a/src/nwp_consumer/internal/inputs/ecmwf/test_mars.py +++ /dev/null @@ -1,157 +0,0 @@ -"""Tests for the ecmwf module.""" - -import datetime as dt -import pathlib -import unittest.mock - -from .mars import PARAMETER_ECMWFCODE_MAP, MARSClient, _parseListing - -# --------- Test setup --------- # - -testMARSClient = MARSClient( - area="uk", - hours=48, -) - -test_list_response: str = """ -class = od -date = 2017-09-11 -expver = 1 -file[0] = hpss:/mars/prod/od/o/oper/fc/sfc/marsodoper/0001/fc/20170911/sfc/1200/879664.20170927.205633 -id = 879664 -levtype = sfc -month = 201709 -stream = oper -time = 12:00:00 -type = fc -year = 2017 -file length missing offset param step -0 13204588 . 1089967084 167.128 0 -0 13204588 . 1116376260 169.128 0 -0 13204588 . 2921064730 167.128 1 -0 13204588 . 2947473906 169.128 1 -0 13204588 . 4699268722 167.128 2 -0 13204588 . 4725677898 169.128 2 -0 13204588 . 6516961654 167.128 3 -0 13204588 . 6543370830 169.128 3 - -Grand Total: -============ - -Entries : 8 -Total : 105,636,704 (100.743 Mbytes) -""" - - -# --------- Client methods --------- # - - -class TestECMWFMARSClient(unittest.TestCase): - """Tests for the ECMWFMARSClient method.""" - - def test_init(self) -> None: - with self.assertRaises(KeyError): - _ = MARSClient(area="not a valid area", hours=48) - - def test_mapCachedRaw(self) -> None: - testFilePath: pathlib.Path = pathlib.Path(__file__).parent / "test_2params.grib" - - out = testMARSClient.mapCachedRaw(p=testFilePath) - - # Ensure the dimensions have the right sizes - self.assertDictEqual( - {"init_time": 1, "step": 49, "latitude": 241, "longitude": 301}, - dict(out.sizes.items()), - ) - # Ensure the dimensions of the variables are in the correct order - self.assertEqual( - ("init_time", "step", "latitude", "longitude"), - out[next(iter(out.data_vars.keys()))].dims, - ) - # Ensure the correct datavars are in the dataset - self.assertCountEqual(["tprate", "sd"], list(out.data_vars.keys())) - - def test_buildMarsRequest(self) -> None: - testFilePath: pathlib.Path = pathlib.Path(__file__).parent / "test_2params.grib" - - # Test that the request is build correctly for the default client - testDefaultClient = MARSClient() - out = testDefaultClient._buildMarsRequest( - list_only=True, - target=testFilePath.as_posix(), - it=dt.datetime(2020, 1, 1, tzinfo=dt.UTC), - params=testDefaultClient.desired_params, - steps=range(4), - ) - - out.replace(" ", "") - lines = out.split("\n") - self.assertEqual(lines[0], "list,") - - d: dict = {} - for line in lines[1:]: - key, value = line.split("=") - d[key.strip()] = value.strip().replace(",", "") - - self.assertEqual(d["param"], "/".join(PARAMETER_ECMWFCODE_MAP.keys())) - self.assertEqual(d["date"], "20200101") - - # Test that the request is build correctly with the basic parameters - - testBasicClient = MARSClient( - area="uk", - hours=4, - param_group="basic", - ) - - out = testBasicClient._buildMarsRequest( - list_only=False, - target=testFilePath.as_posix(), - it=dt.datetime(2020, 1, 1, tzinfo=dt.UTC), - params=testBasicClient.desired_params, - steps=range(4), - ) - - out.replace(" ", "") - lines = out.split("\n") - self.assertEqual(lines[0], "retrieve,") - - d2: dict = {} - for line in lines[1:]: - key, value = line.split("=") - d2[key.strip()] = value.strip().replace(",", "") - - self.assertEqual(d2["param"], "167.128/169.128") - self.assertEqual(d2["date"], "20200101") - - -# --------- Static methods --------- # - - -class TestParseAvailableParams(unittest.TestCase): - def test_parsesSmallFileCorrectly(self) -> None: - out = _parseListing(fileData=test_list_response) - - self.assertDictEqual( - { - "params": ["167.128", "169.128"], - "steps": [0, 1, 2, 3], - }, - out, - ) - - def test_parsesParamsCorrectly(self) -> None: - testFilePath: pathlib.Path = pathlib.Path(__file__).parent / "test_list_response.txt" - - filedata: str = testFilePath.read_text() - - out = _parseListing(fileData=filedata) - - self.maxDiff = None - self.assertDictEqual( - { - "params": ["141.128","164.128","165.128","166.128","167.128","169.128","175.128","186.128","187.128","188.128","20.3","246.228","247.228","47.128","57.128"], - "steps": list(range(0, 49)), - }, - out, - ) diff --git a/src/nwp_consumer/internal/inputs/ecmwf/test_multiarea.grib b/src/nwp_consumer/internal/inputs/ecmwf/test_multiarea.grib deleted file mode 100644 index 26837676..00000000 Binary files a/src/nwp_consumer/internal/inputs/ecmwf/test_multiarea.grib and /dev/null differ diff --git a/src/nwp_consumer/internal/inputs/ecmwf/test_s3.py b/src/nwp_consumer/internal/inputs/ecmwf/test_s3.py deleted file mode 100644 index 25801a77..00000000 --- a/src/nwp_consumer/internal/inputs/ecmwf/test_s3.py +++ /dev/null @@ -1,138 +0,0 @@ -"""Unit tests for the S3Client class.""" - -import datetime as dt -import unittest -from pathlib import Path - -import xarray as xr -from botocore.client import BaseClient as BotocoreClient -from botocore.session import Session -from moto.server import ThreadedMotoServer -import numpy as np - -from ._models import ECMWFLiveFileInfo -from .s3 import S3Client - -ENDPOINT_URL = "http://localhost:5000" -BUCKET = "test-bucket" -KEY = "test-key" -SECRET = "test-secret" # noqa: S105 -REGION = "us-east-1" - -RAW = Path("ecmwf") - - -class TestS3Client(unittest.TestCase): - testS3: BotocoreClient - client: S3Client - server: ThreadedMotoServer - - @classmethod - def setUpClass(cls) -> None: - # Start a local S3 server - cls.server = ThreadedMotoServer() - cls.server.start() - - session = Session() - cls.testS3 = session.create_client( - service_name="s3", - region_name=REGION, - endpoint_url=ENDPOINT_URL, - aws_access_key_id=KEY, - aws_secret_access_key=SECRET, - ) - - # Create a mock S3 bucket - cls.testS3.create_bucket( - Bucket=BUCKET, - ) - - # Create an instance of the S3Client class - cls.client = S3Client( - area="uk", - key=KEY, - secret=SECRET, - region=REGION, - bucket=BUCKET, - endpointURL=ENDPOINT_URL, - ) - - @classmethod - def tearDownClass(cls) -> None: - # Delete all objects in bucket - response = cls.testS3.list_objects_v2( - Bucket=BUCKET, - ) - if "Contents" in response: - for obj in response["Contents"]: - cls.testS3.delete_object( - Bucket=BUCKET, - Key=obj["Key"], - ) - cls.server.stop() - - def test_listFilesForInitTime(self) -> None: - files = [ - "A2D01010000010100001", - "A2D01010000010101001", - "A2D01010000010102011", - "A2D01010000010103001", - "A2D01011200010112001", # Different init time - "A2D02191200010112001", # Leap year on 2024-02-29 - ] - for file in files: - # Create files in the mock bucket - self.testS3.put_object( - Bucket=BUCKET, - Key=(RAW / file).as_posix(), - Body=b"test", - ) - - # Test the listFilesForInitTime method - initTime = dt.datetime(2021, 1, 1, 0, 0, 0, tzinfo=dt.UTC) - out = self.client.listRawFilesForInitTime(it=initTime) - self.assertEqual(len(out), 4) - - def test_downloadRawFile(self) -> None: - # Create a file in the mock bucket - self.testS3.put_object( - Bucket=BUCKET, - Key=(RAW / "A2D01010000010100001").as_posix(), - Body=b"test", - ) - - # Test the downloadRawFile method - out = self.client.downloadToCache(fi=ECMWFLiveFileInfo(fname="A2D01010000010100001")) - self.assertEqual(out.read_bytes(), b"test") - - out.unlink() - - def test_mapCached(self) -> None: - testfile: Path = Path(__file__).parent / "test_multiarea.grib" - out: xr.Dataset = self.client.mapCachedRaw(p=testfile) - - self.assertEqual( - out[next(iter(out.data_vars.keys()))].dims, - ("init_time", "step", "latitude", "longitude"), - ) - self.assertEqual(len(out.data_vars.keys()), 18) - self.assertEqual(out.coords["latitude"].to_numpy().max(), 60) - self.assertIn("t2m", list(out.data_vars.keys())) - self.assertTrue(np.all(out.data_vars["t2m"].values)) - - print(out) - - # Check that setting the area maps only the relevant data - indiaClient = S3Client( - area="nw-india", - key=KEY, - secret=SECRET, - region=REGION, - bucket=BUCKET, - endpointURL=ENDPOINT_URL, - ) - out = indiaClient.mapCachedRaw(p=testfile) - self.assertEqual(out.coords["latitude"].to_numpy().max(), 31) - self.assertIn("t2m", list(out.data_vars.keys())) - self.assertTrue(np.all(out.data_vars["t2m"].values)) - diff --git a/src/nwp_consumer/internal/inputs/icon/__init__.py b/src/nwp_consumer/internal/inputs/icon/__init__.py deleted file mode 100644 index 02fde8c9..00000000 --- a/src/nwp_consumer/internal/inputs/icon/__init__.py +++ /dev/null @@ -1,3 +0,0 @@ -__all__ = ["Client"] - -from .client import Client diff --git a/src/nwp_consumer/internal/inputs/icon/_consts.py b/src/nwp_consumer/internal/inputs/icon/_consts.py deleted file mode 100644 index bf746d14..00000000 --- a/src/nwp_consumer/internal/inputs/icon/_consts.py +++ /dev/null @@ -1,81 +0,0 @@ -"""Defines all parameters available from icon.""" - - -EU_SL_VARS = [ - "alb_rad", - "alhfl_s", - "ashfl_s", - "asob_s", - "asob_t", - "aswdifd_s", - "aswdifu_s", - "aswdir_s", - "athb_s", - "athb_t", - "aumfl_s", - "avmfl_s", - "cape_con", - "cape_ml", - "clch", - "clcl", - "clcm", - "clct", - "clct_mod", - "cldepth", - "h_snow", - "hbas_con", - "htop_con", - "htop_dc", - "hzerocl", - "pmsl", - "ps", - "qv_2m", - "qv_s", - "rain_con", - "rain_gsp", - "relhum_2m", - "rho_snow", - "runoff_g", - "runoff_s", - "snow_con", - "snow_gsp", - "snowlmt", - "synmsg_bt_cl_ir10.8", - "t_2m", - "t_g", - "t_snow", - "tch", - "tcm", - "td_2m", - "tmax_2m", - "tmin_2m", - "tot_prec", - "tqc", - "tqi", - "u_10m", - "v_10m", - "vmax_10m", - "w_snow", - "w_so", - "ww", - "z0", -] - -EU_ML_VARS = ["clc", "fi", "omega", "p", "qv", "relhum", "t", "tke", "u", "v", "w"] - -GLOBAL_SL_VARS = [ - *EU_SL_VARS, - "alb_rad", - "c_t_lk", - "freshsnw", - "fr_ice", - "h_ice", - "h_ml_lk", - "t_ice", - "t_s", - "tqr", - "tqs", - "tqv", -] - -GLOBAL_ML_VARS: list[str] = ["fi", "relhum", "t", "u", "v"] diff --git a/src/nwp_consumer/internal/inputs/icon/_models.py b/src/nwp_consumer/internal/inputs/icon/_models.py deleted file mode 100644 index adb165fe..00000000 --- a/src/nwp_consumer/internal/inputs/icon/_models.py +++ /dev/null @@ -1,37 +0,0 @@ -import datetime as dt - -from nwp_consumer import internal - - -class IconFileInfo(internal.FileInfoModel): - def __init__( - self, it: dt.datetime, filename: str, currentURL: str, step: int, - ) -> None: - self._it = it - # The name of the file when stored by the storer. We decompress from bz2 - # at download time, so we don't want that extension on the filename. - self._filename = filename.replace(".bz2", "") - self._url = currentURL - self.step = step - - def filename(self) -> str: - """Overrides the corresponding method in the parent class.""" - return self._filename - - def filepath(self) -> str: - """Overrides the corresponding method in the parent class.""" - # The filename in the fully-qualified filepath still has the .bz2 extension - # so add it back in - return self._url + "/" + self._filename + ".bz2" - - def it(self) -> dt.datetime: - """Overrides the corresponding method in the parent class.""" - return self._it - - def steps(self) -> list[int]: - """Overrides the corresponding method in the parent class.""" - return [self.step] - - def variables(self) -> list[str]: - """Overrides the corresponding method in the parent class.""" - raise NotImplementedError() diff --git a/src/nwp_consumer/internal/inputs/icon/client.py b/src/nwp_consumer/internal/inputs/icon/client.py deleted file mode 100644 index e8d8009b..00000000 --- a/src/nwp_consumer/internal/inputs/icon/client.py +++ /dev/null @@ -1,439 +0,0 @@ -"""Implements a client to fetch ICON data from DWD.""" -import bz2 -import datetime as dt -import pathlib -import re -import urllib.request - -import numpy as np -import requests -import structlog -import xarray as xr - -from nwp_consumer import internal - -from ._consts import EU_ML_VARS, EU_SL_VARS, GLOBAL_ML_VARS, GLOBAL_SL_VARS -from ._models import IconFileInfo - -log = structlog.getLogger() - - -class Client(internal.FetcherInterface): - """Implements a client to fetch ICON data from DWD.""" - - baseurl: str # The base URL for the ICON model - model: str # The model to fetch data for - parameters: list[str] # The parameters to fetch - - def __init__(self, model: str, hours: int = 48, param_group: str = "default") -> None: - """Create a new Icon Client. - - Exposes a client for ICON data from DWD that conforms to the FetcherInterface. - - Args: - model: The model to fetch data for. Valid models are "europe" and "global". - hours: The number of hours to fetch data for. - param_group: The set of parameters to fetch. - Valid groups are "default", "full", and "basic". - """ - self.baseurl = "https://opendata.dwd.de/weather/nwp" - - match model: - case "europe": - self.baseurl += "/icon-eu/grib" - case "global": - self.baseurl += "/icon/grib" - case _: - raise ValueError( - f"unknown icon model {model}. Valid models are 'europe' and 'global'", - ) - - match (param_group, model): - case ("default", _): - self.parameters = [ - "t_2m", - "clch", - "clcm", - "clcl", - "asob_s", - "athb_s", - "w_snow", - "relhum_2m", - "u_10m", - "v_10m", - "clat", - "clon", - ] - case ("basic", "europe"): - self.parameters = ["t_2m", "asob_s"] - case ("basic", "global"): - self.parameters = ["t_2m", "asob_s", "clat", "clon"] - case ("single-level", "europe"): - self.parameters = EU_SL_VARS - case ("single-level", "global"): - self.parameters = [*GLOBAL_SL_VARS, "clat", "clon"] - case ("multi-level", "europe"): - self.parameters = EU_ML_VARS - case ("multi-level", "global"): - self.parameters = [*GLOBAL_ML_VARS, "clat", "clon"] - case ("full", "europe"): - self.parameters = EU_SL_VARS + EU_ML_VARS - case ("full", "global"): - self.parameters = [*GLOBAL_SL_VARS, *GLOBAL_ML_VARS, "clat", "clon"] - case (_, _): - raise ValueError( - f"unknown parameter group {param_group}." - "Valid groups are 'default', 'full', 'basic', 'single-level', 'multi-level'", - ) - - self.model = model - self.hours = hours - - def datasetName(self) -> str: - """Overrides the corresponding method in the parent class.""" - return f"ICON_{self.model}".upper() - - def getInitHours(self) -> list[int]: # noqa: D102 - return [0, 6, 12, 18] - - def listRawFilesForInitTime(self, *, it: dt.datetime) -> list[internal.FileInfoModel]: # noqa: D102 - # ICON data is only available for today's date. If data hasn't been uploaded for that init - # time yet, then yesterday's data will still be present on the server. - if dt.datetime.now(dt.UTC) - it > dt.timedelta(days=1): - log.warn( - event="requested init time is too old", - inittime=it.strftime("%Y-%m-%d %H:%M"), - ) - return [] - - # Ignore inittimes that don't correspond to valid hours - if it.hour not in self.getInitHours(): - return [] - - files: list[internal.FileInfoModel] = [] - - # Files are split per parameter, level, and step, with a webpage per parameter - # * The webpage contains a list of files for the parameter - # * Find these files for each parameter and add them to the list - for param in self.parameters: - # The list of files for the parameter - parameterFiles: list[internal.FileInfoModel] = [] - - # Fetch DWD webpage detailing the available files for the parameter - response = requests.get(f"{self.baseurl}/{it.strftime('%H')}/{param}/", timeout=3) - - if response.status_code != 200: - log.warn( - event="error fetching filelisting webpage for parameter", - status=response.status_code, - url=response.url, - param=param, - inittime=it.strftime("%Y-%m-%d %H:%M"), - ) - continue - - # The webpage's HTML contains a list of tags - # * Each tag has a href, most of which point to a file) - for line in response.text.splitlines(): - # Check if the line contains a href, if not, skip it - refmatch = re.search(pattern=r'href="(.+)">', string=line) - if refmatch is None: - continue - - # The href contains the name of a file - parse this into a FileInfo object - fi: IconFileInfo | None = None - # Find the corresponding files for the parameter - fi = _parseIconFilename( - name=refmatch.groups()[0], - baseurl=self.baseurl, - match_ml=True, - match_pl=True, - ) - # Ignore the file if it is not for today's date - # or has a step > desired hours - if fi is None or fi.it() != it or (fi.step > self.hours): - continue - - # Add the file to the list - parameterFiles.append(fi) - - log.debug( - event="listed files for parameter", - param=param, - inittime=it.strftime("%Y-%m-%d %H:%M"), - url=response.url, - numfiles=len(parameterFiles), - ) - - # Add the files for the parameter to the list of all files - files.extend(parameterFiles) - - return files - - def mapCachedRaw(self, *, p: pathlib.Path) -> xr.Dataset: - """Overrides the corresponding method in the parent class.""" - if p.suffix != ".grib2": - log.warn( - event="cannot map non-grib file to dataset", - filepath=p.as_posix(), - ) - return xr.Dataset() - - if "_CLAT" in p.stem or "_CLON" in p.stem: - # Ignore the latitude and longitude files - return xr.Dataset() - - log.debug(event="mapping raw file to xarray dataset", filepath=p.as_posix()) - - # Load the raw file as a dataset - try: - ds = xr.open_dataset( - p.as_posix(), - engine="cfgrib", - chunks={ - "time": 1, - "step": 1, - "latitude": "auto", - "longitude": "auto", - }, - backend_kwargs={"indexpath": ""}, - ) - except Exception as e: - log.warn( - event="error converting raw file as dataset", - error=e, - filepath=p.as_posix(), - ) - return xr.Dataset() - - # Most datasets are opened as xarray datasets with "step" as a scalar (nonindexed) coordinate - # Some do not, so add it in manually - if "step" not in ds.coords: - ds = ds.assign_coords({"step": np.timedelta64(0, 'ns')}) - - # The global data is stacked as a 1D values array without lat or long data - # * Manually add it in from the CLAT and CLON files - if self.model == "global": - ds = _addLatLon(ds=ds, p=p) - - # Rename variables to match their listing online to prevent single/multi overlap - # * This assumes the name of the file locally is the same as online - pmatch = re.search(r"_\d{3}_([A-Z0-9_]+).grib", p.name) - if pmatch is not None: - var_name = pmatch.groups()[0] - ds = ds.rename({list(ds.data_vars.keys())[0]: var_name.lower()}) - - # Map the data to the internal dataset representation - # * Transpose the Dataset so that the dimensions are correctly ordered - # * Rechunk the data to a more optimal size - ds = ( - ds.rename({"time": "init_time"}) - .expand_dims(["init_time", "step"]) - .drop_vars(["valid_time", "number", "surface", "heightAboveGround", "level", "isobaricLevel"], errors="ignore") - .sortby("step") - .transpose("init_time", "step", ...) - .chunk( - { - "init_time": 1, - "step": -1, - }, - ) - ) - - return ds - - def downloadToCache( # noqa: D102 - self, - *, - fi: internal.FileInfoModel, - ) -> pathlib.Path: - log.debug(event="requesting download of file", file=fi.filename(), path=fi.filepath()) - try: - response = urllib.request.urlopen(fi.filepath()) - except Exception as e: - log.warn( - event="error calling url for file", - url=fi.filepath(), - filename=fi.filename(), - error=e, - ) - return pathlib.Path() - - if response.status != 200: - log.warn( - event="error downloading file", - status=response.status, - url=fi.filepath(), - filename=fi.filename(), - ) - return pathlib.Path() - - # Extract the bz2 file when downloading - cfp: pathlib.Path = internal.rawCachePath(it=fi.it(), filename=fi.filename()) - with open(str(cfp), "wb") as f: - dec = bz2.BZ2Decompressor() - for chunk in iter(lambda: response.read(16 * 1024), b""): - f.write(dec.decompress(chunk)) - f.flush() - - if not cfp.exists(): - log.warn( - event="error extracting bz2 file", - filename=fi.filename(), - url=fi.filepath(), - filepath=cfp.as_posix(), - ) - return pathlib.Path() - - log.debug( - event="fetched all data from file", - filename=fi.filename(), - url=fi.filepath(), - filepath=cfp.as_posix(), - nbytes=cfp.stat().st_size, - ) - - return cfp - - def parameterConformMap(self) -> dict[str, internal.OCFParameter]: - """Overrides the corresponding method in the parent class.""" - # See https://d-nb.info/1081305452/34 for a list of ICON parameters - return { - "t_2m": internal.OCFParameter.TemperatureAGL, - "clch": internal.OCFParameter.HighCloudCover, - "clcm": internal.OCFParameter.MediumCloudCover, - "clcl": internal.OCFParameter.LowCloudCover, - "asob_s": internal.OCFParameter.DownwardShortWaveRadiationFlux, - "athb_s": internal.OCFParameter.DownwardLongWaveRadiationFlux, - "w_snow": internal.OCFParameter.SnowDepthWaterEquivalent, - "relhum_2m": internal.OCFParameter.RelativeHumidityAGL, - "u_10m": internal.OCFParameter.WindUComponentAGL, - "v_10m": internal.OCFParameter.WindVComponentAGL, - "clat": "lat", # Icon has a seperate dataset for latitude... - "clon": "lon", # ... and longitude (for the global model)! Go figure - } - - -def _parseIconFilename( - name: str, - baseurl: str, - match_sl: bool = True, - match_ti: bool = True, - match_ml: bool = False, - match_pl: bool = False, -) -> IconFileInfo | None: - """Parse a string of HTML into an IconFileInfo object, if it contains one. - - Args: - name: The name of the file to parse - baseurl: The base URL for the ICON model - match_sl: Whether to match single-level files - match_ti: Whether to match time-invariant files - match_ml: Whether to match model-level files - match_pl: Whether to match pressure-level files - """ - # Define the regex patterns to match the different types of file; X is step, L is level - # * Single Level: `MODEL_single-level_YYYYDDMMHH_XXX_SOME_PARAM.grib2.bz2` - slRegex = r"single-level_(\d{10})_(\d{3})_([A-Za-z_\d]+).grib" - # * Time Invariant: `MODEL_time-invariant_YYYYDDMMHH_SOME_PARAM.grib2.bz2` - tiRegex = r"time-invariant_(\d{10})_([A-Za-z_\d]+).grib" - # * Model Level: `MODEL_model-level_YYYYDDMMHH_XXX_LLL_SOME_PARAM.grib2.bz2` - mlRegex = r"model-level_(\d{10})_(\d{3})_(\d+)_([A-Za-z_\d]+).grib" - # * Pressure Level: `MODEL_pressure-level_YYYYDDMMHH_XXX_LLLL_SOME_PARAM.grib2.bz2` - plRegex = r"pressure-level_(\d{10})_(\d{3})_(\d+)_([A-Za-z_\d]+).grib" - - itstring = paramstring = "" - stepstring = "000" - # Try to match the href to one of the regex patterns - slmatch = re.search(pattern=slRegex, string=name) - timatch = re.search(pattern=tiRegex, string=name) - mlmatch = re.search(pattern=mlRegex, string=name) - plmatch = re.search(pattern=plRegex, string=name) - - if slmatch and match_sl: - itstring, stepstring, paramstring = slmatch.groups() - elif timatch and match_ti: - itstring, paramstring = timatch.groups() - elif mlmatch and match_ml: - itstring, stepstring, levelstring, paramstring = mlmatch.groups() - elif plmatch and match_pl: - itstring, stepstring, levelstring, paramstring = plmatch.groups() - else: - return None - - it = dt.datetime.strptime(itstring, "%Y%m%d%H").replace(tzinfo=dt.UTC) - - return IconFileInfo( - it=it, - filename=name, - currentURL=f"{baseurl}/{it.strftime('%H')}/{paramstring.lower()}/", - step=int(stepstring), - ) - - -def _addLatLon(*, ds: xr.Dataset, p: pathlib.Path) -> xr.Dataset: - """Add latitude and longitude data to the dataset. - - Global ICON files do not contain latitude and longitude data, - opting instead for a single `values` dimension. The lats and longs are then - accessible from seperate files. This function injects the lat and lon data - from these files into the dataset. - - :param ds: The dataset to reshape - :param p: The path to the file being reshaped - """ - # Adapted from https://stackoverflow.com/a/62667154 and - # https://github.com/SciTools/iris-grib/issues/140#issuecomment-1398634288 - - # Inject latitude and longitude into the dataset if they are missing - if "latitude" not in ds.dims: - rawlats: list[pathlib.Path] = list(p.parent.glob("*CLAT.grib2")) - if len(rawlats) == 0: - log.warn( - event="no latitude file found for init time", - filepath=p.as_posix(), - init_time=p.parent.name, - ) - return xr.Dataset() - latds = xr.open_dataset( - rawlats[0], - engine="cfgrib", - backend_kwargs={"errors": "ignore"}, - ).load() - tiledlats = latds["tlat"].data - del latds - - if "longitude" not in ds: - rawlons: list[pathlib.Path] = list(p.parent.glob("*CLON.grib2")) - if len(rawlons) == 0: - log.warn( - event="no longitude file found for init time", - filepath=p.as_posix(), - init_time=p.parent.name, - ) - return xr.Dataset() - londs = xr.open_dataset( - rawlons[0], - engine="cfgrib", - backend_kwargs={"errors": "ignore"}, - ).load() - tiledlons = londs["tlon"].data - del londs - - if ds.sizes["values"] != len(tiledlats) or ds.sizes["values"] != len(tiledlons): - raise ValueError( - f"dataset has {ds.sizes['values']} values, " - f"but expected {len(tiledlats) * len(tiledlons)}", - ) - - # Create new coordinates, - # which give the `latitude` and `longitude` position for each position in the `values` dimension: - - ds = ds.assign_coords( - { - "latitude": ("values", tiledlats), - "longitude": ("values", tiledlons), - }, - ) - - return ds diff --git a/src/nwp_consumer/internal/inputs/icon/test_client.py b/src/nwp_consumer/internal/inputs/icon/test_client.py deleted file mode 100644 index c6dd6610..00000000 --- a/src/nwp_consumer/internal/inputs/icon/test_client.py +++ /dev/null @@ -1,142 +0,0 @@ -import datetime as dt -import pathlib -import unittest -from typing import TYPE_CHECKING - -import xarray as xr - -if TYPE_CHECKING: - from ._models import IconFileInfo - -from .client import Client, _parseIconFilename - -testClientGlobal = Client(model="global") -testClientEurope = Client(model="europe") - - -class TestClient(unittest.TestCase): - def test_mapCachedRawGlobal(self) -> None: - tests = [ - { - "filename": "test_icon_global_001_CLCL.grib2", - "expected_dims": ["init_time", "step", "values"], - "expected_var": "ccl", - }, - { - "filename": "test_icon_global_001_HTOP_CON.grib2", - "expected_dims": ["init_time", "step", "values"], - "expected_var": "hcct", - }, - { - "filename": "test_icon_global_001_CLCT_MOD.grib2", - "expected_dims": ["init_time", "step", "values"], - "expected_var": "CLCT_MOD", - }, - ] - - for tst in tests: - with self.subTest(f"test file {tst['filename']}"): - out = testClientGlobal.mapCachedRaw(p=pathlib.Path(__file__).parent / tst["filename"]) - print(out) - - # Check latitude and longitude are injected - self.assertTrue("latitude" in out.coords) - self.assertTrue("longitude" in out.coords) - # Check that the dimensions are correctly ordered and renamed - self.assertEqual((list(out.dims.keys())), tst["expected_dims"]) - - def test_mapCachedRawEurope(self) -> None: - tests = [ - { - "filename": "test_icon_europe_001_CLCL.grib2", - "expected_dims": ["init_time", "step", "latitude", "longitude"], - "expected_var": "ccl", - }, - ] - - for tst in tests: - with self.subTest(f"test file {tst['filename']}"): - out = testClientEurope.mapCachedRaw(p=pathlib.Path(__file__).parent / tst["filename"]) - print(out) - - # Check latitude and longitude are injected - self.assertTrue("latitude" in out.coords) - self.assertTrue("longitude" in out.coords) - # Check that the dimensions are correctly ordered and renamed - for data_var in out.data_vars: - self.assertEqual(list(out[data_var].dims), tst["expected_dims"]) - - def test_mergeRaw(self) -> None: - ds1 = testClientGlobal.mapCachedRaw( - p=pathlib.Path(__file__).parent / "test_icon_global_001_CLCT_MOD.grib2" - ) - ds2 = testClientGlobal.mapCachedRaw( - p=pathlib.Path(__file__).parent / "test_icon_global_001_HTOP_CON.grib2" - ) - - # This should merge without raising an error - _ = xr.merge([ds1, ds2]) - - -class TestParseIconFilename(unittest.TestCase): - baseurl = "https://opendata.dwd.de/weather/nwp/icon/grib" - - def test_parsesSingleLevel(self) -> None: - filename: str = "icon_global_icosahedral_single-level_2020090100_000_T_HUM.grib2.bz2" - - out: IconFileInfo | None = _parseIconFilename( - name=filename, - baseurl=self.baseurl, - ) - self.assertIsNotNone(out) - self.assertEqual(out.filename(), filename.removesuffix(".bz2")) - self.assertEqual(out.it(), dt.datetime(2020, 9, 1, 0, tzinfo=dt.UTC)) - - def test_parsesTimeInvariant(self) -> None: - filename: str = "icon_global_icosahedral_time-invariant_2020090100_CLAT.grib2.bz2" - - out: IconFileInfo | None = _parseIconFilename( - name=filename, - baseurl=self.baseurl, - ) - self.assertIsNotNone(out) - self.assertEqual(out.filename(), filename.removesuffix(".bz2")) - self.assertEqual(out.it(), dt.datetime(2020, 9, 1, 0, tzinfo=dt.UTC)) - - def test_parsesModelLevel(self) -> None: - filename: str = "icon_global_icosahedral_model-level_2020090100_048_32_CLCL.grib2.bz2" - - out: IconFileInfo | None = _parseIconFilename( - name=filename, - baseurl=self.baseurl, - match_ml=True, - ) - self.assertIsNotNone(out) - self.assertEqual(out.filename(), filename.removesuffix(".bz2")) - self.assertEqual(out.it(), dt.datetime(2020, 9, 1, 0, tzinfo=dt.UTC)) - - out: IconFileInfo | None = _parseIconFilename( - name=filename, - baseurl=self.baseurl, - match_ml=False, - ) - self.assertIsNone(out) - - def test_parsesPressureLevel(self) -> None: - filename: str = "icon_global_icosahedral_pressure-level_2020090100_048_1000_T.grib2.bz2" - - out: IconFileInfo | None = _parseIconFilename( - name=filename, - baseurl=self.baseurl, - match_pl=True, - ) - self.assertIsNotNone(out) - self.assertEqual(out.filename(), filename.removesuffix(".bz2")) - self.assertEqual(out.it(), dt.datetime(2020, 9, 1, 0, tzinfo=dt.UTC)) - - out: IconFileInfo | None = _parseIconFilename( - name=filename, - baseurl=self.baseurl, - match_pl=False, - ) - self.assertIsNone(out) diff --git a/src/nwp_consumer/internal/inputs/icon/test_icon_europe_000_ASOB_S.grib2 b/src/nwp_consumer/internal/inputs/icon/test_icon_europe_000_ASOB_S.grib2 deleted file mode 100644 index a2b14f0d..00000000 Binary files a/src/nwp_consumer/internal/inputs/icon/test_icon_europe_000_ASOB_S.grib2 and /dev/null differ diff --git a/src/nwp_consumer/internal/inputs/icon/test_icon_europe_001_CLCL.grib2 b/src/nwp_consumer/internal/inputs/icon/test_icon_europe_001_CLCL.grib2 deleted file mode 100644 index d77be855..00000000 Binary files a/src/nwp_consumer/internal/inputs/icon/test_icon_europe_001_CLCL.grib2 and /dev/null differ diff --git a/src/nwp_consumer/internal/inputs/icon/test_icon_global_001_CLCL.grib2 b/src/nwp_consumer/internal/inputs/icon/test_icon_global_001_CLCL.grib2 deleted file mode 100644 index 7fad65ea..00000000 Binary files a/src/nwp_consumer/internal/inputs/icon/test_icon_global_001_CLCL.grib2 and /dev/null differ diff --git a/src/nwp_consumer/internal/inputs/icon/test_icon_global_001_CLCT_MOD.grib2 b/src/nwp_consumer/internal/inputs/icon/test_icon_global_001_CLCT_MOD.grib2 deleted file mode 100644 index 66f31e08..00000000 Binary files a/src/nwp_consumer/internal/inputs/icon/test_icon_global_001_CLCT_MOD.grib2 and /dev/null differ diff --git a/src/nwp_consumer/internal/inputs/icon/test_icon_global_001_HTOP_CON.grib2 b/src/nwp_consumer/internal/inputs/icon/test_icon_global_001_HTOP_CON.grib2 deleted file mode 100644 index b0b229bb..00000000 Binary files a/src/nwp_consumer/internal/inputs/icon/test_icon_global_001_HTOP_CON.grib2 and /dev/null differ diff --git a/src/nwp_consumer/internal/inputs/icon/test_icon_global_CLAT.grib2 b/src/nwp_consumer/internal/inputs/icon/test_icon_global_CLAT.grib2 deleted file mode 100644 index 5cbe15e0..00000000 Binary files a/src/nwp_consumer/internal/inputs/icon/test_icon_global_CLAT.grib2 and /dev/null differ diff --git a/src/nwp_consumer/internal/inputs/icon/test_icon_global_CLON.grib2 b/src/nwp_consumer/internal/inputs/icon/test_icon_global_CLON.grib2 deleted file mode 100644 index b13f1e9f..00000000 Binary files a/src/nwp_consumer/internal/inputs/icon/test_icon_global_CLON.grib2 and /dev/null differ diff --git a/src/nwp_consumer/internal/inputs/meteofrance/HP1_00H24H_t.grib2 b/src/nwp_consumer/internal/inputs/meteofrance/HP1_00H24H_t.grib2 deleted file mode 100644 index c2186000..00000000 Binary files a/src/nwp_consumer/internal/inputs/meteofrance/HP1_00H24H_t.grib2 and /dev/null differ diff --git a/src/nwp_consumer/internal/inputs/meteofrance/IP1_00H24H_t.grib2 b/src/nwp_consumer/internal/inputs/meteofrance/IP1_00H24H_t.grib2 deleted file mode 100644 index ec160ed6..00000000 Binary files a/src/nwp_consumer/internal/inputs/meteofrance/IP1_00H24H_t.grib2 and /dev/null differ diff --git a/src/nwp_consumer/internal/inputs/meteofrance/SP1_00H24H_t.grib2 b/src/nwp_consumer/internal/inputs/meteofrance/SP1_00H24H_t.grib2 deleted file mode 100644 index 61e69393..00000000 Binary files a/src/nwp_consumer/internal/inputs/meteofrance/SP1_00H24H_t.grib2 and /dev/null differ diff --git a/src/nwp_consumer/internal/inputs/meteofrance/__init__.py b/src/nwp_consumer/internal/inputs/meteofrance/__init__.py deleted file mode 100644 index 02fde8c9..00000000 --- a/src/nwp_consumer/internal/inputs/meteofrance/__init__.py +++ /dev/null @@ -1,3 +0,0 @@ -__all__ = ["Client"] - -from .client import Client diff --git a/src/nwp_consumer/internal/inputs/meteofrance/_consts.py b/src/nwp_consumer/internal/inputs/meteofrance/_consts.py deleted file mode 100644 index a512d1d5..00000000 --- a/src/nwp_consumer/internal/inputs/meteofrance/_consts.py +++ /dev/null @@ -1,4 +0,0 @@ -"""Defines all parameters available from Arpege.""" - -ARPEGE_GLOBAL_VARIABLES = ['u10','v10','si10','wdir10','t2m','r2','gust','efg10','nfg10','ssrd','tp','sprate','d2m','sh2','sshf','slhf','strd','ssr','str','ewss','nsss','t','sp','tcwv','lcc','mcc','hcc','hpbl','h','ws','u','v','pres','r','wdir','u200','v200','si200','u100','v100','si100','z','q','clwc','ciwc','cc','dpt','tke','w','pv','vo','absv','papt',] -ARPEGE_GLOBAL_PARAMETER_SETS = ['HP1','HP2','IP1','IP2','IP3','IP4','SP1','SP2'] diff --git a/src/nwp_consumer/internal/inputs/meteofrance/_models.py b/src/nwp_consumer/internal/inputs/meteofrance/_models.py deleted file mode 100644 index ee594d65..00000000 --- a/src/nwp_consumer/internal/inputs/meteofrance/_models.py +++ /dev/null @@ -1,37 +0,0 @@ -import datetime as dt - -from nwp_consumer import internal - - -class ArpegeFileInfo(internal.FileInfoModel): - def __init__( - self, - it: dt.datetime, - filename: str, - currentURL: str, - step: int, - ) -> None: - self._it = it - self._filename = filename - self._url = currentURL - self.step = step - - def filename(self) -> str: - """Overrides the corresponding method in the parent class.""" - return self._filename - - def filepath(self) -> str: - """Overrides the corresponding method in the parent class.""" - return self._url + self._filename - - def it(self) -> dt.datetime: - """Overrides the corresponding method in the parent class.""" - return self._it - - def steps(self) -> list[int]: - """Overrides the corresponding method in the parent class.""" - return [self.step] - - def variables(self) -> list[str]: - """Overrides the corresponding method in the parent class.""" - raise NotImplementedError() diff --git a/src/nwp_consumer/internal/inputs/meteofrance/client.py b/src/nwp_consumer/internal/inputs/meteofrance/client.py deleted file mode 100644 index 7a512331..00000000 --- a/src/nwp_consumer/internal/inputs/meteofrance/client.py +++ /dev/null @@ -1,315 +0,0 @@ -"""Implements a client to fetch Arpege data from MeteoFrance AWS.""" -import datetime as dt -import pathlib -import re -import typing - -import cfgrib -import s3fs -import structlog -import xarray as xr - -from nwp_consumer import internal - -from ._consts import ARPEGE_GLOBAL_PARAMETER_SETS, ARPEGE_GLOBAL_VARIABLES -from ._models import ArpegeFileInfo - -log = structlog.getLogger() - -COORDINATE_ALLOW_LIST: typing.Sequence[str] = ("time", "step", "latitude", "longitude") - - -class Client(internal.FetcherInterface): - """Implements a client to fetch Arpege data from AWS.""" - - baseurl: str # The base URL for the Argpege model - model: str # The model to fetch data for - parameters: list[str] # The parameters to fetch - - def __init__(self, model: str, hours: int = 48, param_group: str = "default") -> None: - """Create a new Arpege Client. - - Exposes a client for Arpege data from AWS MeteoFrance that conforms to the FetcherInterface. - - Args: - model: The model to fetch data for. Valid models are "europe" and "global". - param_group: The set of parameters to fetch. - Valid groups are "default", "full", and "basic". - """ - self.baseurl = "s3://mf-nwp-models/" - self.fs = s3fs.S3FileSystem(anon=True) - - match model: - case "europe": - self.baseurl += "arpege-europe/v1/" - case "global": - self.baseurl += "arpege-world/v1/" - case _: - raise ValueError( - f"unknown arpege model {model}. Valid models are 'europe' and 'global'", - ) - - match (param_group, model): - case ("default", _): - self.parameters = ["t2m", "hcc", "mcc", "lcc", "ssrd", "d2m", "u10", "v10"] - case ("basic", "europe"): - self.parameters = ["t2m", "ssrd"] - case ("basic", "global"): - self.parameters = ["t2m", "ssrd"] - case ("full", "europe"): - self.parameters = ARPEGE_GLOBAL_VARIABLES - case ("full", "global"): - self.parameters = ARPEGE_GLOBAL_VARIABLES - case (_, _): - raise ValueError( - f"unknown parameter group {param_group}." - "Valid groups are 'default', 'full', 'basic'", - ) - - self.model = model - self.hours = hours - - def datasetName(self) -> str: - """Overrides the corresponding method in the parent class.""" - return f"MeteoFrance_{self.model}".upper() - - def getInitHours(self) -> list[int]: # noqa: D102 - return [0, 6, 12, 18] - - def listRawFilesForInitTime(self, *, it: dt.datetime) -> list[internal.FileInfoModel]: # noqa: D102 - # Ignore inittimes that don't correspond to valid hours - if it.hour not in self.getInitHours(): - return [] - - files: list[internal.FileInfoModel] = [] - - # Files are split per set of parameters, and set of steps - # The list of files for the parameter - parameterFiles: list[internal.FileInfoModel] = [] - - # Parameter sets - for parameter_set in ARPEGE_GLOBAL_PARAMETER_SETS: - # Fetch Arpege webpage detailing the available files for the parameter - files = self.fs.ls( - f"{self.baseurl}{it.strftime('%Y-%m-%d')}/{it.strftime('%H')}/{parameter_set}/" - ) - - # The webpage's HTML contains a list of tags - # * Each tag has a href, most of which point to a file) - for f in files: - if ".inv" in f: # Ignore the .inv files - continue - # The href contains the name of a file - parse this into a FileInfo object - fi: ArpegeFileInfo | None = None - fi = _parseArpegeFilename( - name=f.split("/")[-1], - baseurl=f"{self.baseurl}{it.strftime('%Y-%m-%d')}/{it.strftime('%H')}/{parameter_set}/", - match_hl=len(self.parameters) > 6, - match_pl=len(self.parameters) > 6, - ) - # Ignore the file if it is not for today's date or has a step > desired - if fi is None or fi.it() != it or (fi.step > self.hours): - continue - - # Add the file to the list - parameterFiles.append(fi) - - log.debug( - event="listed files for parameter", - param=parameter_set, - inittime=it.strftime("%Y-%m-%d %H:%M"), - url=f, - numfiles=len(parameterFiles), - ) - - # Add the files for the parameter to the list of all files - files.extend(parameterFiles) - - return files - - def mapCachedRaw(self, *, p: pathlib.Path) -> xr.Dataset: # noqa: D102 - if p.suffix != ".grib2": - log.warn( - event="cannot map non-grib file to dataset", - filepath=p.as_posix(), - ) - return xr.Dataset() - - log.debug(event="mapping raw file to xarray dataset", filepath=p.as_posix()) - - # Load the raw file as a dataset - try: - ds = cfgrib.open_datasets( - p.as_posix(), - ) - except Exception as e: - log.warn( - event="error converting raw file as dataset", - error=e, - filepath=p.as_posix(), - ) - return xr.Dataset() - # Check if datasets is more than a single dataset or not - # * If it is, merge the datasets into a single dataset - if len(ds) > 1: - if "_IP" in str(p): # Pressure levels - for i, d in enumerate(ds): - if "isobaricInhPa" in d.coords and "isobaricInhPa" not in d.dims: - d = d.expand_dims("isobaricInhPa") - ds[i] = d - ds = xr.merge([d for d in ds if "isobaricInhPa" in d.coords], compat="override") - elif "_SP" in str(p): # Single levels - for i, d in enumerate(ds): - if "surface" in d.coords: - d = d.rename({"surface": "heightAboveGround"}) - # Make heightAboveGround a coordinate - if "heightAboveGround" in d.coords: - d = d.expand_dims("heightAboveGround") - ds[i] = d - # Merge all the datasets that have heightAboveGround - ds = xr.merge([d for d in ds if "heightAboveGround" in d.coords], compat="override") - elif "_HP" in str(p): # Height levels - for i, d in enumerate(ds): - if "heightAboveGround" in d.coords and "heightAboveGround" not in d.dims: - d = d.expand_dims("heightAboveGround") - ds[i] = d - ds = xr.merge([d for d in ds if "heightAboveGround" in d.coords], compat="override") - else: - ds = ds[0] - ds = ds.drop_vars("unknown", errors="ignore") - - # Map the data to the internal dataset representation - # * Transpose the Dataset so that the dimensions are correctly ordered - # * Rechunk the data to a more optimal size - ds = ( - ds.rename({"time": "init_time"}) - .expand_dims("init_time") - .transpose("init_time", "step", ...) - .sortby("step") - .chunk( - { - "init_time": 1, - "step": -1, - }, - ) - ) - - return ds - - def downloadToCache( # noqa: D102 - self, - *, - fi: internal.FileInfoModel, - ) -> pathlib.Path: - log.debug(event="requesting download of file", file=fi.filename(), path=fi.filepath()) - # Extract the bz2 file when downloading - cfp: pathlib.Path = internal.rawCachePath(it=fi.it(), filename=fi.filename()) - - self.fs.get(str(fi.filepath()), str(cfp)) - - log.debug( - event="fetched all data from file", - filename=fi.filename(), - url=fi.filepath(), - filepath=cfp.as_posix(), - nbytes=cfp.stat().st_size, - ) - - return cfp - - def parameterConformMap(self) -> dict[str, internal.OCFParameter]: - """Overrides the corresponding method in the parent class.""" - # See https://mf-models-on-aws.org/en/doc/datasets/v1/ - # for a list of Arpege parameters - return { - "t2m": internal.OCFParameter.TemperatureAGL, - "hcc": internal.OCFParameter.HighCloudCover, - "mcc": internal.OCFParameter.MediumCloudCover, - "lcc": internal.OCFParameter.LowCloudCover, - "ssrd": internal.OCFParameter.DownwardShortWaveRadiationFlux, - "d2m": internal.OCFParameter.RelativeHumidityAGL, - "u10": internal.OCFParameter.WindUComponentAGL, - "v10": internal.OCFParameter.WindVComponentAGL, - } - - -def _parseArpegeFilename( - name: str, - baseurl: str, - match_sl: bool = True, - match_hl: bool = True, - match_pl: bool = False, -) -> ArpegeFileInfo | None: - """Parse a string of HTML into an ArpegeFileInfo object, if it contains one. - - Args: - name: The name of the file to parse - baseurl: The base URL for the Arpege model - match_sl: Whether to match single-level files - match_hl: Whether to match height-level files - match_pl: Whether to match pressure-level files - """ - # Defined from the href of the file, its harder to split - # Define the regex patterns to match the different types of file; X is step, L is level - # * Single Level: `MODEL_single-level_YYYYDDMMHH_XXX_SOME_PARAM.grib2.bz2` - slRegex = r"s3://mf-nwp-models/arpege-([A-Za-z_\d]+)/v1/(\d{4})-(\d{2})-(\d{2})/(\d{2})/SP(\d{1})/(\d{2})H(\d{2})H.grib2" - # * Height Level: `MODEL_time-invariant_YYYYDDMMHH_SOME_PARAM.grib2.bz2` - hlRegex = r"s3://mf-nwp-models/arpege-([A-Za-z_\d]+)/v1/(\d{4})-(\d{2})-(\d{2})/(\d{2})/HP(\d{1})/(\d{2})H(\d{2})H.grib2" - # * Pressure Level: `MODEL_model-level_YYYYDDMMHH_XXX_LLL_SOME_PARAM.grib2.bz2` - plRegex = r"s3://mf-nwp-models/arpege-([A-Za-z_\d]+)/v1/(\d{4})-(\d{2})-(\d{2})/(\d{2})/IP(\d{1})/(\d{2})H(\d{2})H.grib2" - - itstring_year = itstring_month = itstring_day = itstring_hour = paramstring = "" - stepstring_start = stepstring_end = "00" - # Try to match the href to one of the regex patterns - slmatch = re.search(pattern=slRegex, string=baseurl + name) - hlmatch = re.search(pattern=hlRegex, string=baseurl + name) - plmatch = re.search(pattern=plRegex, string=baseurl + name) - - if slmatch and match_sl: - ( - _, - itstring_year, - itstring_month, - itstring_day, - itstring_hour, - paramstring, - stepstring_start, - stepstring_end, - ) = slmatch.groups() - elif hlmatch and match_hl: - ( - _, - itstring_year, - itstring_month, - itstring_day, - itstring_hour, - paramstring, - stepstring_start, - stepstring_end, - ) = hlmatch.groups() - elif plmatch and match_pl: - ( - _, - itstring_year, - itstring_month, - itstring_day, - itstring_hour, - paramstring, - stepstring_start, - stepstring_end, - ) = plmatch.groups() - else: - return None - - it = dt.datetime.strptime( - itstring_year + itstring_month + itstring_day + itstring_hour, "%Y%m%d%H" - ).replace(tzinfo=dt.UTC) - - # TODO Construct the public URL from S3 path? - - return ArpegeFileInfo( - it=it, - filename=name, - currentURL=f"{baseurl}", - step=int(stepstring_start), - ) diff --git a/src/nwp_consumer/internal/inputs/meteofrance/test_client.py b/src/nwp_consumer/internal/inputs/meteofrance/test_client.py deleted file mode 100644 index 13b16128..00000000 --- a/src/nwp_consumer/internal/inputs/meteofrance/test_client.py +++ /dev/null @@ -1,99 +0,0 @@ -import datetime as dt -import pathlib -import unittest -from typing import TYPE_CHECKING - -if TYPE_CHECKING: - from ._models import ArpegeFileInfo - -from .client import Client, _parseArpegeFilename - -testClient = Client(model="global") - - -class TestClient(unittest.TestCase): - def test_mapCachedRaw(self) -> None: - - tests = [ - { - "filename": "SP1_00H24H_t.grib2", - "expected_dims": ("init_time", "step", "latitude", "longitude"), - "expected_var": "t", - }, - { - "filename": "HP1_00H24H_t.grib2", - "expected_dims": ("init_time", "step", "heightAboveGround", "latitude", "longitude"), - "expected_var": "t", - }, - { - "filename": "IP1_00H24H_t.grib2", - "expected_dims": ("init_time", "step", "isobaricInhPa", "latitude", "longitude"), - "expected_var": "t", - }, - ] - - for tst in tests: - with self.subTest(f"test file {tst['filename']}"): - out = testClient.mapCachedRaw(p=pathlib.Path(__file__).parent / tst["filename"]) - - # Check latitude and longitude are injected - self.assertTrue("latitude" in out.coords) - self.assertTrue("longitude" in out.coords) - # Check that the dimensions are correctly ordered and renamed - self.assertEqual( - out[next(iter(out.data_vars.keys()))].dims, - tst["expected_dims"], - ) - - -class TestParseArpegeFilename(unittest.TestCase): - baseurl = "s3://mf-nwp-models/arpege-world/v1/2023-12-03/12/" - - def test_parsesSingleLevel(self) -> None: - filename: str = "00H24H.grib2" - - out: ArpegeFileInfo | None = _parseArpegeFilename( - name=filename, - baseurl=self.baseurl+"SP1/", - ) - self.assertIsNotNone(out) - self.assertEqual(out.filename(), filename) - self.assertEqual(out.it(), dt.datetime(2023, 12, 3, 12, tzinfo=dt.timezone.utc)) - - def test_parsesHeightLevel(self) -> None: - filename: str = "00H24H.grib2" - - out: ArpegeFileInfo | None = _parseArpegeFilename( - name=filename, - baseurl=self.baseurl+"HP2/", - match_hl=True, - ) - self.assertIsNotNone(out) - self.assertEqual(out.filename(), filename) - self.assertEqual(out.it(), dt.datetime(2023, 12, 3, 12, tzinfo=dt.timezone.utc)) - - out: ArpegeFileInfo | None = _parseArpegeFilename( - name=filename, - baseurl=self.baseurl, - match_hl=False, - ) - self.assertIsNone(out) - - def test_parsesPressureLevel(self) -> None: - filename: str = "00H24H.grib2" - - out: ArpegeFileInfo | None = _parseArpegeFilename( - name=filename, - baseurl=self.baseurl+"IP4/", - match_pl=True, - ) - self.assertIsNotNone(out) - self.assertEqual(out.filename(), filename) - self.assertEqual(out.it(), dt.datetime(2023, 12, 3, 12, tzinfo=dt.timezone.utc)) - - out: ArpegeFileInfo | None = _parseArpegeFilename( - name=filename, - baseurl=self.baseurl, - match_pl=False, - ) - self.assertIsNone(out) diff --git a/src/nwp_consumer/internal/inputs/metoffice/README.md b/src/nwp_consumer/internal/inputs/metoffice/README.md deleted file mode 100644 index 2707687c..00000000 --- a/src/nwp_consumer/internal/inputs/metoffice/README.md +++ /dev/null @@ -1,202 +0,0 @@ -# MetOffice API - ---- - - -## Data - -Currently being fetched from our MetOffice orders: - -### `uk-5params-35steps` - -| Name | Long Name | Level | ID | Unit | -|-----------------------------------|----------------------------------------------------|--------------|-----------|--------| -| Low Cloud Cover | low-cloud-cover | `atmosphere` | `lcc` | % | -| Snow Depth | snow-depth-water-equivalent | `ground` | `sd` | kg m-2 | -| Downward Shortwave Radiation Flux | downward-short-wave-radiation-flux | `ground` | `dswrf` | W m-2 | -| Temperature at 1.5m | temperature | `agl` | `t2m` | K | -| Wind Direction at 10m | wind-direction-from-which-blowing-surface-adjusted | `agl` | `unknown` | | - -### `uk-11params-12steps` - -| Name | Long Name | Level | ID | Unit | -|--------------------------------------|------------------------------------|--------------|-----------|------------| -| High Cloud Cover | high-cloud-cover | `atmosphere` | `hcc` | % | -| Medium Cloud Cover | medium-cloud-cover | `atmosphere` | `mcc` | % | -| Low Cloud Cover | low-cloud-cover | `atmosphere` | `lcc` | % | -| Visibility at 1.5m | visibility | `agl` | `vis` | m | -| Relative Humidity at 1.5m | relative-humidity | `agl` | `r2` | % | -| Rain Precipitation Rate | rain-precipitation-rate | `ground` | `rprate` | kg m-2 s-1 | -| Snow Depth - ground | snow-depth-water-equivalent | `ground` | `sd` | kg m-2 | -| Downward Longwave Radiation Flux | downward-long-wave-radiation-flux | `ground` | `dlwrf` | W m-2 | -| Downward Shortwave Radiation Flux | downward-short-wave-radiation-flux | `ground` | `dswrf` | W m-2 | -| Temperature at 1.5m | temperature | `agl` | `t2m` | K | -| Wind Speed at 10m (Surface Adjusted) | wind-speed-surface-adjusted | `agl` | `unknown` | m s-1 | - -> :warning: **NOTE:** The two wind parameters are read in from their grib files as "unknown" - -## Parameter names in datasets - -These orders may provide multiple time steps per "latest" file list. - -Each parameter is loaded as a separate grib file. - -
- Datasets - - --- relative-humidity-1.5 --- - Dimensions: (step: 10, y: 639, x: 455) - Coordinates: - time datetime64[ns] 2023-03-08T10:00:00 - * step (step) timedelta64[ns] 00:00:00 01:00:00 ... 12:00:00 - heightAboveGround float64 1.5 - latitude (y, x) float64 ... - longitude (y, x) float64 ... - valid_time (step) datetime64[ns] ... - Dimensions without coordinates: y, x - Data variables: - r2 (step, y, x) float32 ... - - --- temperature 1.5m --- - Dimensions: (step: 10, y: 639, x: 455) - Coordinates: - time datetime64[ns] 2023-03-08T10:00:00 - * step (step) timedelta64[ns] 00:00:00 01:00:00 ... 12:00:00 - heightAboveGround float64 1.5 - latitude (y, x) float64 ... - longitude (y, x) float64 ... - valid_time (step) datetime64[ns] ... - Dimensions without coordinates: y, x - Data variables: - t2m (step, y, x) float32 ... (t2m because it's called "temperature 2m", even though it's at 1.5m) - - --- visibility 1.5 --- - Dimensions: (step: 10, y: 639, x: 455) - Coordinates: - time datetime64[ns] 2023-03-08T10:00:00 - * step (step) timedelta64[ns] 00:00:00 01:00:00 ... 12:00:00 - heightAboveGround float64 1.5 - latitude (y, x) float64 ... - longitude (y, x) float64 ... - valid_time (step) datetime64[ns] ... - Dimensions without coordinates: y, x - Data variables: - vis (step, y, x) float32 ... - - --- wind speed surface adjusted --- - Dimensions: (step: 10, y: 639, x: 455) - Coordinates: - time datetime64[ns] 2023-03-08T10:00:00 - * step (step) timedelta64[ns] 00:00:00 01:00:00 ... 12:00:00 - heightAboveGround float64 10.0 - latitude (y, x) float64 ... - longitude (y, x) float64 ... - valid_time (step) datetime64[ns] ... - Dimensions without coordinates: y, x - Data variables: - unknown (step, y, x) float32 ... - - --- high cloud cover --- - Dimensions: (step: 10, y: 639, x: 455) - Coordinates: - time datetime64[ns] 2023-03-08T10:00:00 - * step (step) timedelta64[ns] 00:00:00 01:00:00 ... 08:00:00 12:00:00 - atmosphere float64 0.0 - latitude (y, x) float64 ... - longitude (y, x) float64 ... - valid_time (step) datetime64[ns] ... - Dimensions without coordinates: y, x - Data variables: - hcc (step, y, x) float32 ... - - --- low cloud cover --- - Coordinates: - time datetime64[ns] 2023-03-08T10:00:00 - * step (step) timedelta64[ns] 00:00:00 01:00:00 ... 08:00:00 12:00:00 - atmosphere float64 0.0 - latitude (y, x) float64 ... - longitude (y, x) float64 ... - valid_time (step) datetime64[ns] ... - Dimensions without coordinates: y, x - Data variables: - lcc (step, y, x) float32 ... - - --- medium cloud cover --- - Dimensions: (step: 10, y: 639, x: 455) - Coordinates: - time datetime64[ns] 2023-03-08T10:00:00 - * step (step) timedelta64[ns] 00:00:00 01:00:00 ... 08:00:00 12:00:00 - atmosphere float64 0.0 - latitude (y, x) float64 ... - longitude (y, x) float64 ... - valid_time (step) datetime64[ns] ... - Dimensions without coordinates: y, x - Data variables: - mcc (step, y, x) float32 ... - - --- downward longwave radiation flux --- - Dimensions: (step: 10, y: 639, x: 455) - Coordinates: - time datetime64[ns] 2023-03-08T10:00:00 - * step (step) timedelta64[ns] 00:00:00 01:00:00 ... 08:00:00 12:00:00 - surface float64 0.0 - latitude (y, x) float64 ... - longitude (y, x) float64 ... - valid_time (step) datetime64[ns] ... - Dimensions without coordinates: y, x - Data variables: - dlwrf (step, y, x) float32 ... - - --- downward shortwave radiation flux --- - Dimensions: (step: 10, y: 639, x: 455) - Coordinates: - time datetime64[ns] 2023-03-08T10:00:00 - * step (step) timedelta64[ns] 00:00:00 01:00:00 ... 08:00:00 12:00:00 - surface float64 0.0 - latitude (y, x) float64 ... - longitude (y, x) float64 ... - valid_time (step) datetime64[ns] ... - Dimensions without coordinates: y, x - Data variables: - dswrf (step, y, x) float32 ... - - --- snow depth --- - Dimensions: (step: 10, y: 639, x: 455) - Coordinates: - time datetime64[ns] 2023-03-08T10:00:00 - * step (step) timedelta64[ns] 00:00:00 01:00:00 ... 08:00:00 12:00:00 - surface float64 0.0 - latitude (y, x) float64 ... - longitude (y, x) float64 ... - valid_time (step) datetime64[ns] ... - Dimensions without coordinates: y, x - Data variables: - sd (step, y, x) float32 ... - - --- rain precipitation rate --- - Dimensions: (step: 10, y: 639, x: 455) - Coordinates: - time datetime64[ns] 2023-03-08T21:00:00 - * step (step) timedelta64[ns] 00:00:00 01:00:00 ... 08:00:00 12:00:00 - surface float64 0.0 - latitude (y, x) float64 ... - longitude (y, x) float64 ... - valid_time (step) datetime64[ns] ... - Dimensions without coordinates: y, x - Data variables: - rprate (step, y, x) float32 ... - - --- wind direction from which blowing surface adjusted --- - Dimensions: (step: 36, y: 639, x: 455) - Coordinates: - time datetime64[ns] 2023-03-08T21:00:00 - * step (step) timedelta64[ns] 00:00:00 ... 1 days 11:00:00 - heightAboveGround float64 10.0 - latitude (y, x) float64 ... - longitude (y, x) float64 ... - valid_time (step) datetime64[ns] ... - Dimensions without coordinates: y, x - Data variables: - unknown (step, y, x) float32 ... - -
diff --git a/src/nwp_consumer/internal/inputs/metoffice/__init__.py b/src/nwp_consumer/internal/inputs/metoffice/__init__.py deleted file mode 100644 index 74f4c648..00000000 --- a/src/nwp_consumer/internal/inputs/metoffice/__init__.py +++ /dev/null @@ -1,3 +0,0 @@ -__all__ = ['Client'] - -from .client import Client diff --git a/src/nwp_consumer/internal/inputs/metoffice/_models.py b/src/nwp_consumer/internal/inputs/metoffice/_models.py deleted file mode 100644 index b6313151..00000000 --- a/src/nwp_consumer/internal/inputs/metoffice/_models.py +++ /dev/null @@ -1,58 +0,0 @@ -import datetime as dt -from typing import ClassVar - -from marshmallow import EXCLUDE, Schema, fields -from marshmallow_dataclass import dataclass - -from nwp_consumer import internal - - -@dataclass -class MetOfficeFileInfo(internal.FileInfoModel): - - class Meta: - unknown = EXCLUDE - - fileId: str - runDateTime: dt.datetime - - Schema: ClassVar[type[Schema]] = Schema # To prevent confusing type checkers - - def it(self) -> dt.datetime: - """Overrides the corresponding method in the parent class.""" - return self.runDateTime.replace(tzinfo=None) - - def filename(self) -> str: - """Overrides the corresponding method in the parent class.""" - return self.fileId + ".grib" - - def filepath(self) -> str: - """Overrides the corresponding method in the parent class.""" - return f"{self.fileId}/data" - - def steps(self) -> list[int]: - """Overrides the corresponding method in the parent class.""" - raise NotImplementedError() - - def variables(self) -> list[str]: - """Overrides the corresponding method in the parent class.""" - raise NotImplementedError() - - -@dataclass -class MetOfficeOrderDetails: - - class Meta: - unknown = EXCLUDE - - files: list[MetOfficeFileInfo] = fields.List(fields.Nested(MetOfficeFileInfo.Schema())) - - Schema: ClassVar[type[Schema]] = Schema # To prevent confusing type checkers - - -@dataclass -class MetOfficeResponse: - - orderDetails: MetOfficeOrderDetails - - Schema: ClassVar[type[Schema]] = Schema # To prevent confusing type checkers diff --git a/src/nwp_consumer/internal/inputs/metoffice/client.py b/src/nwp_consumer/internal/inputs/metoffice/client.py deleted file mode 100644 index b003ecef..00000000 --- a/src/nwp_consumer/internal/inputs/metoffice/client.py +++ /dev/null @@ -1,337 +0,0 @@ -"""Implements a client to fetch the data from the MetOffice API.""" - -import datetime as dt -import pathlib -import urllib.request - -import pyproj -import requests -import structlog.stdlib -import xarray as xr - -from nwp_consumer import internal - -from ._models import MetOfficeFileInfo, MetOfficeResponse - -log = structlog.getLogger() - -class Client(internal.FetcherInterface): - """Implements a client to fetch the data from the MetOffice API.""" - - # Base https URL for MetOffice's data endpoint - baseurl: str - - # Query string headers to pass to the MetOffice API - __headers: dict[str, str] - - def __init__(self, *, orderID: str, apiKey: str) -> None: - """Create a new MetOfficeClient. - - Exposes a client for the MetOffice API which conforms to the FetcherInterface. - MetOffice API credentials must be provided, as well as an orderID for the - desired dataset. - - Args: - orderID: The orderID to fetch from the MetOffice API. - apiKey: The apiKey to use to authenticate with the MetOffice API. - """ - if any(value in [None, "", "unset"] for value in [apiKey, orderID]): - raise KeyError("must provide apiKey and orderID for MetOffice API") - self.baseurl: str = ( - f"https://data.hub.api.metoffice.gov.uk/atmospheric-models/1.0.0/orders/{orderID}/latest" - ) - self.querystring: dict[str, str] = {"detail": "MINIMAL"} - self.__headers: dict[str, str] = { - "accept": "application/json, application/json", - "apikey": apiKey, - } - - def datasetName(self) -> str: - """Overrides the corresponding method in FetcherInterface.""" - return "UKV" - - def getInitHours(self) -> list[int]: # noqa: D102 - # NOTE: This will depend on the order you have with the MetOffice. - # Technically they can provide data for every hour of the day, - # but OpenClimateFix choose to match what is available from CEDA. - return [0, 3, 6, 9, 12, 15, 18, 21] - - def listRawFilesForInitTime(self, *, it: dt.datetime) -> list[internal.FileInfoModel]: # noqa: D102 - if ( - self.__headers.get("apikey") is None - ): - log.error("all metoffice API credentials not provided") - return [] - - if it.date() != dt.datetime.now(tz=dt.UTC).date(): - log.warn("metoffice API only supports fetching data for the current day") - return [] - - # Ignore inittimes that don't correspond to valid hours - if it.hour not in self.getInitHours(): - return [] - - # Fetch info for all files available on the input date - response: requests.Response = requests.request( - method="GET", - url=self.baseurl, - headers=self.__headers, - params=self.querystring, - ) - try: - rj: dict = response.json() - except Exception as e: - log.warn( - event="error parsing response from filelist endpoint", - error=e, - response=response.content, - ) - return [] - if not response.ok or ("httpCode" in rj and int(rj["httpCode"]) > 399): - log.warn( - event="error response from filelist endpoint", - url=response.url, - response=rj, - ) - return [] - - # Map the response to a MetOfficeResponse object - try: - responseObj: MetOfficeResponse = MetOfficeResponse.Schema().load(response.json()) - except Exception as e: - log.warn( - event="response from metoffice does not match expected schema", - error=e, - response=response.json(), - ) - return [] - - # Filter the file infos for the desired init time - wantedFileInfos: list[MetOfficeFileInfo] = [ - fo for fo in responseObj.orderDetails.files if _isWantedFile(fi=fo, dit=it) - ] - - return wantedFileInfos - - def downloadToCache( # noqa: D102 - self, - *, - fi: internal.FileInfoModel, - ) -> pathlib.Path: - if ( - self.__headers.get("apikey") is None - ): - log.error("all metoffice API credentials not provided") - return pathlib.Path() - - log.debug( - event="requesting download of file", - file=fi.filename(), - ) - url: str = f"{self.baseurl}/{fi.filepath()}" - try: - opener = urllib.request.build_opener() - opener.addheaders = list( - dict( - self.__headers, - **{"accept": "application/x-grib"}, - ).items(), - ) - urllib.request.install_opener(opener) - response = urllib.request.urlopen(url=url) - if response.status != 200: - log.warn( - event="error response received for download file request", - response=response.json(), - url=url, - ) - return pathlib.Path() - except Exception as e: - log.warn( - event="error calling url for file", - url=url, - filename=fi.filename(), - error=e, - ) - return pathlib.Path() - - # Stream the filedata into cache - cfp: pathlib.Path = internal.rawCachePath(it=fi.it(), filename=fi.filename()) - with cfp.open("wb") as f: - for chunk in iter(lambda: response.read(16 * 1024), b""): - f.write(chunk) - f.flush() - - log.debug( - event="fetched all data from file", - filename=fi.filename(), - url=url, - filepath=cfp.as_posix(), - nbytes=cfp.stat().st_size, - ) - - return cfp - - def mapCachedRaw(self, *, p: pathlib.Path) -> xr.Dataset: # noqa: D102 - if p.suffix != ".grib": - log.warn( - event="cannot map non-grib file to dataset", - filepath=p.as_posix(), - ) - return xr.Dataset() - - log.debug( - event="mapping raw file to xarray dataset", - filepath=p.as_posix(), - ) - - # Cfgrib is built upon eccodes which needs an in-memory file to read from - # Load the GRIB file as a cube - try: - # Read the file as a dataset, also reading the values of the keys in 'read_keys' - parameterDataset: xr.Dataset = xr.open_dataset( - p.as_posix(), - engine="cfgrib", - backend_kwargs={"read_keys": ["name", "parameterNumber"], "indexpath": ""}, - chunks={ - "time": 1, - "step": -1, - "x": "auto", - "y": "auto", - }, - ) - except Exception as e: - log.warn( - event="error loading raw file as dataset", - error=e, - filepath=p.as_posix(), - ) - return xr.Dataset() - - # Make the DataArray OCF-compliant - # 1. Rename the parameter to the OCF short name - currentName = next(iter(parameterDataset.data_vars)) - parameterNumber = parameterDataset[currentName].attrs["GRIB_parameterNumber"] - - # The two wind dirs are the only parameters read in as "unknown" - # * Tell them apart via the parameterNumber attribute - # which lines up with the last number in the GRIB2 code specified below - # https://gridded-data-ui.cda.api.metoffice.gov.uk/glossary?groups=Wind&sortOrder=GRIB2_CODE - match currentName, parameterNumber: - case "unknown", 194: - parameterDataset = parameterDataset.rename( - { - currentName: internal.OCFParameter.WindDirectionFromWhichBlowingSurfaceAdjustedAGL.value, - }, - ) - case "unknown", 195: - parameterDataset = parameterDataset.rename( - {currentName: internal.OCFParameter.WindSpeedSurfaceAdjustedAGL.value}, - ) - - # There is some weird behaviour with the radiation parameters, and different setups - # this is a catch all situation (hopefully) - case "sdswrf", 7: - parameterDataset = parameterDataset.rename( - {currentName: 'dswrf'}, - ) - case "sdlwrf", 3: - parameterDataset = parameterDataset.rename( - {currentName: 'dlwrf'}, - ) - - # Map the data to the internal dataset representation - # * Transpose the Dataset so that the dimensions are correctly ordered - # * Rechunk the data to a more optimal size - # * Reverse `latitude` so it's top-to-bottom via reindexing. - parameterDataset = ( - parameterDataset.drop_vars( - names=[ - "height", - "pressure", - "valid_time", - "surface", - "heightAboveGround", - "atmosphere", - "cloudBase", - "meanSea", - "heightAboveGroundLayer", - "level", - ], - errors="ignore", - ) - .rename({"time": "init_time"}) - .expand_dims(["init_time"]) - .sortby("y", ascending=False) - .transpose("init_time", "step", "y", "x") - .sortby("step") - .chunk( - { - "init_time": 1, - "step": -1, - "y": len(parameterDataset.y) // 2, - "x": len(parameterDataset.x) // 2, - }, - ) - ) - - # TODO: Remove this by moving this logic into ocf-datapipes and update PVNet1+2 to use that - # TODO: See issue #26 https://github.com/openclimatefix/nwp-consumer/issues/26 - # 5. Create osgb x and y coordinates from the lat/lon coordinates - # * The lat/lon coordinates are WGS84, i.e. EPSG:4326 - # * The OSGB coordinates are EPSG:27700 - # * Approximate the osgb values by taking the first row and column of the - # transformed x/y grids - latlonOsgbTransformer = pyproj.Transformer.from_crs( - crs_from=4326, - crs_to=27700, - always_xy=True, - ) - osgbX, osgbY = latlonOsgbTransformer.transform( - parameterDataset.longitude.values, - parameterDataset.latitude.values, - ) - osgbX = osgbX.astype(int) - osgbY = osgbY.astype(int) - parameterDataset = parameterDataset.assign_coords( - { - "x": osgbX[0], - "y": [osgbY[i][0] for i in range(len(osgbY))], - }, - ) - - return parameterDataset - - def parameterConformMap(self) -> dict[str, internal.OCFParameter]: - """Overrides the corresponding method in the parent class.""" - return { - "t2m": internal.OCFParameter.TemperatureAGL, - "si10": internal.OCFParameter.WindSpeedSurfaceAdjustedAGL, - "wdir10": internal.OCFParameter.WindDirectionFromWhichBlowingSurfaceAdjustedAGL, - "hcc": internal.OCFParameter.HighCloudCover, - "mcc": internal.OCFParameter.MediumCloudCover, - "lcc": internal.OCFParameter.LowCloudCover, - "vis": internal.OCFParameter.VisibilityAGL, - "r2": internal.OCFParameter.RelativeHumidityAGL, - "rprate": internal.OCFParameter.RainPrecipitationRate, - "tprate": internal.OCFParameter.RainPrecipitationRate, - "sd": internal.OCFParameter.SnowDepthWaterEquivalent, - "dswrf": internal.OCFParameter.DownwardShortWaveRadiationFlux, - "dlwrf": internal.OCFParameter.DownwardLongWaveRadiationFlux, - } - - -def _isWantedFile(*, fi: MetOfficeFileInfo, dit: dt.datetime) -> bool: - """Check if the input FileInfo corresponds to a wanted GRIB file. - - :param fi: FileInfo describing the file to check - :param dit: Desired init time - """ - # False if item has an init_time not equal to desired init time - if fi.it().replace(tzinfo=None) != dit.replace(tzinfo=None): - return False - # False if item is one of the ones ending in +HH - if "+" in fi.filename(): - return False - - return True diff --git a/src/nwp_consumer/internal/inputs/metoffice/test_client.py b/src/nwp_consumer/internal/inputs/metoffice/test_client.py deleted file mode 100644 index 1af02b34..00000000 --- a/src/nwp_consumer/internal/inputs/metoffice/test_client.py +++ /dev/null @@ -1,113 +0,0 @@ -"""Tests for the metoffice module.""" - -import datetime as dt -import pathlib -import unittest.mock - -from ._models import MetOfficeFileInfo -from .client import Client, _isWantedFile - -# --------- Test setup --------- # - -testClient = Client( - orderID="tmp", - apiKey="tmp", -) - -# --------- Client methods --------- # - - -class TestClient_Init(unittest.TestCase): - """Tests for the MetOfficeClient.__init__ method.""" - - def test_errorsWhenVariablesAreNotSet(self) -> None: - with self.assertRaises(KeyError): - _ = Client(orderID="tmp", apiKey="") - - -class TestClient(unittest.TestCase): - """Tests for the MetOfficeClient.""" - - def test_mapCachedRaw(self) -> None: - - tests = [ - { - "filename": "test_knownparam.grib", - "expected_dims": ["init_time", "step", "y", "x"], - "expected_var": "dswrf", - }, - { - "filename": "test_unknownparam1.grib", - "expected_dims": ["init_time", "step", "y", "x"], - "expected_var": "wdir10", - }, - { - "filename": "test_unknownparam2.grib", - "expected_dims": ["init_time", "step", "y", "x"], - "expected_var": "si10", - }, - ] - - for tst in tests: - with self.subTest(f"test file {tst['filename']}"): - out = testClient.mapCachedRaw(p=pathlib.Path(__file__).parent / tst["filename"]) - - # Ensure the dimensions of the variables are correct - for data_var in out.data_vars: - self.assertEqual(list(out[data_var].dims), tst["expected_dims"], - msg=f'Dims "{list(out[data_var].dims)}" not as expected in {tst}') - # Ensure the correct variable is in the data_vars - self.assertTrue(tst["expected_var"] in list(out.data_vars.keys()), - msg=f'Variable "{list(out.data_vars.keys())}" not as expected in {tst}') - # Ensure no unknowns - self.assertNotIn("unknown", list(out.data_vars.keys())) - - -# --------- Static methods --------- # - - -class Test_IsWantedFile(unittest.TestCase): - """Tests for the _isWantedFile method.""" - - def test_correctlyFiltersMetOfficeFileInfos(self) -> None: - initTime: dt.datetime = dt.datetime( - year=2023, - month=3, - day=24, - hour=0, - minute=0, - tzinfo=dt.timezone.utc, - ) - - wantedFileInfos: list[MetOfficeFileInfo] = [ - MetOfficeFileInfo( - fileId="agl_temperature_1.5_2023032400", - runDateTime=dt.datetime( - year=2023, month=3, day=24, hour=0, minute=0, tzinfo=dt.timezone.utc, - ), - ), - MetOfficeFileInfo( - fileId="ground_downward-short-wave-radiation-flux_2023032400", - runDateTime=dt.datetime( - year=2023, month=3, day=24, hour=0, minute=0, tzinfo=dt.timezone.utc, - ), - ), - ] - - unwantedFileInfos: list[MetOfficeFileInfo] = [ - MetOfficeFileInfo( - fileId="agl_temperature_1.5+00", - runDateTime=dt.datetime( - year=2023, month=3, day=24, hour=0, minute=0, tzinfo=dt.timezone.utc, - ), - ), - MetOfficeFileInfo( - fileId="agl_temperature_1.5_2023032403", - runDateTime=dt.datetime( - year=2023, month=3, day=24, hour=3, minute=0, tzinfo=dt.timezone.utc, - ), - ), - ] - - self.assertTrue(all(_isWantedFile(fi=fo, dit=initTime) for fo in wantedFileInfos)) - self.assertFalse(all(_isWantedFile(fi=fo, dit=initTime) for fo in unwantedFileInfos)) diff --git a/src/nwp_consumer/internal/inputs/metoffice/test_knownparam.grib b/src/nwp_consumer/internal/inputs/metoffice/test_knownparam.grib deleted file mode 100644 index bdae72b1..00000000 Binary files a/src/nwp_consumer/internal/inputs/metoffice/test_knownparam.grib and /dev/null differ diff --git a/src/nwp_consumer/internal/inputs/metoffice/test_unknownparam1.grib b/src/nwp_consumer/internal/inputs/metoffice/test_unknownparam1.grib deleted file mode 100644 index e5f86cf9..00000000 Binary files a/src/nwp_consumer/internal/inputs/metoffice/test_unknownparam1.grib and /dev/null differ diff --git a/src/nwp_consumer/internal/inputs/metoffice/test_unknownparam2.grib b/src/nwp_consumer/internal/inputs/metoffice/test_unknownparam2.grib deleted file mode 100644 index df619082..00000000 Binary files a/src/nwp_consumer/internal/inputs/metoffice/test_unknownparam2.grib and /dev/null differ diff --git a/src/nwp_consumer/internal/inputs/metoffice/test_wrongnameparam.grib b/src/nwp_consumer/internal/inputs/metoffice/test_wrongnameparam.grib deleted file mode 100644 index d7c94424..00000000 Binary files a/src/nwp_consumer/internal/inputs/metoffice/test_wrongnameparam.grib and /dev/null differ diff --git a/src/nwp_consumer/internal/inputs/noaa/__init__.py b/src/nwp_consumer/internal/inputs/noaa/__init__.py deleted file mode 100644 index c0ab0b44..00000000 --- a/src/nwp_consumer/internal/inputs/noaa/__init__.py +++ /dev/null @@ -1,4 +0,0 @@ -__all__ = ["AWSClient", "NCARClient"] - -from .aws import Client as AWSClient -from .ncar import Client as NCARClient \ No newline at end of file diff --git a/src/nwp_consumer/internal/inputs/noaa/_consts.py b/src/nwp_consumer/internal/inputs/noaa/_consts.py deleted file mode 100644 index e6f4413f..00000000 --- a/src/nwp_consumer/internal/inputs/noaa/_consts.py +++ /dev/null @@ -1,30 +0,0 @@ -"""Defines all parameters available from NOAA.""" - -GFS_VARIABLES = ['siconc_surface_instant', 'slt_surface_instant', 'cape_surface_instant', 't_surface_instant', - 'sp_surface_instant', 'lsm_surface_instant', 'sr_surface_instant', 'vis_surface_instant', - 'prate_surface_instant', 'acpcp_surface_accum', 'sde_surface_instant', 'cin_surface_instant', - 'orog_surface_instant', 'tp_surface_accum', 'lhtfl_surface_avg', 'shtfl_surface_avg', - 'crain_surface_instant', 'cfrzr_surface_instant', 'cicep_surface_instant', 'csnow_surface_instant', - 'cprat_surface_instant', 'cpofp_surface_instant', 'pevpr_surface_instant', 'sdwe_surface_instant', - 'uflx_surface_avg', 'vflx_surface_avg', 'gust_surface_instant', 'fricv_surface_instant', - 'u-gwd_surface_avg', 'v-gwd_surface_avg', 'hpbl_surface_instant', 'dswrf_surface_avg', - 'uswrf_surface_avg', 'dlwrf_surface_avg', 'ulwrf_surface_avg', 'lftx_surface_instant', - '4lftx_surface_instant', 'veg_surface_instant', 'watr_surface_accum', 'gflux_surface_avg', - 'fco2rec_surface_instant', 'hindex_surface_instant', 'wilt_surface_instant', 'fldcp_surface_instant', - 'al_surface_avg', 'SUNSD_surface_instant', 'prate_surface_avg', 'crain_surface_avg', - 'cfrzr_surface_avg', 'cicep_surface_avg', 'csnow_surface_avg', 'cprat_surface_avg', 'pres_instant', - 'q_instant', 't_instant', 'u_instant', 'v_instant', 'u10_instant', 'v10_instant', 't2m_instant', - 'd2m_instant', 'tmax_max', 'tmin_min', 'sh2_instant', 'r2_instant', 'aptmp_instant', 'u100_instant', - 'v100_instant', 'refd_instant', 't', 'u', 'v', 'q', 'w', 'gh', 'r', 'absv', 'o3mr', 'wz', 'tcc', - 'clwmr', 'icmr', 'rwmr', 'snmr', 'grle', ] - -MISSING_STEP_0_VARIABLES = ['slt_surface_instant', 'sr_surface_instant', 'acpcp_surface_accum', 'tp_surface_accum', - 'lhtfl_surface_avg', 'shtfl_surface_avg', 'cprat_surface_instant', 'pevpr_surface_instant', - 'uflx_surface_avg', 'vflx_surface_avg', 'fricv_surface_instant', 'u-gwd_surface_avg', - 'v-gwd_surface_avg', 'dswrf_surface_avg', 'uswrf_surface_avg', 'dlwrf_surface_avg', - 'ulwrf_surface_avg', 'veg_surface_instant', 'watr_surface_accum', 'gflux_surface_avg', - 'fco2rec_surface_instant', 'al_surface_avg', 'prate_surface_avg', 'crain_surface_avg', - 'cfrzr_surface_avg', 'cicep_surface_avg', 'csnow_surface_avg', 'cprat_surface_avg', - 'tmax_max', 'tmin_min', 'refd_instant', 'q', ] - -EXTRA_STEP_0_VARIABLES = ["landn_surface_instant", "5wavh"] diff --git a/src/nwp_consumer/internal/inputs/noaa/_models.py b/src/nwp_consumer/internal/inputs/noaa/_models.py deleted file mode 100644 index 15388605..00000000 --- a/src/nwp_consumer/internal/inputs/noaa/_models.py +++ /dev/null @@ -1,37 +0,0 @@ -import datetime as dt - -from nwp_consumer import internal - - -class NOAAFileInfo(internal.FileInfoModel): - def __init__( - self, - it: dt.datetime, - filename: str, - currentURL: str, - step: int, - ) -> None: - self._it = it - self._filename = filename - self._url = currentURL - self.step = step - - def filename(self) -> str: - """Overrides the corresponding method in the parent class.""" - return self._filename - - def filepath(self) -> str: - """Overrides the corresponding method in the parent class.""" - return self._url + "/" + self._filename - - def it(self) -> dt.datetime: - """Overrides the corresponding method in the parent class.""" - return self._it - - def steps(self) -> list[int]: - """Overrides the corresponding method in the parent class.""" - return [self.step] - - def variables(self) -> list[str]: - """Overrides the corresponding method in the parent class.""" - raise NotImplementedError() diff --git a/src/nwp_consumer/internal/inputs/noaa/aws.py b/src/nwp_consumer/internal/inputs/noaa/aws.py deleted file mode 100644 index 522c2fe4..00000000 --- a/src/nwp_consumer/internal/inputs/noaa/aws.py +++ /dev/null @@ -1,237 +0,0 @@ -"""Implements a client to fetch NOAA data from AWS.""" -import datetime as dt -import pathlib -import typing -import urllib.request - -import cfgrib -import structlog -import xarray as xr - -from nwp_consumer import internal - -from ._consts import GFS_VARIABLES -from ._models import NOAAFileInfo - -log = structlog.getLogger() - -COORDINATE_ALLOW_LIST: typing.Sequence[str] = ("init_time", "step", "latitude", "longitude") - - -class Client(internal.FetcherInterface): - """Implements a client to fetch NOAA data from AWS.""" - - baseurl: str # The base URL for the NOAA model - model: str # The model to fetch data for - parameters: list[str] # The parameters to fetch - - def __init__(self, model: str, hours: int = 48, param_group: str = "default") -> None: - """Create a new NOAA Client. - - Exposes a client for NOAA data from AWS that conforms to the FetcherInterface. - - Args: - model: The model to fetch data for. Valid models is "global". - param_group: The set of parameters to fetch. - Valid groups are "default", "full", and "basic". - """ - self.baseurl = "https://noaa-gfs-bdp-pds.s3.amazonaws.com" - - match (param_group, model): - case ("default", _): - self.parameters = [ - "t2m", - "tcc", - "mcc", - "hcc", - "lcc", - "dswrf", - "dlwrf", - "prate", - "sdwe", - "r", - "vis", - "u10", - "v10", - "u100", - "v100", - ] - case ("basic", "global"): - self.parameters = ["t2m", "dswrf"] - case ("full", "global"): - raise ValueError("full parameter group is not yet implemented for GFS") - case (_, _): - raise ValueError( - f"unknown parameter group {param_group}." - "Valid groups are 'default', 'full', 'basic'", - ) - - self.model = model - self.hours = hours - - def datasetName(self) -> str: - """Overrides the corresponding method in the parent class.""" - return f"NOAA_{self.model}".upper() - - def getInitHours(self) -> list[int]: # noqa: D102 - return [0, 6, 12, 18] - - def listRawFilesForInitTime(self, *, it: dt.datetime) -> list[internal.FileInfoModel]: # noqa: D102 - # Ignore inittimes that don't correspond to valid hours - if it.hour not in self.getInitHours(): - return [] - - files: list[internal.FileInfoModel] = [] - - # Files are split per timestep - # And the url includes the time and init time - # https://noaa-gfs-bdp-pds.s3.amazonaws.com/gfs.20201206/00/atmos/gfs.t00z.pgrb2.1p00.f000 - for step in range(0, self.hours + 1, 3): - files.append( - NOAAFileInfo( - it=it, - filename=f"gfs.t{it.hour:02}z.pgrb2.1p00.f{step:03}", - currentURL=f"{self.baseurl}/gfs.{it.strftime('%Y%m%d')}/{it.hour:02}/atmos", - step=step, - ), - ) - - log.debug( - event="listed files for init time", - inittime=it.strftime("%Y-%m-%d %H:%M"), - numfiles=len(files), - ) - - return files - - def mapCachedRaw(self, *, p: pathlib.Path) -> xr.Dataset: # noqa: D102 - log.debug(event="mapping raw file to xarray dataset", filepath=p.as_posix()) - - # Load the raw file as a dataset - try: - ds = cfgrib.open_datasets( - p.as_posix(), - backend_kwargs={ - "indexpath": "", - "errors": "ignore", - }, - ) - except Exception as e: - log.warn( - event="error converting raw file as dataset", - error=e, - filepath=p.as_posix(), - ) - return xr.Dataset() - - log.debug(event=f"Loaded the file {p.as_posix()}, and now processing it") - # Process all the parameters into a single file - ds = [ - d - for d in ds - if any(x in d.coords for x in ["surface", "heightAboveGround", "isobaricInhPa"]) - ] - - # Split into surface, heightAboveGround, and isobaricInhPa lists - surface = [d for d in ds if "surface" in d.coords] - heightAboveGround = [d for d in ds if "heightAboveGround" in d.coords] - isobaricInhPa = [d for d in ds if "isobaricInhPa" in d.coords] - - # * Drop any variables we are not intrested in keeping - for i, d in enumerate(surface): - unwanted_variables = [v for v in d.data_vars if v not in self.parameters] - surface[i] = d.drop_vars(unwanted_variables) - for i, d in enumerate(heightAboveGround): - unwanted_variables = [v for v in d.data_vars if v not in self.parameters] - heightAboveGround[i] = d.drop_vars(unwanted_variables) - for i, d in enumerate(isobaricInhPa): - unwanted_variables = [v for v in d.data_vars if v not in self.parameters] - isobaricInhPa[i] = d.drop_vars(unwanted_variables) - - surface_merged = xr.merge(surface, compat="override").drop_vars( - ["unknown_surface_instant", "valid_time"], - errors="ignore", - ) - del surface - # Drop unknown data variable - hag_merged = xr.merge(heightAboveGround).drop_vars("valid_time", errors="ignore") - del heightAboveGround - iso_merged = xr.merge(isobaricInhPa).drop_vars("valid_time", errors="ignore") - del isobaricInhPa - - log.debug(event='Merging surface, hag and iso backtogether') - - total_ds = ( - xr.merge([surface_merged, hag_merged, iso_merged]) - .rename({"time": "init_time"}) - .expand_dims("init_time") - .expand_dims("step") - .transpose("init_time", "step", ...) - .sortby("step") - .chunk({"init_time": 1, "step": 1}) - ) - del surface_merged, hag_merged, iso_merged - - ds = total_ds.drop_dims([c for c in list(total_ds.sizes.keys()) if c not in COORDINATE_ALLOW_LIST]) - - log.debug(event='Finished mapping raw file to xarray', filename=p.as_posix()) - - return ds - - def downloadToCache( # noqa: D102 - self, - *, - fi: internal.FileInfoModel, - ) -> pathlib.Path: - log.debug(event="requesting download of file", file=fi.filename(), path=fi.filepath()) - try: - response = urllib.request.urlopen(fi.filepath()) - except Exception as e: - log.warn( - event="error calling url for file", - url=fi.filepath(), - filename=fi.filename(), - error=e, - ) - return pathlib.Path() - - if response.status != 200: - log.warn( - event="error downloading file", - status=response.status, - url=fi.filepath(), - filename=fi.filename(), - ) - return pathlib.Path() - - # Extract the bz2 file when downloading - cfp: pathlib.Path = internal.rawCachePath(it=fi.it(), filename=fi.filename()) - with open(cfp, "wb") as f: - f.write(response.read()) - - log.debug( - event="fetched all data from file", - filename=fi.filename(), - url=fi.filepath(), - filepath=cfp.as_posix(), - nbytes=cfp.stat().st_size, - ) - - return cfp - - def parameterConformMap(self) -> dict[str, internal.OCFParameter]: - """Overrides the corresponding method in the parent class.""" - # See https://www.nco.ncep.noaa.gov/pmb/products/gfs/gfs.t00z.pgrb2.0p25.f003.shtml for a list of NOAA GFS - return { - "t2m_instant": internal.OCFParameter.TemperatureAGL, - "tcc": internal.OCFParameter.HighCloudCover, - "dswrf_surface_avg": internal.OCFParameter.DownwardShortWaveRadiationFlux, - "dlwrf_surface_avg": internal.OCFParameter.DownwardLongWaveRadiationFlux, - "sdwe_surface_instant": internal.OCFParameter.SnowDepthWaterEquivalent, - "r": internal.OCFParameter.RelativeHumidityAGL, - "u10_instant": internal.OCFParameter.WindUComponentAGL, - "v10_instant": internal.OCFParameter.WindVComponentAGL, - "u100_instant": internal.OCFParameter.WindUComponent100m, - "v100_instant": internal.OCFParameter.WindVComponent100m, - } - diff --git a/src/nwp_consumer/internal/inputs/noaa/ncar.py b/src/nwp_consumer/internal/inputs/noaa/ncar.py deleted file mode 100644 index f3655379..00000000 --- a/src/nwp_consumer/internal/inputs/noaa/ncar.py +++ /dev/null @@ -1,222 +0,0 @@ -"""Implements a client to fetch NOAA data from NCAR.""" -import datetime as dt -import pathlib -import typing -import urllib.request - -import cfgrib -import structlog -import xarray as xr - -from nwp_consumer import internal - -from ._consts import GFS_VARIABLES -from ._models import NOAAFileInfo - -log = structlog.getLogger() - -COORDINATE_ALLOW_LIST: typing.Sequence[str] = ("time", "step", "latitude", "longitude") - - -class Client(internal.FetcherInterface): - """Implements a client to fetch NOAA data from NCAR.""" - - baseurl: str # The base URL for the NOAA model - model: str # The model to fetch data for - parameters: list[str] # The parameters to fetch - - def __init__(self, model: str, hours: int = 48, param_group: str = "default") -> None: - """Create a new NOAA Client. - - Exposes a client for NOAA data from NCAR that conforms to the FetcherInterface. - - Args: - model: The model to fetch data for. Valid models are "global". - param_group: The set of parameters to fetch. - Valid groups are "default", "full", and "basic". - """ - self.baseurl = "https://data.rda.ucar.edu/ds084.1" - - match (param_group, model): - case ("default", _): - self.parameters = ["t2m_instant", "tcc", "dswrf_surface_avg", "dlwrf_surface_avg", - "sdwe_surface_instant", "r", "u10_instant", "v10_instant"] - case ("basic", "global"): - self.parameters = ["t2m_instant", "dswrf_surface_avg"] - case ("full", "global"): - self.parameters = GFS_VARIABLES - case (_, _): - raise ValueError( - f"unknown parameter group {param_group}." - "Valid groups are 'default', 'full', 'basic'", - ) - - self.model = model - self.hours = hours - - def datasetName(self) -> str: - """Overrides the corresponding method in the parent class.""" - return f"NOAA_{self.model}".upper() - - def getInitHours(self) -> list[int]: # noqa: D102 - return [0, 6, 12, 18] - - def listRawFilesForInitTime(self, *, it: dt.datetime) -> list[internal.FileInfoModel]: # noqa: D102 - - # Ignore inittimes that don't correspond to valid hours - if it.hour not in self.getInitHours(): - return [] - - # The GFS dataset goes from 2015-01-15 to present - # * https://rda.ucar.edu/datasets/ds084.1/ - if it < dt.datetime(2015, 1, 15, tzinfo=dt.UTC): - return [] - - files: list[internal.FileInfoModel] = [] - - # The GFS dataset has data in hour jumps of 3 up to 240 - for step in range(0, self.hours + 1, 3): - filename = f"gfs.0p25.{it.strftime('%Y%m%d%H')}.f{step:03}.grib2" - files.append( - NOAAFileInfo( - it=it, - filename=filename, - currentURL=f"{self.baseurl}/{it.strftime('%Y')}/{it.strftime('%Y%m%d')}", - step=step, - ), - ) - - return files - - def mapCachedRaw(self, *, p: pathlib.Path) -> xr.Dataset: # noqa: D102 - if p.suffix != ".grib2": - log.warn( - event="cannot map non-grib file to dataset", - filepath=p.as_posix(), - ) - return xr.Dataset() - - log.debug(event="mapping raw file to xarray dataset", filepath=p.as_posix()) - - # Load the raw file as a list of datasets - try: - ds: list[xr.Dataset] = cfgrib.open_datasets( - p.as_posix(), - ) - except Exception as e: - log.error( - event="error converting raw file as dataset", - error=e, - filepath=p.as_posix(), - ) - return xr.Dataset() - - # Process all the parameters into a single file - ds = [ - d for d in ds - if any(x in d.coords for x in ["surface", "heightAboveGround", "isobaricInhPa"]) - ] - - # Split into surface, heightAboveGround, and isobaricInhPa lists - surface: list[xr.Dataset] = [d for d in ds if "surface" in d.coords] - heightAboveGround: list[xr.Dataset] = [d for d in ds if "heightAboveGround" in d.coords] - isobaricInhPa: list[xr.Dataset] = [d for d in ds if "isobaricInhPa" in d.coords] - del ds - - # Update name of each data variable based off the attribute GRIB_stepType - for i, d in enumerate(surface): - for variable in d.data_vars: - d = d.rename({variable: f"{variable}_surface_{d[f'{variable}'].attrs['GRIB_stepType']}"}) - surface[i] = d - for i, d in enumerate(heightAboveGround): - for variable in d.data_vars: - d = d.rename({variable: f"{variable}_{d[f'{variable}'].attrs['GRIB_stepType']}"}) - heightAboveGround[i] = d - - surface_merged: xr.Dataset = xr.merge(surface).drop_vars( - ["unknown_surface_instant", "valid_time"], errors="ignore", - ) - del surface - heightAboveGround_merged: xr.Dataset = xr.merge(heightAboveGround).drop_vars( - ["valid_time"], errors="ignore", - ) - del heightAboveGround - isobaricInhPa_merged: xr.Dataset = xr.merge(isobaricInhPa).drop_vars( - ["valid_time"], errors="ignore", - ) - del isobaricInhPa - - total_ds = xr.merge([surface_merged, heightAboveGround_merged, isobaricInhPa_merged]) - del surface_merged, heightAboveGround_merged, isobaricInhPa_merged - - # Map the data to the internal dataset representation - # * Transpose the Dataset so that the dimensions are correctly ordered - # * Rechunk the data to a more optimal size - total_ds = ( - total_ds.rename({"time": "init_time"}) - .expand_dims("init_time") - .expand_dims("step") - .transpose("init_time", "step", ...) - .sortby("step") - .chunk({"init_time": 1, "step": 1}) - ) - - return total_ds - - def downloadToCache( # noqa: D102 - self, - *, - fi: internal.FileInfoModel, - ) -> pathlib.Path: - log.debug(event="requesting download of file", file=fi.filename(), path=fi.filepath()) - try: - response = urllib.request.urlopen(fi.filepath()) - except Exception as e: - log.warn( - event="error calling url for file", - url=fi.filepath(), - filename=fi.filename(), - error=e, - ) - return pathlib.Path() - - if response.status != 200: - log.warn( - event="error downloading file", - status=response.status, - url=fi.filepath(), - filename=fi.filename(), - ) - return pathlib.Path() - - # Extract the bz2 file when downloading - cfp: pathlib.Path = internal.rawCachePath(it=fi.it(), filename=fi.filename()) - with open(cfp, "wb") as f: - f.write(response.read()) - - log.debug( - event="fetched all data from file", - filename=fi.filename(), - url=fi.filepath(), - filepath=cfp.as_posix(), - nbytes=cfp.stat().st_size, - ) - - return cfp - - def parameterConformMap(self) -> dict[str, internal.OCFParameter]: - """Overrides the corresponding method in the parent class.""" - # See https://www.nco.ncep.noaa.gov/pmb/products/gfs/gfs.t00z.pgrb2.0p25.f003.shtml - # for a list of NOAA parameters - return { - "t2m_instant": internal.OCFParameter.TemperatureAGL, - "tcc": internal.OCFParameter.HighCloudCover, - "dswrf_surface_avg": internal.OCFParameter.DownwardShortWaveRadiationFlux, - "dlwrf_surface_avg": internal.OCFParameter.DownwardLongWaveRadiationFlux, - "sdwe_surface_instant": internal.OCFParameter.SnowDepthWaterEquivalent, - "r": internal.OCFParameter.RelativeHumidityAGL, - "u10_instant": internal.OCFParameter.WindUComponentAGL, - "v10_instant": internal.OCFParameter.WindVComponentAGL, - "u100_instant": internal.OCFParameter.WindUComponent100m, - "v100_instant": internal.OCFParameter.WindVComponent100m, - } diff --git a/src/nwp_consumer/internal/inputs/noaa/test_aws.py b/src/nwp_consumer/internal/inputs/noaa/test_aws.py deleted file mode 100644 index 9ea7112b..00000000 --- a/src/nwp_consumer/internal/inputs/noaa/test_aws.py +++ /dev/null @@ -1,35 +0,0 @@ -import datetime as dt -import pathlib -import unittest -from typing import TYPE_CHECKING - -if TYPE_CHECKING: - from ._models import NOAAFileInfo - -from .aws import Client - -testClient = Client(model="global", param_group="basic") - - -class TestClient(unittest.TestCase): - def test_mapCachedRaw(self) -> None: - # Test with global file - testFilePath: pathlib.Path = ( - pathlib.Path(__file__).parent / "test_surface_000.grib2" - ) - out = testClient.mapCachedRaw(p=testFilePath) - # Check latitude and longitude are injected - self.assertTrue("latitude" in out.coords) - self.assertTrue("longitude" in out.coords) - print(out) - # Check that the dimensions are correctly ordered and renamed - self.assertEqual( - out[next(iter(out.data_vars.keys()))].dims, - ("init_time", "step", "latitude", "longitude"), - ) - self.assertEqual(len(out["latitude"].values), 721) - self.assertEqual(len(out["longitude"].values), 1440) - self.assertEqual(len(out["init_time"].values), 1) - self.assertEqual(len(out["step"].values), 1) - self.assertListEqual(list(out.data_vars.keys()), ["t2m"]) - diff --git a/src/nwp_consumer/internal/inputs/noaa/test_ncar.py b/src/nwp_consumer/internal/inputs/noaa/test_ncar.py deleted file mode 100644 index 5d0038a9..00000000 --- a/src/nwp_consumer/internal/inputs/noaa/test_ncar.py +++ /dev/null @@ -1,27 +0,0 @@ -import pathlib -import unittest - -from .ncar import Client - -testClient = Client(model="global", param_group="full") - - -class TestClient(unittest.TestCase): - def test_mapCachedRaw(self) -> None: - # Test with global file - testFilePath: pathlib.Path = ( - pathlib.Path(__file__).parent / "test_surface_000.grib2" - ) - out = testClient.mapCachedRaw(p=testFilePath) - # Check latitude and longitude are injected - self.assertTrue("latitude" in out.coords) - self.assertTrue("longitude" in out.coords) - # Check that the dimensions are correctly ordered and renamed - self.assertEqual( - out[next(iter(out.data_vars.keys()))].dims, - ("init_time", "step", "latitude", "longitude"), - ) - self.assertEqual(len(out["latitude"].values), 721) - self.assertEqual(len(out["longitude"].values), 1440) - self.assertEqual(len(out["init_time"].values), 1) - self.assertEqual(len(out["step"].values), 1) diff --git a/src/nwp_consumer/internal/inputs/noaa/test_surface_000.grib2 b/src/nwp_consumer/internal/inputs/noaa/test_surface_000.grib2 deleted file mode 100644 index a24e14a2..00000000 Binary files a/src/nwp_consumer/internal/inputs/noaa/test_surface_000.grib2 and /dev/null differ diff --git a/src/nwp_consumer/internal/models.py b/src/nwp_consumer/internal/models.py deleted file mode 100644 index 0e1d24ee..00000000 --- a/src/nwp_consumer/internal/models.py +++ /dev/null @@ -1,203 +0,0 @@ -"""Contains both ports and domain models for the nwp_consumer package.""" - -import abc -import datetime as dt -import pathlib -from enum import Enum - -import xarray as xr - - -# ------- Domain models ------- # - - -class OCFParameter(str, Enum): - """Short names for the OCF parameters.""" - - LowCloudCover = "lcc" - MediumCloudCover = "mcc" - HighCloudCover = "hcc" - TotalCloudCover = "clt" - VisibilityAGL = "vis" - RelativeHumidityAGL = "r" - RainPrecipitationRate = "prate" - SnowDepthWaterEquivalent = "sde" - DownwardShortWaveRadiationFlux = "dswrf" - DownwardLongWaveRadiationFlux = "dlwrf" - TemperatureAGL = "t" - WindSpeedSurfaceAdjustedAGL = "si10" - WindDirectionFromWhichBlowingSurfaceAdjustedAGL = "wdir10" - WindUComponentAGL = "u10" - WindVComponentAGL = "v10" - WindUComponent100m = "u100" - WindVComponent100m = "v100" - WindUComponent200m = "u200" - WindVComponent200m = "v200" - DirectSolarRadiation = "sr" - DownwardUVRadiationAtSurface = "duvrs" - - -class FileInfoModel(abc.ABC): - """Information about a raw file. - - FileInfoModel assumes the following properties exist for all - raw NWP files that may be encountered in a provider's archive: - - 1. The file has a name - 2. The file has a path - 3. The file corresponds to a single forecast init time - 4. The file corresponds to one or more time steps - 5. The file corresponds to one or more variables - - These assumptions are reflected in the abstract methods of this class. - """ - - @abc.abstractmethod - def filename(self) -> str: - """Return the file name including extension.""" - pass - - @abc.abstractmethod - def filepath(self) -> str: - """Return the remote file path, not including protocols and TLDs.""" - pass - - @abc.abstractmethod - def it(self) -> dt.datetime: - """Return the init time of the file.""" - pass - - @abc.abstractmethod - def steps(self) -> list[int]: - """Return the time steps of the file.""" - pass - - @abc.abstractmethod - def variables(self) -> list[str]: - """Return the variables of the file.""" - pass - - -# ------- Interfaces ------- # -# Represent ports in the hexagonal architecture pattern - -class FetcherInterface(abc.ABC): - """Generic interface for fetching and converting NWP data from an API. - - Used for dependency injection. NWP data from any source shares common properties: - - It is presented in one or many files for a given init_time - - These files can be read as raw bytes - - There is an expected number of files per init_time which correspond to an equivalent - number of variables and steps in the dataset - - The following functions define generic transforms based around these principals. - """ - - @abc.abstractmethod - def listRawFilesForInitTime(self, *, it: dt.datetime) -> list[FileInfoModel]: - """List the relative path of all files available from source for the given init_time. - - :param it: Init Time to list files for - """ - pass - - @abc.abstractmethod - def downloadToCache(self, *, fi: FileInfoModel) -> pathlib.Path: - """Fetch the bytes of a single raw file from source and save to a cache file. - - :param fi: File Info object describing the file to fetch - :return: Path to the local cache file, or pathlib.Path() if the file was not fetched - """ - pass - - @abc.abstractmethod - def mapCachedRaw(self, *, p: pathlib.Path) -> xr.Dataset: - """Create an xarray dataset from the given RAW data in a cache file. - - :param p: Path to cached file holding raw data - :return: Dataset created from the raw data - """ - pass - - @abc.abstractmethod - def getInitHours(self) -> list[int]: - """Get the forecast init hours available from the source. - - :return: List of forecast init hours - """ - pass - - @abc.abstractmethod - def parameterConformMap(self) -> dict[str, OCFParameter]: - """The mapping from the source's parameter names to the OCF short names. - - :return: Dictionary of parameter mappings - """ - pass - - @abc.abstractmethod - def datasetName(self) -> str: - """Return the name of the dataset. - - :return: Name of the dataset - """ - pass - - -class StorageInterface(abc.ABC): - """Generic interface for storing data, used for dependency injection.""" - - @abc.abstractmethod - def exists(self, *, dst: pathlib.Path) -> bool: - """Check if the given path exists. - - :param dst: Path to check - :return: True if the path exists, False otherwise - """ - pass - - @abc.abstractmethod - def store(self, *, src: pathlib.Path, dst: pathlib.Path) -> pathlib.Path: - """Move a file to the store. - - :param src: Path to file to store - :param dst: Desired path in store - :return: Location in raw store - """ - pass - - @abc.abstractmethod - def listInitTimes(self, *, prefix: pathlib.Path) -> list[dt.datetime]: - """List all initTime folders in the given prefix. - - :param prefix: Path to prefix to list initTimes for - :return: List of initTimes - """ - pass - - @abc.abstractmethod - def copyITFolderToCache(self, *, prefix: pathlib.Path, it: dt.datetime) \ - -> list[pathlib.Path]: - """Copy all files in given folder to cache. - - :param prefix: Path of folder in which to find initTimes - :param it: InitTime to copy files for - :return: List of paths to cached files - """ - pass - - @abc.abstractmethod - def delete(self, *, p: pathlib.Path) -> None: - """Delete the given path. - - :param p: Path to delete - """ - pass - - @abc.abstractmethod - def name(self) -> str: - """Return the name of the storage provider. - - :return: Name of the storage provider - """ - pass diff --git a/src/nwp_consumer/internal/outputs/__init__.py b/src/nwp_consumer/internal/outputs/__init__.py deleted file mode 100644 index dd8ce0fc..00000000 --- a/src/nwp_consumer/internal/outputs/__init__.py +++ /dev/null @@ -1,13 +0,0 @@ -"""Output modules the consumer can write to.""" - -from . import ( - huggingface, - localfs, - s3, -) - -__all__ = [ - "localfs", - "s3", - "huggingface", -] diff --git a/src/nwp_consumer/internal/outputs/huggingface/__init__.py b/src/nwp_consumer/internal/outputs/huggingface/__init__.py deleted file mode 100644 index f274eb57..00000000 --- a/src/nwp_consumer/internal/outputs/huggingface/__init__.py +++ /dev/null @@ -1,4 +0,0 @@ -__all__ = ['Client'] - -from .client import Client - diff --git a/src/nwp_consumer/internal/outputs/huggingface/client.py b/src/nwp_consumer/internal/outputs/huggingface/client.py deleted file mode 100644 index c2f2f725..00000000 --- a/src/nwp_consumer/internal/outputs/huggingface/client.py +++ /dev/null @@ -1,313 +0,0 @@ -"""Client for HuggingFace.""" - -import datetime as dt -import pathlib - -import huggingface_hub as hfh -import structlog -from huggingface_hub.hf_api import ( - RepoFile, - RepoFolder, - RevisionNotFoundError, -) - -from nwp_consumer import internal - -log = structlog.getLogger() - - -class Client(internal.StorageInterface): - """Client for HuggingFace.""" - - # HuggingFace API - __api: hfh.HfApi - - # DatasetURL - dsURL: str - - def __init__(self, repoID: str, token: str | None = None, endpoint: str | None = None) -> None: - """Create a new client for HuggingFace. - - Exposes a client for the HuggingFace filesystem API that conforms to the StorageInterface. - - Args: - repoID: The ID of the repo to use for the dataset. - token: The HuggingFace authentication token. - endpoint: The HuggingFace endpoint to use. - """ - self.__api = hfh.HfApi(token=token, endpoint=endpoint) - # Get the URL to the dataset, e.g. https://huggingface.co/datasets/username/dataset - self.dsURL = hfh.hf_hub_url( - endpoint=endpoint, - repo_id=repoID, - repo_type="dataset", - filename="", - ) - # Repo ID - self.repoID = repoID - - try: - self.__api.dataset_info( - repo_id=repoID, - ) - except Exception as e: - log.warn( - event="failed to authenticate with huggingface for given repo", - repo_id=repoID, - error=e, - ) - - def name(self) -> str: - """Overrides the corresponding method of the parent class.""" - return "huggingface" - - def exists(self, *, dst: pathlib.Path) -> bool: - """Overrides the corresponding method of the parent class.""" - try: - path_infos: list[RepoFile | RepoFolder] = self.__api.get_paths_info( - repo_id=self.repoID, - repo_type="dataset", - paths=[dst.as_posix()], - ) - if len(path_infos) == 0: - return False - except RevisionNotFoundError: - return False - return True - - def store(self, *, src: pathlib.Path, dst: pathlib.Path) -> pathlib.Path: - """Overrides the corresponding method of the parent class.""" - # Remove any leading slashes as they are not allowed in huggingface - dst = dst.relative_to("/") if dst.is_absolute() else dst - - # Get the hash of the latest commit - sha: str = self.__api.dataset_info(repo_id=self.repoID).sha - # Handle the case where we are trying to upload a folder - if src.is_dir(): - # Upload the folder using the huggingface API - future = self.__api.upload_folder( - repo_id=self.repoID, - repo_type="dataset", - folder_path=src.as_posix(), - path_in_repo=dst.as_posix(), - parent_commit=sha, - run_as_future=True, - ) - # Handle the case where we are trying to upload a file - else: - # Upload the file using the huggingface API - future = self.__api.upload_file( - repo_id=self.repoID, - repo_type="dataset", - path_or_fileobj=src.as_posix(), - path_in_repo=dst.as_posix(), - parent_commit=sha, - run_as_future=True, - ) - - # Block until the upload is complete to prevent overlapping commits - url = future.result(timeout=120) - log.info("Uploaded to huggingface", commiturl=url) - - # Perform a check on the size of the file - size = self._get_size(p=dst) - if size != src.stat().st_size and future.done(): - log.warn( - event="stored file size does not match source file size", - src=src.as_posix(), - dst=dst.as_posix(), - srcsize=src.stat().st_size, - dstsize=size, - ) - else: - log.debug( - event=f"stored file {dst.name}", - filepath=dst.as_posix(), - nbytes=size, - ) - return dst - - def listInitTimes(self, *, prefix: pathlib.Path) -> list[dt.datetime]: - """Overrides the corresponding method of the parent class.""" - # Remove any leading slashes as they are not allowed in huggingface - prefix = prefix.relative_to("/") if prefix.is_absolute() else prefix - # Get the path relative to the prefix of every folder in the repo - allDirs: list[pathlib.Path] = [ - pathlib.Path(f.path).relative_to(prefix) - for f in self.__api.list_repo_tree( - repo_id=self.repoID, - repo_type="dataset", - path_in_repo=prefix.as_posix(), - recursive=True, - ) - if isinstance(f, RepoFolder) - ] - - # Get the initTime from the folder pattern - initTimes = set() - for d in allDirs: - if d.match(internal.IT_FOLDER_GLOBSTR_RAW): - try: - # Try to parse the folder name as a datetime - ddt = dt.datetime.strptime( - d.as_posix(), - internal.IT_FOLDER_STRUCTURE_RAW, - ).replace(tzinfo=dt.UTC) - initTimes.add(ddt) - except ValueError: - log.debug( - event="ignoring invalid folder name", - name=d.as_posix(), - within=prefix.as_posix(), - ) - - sortedInitTimes = sorted(initTimes) - log.debug( - event=f"found {len(initTimes)} init times in raw directory", - earliest=sortedInitTimes[0], - latest=sortedInitTimes[-1], - ) - return sortedInitTimes - - def copyITFolderToCache(self, *, prefix: pathlib.Path, it: dt.datetime) -> list[pathlib.Path]: - """Overrides the corresponding method of the parent class.""" - # Remove any leading slashes as they are not allowed in huggingface - prefix = prefix.relative_to("/") if prefix.is_absolute() else prefix - - # Get the paths of all files in the folder - paths: list[pathlib.Path] = [ - pathlib.Path(p.path) - for p in self.__api.list_repo_tree( - repo_id=self.repoID, - repo_type="dataset", - path_in_repo=(prefix / it.strftime(internal.IT_FOLDER_STRUCTURE_RAW)).as_posix(), - recursive=True, - ) - if isinstance(p, RepoFile) - ] - - log.debug( - event="copying it folder to cache", - inittime=it.strftime(internal.IT_FOLDER_STRUCTURE_RAW), - numfiles=len(paths), - ) - - # Read all files into cache - cachedPaths: list[pathlib.Path] = [] - for path in paths: - # Huggingface replicates the full path from repo root on download - # to local directory. - cfp: pathlib.Path = internal.CACHE_DIR / path.as_posix() - - # Use existing cached file if it already exists in the cache - if cfp.exists() and cfp.stat().st_size > 0: - log.debug( - event="file already exists in cache, skipping", - filepath=path.as_posix(), - cachepath=cfp.as_posix(), - ) - cachedPaths.append(cfp) - continue - - # Don't copy file from the store if it is empty - if self.exists(dst=path) is False: - log.warn( - event="file does not exist in store, skipping", - filepath=path.as_posix(), - ) - continue - - # Copy the file from the store to cache - self.__api.hf_hub_download( - repo_id=self.repoID, - repo_type="dataset", - filename=path.as_posix(), - local_dir=internal.CACHE_DIR.as_posix(), - local_dir_use_symlinks=False, - ) - - # Check that the file was copied correctly - if cfp.stat().st_size != self._get_size(p=path) or cfp.stat().st_size == 0: - log.warn( - event="copied file size does not match source file size", - src=path.as_posix(), - dst=cfp.as_posix(), - srcsize=self._get_size(p=path), - dstsize=cfp.stat().st_size, - ) - else: - cachedPaths.append(cfp) - - log.debug( - event="copied it folder to cache", - nbytes=[p.stat().st_size for p in cachedPaths], - inittime=it.strftime("%Y-%m-%d %H:%M"), - ) - - return cachedPaths - - def delete(self, *, p: pathlib.Path) -> None: - """Overrides the corresponding method of the parent class.""" - # Remove any leading slashes as they are not allowed in huggingface - p = p.relative_to("/") if p.is_absolute() else p - - # Determine if the path corresponds to a file or a folder - info: RepoFile | RepoFolder = self.__api.get_paths_info( - repo_id=self.repoID, - repo_type="dataset", - paths=[p.as_posix()], - recursive=False, - )[0] - # Call the relevant delete function using the huggingface API - if isinstance(info, RepoFolder): - self.__api.delete_folder( - repo_id=self.repoID, - repo_type="dataset", - path_in_repo=p.as_posix(), - ) - else: - self.__api.delete_file( - repo_id=self.repoID, - repo_type="dataset", - path_in_repo=p.as_posix(), - ) - - def _get_size(self, *, p: pathlib.Path) -> int: - """Gets the size of a file or folder in the huggingface dataset.""" - # Remove any leading slashes as they are not allowed in huggingface - p = p.relative_to("/") if p.is_absolute() else p - - size: int = 0 - # Get the info of the path - path_info: RepoFile | RepoFolder = self.__api.get_paths_info( - repo_id=self.repoID, - repo_type="dataset", - paths=[p.as_posix()], - ) - - if len(path_info) == 0: - # The path in question doesn't exist - log.warn( - event="path does not exist in huggingface dataset", - path=p.as_posix(), - ) - return size - - # Calculate the size of the file or folder - if isinstance(path_info[0], RepoFolder): - size = sum( - [ - f.size - for f in self.__api.list_repo_tree( - repo_id=self.repoID, - repo_type="dataset", - path_in_repo=p.as_posix(), - recursive=True, - ) - if isinstance(f, RepoFile) - ], - ) - elif isinstance(path_info[0], RepoFile): - size = path_info[0].size - - return size diff --git a/src/nwp_consumer/internal/outputs/huggingface/test_client.py b/src/nwp_consumer/internal/outputs/huggingface/test_client.py deleted file mode 100644 index f1698faa..00000000 --- a/src/nwp_consumer/internal/outputs/huggingface/test_client.py +++ /dev/null @@ -1,43 +0,0 @@ -import datetime as dt -import pathlib -import unittest - -from nwp_consumer import internal - -from .client import Client - -USER = "openclimatefix" -RAW = pathlib.Path("raw") - - -class TestHuggingFaceClient(unittest.TestCase): - repoID: str - client: Client - - @classmethod - def setUpClass(cls) -> None: - cls.repoID = "PolyAI/minds14" - cls.client = Client(repoID=cls.repoID) - - def test_get_size(self) -> None: - """Test that the size of a file is returned correctly.""" - name_size_map: dict[str, int] = { - "README.md": 5276, - "data": 471355396, - } - for name, exp in name_size_map.items(): - with self.subTest(msg=name): - self.assertEqual(self.client._get_size(p=pathlib.Path(name)), exp) - - def test_exists(self) -> None: - """Test that the existence of a file is returned correctly.""" - name_exists_map: dict[str, bool] = { - "README.md": True, - "data": True, - "nonexistent1": False, - "nonexistent/nonexistent2": False, - } - for name, exp in name_exists_map.items(): - with self.subTest(msg=name): - self.assertEqual(self.client.exists(dst=pathlib.Path(name)), exp) - diff --git a/src/nwp_consumer/internal/outputs/localfs/__init__.py b/src/nwp_consumer/internal/outputs/localfs/__init__.py deleted file mode 100644 index 74f4c648..00000000 --- a/src/nwp_consumer/internal/outputs/localfs/__init__.py +++ /dev/null @@ -1,3 +0,0 @@ -__all__ = ['Client'] - -from .client import Client diff --git a/src/nwp_consumer/internal/outputs/localfs/client.py b/src/nwp_consumer/internal/outputs/localfs/client.py deleted file mode 100644 index c60095f6..00000000 --- a/src/nwp_consumer/internal/outputs/localfs/client.py +++ /dev/null @@ -1,141 +0,0 @@ -"""Client for local filesystem.""" - -import datetime as dt -import os -import pathlib -import shutil - -import structlog - -from nwp_consumer import internal - -log = structlog.getLogger() - - -class Client(internal.StorageInterface): - """Client for local filesystem. - - This class implements the StorageInterface for the local filesystem. - """ - - def name(self) -> str: - """Overrides the corresponding method in the parent class.""" - return "localfilesystem" - - def exists(self, *, dst: pathlib.Path) -> bool: - """Overrides the corresponding method in the parent class.""" - return dst.exists() - - def store(self, *, src: pathlib.Path, dst: pathlib.Path) -> pathlib.Path: - """Overrides the corresponding method in the parent class.""" - if src == dst: - return dst - - dst.parent.mkdir(parents=True, exist_ok=True) - if src.is_dir(): - shutil.copytree(src=src, dst=dst) - else: - shutil.copy(src=src, dst=dst) - - if src.stat().st_size != dst.stat().st_size: - log.warn( - event="file size mismatch", - src=src.as_posix(), - dst=dst.as_posix(), - srcbytes=src.stat().st_size, - dstbytes=dst.stat().st_size, - ) - else: - log.debug( - event="stored file locally", - src=src.as_posix(), - dst=dst.as_posix(), - nbytes=dst.stat().st_size, - ) - - # Delete the cache to avoid double storage - try: - src.unlink() - except: - log.warn( - event="could not delete source file. Will be cleaned up at end of run", - src=src.as_posix(), - ) - - return dst - - def listInitTimes(self, *, prefix: pathlib.Path) -> list[dt.datetime]: - """Overrides the corresponding method in the parent class.""" - # List all the inittime folders in the given directory - dirs = [ - f.relative_to(prefix) - for f in prefix.glob(internal.IT_FOLDER_GLOBSTR_RAW) - if f.suffix == "" - ] - - initTimes = set() - for dir in dirs: - try: - # Try to parse the dir as a datetime - ddt: dt.datetime = dt.datetime.strptime( - dir.as_posix(), - internal.IT_FOLDER_STRUCTURE_RAW, - ).replace(tzinfo=dt.UTC) - # Add the initTime to the set - initTimes.add(ddt) - except ValueError: - log.debug( - event="ignoring invalid folder name", - name=dir.as_posix(), - within=prefix.as_posix(), - ) - - if len(initTimes) == 0: - log.debug( - event="no init times found in raw directory", - within=prefix.as_posix(), - ) - return [] - - sortedInitTimes = sorted(initTimes) - log.debug( - event=f"found {len(initTimes)} init times in raw directory", - earliest=sortedInitTimes[0], - latest=sortedInitTimes[-1], - ) - - return sortedInitTimes - - def copyITFolderToCache(self, *, prefix: pathlib.Path, it: dt.datetime) -> list[pathlib.Path]: - """Overrides the corresponding method in the parent class.""" - # Check if the folder exists - if not (prefix / it.strftime(internal.IT_FOLDER_STRUCTURE_RAW)).exists(): - log.debug( - event="Init time folder not present", - path=(prefix / it.strftime(internal.IT_FOLDER_STRUCTURE_RAW)).as_posix(), - ) - return [] - filesInFolder = list((prefix / it.strftime(internal.IT_FOLDER_STRUCTURE_RAW)).iterdir()) - - cfps: list[pathlib.Path] = [] - for file in filesInFolder: - # Copy the file to the cache if it isn't already there - dst: pathlib.Path = internal.rawCachePath(it=it, filename=file.name) - if not dst.exists(): - dst.parent.mkdir(parents=True, exist_ok=True) - shutil.copy2(src=file, dst=dst) - cfps.append(dst) - - return cfps - - def delete(self, *, p: pathlib.Path) -> None: - """Overrides the corresponding method in the parent class.""" - if not p.exists(): - raise FileNotFoundError(f"file does not exist: {p}") - if p.is_file(): - p.unlink() - elif p.is_dir(): - shutil.rmtree(p.as_posix()) - else: - raise ValueError(f"path is not a file or directory: {p}") - return diff --git a/src/nwp_consumer/internal/outputs/localfs/test_client.py b/src/nwp_consumer/internal/outputs/localfs/test_client.py deleted file mode 100644 index 6c9384a4..00000000 --- a/src/nwp_consumer/internal/outputs/localfs/test_client.py +++ /dev/null @@ -1,187 +0,0 @@ -import datetime as dt -import shutil -import unittest -import uuid -from pathlib import Path - -import numpy as np -import xarray as xr - -from nwp_consumer import internal - -from .client import Client - -RAW = Path("test_raw_dir") -ZARR = Path("test_zarr_dir") - - -class TestLocalFSClient(unittest.TestCase): - @classmethod - def setUpClass(cls) -> None: - # Make test directories - RAW.mkdir(parents=True, exist_ok=True) - ZARR.mkdir(parents=True, exist_ok=True) - - cls.testClient = Client() - - @classmethod - def tearDownClass(cls) -> None: - # Clean up the temporary directory - shutil.rmtree(RAW.as_posix()) - shutil.rmtree(ZARR.as_posix()) - - def test_exists(self) -> None: - initTime = dt.datetime(2021, 1, 1, 0, 0, 0, tzinfo=dt.UTC) - - # Create a file in the raw directory - path = RAW / f"{initTime:{internal.IT_FOLDER_STRUCTURE_RAW}}" / "test_file.grib" - path.parent.mkdir(parents=True, exist_ok=True) - path.touch() - - # Check if the file exists using the function - exists = self.testClient.exists(dst=path) - - # Assert that the file exists - self.assertTrue(exists) - - # Remove the init time folder - shutil.rmtree(RAW / "2021") - - # Check that the function returns false when the file does not exist - exists = self.testClient.exists( - dst=RAW / f"{initTime:{internal.IT_FOLDER_STRUCTURE_RAW}}" / "not_exists.grib", - ) - - # Assert that the file does not exist - self.assertFalse(exists) - - # Create a zarr file in the zarr directory - testDS = xr.Dataset( - data_vars={ - "UKV": ( - ("init_time", "variable", "step", "x", "y"), - np.random.rand(1, 2, 12, 100, 100), - ), - }, - coords={ - "init_time": [np.datetime64(initTime)], - "variable": ["t", "r"], - "step": range(12), - "x": range(100), - "y": range(100), - }, - ) - - testDS.to_zarr(store=ZARR / "test_file.zarr", compute=True) - - # Check if the file exists using the function - exists = self.testClient.exists(dst=ZARR / "test_file.zarr") - - # Assert that the file exists - self.assertTrue(exists) - - def test_store(self) -> None: - initTime = dt.datetime(2021, 1, 2, 0, 0, 0, tzinfo=dt.UTC) - dst = RAW / f"{initTime:{internal.IT_FOLDER_STRUCTURE_RAW}}" / "test_store.grib" - src = internal.CACHE_DIR / f"nwpc-{uuid.uuid4()}" - # Create a temporary file to simulate a file to be stored - src.parent.mkdir(parents=True, exist_ok=True) - src.write_bytes(bytes("test_file_contents", "utf-8")) - - # Store the file using the function - out = self.testClient.store(src=src, dst=dst) - - # Assert that the file exists - self.assertTrue(dst.exists()) - # Assert that the file has the correct size - self.assertEqual(out, dst) - # Assert that the temporary file has been deleted - self.assertFalse(src.exists()) - - def test_listInitTimes(self) -> None: - expectedTimes = [ - dt.datetime(2023, 1, 1, 3, tzinfo=dt.UTC), - dt.datetime(2023, 1, 2, 6, tzinfo=dt.UTC), - dt.datetime(2023, 1, 3, 9, tzinfo=dt.UTC), - ] - - # Create some files in the raw directory - dirs = [RAW / t.strftime(internal.IT_FOLDER_STRUCTURE_RAW) for t in expectedTimes] - - for d in dirs: - d.mkdir(parents=True, exist_ok=True) - - # Get the list of init times - initTimes = self.testClient.listInitTimes(prefix=Path(RAW)) - - # Assert that the list of init times is correct - self.assertEqual(initTimes, expectedTimes) - - # Remove the files - for d in dirs: - shutil.rmtree(d) - - def test_copyITFolderToCache(self) -> None: - # Make some files in the raw directory - initTime = dt.datetime(2023, 1, 1, 3, tzinfo=dt.UTC) - files = [ - RAW / f"{initTime:%Y/%m/%d/%H%M}" / "test_copyITFolderToTemp1.grib", - RAW / f"{initTime:%Y/%m/%d/%H%M}" / "test_copyITFolderToTemp2.grib", - RAW / f"{initTime:%Y/%m/%d/%H%M}" / "test_copyITFolderToTemp3.grib", - ] - for f in files: - f.parent.mkdir(parents=True, exist_ok=True) - f.write_bytes(bytes("test_file_contents", "utf-8")) - - # Test the function - paths = self.testClient.copyITFolderToCache(prefix=RAW, it=initTime) - - # Assert the contents of the temp files is correct - for _i, path in enumerate(paths): - self.assertEqual(path.read_bytes(), bytes("test_file_contents", "utf-8")) - - # Remove the files - shutil.rmtree(files[0].parent) - - def test_delete(self) -> None: - # Create a file in the raw directory - initTime = dt.datetime(2023, 1, 1, 3, tzinfo=dt.UTC) - path = RAW / f"{initTime:%Y/%m/%d/%H%M}" / "test_delete.grib" - path.parent.mkdir(parents=True, exist_ok=True) - path.touch() - - # Delete the file using the function - self.testClient.delete(p=path) - - # Assert that the file no longer exists - self.assertFalse(path.exists()) - - # Create a zarr folder in the zarr directory - path = ZARR / "test_delete.zarr" - testDS = xr.Dataset( - data_vars={ - "UKV": ( - ("init_time", "variable", "step", "x", "y"), - np.random.rand(1, 2, 12, 100, 100), - ), - }, - coords={ - "init_time": [np.datetime64(initTime)], - "variable": ["t", "r"], - "step": range(12), - "x": range(100), - "y": range(100), - }, - ) - - testDS.to_zarr(store=path, compute=True) - - # Delete the folder using the function - self.testClient.delete(p=path) - - # Assert that the folder no longer exists - self.assertFalse(path.exists()) - - -if __name__ == "__main__": - unittest.main() diff --git a/src/nwp_consumer/internal/outputs/s3/__init__.py b/src/nwp_consumer/internal/outputs/s3/__init__.py deleted file mode 100644 index 74f4c648..00000000 --- a/src/nwp_consumer/internal/outputs/s3/__init__.py +++ /dev/null @@ -1,3 +0,0 @@ -__all__ = ['Client'] - -from .client import Client diff --git a/src/nwp_consumer/internal/outputs/s3/client.py b/src/nwp_consumer/internal/outputs/s3/client.py deleted file mode 100644 index 2d5e3664..00000000 --- a/src/nwp_consumer/internal/outputs/s3/client.py +++ /dev/null @@ -1,217 +0,0 @@ -"""Client for AWS S3.""" - -import datetime as dt -import pathlib - -import s3fs -import structlog - -from nwp_consumer import internal - -log = structlog.getLogger() - - -class Client(internal.StorageInterface): - """Storage Interface client for AWS S3.""" - - # S3 Bucket - __bucket: pathlib.Path - - # S3 Filesystem - __fs: s3fs.S3FileSystem - - def __init__( - self, - *, - bucket: str, - region: str, - key: str | None = "", - secret: str| None = "", - endpointURL: str = "", - ) -> None: - """Create a new S3Client. - - Exposes a client that conforms to the StorageInterface. - Provide credentials either explicitly via key and secret - or fallback to default credentials if not provided or empty. - - Args: - bucket: S3 bucket name to use for storage. - region: S3 region the bucket is in. - key: Use this access key, if specified. - secret: Use this secret, if specified. - endpointURL: Use this endpoint URL, if specified. - """ - if (key, secret) == ("", ""): - log.info( - event="attempting AWS connection using default credentials", - ) - key, secret = None, None - - self.__fs: s3fs.S3FileSystem = s3fs.S3FileSystem( - key=key, - secret=secret, - client_kwargs={ - "region_name": region, - "endpoint_url": None if endpointURL == "" else endpointURL, - }, - ) - - self.__bucket = pathlib.Path(bucket) - - def name(self) -> str: - """Overrides the corresponding method in the parent class.""" - return "s3" - - def exists(self, *, dst: pathlib.Path) -> bool: - """Overrides the corresponding method in the parent class.""" - return self.__fs.exists((self.__bucket / dst).as_posix()) - - def store(self, *, src: pathlib.Path, dst: pathlib.Path) -> pathlib.Path: - """Overrides the corresponding method in the parent class.""" - log.debug( - event="storing file in s3", - src=src.as_posix(), - dst=(self.__bucket / dst).as_posix(), - ) - - # If file already exists in store and is of the same size, skip the upload - if self.exists(dst=dst) and self.__fs.du((self.__bucket / dst).as_posix()) == src.stat().st_size: - log.debug( - event="file of same size already exists in s3, skipping", - src=src.as_posix(), - dst=(self.__bucket / dst).as_posix(), - ) - return dst - - # Upload the file to the store - self.__fs.put(lpath=src.as_posix(), rpath=(self.__bucket / dst).as_posix(), recursive=True) - # Don't delete cached file as user may want to do further processing locally. - remote_size_bytes: int = self.__fs.du((self.__bucket / dst).as_posix()) - local_size_bytes: int = src.stat().st_size - if src.is_dir(): - local_size_bytes: int = sum( - f.stat().st_size - for f in src.rglob("*") - if f.is_file() - ) - if remote_size_bytes != local_size_bytes: - log.warn( - event="file size mismatch", - src=src.as_posix(), - dst=(self.__bucket / dst).as_posix(), - srcsize=src.stat().st_size, - dstsize=remote_size_bytes, - ) - else: - log.debug( - event="stored file in s3", - src=src.as_posix(), - dst=(self.__bucket / dst).as_posix(), - remote_size_bytes=remote_size_bytes, - ) - return dst - - def listInitTimes(self, *, prefix: pathlib.Path) -> list[dt.datetime]: - """Overrides the corresponding method in the parent class.""" - allDirs = [ - pathlib.Path(d).relative_to(self.__bucket / prefix) - for d in self.__fs.glob(f"{self.__bucket}/{prefix}/{internal.IT_FOLDER_GLOBSTR_RAW}") - if self.__fs.isdir(d) - ] - - # Get the initTime from the folder pattern - initTimes = set() - for dir in allDirs: - if dir.match(internal.IT_FOLDER_GLOBSTR_RAW): - try: - # Try to parse the folder name as a datetime - ddt = dt.datetime.strptime(dir.as_posix(), internal.IT_FOLDER_STRUCTURE_RAW).replace( - tzinfo=dt.UTC, - ) - initTimes.add(ddt) - except ValueError: - log.debug( - event="ignoring invalid folder name", - name=dir.as_posix(), - within=prefix.as_posix(), - ) - - sortedInitTimes = sorted(initTimes) - log.debug( - event=f"found {len(initTimes)} init times in raw directory", - earliest=sortedInitTimes[0], - latest=sortedInitTimes[-1], - ) - return sortedInitTimes - - def copyITFolderToCache(self, *, prefix: pathlib.Path, it: dt.datetime) -> list[pathlib.Path]: - """Overrides the corresponding method in the parent class.""" - initTimeDirPath = self.__bucket / prefix / it.strftime(internal.IT_FOLDER_STRUCTURE_RAW) - - if not self.__fs.exists(initTimeDirPath.as_posix()) or not self.__fs.isdir(initTimeDirPath.as_posix()): - log.warn( - event="init time folder does not exist in store", - path=it.strftime(internal.IT_FOLDER_STRUCTURE_RAW), - ) - return [] - - paths = [ - pathlib.Path(p).relative_to(self.__bucket) - for p in self.__fs.ls(initTimeDirPath.as_posix()) - ] - - log.debug( - event="copying it folder to cache", - inittime=it.strftime(internal.IT_FOLDER_STRUCTURE_RAW), - numfiles=len(paths), - ) - - # Read all files into cache - cachedPaths: list[pathlib.Path] = [] - for path in paths: - cfp: pathlib.Path = internal.rawCachePath(it=it, filename=path.name) - - # Use existing cached file if it exists and is not empty - if cfp.exists() and cfp.stat().st_size > 0: - log.debug( - event="file already exists in cache, skipping", - filepath=path.as_posix(), - cachepath=cfp.as_posix(), - ) - cachedPaths.append(cfp) - continue - - # Don't copy file from the store if it is empty - if ( - self.exists(dst=path) is False - or self.__fs.du(path=(self.__bucket / path).as_posix()) == 0 - ): - log.warn( - event="file in store is empty", - filepath=path.as_posix(), - ) - continue - - # Copy the file from the store to cache - with self.__fs.open(path=(self.__bucket / path).as_posix(), mode="rb") as infile: - with cfp.open("wb") as tmpfile: - for chunk in iter(lambda: infile.read(16 * 1024), b""): - tmpfile.write(chunk) - tmpfile.flush() - cachedPaths.append(cfp) - - log.debug( - event="copied it folder to cache", - nbytes=[p.stat().st_size for p in cachedPaths], - inittime=it.strftime("%Y-%m-%d %H:%M"), - ) - - return cachedPaths - - def delete(self, *, p: pathlib.Path) -> None: - """Overrides the corresponding method in the parent class.""" - if self.__fs.isdir((self.__bucket / p).as_posix()): - self.__fs.rm((self.__bucket / p).as_posix(), recursive=True) - else: - self.__fs.rm((self.__bucket / p).as_posix()) diff --git a/src/nwp_consumer/internal/outputs/s3/test_client.py b/src/nwp_consumer/internal/outputs/s3/test_client.py deleted file mode 100644 index 893542ea..00000000 --- a/src/nwp_consumer/internal/outputs/s3/test_client.py +++ /dev/null @@ -1,262 +0,0 @@ -import datetime as dt -import inspect -import unittest -import uuid -from pathlib import Path - -from botocore.client import BaseClient as BotocoreClient -from botocore.session import Session -from moto.server import ThreadedMotoServer - -from nwp_consumer import internal - -from .client import Client - -ENDPOINT_URL = "http://localhost:5000" -BUCKET = "test-bucket" -KEY = "test-key" -SECRET = "test-secret" # noqa: S105 -REGION = "us-east-1" - -RAW = Path("raw") -ZARR = Path("zarr") - - -class TestS3Client(unittest.TestCase): - testS3: BotocoreClient - client: Client - server: ThreadedMotoServer - - @classmethod - def setUpClass(cls) -> None: - # Start a local S3 server - cls.server = ThreadedMotoServer() - cls.server.start() - - session = Session() - cls.testS3 = session.create_client( - service_name="s3", - region_name=REGION, - endpoint_url=ENDPOINT_URL, - aws_access_key_id=KEY, - aws_secret_access_key=SECRET, - ) - - # Create a mock S3 bucket - cls.testS3.create_bucket( - Bucket=BUCKET, - ) - - # Create an instance of the S3Client class - cls.client = Client( - key=KEY, - secret=SECRET, - region=REGION, - bucket=BUCKET, - endpointURL=ENDPOINT_URL, - ) - - @classmethod - def tearDownClass(cls) -> None: - # Delete all objects in bucket - response = cls.testS3.list_objects_v2( - Bucket=BUCKET, - ) - if "Contents" in response: - for obj in response["Contents"]: - cls.testS3.delete_object( - Bucket=BUCKET, - Key=obj["Key"], - ) - cls.server.stop() - - def test_exists(self) -> None: - # Create a mock file in the raw directory - initTime = dt.datetime(2023, 1, 1, tzinfo=dt.UTC) - fileName = inspect.stack()[0][3] + ".grib" - filePath = RAW / f"{initTime:%Y/%m/%d/%H%M}" / fileName - self.testS3.put_object( - Bucket=BUCKET, - Key=filePath.as_posix(), - Body=bytes(fileName, "utf-8"), - ) - - # Call the existsInRawDir method - exists = self.client.exists(dst=filePath) - - # Verify the existence of the file - self.assertTrue(exists) - - # Call the existsInRawDir method on a non-existent file - exists = self.client.exists(dst=Path("non_existent_file.grib")) - - # Verify the non-existence of the file - self.assertFalse(exists) - - # Delete the created files - self.testS3.delete_object( - Bucket=BUCKET, - Key=filePath.as_posix(), - ) - - def test_store(self) -> None: - initTime = dt.datetime(2023, 1, 2, tzinfo=dt.UTC) - fileName = inspect.stack()[0][3] + ".grib" - dst = RAW / f"{initTime:%Y/%m/%d/%H%M}" / fileName - src = internal.CACHE_DIR / f"nwpc-{uuid.uuid4()}" - src.parent.mkdir(parents=True, exist_ok=True) - - # Write the data to the temporary file - src.write_bytes(bytes(fileName, "utf-8")) - - name = self.client.store(src=src, dst=dst) - - # Verify the written file in the raw directory - response = self.testS3.get_object(Bucket=BUCKET, Key=dst.as_posix()) - self.assertEqual(response["Body"].read(), bytes(fileName, "utf-8")) - - # Verify the correct number of bytes was written - self.assertEqual(name, dst) - - # Delete the created file and the temp file - self.testS3.delete_object(Bucket=BUCKET, Key=dst.as_posix()) - src.unlink(missing_ok=True) - - ## Test the store doesn't overwrite an existing file of equivalent size - - # Create a mock file in the store - self.testS3.put_object( - Bucket=BUCKET, - Key=dst.as_posix(), - Body=bytes(fileName, "utf-8"), - ) - - # Create a temporary file with the same data - src.write_bytes(bytes(fileName, "utf-8")) - - # Get the modified date of the file in the store - response = self.testS3.head_object(Bucket=BUCKET, Key=dst.as_posix()) - lastModified = response["LastModified"] - - # Call the store method on the file - name = self.client.store(src=src, dst=dst) - - # Verify the file in the store was not overwritten - response = self.testS3.get_object(Bucket=BUCKET, Key=dst.as_posix()) - self.assertEqual(response["Body"].read(), bytes(fileName, "utf-8")) - self.assertEqual(lastModified, response["LastModified"]) - - - def test_listInitTimes(self) -> None: - # Create mock folders/files in the raw directory - self.testS3.put_object( - Bucket=BUCKET, - Key=f"{RAW}/2023/01/03/0000/test_raw_file1.grib", - Body=b"test_data", - ) - self.testS3.put_object( - Bucket=BUCKET, - Key=f"{RAW}/2023/01/04/0300/test_raw_file2.grib", - Body=b"test_data", - ) - - # Call the listInitTimesInRawDir method - init_times = self.client.listInitTimes(prefix=RAW) - - # Verify the returned list of init times - expected_init_times = [ - dt.datetime(2023, 1, 3, 0, 0, tzinfo=dt.UTC), - dt.datetime(2023, 1, 4, 3, 0, tzinfo=dt.UTC), - ] - self.assertEqual(init_times, expected_init_times) - - # Delete the created files - self.testS3.delete_object( - Bucket=BUCKET, - Key=f"{RAW}/2023/01/03/0000/test_raw_file1.grib", - ) - self.testS3.delete_object( - Bucket=BUCKET, - Key=f"{RAW}/2023/01/04/0300/test_raw_file2.grib", - ) - - def test_copyITFolderToCache(self) -> None: - # Make some files in the raw directory - initTime = dt.datetime(2023, 1, 1, 3, tzinfo=dt.UTC) - files = [ - RAW - / f"{initTime:{internal.IT_FOLDER_STRUCTURE_RAW}}" - / "test_copyITFolderToTemp1.grib", - RAW - / f"{initTime:{internal.IT_FOLDER_STRUCTURE_RAW}}" - / "test_copyITFolderToTemp2.grib", - RAW - / f"{initTime:{internal.IT_FOLDER_STRUCTURE_RAW}}" - / "test_copyITFolderToTemp3.grib", - ] - for f in files: - self.testS3.put_object( - Bucket=BUCKET, - Key=f.as_posix(), - Body=bytes("test_file_contents", "utf-8"), - ) - - # Call the copyItFolderToCache method - paths = self.client.copyITFolderToCache(prefix=RAW, it=initTime) - - # Assert the contents of the cached files is correct - for _i, path in enumerate(paths): - self.assertEqual(path.read_bytes(), bytes("test_file_contents", "utf-8")) - - # Delete the cached files - path.unlink() - - # Delete the files in S3 - for f in files: - self.testS3.delete_object(Bucket=BUCKET, Key=f.as_posix()) - - # Make some more RAW files in the raw directory AND in the cache directory - initTime2 = dt.datetime(2023, 1, 1, 6, tzinfo=dt.UTC) - files2 = [ - RAW / f"{initTime2:%Y/%m/%d/%H%M}" / "test_copyITFolderToTemp1.grib", - RAW / f"{initTime2:%Y/%m/%d/%H%M}" / "test_copyITFolderToTemp2.grib", - RAW / f"{initTime2:%Y/%m/%d/%H%M}" / "test_copyITFolderToTemp3.grib", - ] - for f in files2: - self.testS3.put_object( - Bucket=BUCKET, - Key=f.as_posix(), - Body=bytes("test_file_contents", "utf-8"), - ) - with open(internal.CACHE_DIR / f.name, "w") as f: - f.write("test_file_contents") - - # Call the copyITFolderToCache method again - paths = self.client.copyITFolderToCache(prefix=RAW, it=initTime2) - self.assertEqual(len(paths), 3) - - # Delete the files in S3 - for f in files2: - self.testS3.delete_object(Bucket=BUCKET, Key=f.as_posix()) - - @unittest.skip("Broken on github ci") - def test_delete(self) -> None: - # Create a file in the raw directory - initTime = dt.datetime(2023, 1, 1, 3, tzinfo=dt.UTC) - path = RAW / f"{initTime:{internal.IT_FOLDER_STRUCTURE_RAW}}" / "test_delete.grib" - self.testS3.put_object( - Bucket=BUCKET, - Key=path.as_posix(), - Body=bytes("test_delete", "utf-8"), - ) - - # Delete the file using the function - self.client.delete(p=path) - - # Assert that the file no longer exists - with self.assertRaises(Exception): - self.testS3.get_object(Bucket=BUCKET, Key=path.as_posix()) - - -if __name__ == "__main__": - unittest.main() diff --git a/src/test_integration/test_inputs_integration.py b/src/test_integration/test_inputs_integration.py deleted file mode 100644 index f7c2bc7e..00000000 --- a/src/test_integration/test_inputs_integration.py +++ /dev/null @@ -1,159 +0,0 @@ -"""Integration tests for the `inputs` module. - -WARNING: Requires environment variables to be set for the MetOffice and CEDA APIs. -Just tests connections to the APIs. Tests assume that attempts to download the -source files would raise an exception in the first TIMEOUT seconds of running, -and will be considered passed if no exception is raised within that time. -""" - -import datetime as dt -import unittest - -from nwp_consumer.internal import config, inputs, outputs - -storageClient = outputs.localfs.Client() - - -TIMEOUT = 10 - - -class TestListRawFilesForInitTime(unittest.TestCase): - def test_getsFileInfosFromCEDA(self) -> None: - cedaInitTime: dt.datetime = dt.datetime( - year=2022, - month=1, - day=1, - hour=0, - minute=0, - tzinfo=dt.UTC, - ) - c = config.CEDAEnv() - cedaClient = inputs.ceda.Client( - ftpUsername=c.CEDA_FTP_USER, - ftpPassword=c.CEDA_FTP_PASS, - ) - fileInfos = cedaClient.listRawFilesForInitTime(it=cedaInitTime) - self.assertTrue(len(fileInfos) > 0) - - def test_getsFileInfosFromMetOffice(self) -> None: - metOfficeInitTime: dt.datetime = dt.datetime.now(tz=dt.UTC).replace( - hour=0, - minute=0, - second=0, - microsecond=0, - ) - c = config.MetOfficeEnv() - metOfficeClient = inputs.metoffice.Client( - orderID=c.METOFFICE_ORDER_ID, - apiKey=c.METOFFICE_API_KEY, - ) - fileInfos = metOfficeClient.listRawFilesForInitTime(it=metOfficeInitTime) - self.assertTrue(len(fileInfos) > 0) - - def test_getsFileInfosFromECMWFMARS(self) -> None: - ecmwfMarsInitTime: dt.datetime = dt.datetime( - year=2022, - month=1, - day=1, - hour=0, - minute=0, - tzinfo=dt.UTC, - ) - c = config.ECMWFMARSEnv() - ecmwfMarsClient = inputs.ecmwf.MARSClient( - area=c.ECMWF_AREA, - hours=4, - ) - fileInfos = ecmwfMarsClient.listRawFilesForInitTime(it=ecmwfMarsInitTime) - self.assertTrue(len(fileInfos) > 0) - - def test_getsFileInfosFromICON(self) -> None: - iconInitTime: dt.datetime = dt.datetime.now(tz=dt.UTC).replace( - hour=0, - minute=0, - second=0, - microsecond=0, - ) - iconClient = inputs.icon.Client( - model="global", - hours=4, - param_group="basic", - ) - fileInfos = iconClient.listRawFilesForInitTime(it=iconInitTime) - self.assertTrue(len(fileInfos) > 0) - - iconClient = inputs.icon.Client( - model="europe", - hours=4, - param_group="basic", - ) - euFileInfos = iconClient.listRawFilesForInitTime(it=iconInitTime) - self.assertTrue(len(euFileInfos) > 0) - self.assertNotEqual(fileInfos, euFileInfos) - - def test_getsFileInfosFromCMC(self) -> None: - cmcInitTime: dt.datetime = dt.datetime.now(tz=dt.UTC).replace( - hour=0, - minute=0, - second=0, - microsecond=0, - ) - cmcClient = inputs.cmc.Client( - model="gdps", - hours=4, - param_group="basic", - ) - fileInfos = cmcClient.listRawFilesForInitTime(it=cmcInitTime) - self.assertGreater(len(fileInfos), 0) - - cmcClient = inputs.cmc.Client( - model="geps", - hours=4, - param_group="basic", - ) - gepsFileInfos = cmcClient.listRawFilesForInitTime(it=cmcInitTime) - self.assertGreater(len(gepsFileInfos), 0) - self.assertNotEqual(fileInfos, gepsFileInfos) - - def test_getsFileInfosFromMeteoFrance(self) -> None: - arpegeInitTime: dt.datetime = dt.datetime.now(tz=dt.UTC).replace( - hour=0, - minute=0, - second=0, - microsecond=0, - ) - arpegeClient = inputs.meteofrance.Client( - model="global", - hours=4, - param_group="basic", - ) - fileInfos = arpegeClient.listRawFilesForInitTime(it=arpegeInitTime) - self.assertTrue(len(fileInfos) > 0) - - arpegeClient = inputs.meteofrance.Client( - model="europe", - hours=4, - param_group="basic", - ) - europeFileInfos = arpegeClient.listRawFilesForInitTime(it=arpegeInitTime) - self.assertTrue(len(europeFileInfos) > 0) - self.assertNotEqual(fileInfos, europeFileInfos) - - def test_getsFilesFromNOAANCAR(self) -> None: - ncarInitTime: dt.datetime = dt.datetime( - year=2023, - month=12, - day=19, - tzinfo=dt.UTC, - ) - ncarClient = inputs.noaa.NCARClient( - model="global", - param_group="full", - hours=4, - ) - fileInfos = ncarClient.listRawFilesForInitTime(it=ncarInitTime) - self.assertTrue(len(fileInfos) > 0) - - -if __name__ == "__main__": - unittest.main() diff --git a/src/test_integration/test_service_integration.py b/src/test_integration/test_service_integration.py deleted file mode 100644 index 103fae6b..00000000 --- a/src/test_integration/test_service_integration.py +++ /dev/null @@ -1,201 +0,0 @@ -"""Integration tests for the NWPConsumerService class. - -WARNING: Requires environment variables to be set for the MetOffice and CEDA APIs. -Will download up to a GB of data. Costs may apply for usage of the APIs. - -Runs the main function of the consumer as it would appear externally imported -""" - -import datetime as dt -import os -import shutil -import unittest -import unittest.mock - -import numpy as np -import ocf_blosc2 # noqa: F401 -import xarray as xr -from nwp_consumer.cmd.main import run - - -class TestNWPConsumerService_MetOffice(unittest.TestCase): - """Integration tests for the NWPConsumerService class.""" - - def setUp(self) -> None: - self.rawdir = "data/me_raw" - self.zarrdir = "data/me_zarr" - - def test_downloadAndConvertDataset(self) -> None: - initTime: dt.datetime = dt.datetime.now(tz=dt.UTC) - - raw_files, zarr_files = run( - [ - "consume", - "--source=metoffice", - "--rdir=" + self.rawdir, - "--zdir=" + self.zarrdir, - "--from=" + initTime.strftime("%Y-%m-%dT00:00"), - ], - ) - - self.assertGreater(len(raw_files), 0) - self.assertEqual(len(zarr_files), 1) - - for path in zarr_files: - ds = xr.open_zarr(store=f"zip::{path.as_posix()}") - - # The number of variables in the dataset depends on the order from MetOffice - numVars = len(ds.coords["variable"].values) - - # Ensure the dimensions have the right sizes - self.assertDictEqual( - {"variable": numVars, "init_time": 1, "step": 5, "y": 639, "x": 455}, - dict(ds.sizes.items()), - ) - # Ensure the dimensions of the variables are in the correct order - self.assertEqual(("variable", "init_time", "step", "y", "x"), ds["UKV"].dims) - # Ensure the init time is correct - self.assertEqual( - np.datetime64(initTime.strftime("%Y-%m-%dT00:00")), - ds.coords["init_time"].values[0], - ) - - shutil.rmtree(self.rawdir) - shutil.rmtree(self.zarrdir) - - -class TestNWPConsumerService_CEDA(unittest.TestCase): - """Integration tests for the NWPConsumerService class.""" - - def setUp(self) -> None: - self.rawdir = "data/cd_raw" - self.zarrdir = "data/cd_zarr" - - def test_downloadAndConvertDataset(self) -> None: - raw_files, zarr_files = run( - [ - "consume", - "--source=ceda", - "--rdir=" + self.rawdir, - "--zdir=" + self.zarrdir, - "--from=2022-01-01T12:00", - ], - ) - - self.assertGreater(len(raw_files), 0) - self.assertEqual(len(zarr_files), 1) - - for path in zarr_files: - ds = xr.open_zarr(store=f"zip::{path.as_posix()}").compute() - - # Enusre the data variables are correct - self.assertEqual(["UKV"], list(ds.data_vars)) - # Ensure the dimensions have the right sizes - self.assertEqual( - {"variable": 12, "init_time": 1, "step": 51, "y": 704, "x": 548}, - dict(ds.sizes.items()), - ) - # Ensure the init time is correct - self.assertEqual( - np.datetime64("2022-01-01T12:00"), - ds.coords["init_time"].values[0], - ) - - shutil.rmtree(self.rawdir) - shutil.rmtree(self.zarrdir) - - -class TestNWPConverterService_ECMWFMARS(unittest.TestCase): - def setUp(self) -> None: - self.rawdir = "data/ec_raw" - self.zarrdir = "data/ec_zarr" - - @unittest.mock.patch.dict(os.environ, {"ECMWF_PARAMETER_GROUP": "basic", "ECMWF_HOURS": "3"}) - def test_downloadAndConvertDataset(self) -> None: - initTime: dt.datetime = dt.datetime(year=2022, month=1, day=1, tzinfo=dt.UTC) - - raw_files, zarr_files = run( - [ - "consume", - "--source=ecmwf-mars", - "--rdir=" + self.rawdir, - "--zdir=" + self.zarrdir, - "--from=" + initTime.strftime("%Y-%m-%dT00:00"), - ], - ) - - self.assertGreater(len(raw_files), 0) - self.assertEqual(len(zarr_files), 1) - - for path in zarr_files: - ds = xr.open_zarr(store=f"zip::{path.as_posix()}").compute() - - # Ensure the data variables are correct - self.assertEqual(["ECMWF_UK"], list(ds.data_vars)) - # Ensure the dimensions have the right sizes. - # * Should be two variables due to the "basic" parameter group - # * Should be 4 steps due to the "3" hours - self.assertEqual( - { - "variable": 2, - "init_time": 1, - "step": 3, - "latitude": 141, - "longitude": 151, - }, - dict(ds.sizes.items()), - ) - # Ensure the init time is correct - self.assertEqual( - np.datetime64(initTime.strftime("%Y-%m-%dT00:00")), - ds.coords["init_time"].values[0], - ) - - shutil.rmtree(self.rawdir) - shutil.rmtree(self.zarrdir) - - -class TestNWPConsumerService_ICON(unittest.TestCase): - """Integration tests for the NWPConsumerService class.""" - - def setUp(self) -> None: - self.rawdir = "data/ic_raw" - self.zarrdir = "data/ic_zarr" - - @unittest.mock.patch.dict(os.environ, {"ICON_PARAMETER_GROUP": "basic", "ICON_HOURS": "3"}) - def test_downloadAndConvertDataset(self) -> None: - initTime: dt.datetime = dt.datetime.now(tz=dt.UTC) - - raw_files, zarr_files = run( - [ - "consume", - "--source=icon", - "--rdir=" + self.rawdir, - "--zdir=" + self.zarrdir, - "--from=" + initTime.strftime("%Y-%m-%dT00:00"), - ], - ) - - self.assertGreater(len(raw_files), 0) - self.assertEqual(len(zarr_files), 1) - - for path in zarr_files: - ds = xr.open_zarr(store=f"zip::{path.as_posix()}").compute() - - # Ensure the data variables are correct - self.assertEqual(["ICON_EUROPE"], list(ds.data_vars)) - # Ensure the dimensions have the right sizes - # * Should be two variables due to the "basic" parameter group - # * Should be 4 steps due to the "3" hours - self.assertEqual( - {"variable": 2, "init_time": 1, "step": 4, "latitude": 657, "longitude": 1377}, - ds.sizes, - ) - # Ensure the init time is correct - self.assertEqual( - np.datetime64(initTime.strftime("%Y-%m-%dT00:00")), - ds.coords["init_time"].values[0], - ) - - shutil.rmtree(self.rawdir) - shutil.rmtree(self.zarrdir) diff --git a/taskfile.yml b/taskfile.yml deleted file mode 100644 index 8ed78417..00000000 --- a/taskfile.yml +++ /dev/null @@ -1,47 +0,0 @@ -version: '3' - -# If you want to run with python from a specific environment, -# set the PYTHON_PREFIX environment variable to -# /path/to/python/dir/ - -tasks: - - install-dependencies: - aliases: ["install"] - desc: "Install application dependencies as defined in pyproject.toml" - cmds: - - ${PYTHON_PREFIX}python -m pip install -q -e . - - install-dev-dependencies: - aliases: ["install-dev"] - desc: "Installs development dependencies as defined in pyproject.toml" - cmds: - - ${PYTHON_PREFIX}python -m pip install --upgrade -q pip wheel setuptools - - ${PYTHON_PREFIX}python -m pip install -q -e .[dev] - - test-unit: - aliases: ["ut"] - deps: [install-dev-dependencies] - desc: "Run all application unittests" - cmds: - - ${PYTHON_PREFIX}python -m xmlrunner discover -s src/nwp_consumer -p "test_*.py" --output-file ut-report.xml - - test-integration: - aliases: ["it"] - deps: [install-dev-dependencies] - desc: "Run all application integration tests" - cmds: - - ${PYTHON_PREFIX}python -m xmlrunner discover -s src/test_integration -p "test_*.py" --output-file it-report.xml - - build-wheel: - aliases: ["wheel"] - desc: "Build python wheel" - cmds: - - ${PYTHON_PREFIX}python -m pip wheel . --no-deps --wheel-dir dist - - build-container: - aliases: ["cont"] - desc: "Build container" - cmds: - - docker build -f Containerfile . --tag nwp-consumer:local --progress=plain -