Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

📊 climate: era5 surface temperature data December 2024 #3697

Merged
merged 4 commits into from
Dec 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion dag/climate.yml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ steps:
# Copernicus Climate Change Service - Surface temperature.
#
data://meadow/climate/2023-12-20/surface_temperature:
- snapshot://climate/2024-11-05/surface_temperature.zip
- snapshot://climate/2024-12-05/surface_temperature.zip
- snapshot://countries/2023-12-27/world_bank.zip
data://garden/climate/2023-12-20/surface_temperature:
- data://meadow/climate/2023-12-20/surface_temperature
Expand Down
25 changes: 14 additions & 11 deletions etl/steps/data/meadow/climate/2023-12-20/surface_temperature.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""Load a snapshot and create a meadow dataset."""

import io
import tempfile
import zipfile

import geopandas as gpd
Expand All @@ -25,16 +25,19 @@

def _load_data_array(snap: Snapshot) -> xr.DataArray:
log.info("load_data_array.start")
# Load data from snapshot.
with zipfile.ZipFile(snap.path, "r") as zip_file:
# Iterate through all files in the zip archive
for file_info in zip_file.infolist():
with zip_file.open(file_info) as file:
file_content = file.read()
# Create an in-memory bytes file and load the dataset
with io.BytesIO(file_content) as memfile:
da = xr.open_dataset(memfile).load() # .load() ensures data is eagerly loaded
if file_info.filename.endswith((".grb", ".grib")): # Filter GRIB files
with zip_file.open(file_info) as file:
file_content = file.read()

# Write to a temporary file
with tempfile.NamedTemporaryFile(delete=True, suffix=".grib") as tmp_file:
tmp_file.write(file_content)
tmp_file.flush() # Ensure all data is written

# Load the GRIB file using xarray and cfgrib
da = xr.open_dataset(tmp_file.name, engine="cfgrib").load()
# Convert temperature from Kelvin to Celsius.
da = da["t2m"] - 273.15

Expand Down Expand Up @@ -137,11 +140,11 @@ def run(dest_dir: str) -> None:
f"It wasn't possible to extract temperature data for {len(small_countries)} small countries as they are too small for the resolution of the Copernicus data."
)
# Define the start and end dates
da["date"] = pd.to_datetime(da["date"].astype(str), format="%Y%m%d")
da["valid_time"] = xr.DataArray(pd.to_datetime(da["valid_time"].values), dims=da["valid_time"].dims)

# Now you can access the 'dt' accessor
start_time = da["date"].min().dt.date.astype(str).item()
end_time = da["date"].max().dt.date.astype(str).item()
start_time = da["valid_time"].min().dt.date.astype(str).item()
end_time = da["valid_time"].max().dt.date.astype(str).item()

# Generate a date range from start_time to end_time with monthly frequency
month_middles = pd.date_range(start=start_time, end=end_time, freq="MS") + pd.offsets.Day(14)
Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ dependencies = [
"geopy>=2.4.1",
"py7zr>=0.22.0",
"pyreadr>=0.5.2",
"cfgrib>=0.9.15.0",
]

[tool.uv.sources]
Expand Down
2 changes: 1 addition & 1 deletion snapshots/climate/2024-11-19/total_precipitation.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Script to create a snapshot of the monthly averaged surface temperature data from 1950 to present from the Copernicus Climate Change Service.
"""Script to create a snapshot of the precipitation data from 1950 to present from the Copernicus Climate Change Service.

The script assumes that the data is available on the CDS API.
Instructions on how to access the API on a Mac are here: https://confluence.ecmwf.int/display/CKB/How+to+install+and+use+CDS+API+on+macOS
Expand Down
55 changes: 55 additions & 0 deletions snapshots/climate/2024-12-05/surface_temperature.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
"""Script to create a snapshot of the monthly averaged surface temperature data from 1950 to present from the Copernicus Climate Change Service.

The script assumes that the data is available on the CDS API.
Instructions on how to access the API on a Mac are here: https://confluence.ecmwf.int/display/CKB/How+to+install+and+use+CDS+API+on+macOS

More information on how to access the data is here: hhttps://cds.climate.copernicus.eu/datasets/reanalysis-era5-single-levels-monthly-means?tab=overview

The data is downloaded as a NetCDF file. Tutorials for using the Copernicus API are here and work with the NETCDF format are here: https://ecmwf-projects.github.io/copernicus-training-c3s/cds-tutorial.html
"""

import tempfile
from pathlib import Path

# CDS API
import cdsapi
import click

from etl.snapshot import Snapshot

# Version for current snapshot dataset.
SNAPSHOT_VERSION = Path(__file__).parent.name


@click.command()
@click.option("--upload/--skip-upload", default=True, type=bool, help="Upload dataset to Snapshot")
def main(upload: bool) -> None:
# Create a new snapshot.
snap = Snapshot(f"climate/{SNAPSHOT_VERSION}/surface_temperature.zip")

# Save data as a compressed temporary file.
with tempfile.TemporaryDirectory() as temp_dir:
output_file = Path(temp_dir) / "era5_monthly_t2m_eur.nc"

client = cdsapi.Client()

dataset = "reanalysis-era5-single-levels-monthly-means"
request = {
"product_type": ["monthly_averaged_reanalysis"],
"variable": ["2m_temperature"],
"year": [str(year) for year in range(1940, 2025)],
"month": ["01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12"],
"time": "00:00",
"area": [90, -180, -90, 180],
"data_format": "grib",
"download_format": "zip",
}

client.retrieve(dataset, request, output_file)

# Upload snapshot.
snap.create_snapshot(filename=output_file, upload=upload)


if __name__ == "__main__":
main()
26 changes: 26 additions & 0 deletions snapshots/climate/2024-12-05/surface_temperature.zip.dvc
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
meta:
origin:
title_snapshot: ERA5 Monthly Averaged Data on Single Levels from 1940 to Present - Monthly Averages of 2m Surface Temperature
title: ERA5 monthly averaged data on single levels from 1940 to present
description: |-
ERA5 is the latest climate reanalysis produced by ECMWF, providing hourly data on many atmospheric, land-surface and sea-state parameters together with estimates of uncertainty.

ERA5 data are available in the Climate Data Store on regular latitude-longitude grids at 0.25° x 0.25° resolution, with atmospheric parameters on 37 pressure levels.

ERA5 is available from 1940 and continues to be extended forward in time, with daily updates being made available 5 days behind real time

Initial release data, i.e., data no more than three months behind real time, are called ERA5T.
producer: Contains modified Copernicus Climate Change Service information
version_producer: 2
citation_full: |-
Hersbach, H., Bell, B., Berrisford, P., Biavati, G., Horányi, A., Muñoz Sabater, J., Nicolas, J., Peubey, C., Radu, R., Rozum, I., Schepers, D., Simmons, A., Soci, C., Dee, D., Thépaut, J-N. (2023): ERA5 monthly averaged data on single levels from 1940 to present. Copernicus Climate Change Service (C3S) Climate Data Store (CDS), DOI: 10.24381/cds.f17050d7 (Accessed on 19-Nov-2024)
url_main: https://cds.climate.copernicus.eu/datasets/reanalysis-era5-single-levels-monthly-means?tab=overview
date_accessed: 2024-12-05
date_published: 2019-12-04
license:
name: Copernicus License
url: https://cds.climate.copernicus.eu/datasets/reanalysis-era5-single-levels-monthly-means?tab=overview
outs:
- md5: 8fffb8e0ed6edc22b681587769a54b4e
size: 1709315816
path: surface_temperature.zip
50 changes: 50 additions & 0 deletions uv.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading