Skip to content
This repository has been archived by the owner on Sep 11, 2023. It is now read-only.

Convert NWP grib files to Zarr intermediate #357

Merged
merged 56 commits into from
Nov 16, 2021
Merged
Changes from 1 commit
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
10582cf
sketching out convert_NWP_grib_to_zarr.py
JackKelly Nov 9, 2021
c834c95
Experimenting with ways of loading NWPs in a notebook
JackKelly Nov 9, 2021
57d05b2
slight tweak to docstring
JackKelly Nov 9, 2021
19e6d6a
very early draft of NWP to zarr script. Auto generated from ipython …
JackKelly Nov 10, 2021
3964b40
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 10, 2021
7d1ff79
removing convert_NWP_grib_to_zarr.py
JackKelly Nov 11, 2021
b4cdbef
merge
JackKelly Nov 11, 2021
1442913
renaming convert_NWP_grib_to_zarr.py
JackKelly Nov 11, 2021
c4595f2
fix bug
JackKelly Nov 11, 2021
c96b67c
setting convert_NWP_grib_to_zarr.py to be executable
JackKelly Nov 11, 2021
54e99fe
fix bug
JackKelly Nov 11, 2021
2837bfc
Big speedup of load_grib_file by goign back to using idx files. Ramd…
JackKelly Nov 11, 2021
62344ac
Speed up reshape_1d_to_2d from 7 seconds to 0.5 seconds by pre-loading
JackKelly Nov 11, 2021
6457d72
New script. Should be a lot faster. Still single threaded.
JackKelly Nov 11, 2021
ea96e35
Chunk and compression and float16
JackKelly Nov 11, 2021
aae62ef
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 11, 2021
8f29d07
remove break
JackKelly Nov 11, 2021
6495807
merge
JackKelly Nov 11, 2021
cadca3d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 11, 2021
d91929b
fix import of numcodecs
JackKelly Nov 11, 2021
59a4b25
merge
JackKelly Nov 11, 2021
bbe516f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 11, 2021
52eeed5
multiple processes for converting NWPs to Zarr.
JackKelly Nov 12, 2021
df15b5a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 12, 2021
7a1938c
Restart processing from where it last left off
JackKelly Nov 12, 2021
ea3a439
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 12, 2021
3497245
attempt to fix problem of using too much mem
JackKelly Nov 12, 2021
3491081
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 12, 2021
8d7a61f
Try letting each process save to zarr, using a chain of locks to guar…
JackKelly Nov 12, 2021
c24c7d5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 12, 2021
6f501c2
use map not imap
JackKelly Nov 12, 2021
e151772
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 12, 2021
8472071
revert. Again
JackKelly Nov 12, 2021
5dab321
revert
JackKelly Nov 12, 2021
0234aca
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 12, 2021
9971a0e
back to single thread for now
JackKelly Nov 12, 2021
ba9540b
merge
JackKelly Nov 12, 2021
9c30ba6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 12, 2021
76a4bcc
Start tidying up script. Revert back to using multiprocessing.Pool
JackKelly Nov 13, 2021
9dfd1da
Use Semaphore to attempt to prevent reader processes from getting ahe…
JackKelly Nov 13, 2021
52c836a
ignore ecCodes warning
JackKelly Nov 13, 2021
0403a06
filter eccodes logger warning
JackKelly Nov 13, 2021
df3b63e
print timings for passing data through multiprocessing.Queue. Answer…
JackKelly Nov 13, 2021
711b611
Try using ThreadPoolExecutor. Returns objects immediately. But over…
JackKelly Nov 13, 2021
d3de7cb
Finally got idea working with chain of locks. Seems like best so far
JackKelly Nov 13, 2021
a5eec95
search across all directories again
JackKelly Nov 13, 2021
cba7741
Write detailed description of how the code works
JackKelly Nov 13, 2021
793f96a
Tidy up code a lot. Use click.
JackKelly Nov 13, 2021
6aab090
Code is now quite tidy. Need to add some docstrings.
JackKelly Nov 13, 2021
d30e86c
convert_NWP_grib_to_zarr.py is finally done, I think
JackKelly Nov 13, 2021
d528994
remove old ipython notebook
JackKelly Nov 13, 2021
80420f4
Simplify logger formatting
JackKelly Nov 13, 2021
9b2ca71
Updating documentation
JackKelly Nov 15, 2021
d1f9353
Tweak comments and logging.
JackKelly Nov 15, 2021
b84487f
Change formatting of dataset.init_time in logging
JackKelly Nov 15, 2021
a0033b9
Make note of broken reject-regexs #389
JackKelly Nov 15, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
filter eccodes logger warning
  • Loading branch information
JackKelly committed Nov 13, 2021
commit 0403a0689f1747d33c64aec8c443c3ad01b5db81
20 changes: 14 additions & 6 deletions scripts/convert_NWP_grib_to_zarr.py
Original file line number Diff line number Diff line change
@@ -17,7 +17,7 @@
* The new Zarr has a few more variables.

"""
import warnings
import logging
import datetime
import multiprocessing
import re
@@ -32,6 +32,18 @@
import xarray as xr


# Filter the ecCodes log warning
# "ecCodes provides no latitudes/longitudes for gridType='transverse_mercator'"
# generated here: https://github.com/ecmwf/cfgrib/blob/master/cfgrib/dataset.py#L402
class FilterEccodesWarning(logging.Filter):
def filter(self, record) -> bool:
"""Inspect `record`. Return True to log `record`. Return False to ignore `record`."""
return not record.getMessage() == (
"ecCodes provides no latitudes/longitudes for gridType='transverse_mercator'")


logging.getLogger('cfgrib.dataset').addFilter(FilterEccodesWarning())

# Done:
#
# * Merge Wholesale1 and 2 (2 includes dswrf, lcc, mcc, and hcc)
@@ -181,11 +193,7 @@ def load_grib_file(full_filename: Union[Path, str], verbose: bool = False) -> xr
# The grib files are "heterogeneous", so we use cfgrib.open_datasets
# to return a list of contiguous xr.Datasets.
# See https://github.com/ecmwf/cfgrib#automatic-filtering
with warnings.catch_warnings():
warnings.filterwarnings(
action="ignore",
message="ecCodes provides no latitudes/longitudes for gridType='transverse_mercator'")
datasets_from_grib = cfgrib.open_datasets(full_filename)
datasets_from_grib = cfgrib.open_datasets(full_filename)
n_datasets = len(datasets_from_grib)

# Get each dataset into the right shape for merging: