Create script to convert NWM forcing inputs into ngen basins and format #24

jameshalgren · 2022-12-15T22:53:14Z

TL; DR

How do we convert the current NWM forcing data into inputs for the ngen catchments?

jameshalgren · 2022-12-15T23:15:26Z

Details

We are working on a pipeline to develop forcing inputs for the community NextGen
The code snippets below should take about 5 minutes to execute, if you want to try them out.

The hydrofabric files

Please see this for a geopkg to Ngen input converter. With that script, the gpkg files in the AWS catalog for the current NWM hydrofabric artifacts(see also the hydrofabric github.io page)… can be turned into the catchment_data.geojson and nexus_data.geojson input files for a NextGen execution — there is only one other main input file, which is the realization.json that needs to be developed (along with the forcing information, of course.)

Forcings

And... those forcing files are what this issue is about!

Here's what we have so far:

Get some test data

We can download the forcing data pre-processed for the current NWM from the gcp bucket.

e.g., wget https://storage.googleapis.com/national-water-model/nwm.20220901/forcing_medium_range/nwm.t00z.medium_range.forcing.f001.conus.nc

(Note: there are cooler ways to do this which we are exploring in other experiments, but for now, it is easy to just download a few files and work with that.)

Also get the .gpkg file (for Alabama, of course!!)
wget https://nextgen-hydrofabric.s3.amazonaws.com/v1.2/nextgen_03W.gpkg

read in the catchment boundaries where we will need to sample the forcing

import geopandas as gpd
f_03 = "nextgen_03W.gpkg" 
gpkg_03w_divides = gpd.read_file(f_03, layer="divides")

method #1

We tried this two ways. The first way relies on an intermediate file for the variable. It's fast. And it helps with debugging. Note, the Tiff export turns things into scaled integers, and we didn't figure out how to handle that if its important (it would be, but we were aiming for method #2 anyway...)

read the forcing dataset and write to geotiff

import xarray as xr
fc_nc = ("nwm.t00z.medium_range.forcing.f001.conus.nc")

# TODO LATER: Look more into the differences between these input methods
# https://docs.xarray.dev/en/stable/generated/xarray.open_rasterio.html
# https://corteva.github.io/rioxarray/stable/examples/convert_to_raster.html
# https://corteva.github.io/rioxarray/stable/getting_started/getting_started.html
# TODO LATER: make sure the imports are robust -- do these need to be loaded or installed?
# import rioxarray as rxr
# import rasterio

reng = "rasterio"
fc_xds = xr.open_dataset(fc_nc, engine=reng)
# NOTE: This is the same: fc_xds = rxr.open_rasterio(fc_nc)
# NOTE: This way doesn't work (the engine specification is important): fc_xds = xr.open_dataset(fc_nc) 
fc_xds["LWDOWN"].rio.to_raster("test1.tif")

read the geotiff and do zonal statistics on the catchments

with rasterio.open("test1.tif") as src:
    affine = src.transform
    array = src.read(1)
    df_zonal_stats = pd.DataFrame(zonal_stats(gpkg_03w_divides, array, affine=affine))

# adding statistics back to original GeoDataFrame
gdf2 = pd.concat([gpkg_03w_divides, df_zonal_stats], axis=1)

Method #2

This way, we avoid the intermediate file -- which is good for streamlining, but probably is harder to debug.

with xr.open_dataset(fc_nc, engine=reng) as _xds:
    _src = _xds.LWDOWN
    _aff2 = _src.rio.transform()
    _arr2 = _src.values[0]
    _df_zonal_stats = pd.DataFrame(zonal_stats(gpkg_03w_divides, _arr2, affine=_aff2))
    
# adding statistics back to original GeoDataFrame
gdf3 = pd.concat([gpkg_03w_divides, _df_zonal_stats], axis=1)
gdf3.to_csv

Success!

The output looks like this. We can use the mean value (or perhaps some other statistic... we'll need to think about that) as the value to bring over to the csv for this time-step for each catchment.

(Pdb) gdf3
               id    areasqkm      type        toid                                           geometry      min      max        mean  count
0      cat-107068   14.698357   network  nex-107069  MULTIPOLYGON (((1044735.000 841095.000, 104482...  199.125  199.333  199.233917     12
1      cat-107233    8.058603   network  nex-107234  MULTIPOLYGON (((1077765.001 1061085.003, 10778...  192.494  192.550  192.525000      5
2      cat-107683   17.203060   network  nex-107684  MULTIPOLYGON (((1038915.001 1106985.003, 10387...  193.985  194.136  194.069125     16
3      cat-108167   15.632991   network  nex-108168  MULTIPOLYGON (((1094685.002 1036725.001, 10943...  192.026  192.298  192.182143     14
4      cat-112977   17.297992   network  nex-112978  MULTIPOLYGON (((736335.000 1096815.003, 736184...  187.651  188.667  188.019111     18
...           ...         ...       ...         ...                                                ...      ...      ...         ...    ...
30867  cat-137554    0.022500   coastal  cnx-137554  MULTIPOLYGON (((648015.005 830264.997, 648074....      NaN      NaN         NaN      0
30868  cat-106885  136.690204  internal  inx-106885  MULTIPOLYGON (((986774.996 876974.997, 986835....  197.238  197.934  197.502941    135
30869  cat-137071    0.848702   coastal  cnx-137071  MULTIPOLYGON (((1152974.999 851804.998, 115300...  195.405  195.405  195.405000      1
30870  cat-137068    0.007199   coastal  cnx-137068  MULTIPOLYGON (((1153095.000 851805.003, 115300...      NaN      NaN         NaN      0
30871  cat-137748    5.238898   coastal  cnx-137748  MULTIPOLYGON (((621914.998 811425.005, 621914....  272.608  285.758  279.119333      6

[30872 rows x 9 columns]

jameshalgren · 2022-12-15T23:16:30Z

Next Steps

We need to loop through the variables and forecast times (or retrospective times) and write a giant concatenated .csv or .nc with the write information.

Or... figure out how to do streaming. But one step at a time.

jameshalgren · 2022-12-15T23:19:03Z

Ping @ZacharyWills @hellkite500

jameshalgren · 2022-12-15T23:22:49Z

This is pretty slow... so we'll want to multithread it. We can use the kerchunk tools, perhaps, to do collections of files at a time. Are there ways we can split up the calculation into clusters of inputs? (run 5 days, pause, run 5 more days, etc.?) Building the input files in clusters could make this easier to stream, grabbing the source files in manageable bunches.

jameshalgren · 2022-12-16T00:22:30Z

We'll need to pay close attention to the transform parameter if we take this into production somehow. This post has detailed discussion:
https://github.com/perrygeo/python-rasterstats/blob/master/docs/notebooks/Integrating%20with%20GeoPandas%20and%20Numpy.ipynb

shahab122 · 2022-12-19T19:43:46Z

We compared the variable names between the forcing variables of ngen example files (https://github.com/NOAA-OWP/ngen/tree/master/data/forcing) and the forcing variables extracted using the above script. The following table shows that ngen example forcing includes both precip_rate and APCP_surface, while the forcing extracted by our script (from the GCP bucket) includes only RAINRATE (seems to be the same as precip_rate).

NGEN Forcing File	NGEN Forcing File (unit)	NWM (GCP) Forcing Data	NWM (GCP) Forcing Data (Long Name)	NWM (GCP) Forcing Data (Unit)
time	hour	time		hour
APCP_surface		Not Available
DLWRF_surface		LWDOWN	Surface downward long-wave radiation flux	W m-2
DSWRF_surface		SWDOWN	Surface downward short-wave radiation flux	W m-2
PRES_surface		PSFC	Surface Pressure	Pa
SPFH_2maboveground		Q2D	2-m Specific Humidity	kg kg-1
TMP_2maboveground		T2D	2-m Air Temperature	K
UGRD_10maboveground		U2D	10-m U-component of wind	m s-1
VGRD_10maboveground		V2D	10-m V-component of wind	m s-1
precip_rate		RAINRATE	Surface Precipitation Rate	mm s^-1

jameshalgren · 2023-02-22T22:46:26Z

Working on converting the outputs from the "new" way into something a little more structured for output to csv... see #26

jameshalgren assigned shahab122 and jameshalgren Dec 15, 2022

jameshalgren linked a pull request Dec 19, 2022 that will close this issue

Add test script to process nwm forcing into ngen input #26

Open

9 tasks

jameshalgren linked a pull request Jan 4, 2023 that will close this issue

Add test script to process nwm forcing into ngen input #26

Open

9 tasks

jameshalgren changed the title ~~NWM forcing inputs for ngen~~ Create script to convert NWM forcing inputs into ngen format Jan 6, 2023

jameshalgren changed the title ~~Create script to convert NWM forcing inputs into ngen format~~ Create script to convert NWM forcing inputs into ngen basins and format Jan 6, 2023

jameshalgren mentioned this issue May 5, 2023

Hydrofab2ngen tools #46

Open

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create script to convert NWM forcing inputs into ngen basins and format #24

Create script to convert NWM forcing inputs into ngen basins and format #24

jameshalgren commented Dec 15, 2022 •

edited

Loading

jameshalgren commented Dec 15, 2022 •

edited

Loading

jameshalgren commented Dec 15, 2022

jameshalgren commented Dec 15, 2022

jameshalgren commented Dec 15, 2022 •

edited

Loading

jameshalgren commented Dec 16, 2022

shahab122 commented Dec 19, 2022 •

edited

Loading

jameshalgren commented Feb 22, 2023 •

edited

Loading

Create script to convert NWM forcing inputs into ngen basins and format #24

Create script to convert NWM forcing inputs into ngen basins and format #24

Comments

jameshalgren commented Dec 15, 2022 • edited Loading

TL; DR

jameshalgren commented Dec 15, 2022 • edited Loading

Details

The hydrofabric files

Forcings

Get some test data

read in the catchment boundaries where we will need to sample the forcing

method #1

read the forcing dataset and write to geotiff

read the geotiff and do zonal statistics on the catchments

Method #2

Success!

jameshalgren commented Dec 15, 2022

Next Steps

jameshalgren commented Dec 15, 2022

jameshalgren commented Dec 15, 2022 • edited Loading

jameshalgren commented Dec 16, 2022

shahab122 commented Dec 19, 2022 • edited Loading

jameshalgren commented Feb 22, 2023 • edited Loading

jameshalgren commented Dec 15, 2022 •

edited

Loading

jameshalgren commented Dec 15, 2022 •

edited

Loading

jameshalgren commented Dec 15, 2022 •

edited

Loading

shahab122 commented Dec 19, 2022 •

edited

Loading

jameshalgren commented Feb 22, 2023 •

edited

Loading