-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create script to convert NWM forcing inputs into ngen basins and format #24
Comments
DetailsWe are working on a pipeline to develop forcing inputs for the community NextGen The hydrofabric filesPlease see this for a geopkg to Ngen input converter. With that script, the gpkg files in the AWS catalog for the current NWM hydrofabric artifacts(see also the hydrofabric github.io page)… can be turned into the ForcingsAnd... those forcing files are what this issue is about! Here's what we have so far: Get some test dataWe can download the forcing data pre-processed for the current NWM from the gcp bucket. e.g., (Note: there are cooler ways to do this which we are exploring in other experiments, but for now, it is easy to just download a few files and work with that.) Also get the .gpkg file (for Alabama, of course!!) read in the catchment boundaries where we will need to sample the forcingimport geopandas as gpd
f_03 = "nextgen_03W.gpkg"
gpkg_03w_divides = gpd.read_file(f_03, layer="divides") method #1We tried this two ways. The first way relies on an intermediate file for the variable. It's fast. And it helps with debugging. Note, the Tiff export turns things into scaled integers, and we didn't figure out how to handle that if its important (it would be, but we were aiming for method #2 anyway...) read the forcing dataset and write to geotiffimport xarray as xr
fc_nc = ("nwm.t00z.medium_range.forcing.f001.conus.nc")
# TODO LATER: Look more into the differences between these input methods
# https://docs.xarray.dev/en/stable/generated/xarray.open_rasterio.html
# https://corteva.github.io/rioxarray/stable/examples/convert_to_raster.html
# https://corteva.github.io/rioxarray/stable/getting_started/getting_started.html
# TODO LATER: make sure the imports are robust -- do these need to be loaded or installed?
# import rioxarray as rxr
# import rasterio
reng = "rasterio"
fc_xds = xr.open_dataset(fc_nc, engine=reng)
# NOTE: This is the same: fc_xds = rxr.open_rasterio(fc_nc)
# NOTE: This way doesn't work (the engine specification is important): fc_xds = xr.open_dataset(fc_nc)
fc_xds["LWDOWN"].rio.to_raster("test1.tif") read the geotiff and do zonal statistics on the catchmentswith rasterio.open("test1.tif") as src:
affine = src.transform
array = src.read(1)
df_zonal_stats = pd.DataFrame(zonal_stats(gpkg_03w_divides, array, affine=affine))
# adding statistics back to original GeoDataFrame
gdf2 = pd.concat([gpkg_03w_divides, df_zonal_stats], axis=1) Method #2This way, we avoid the intermediate file -- which is good for streamlining, but probably is harder to debug. with xr.open_dataset(fc_nc, engine=reng) as _xds:
_src = _xds.LWDOWN
_aff2 = _src.rio.transform()
_arr2 = _src.values[0]
_df_zonal_stats = pd.DataFrame(zonal_stats(gpkg_03w_divides, _arr2, affine=_aff2))
# adding statistics back to original GeoDataFrame
gdf3 = pd.concat([gpkg_03w_divides, _df_zonal_stats], axis=1)
gdf3.to_csv Success!The output looks like this. We can use the mean value (or perhaps some other statistic... we'll need to think about that) as the value to bring over to the csv for this time-step for each catchment. (Pdb) gdf3
id areasqkm type toid geometry min max mean count
0 cat-107068 14.698357 network nex-107069 MULTIPOLYGON (((1044735.000 841095.000, 104482... 199.125 199.333 199.233917 12
1 cat-107233 8.058603 network nex-107234 MULTIPOLYGON (((1077765.001 1061085.003, 10778... 192.494 192.550 192.525000 5
2 cat-107683 17.203060 network nex-107684 MULTIPOLYGON (((1038915.001 1106985.003, 10387... 193.985 194.136 194.069125 16
3 cat-108167 15.632991 network nex-108168 MULTIPOLYGON (((1094685.002 1036725.001, 10943... 192.026 192.298 192.182143 14
4 cat-112977 17.297992 network nex-112978 MULTIPOLYGON (((736335.000 1096815.003, 736184... 187.651 188.667 188.019111 18
... ... ... ... ... ... ... ... ... ...
30867 cat-137554 0.022500 coastal cnx-137554 MULTIPOLYGON (((648015.005 830264.997, 648074.... NaN NaN NaN 0
30868 cat-106885 136.690204 internal inx-106885 MULTIPOLYGON (((986774.996 876974.997, 986835.... 197.238 197.934 197.502941 135
30869 cat-137071 0.848702 coastal cnx-137071 MULTIPOLYGON (((1152974.999 851804.998, 115300... 195.405 195.405 195.405000 1
30870 cat-137068 0.007199 coastal cnx-137068 MULTIPOLYGON (((1153095.000 851805.003, 115300... NaN NaN NaN 0
30871 cat-137748 5.238898 coastal cnx-137748 MULTIPOLYGON (((621914.998 811425.005, 621914.... 272.608 285.758 279.119333 6
[30872 rows x 9 columns] |
Next StepsWe need to loop through the variables and forecast times (or retrospective times) and write a giant concatenated .csv or .nc with the write information. Or... figure out how to do streaming. But one step at a time. |
This is pretty slow... so we'll want to multithread it. We can use the kerchunk tools, perhaps, to do collections of files at a time. Are there ways we can split up the calculation into clusters of inputs? (run 5 days, pause, run 5 more days, etc.?) Building the input files in clusters could make this easier to stream, grabbing the source files in manageable bunches. |
We'll need to pay close attention to the |
We compared the variable names between the forcing variables of ngen example files (https://github.com/NOAA-OWP/ngen/tree/master/data/forcing) and the forcing variables extracted using the above script. The following table shows that ngen example forcing includes both precip_rate and APCP_surface, while the forcing extracted by our script (from the GCP bucket) includes only RAINRATE (seems to be the same as precip_rate).
|
Working on converting the outputs from the "new" way into something a little more structured for output to csv... see #26 |
TL; DR
How do we convert the current NWM forcing data into inputs for the ngen catchments?
The text was updated successfully, but these errors were encountered: