-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Catchment Weights #34
Comments
Hi @JordanLaserGit, @program-- Please take a look at this new capability and data source and let me know if its looking like what you need. I have to say it is pretty slick and actually reverts back to our initial data products pre forcing workstream! I think the details might need a little TLC but lets see how you guys use them and we can adjust: # Download per the link you shared and read in ...
fg = terra::rast('/Users/mjohnson/Downloads/nwm.t00z.medium_range.forcing.f001.conus.nc')
# Define divide ids however you like, here is a 2 divide example
divs = c("cat-1866993", "cat-1866995")
# Extract precomputed weights for the forcing input and divides
w = arrow::open_dataset('s3://lynker-spatial/v20.1/forcing_weights.parquet') |>
dplyr::filter(divide_id %in% divs, grid_id == "medium_range.forcing.conus") |>
dplyr::collect()
# Extract cell level values from forcing grid
x = zonal::weight_grid_to_data(fg, w = w)
# Explore output
head(x)
#> divide_id cell coverage_fraction grid_id U2D
#> 1: cat-1866993 7632614 0.006292699 medium_range.forcing.conus -1.083
#> 2: cat-1866993 7632615 0.083473243 medium_range.forcing.conus -1.089
#> 3: cat-1866993 7637220 0.007748126 medium_range.forcing.conus -1.039
#> 4: cat-1866993 7637221 0.167723924 medium_range.forcing.conus -1.045
#> 5: cat-1866993 7637222 0.868241370 medium_range.forcing.conus -1.052
#> 6: cat-1866993 7637223 0.815731525 medium_range.forcing.conus -1.058
#> V2D LWDOWN RAINRATE T2D Q2D PSFC SWDOWN
#> 1: -1.317 213.503 2.576764e-06 259.58 0.001265 83949.5 0
#> 2: -1.266 213.116 2.609972e-06 259.48 0.001254 84060.7 0
#> 3: -1.385 214.007 3.093114e-06 259.78 0.001284 83776.8 0
#> 4: -1.347 213.779 3.045424e-06 259.65 0.001272 83803.0 0
#> 5: -1.307 213.526 3.004070e-06 259.55 0.001262 83893.8 0
#> 6: -1.267 213.249 2.953562e-06 259.46 0.001253 84004.0 0
# Or, re-scale on the fly...
(x2 = zonal::execute_zonal(fg, w = w, ID = "divide_id"))
#> divide_id grid_id U2D V2D LWDOWN
#> 1: cat-1866993 medium_range.forcing.conus -0.9880872 -1.219789 213.9363
#> 2: cat-1866995 medium_range.forcing.conus -0.9106252 -1.148223 215.3706
#> RAINRATE T2D Q2D PSFC SWDOWN
#> 1: 4.062485e-06 259.5877 0.001264434 84141.37 0
#> 2: 5.190276e-06 259.7551 0.001279137 84386.35 0
# You can find the valid grid data structures in the top level JSON
jsonlite::fromJSON('https://lynker-spatial.s3.amazonaws.com/v20.1/forcing_grids.json',
simplifyDataFrame = TRUE)
#> grid_id schema X1 Xn Y1 Yn resX
#> 1 medium_range.forcing.conus climateR -2303999 2304001 -1920000 1920000 1000
#> resY ncols nrows
#> 1 1000 4608 3840
#> crs
#> 1 +proj=lcc +lat_0=40 +lon_0=-97 +lat_1=30 +lat_2=60 +x_0=0 +y_0=0 +R=6370000 +units=m +no_defs Created on 2024-02-06 with reprex v2.0.2 |
attn: @ZacharyWills |
@mikejohnson51 this is exactly what I was hoping for! will implement this week and i'll get back to you. thank you sir! |
@mikejohnson51 Thoughts on reformatting the parquet to something like this? Motivation for me is the forcingprocessor has an internal loop through catchments. It'll need to do a seek for each catchment key in the current file layout. I can prepare my own file from s3://lynker-spatial/v20.1/forcing_weights.parquet easily if it's preferable to keep the current data layout. |
Not opposed per say that looks a lot like a JSON rather parquet based on my rudimentary knowledge. I would consider the advantages of the tabular form rather then loop for a few reasons that were the basis for zonal. I can spit out a converted file for testing though! |
oh it is json! i phrased that poorly. no worries, I can create my own for our conus daily run. main purpose of this was to have standardized weights so it's all good! thanks @mikejohnson51 |
@JordanLaserGit just to elaborate a bit on that top point, I think working from the table can be very fast and I would wager, be a preferred approach to looping. Below is a CONUS regrid on my local laptop, no multi processing, parallelization or all the stuff you guys rock at. If its of interest maybe we can write a So I imagine with the way you can distribute processing you could rescale CONUS for an 18 hour forecast in a minute over a few processers. Im now out of my depths :) |
man that's fast! forcingprocessor takes about 40 seconds to create something equivalent to |
@JordanLaserGit You may also consider trying to work with pyarrow RecordBatches instead of loading in the entire parquet file if you're not already, since you'll know the divide IDs beforehand (or use import pyarrow as pa
import pyarrow.compute as pc
import pyarrow.dataset
divs = ["cat-1866993", "cat-1866995"]
def print_batches():
w = pa.dataset.dataset(
's3://lynker-spatial/v20.1/forcing_weights.parquet', format='parquet'
).filter(
pc.field('divide_id').isin(divs)
).to_batches()
batch: pa.RecordBatch
for batch in w:
tbl = batch.to_pandas()
if tbl.empty:
continue
print(tbl)
#> divide_id cell coverage_fraction grid_id
#> 0 cat-1866993 7632614.0 0.006293 medium_range.forcing.conus
#> 1 cat-1866993 7632615.0 0.083473 medium_range.forcing.conus
#> 2 cat-1866993 7637220.0 0.007748 medium_range.forcing.conus
#> .. ... ... ... ...
#> 43 cat-1866995 7669480.0 0.101241 medium_range.forcing.conus
#> 44 cat-1866995 7669481.0 0.099398 medium_range.forcing.conus
if __name__ == '__main__':
import timeit
print(timeit.timeit("print_batches()", globals=locals(), number=3))
#> 10.545011819922365 With the pyarrow compute kernel, you can also perform aggregation before reading through the batches IIRC. If I can get an example, I'll edit it onto here. |
@program-- that's handy! our main application is a conus run so we have to load the whole thing anyway, but this will be helpful when we are subsetting, which we are supporting. |
I'd consider it for CONUS too -- you can potentially group batches by ID, then grouped aggregation becomes embarrassingly parallel. Since you're streaming records, the memory footprint is lower too (polars has some handy methods for doing all of this via lazy frames) |
@program-- i'll give it a try! |
Catchment weights (the indices within the lat/lon divides of a particular catchment relative to some grid) are needed for extracting forcing data. The method of calculating these weights can vary and yield different weights depending on methods used See issue 28 in hfsubset. Also, these weights may change with each hydrofabric release. To help ensure catchment weights are consistent for a particular hydrofabric and grid, I suggest generating conus weight files for common grids(projections) that can be subsetted via a tool like hfsubset.
For example, NWM v3 uses a lambert conformal conic projection.
https://noaa-nwm-pds.s3.amazonaws.com/nwm.20240112/forcing_medium_range/nwm.t00z.medium_range.forcing.f001.conus.nc
The projection type and hydrofabric can be stored as fields in the weights.json. This way forcings engines could check that the weight projection is appropriate for the nwm file being processed. Information about the grid should be stored in the json as well (dx, dy, nx, ny, x0, y0).
This will place more responsibility on the hydrofabric, but might be worth it in the long run to avoid weight generation algorithms yielding different weights for the same hydrofabric and grid.
The text was updated successfully, but these errors were encountered: