-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add test script to process nwm forcing into ngen input #26
base: main
Are you sure you want to change the base?
Add test script to process nwm forcing into ngen input #26
Conversation
): | ||
|
||
reng = "rasterio" | ||
_xds = xr.open_dataset(folder_prefix.joinpath(file_list[0]), engine=reng) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe open the file handles up first, then just use first filehandle to extract _xds. Then everything gets closed at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe. We need the specific raster engine to get the right behavior from the [email protected]()
line below, which we don't have in the later call...
See
_xds = xr.open_dataset(folder_prefix.joinpath(file_list[0]), engine=reng)
vs.
filehandles = [xr.open_dataset("data/" + f) for f in file_list]
for f in filehandles: | ||
cropped = xr_read_window(f, winslices, mask=mask) | ||
stats.append(cropped.mean()) | ||
[f.close() for f in filehandles] # Returns None for each file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this for convenience. You could make this a for-loop for clarity
for f in filehandles:
f.close()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the list comprehension is plenty readable... and more compact. Good enough?
983aa5b
to
fc67d18
Compare
1245428
to
e8caaf9
Compare
Moved todo to main comment. |
@mgdenno @arpita0911patel -- Let's discuss |
ping @karnesh |
1ab1532
to
3dc7705
Compare
eb93ee1
to
2dc42b5
Compare
Working on this with respect to #24.
This is ready for at least initial review. The main script,
process_nwm_forcing_to_ngen.py
has four functions which all produce the same thing -- a dataframe with the right information to provide forcing to the ngen CFE model in the form of precipitation inputs for each basin in a geopackage file with basin polygons from the ngen hydrofabric. In the test script, there is a primitive method included that creates a dictionary using an extremely slow technique, but one which is very clear to follow. (The test script, by the way, is not a canonical test script -- just amain
function that runs a few example uses.)Work remaining might include the following:
--enable-parallel
flag or here - possibly change to access pattern)Three levels of review are helpful:
A. Is a utility to convert NWM output to NGEN basins useful?
B. Who else should be tagged?
C. Deep review (code, methods, etc.)
How to test:
Download a set of test data files to use with
test_process_nwm_forcing_to_ngen.py
:UpdateSee below...Times are as follows for tests with 3 files and 200 features, then 30 files and 2000 features, respectively:
There is a warning returned. Perhaps the processes are closing before the files close...
Update
With the latest commits, the fastest method is the "new" way with reversed loops, especially with a large number of files/timestamps. To produce a 30 hour input for 2000 basins takes between 220 and 250 seconds:
200 basins with 240 files:
A 10-day hourly input of rainfall (240 hours == one medium_range forecast) for 2000 basins in the Southeast W hydrofabric region took 30 minutes:
All execution times reported here from tests performed on a mac M1 Max with 64 Gb RAM.