-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to calculate climate metrics faster, in the case of mask nc data? #1545
Comments
Hi @CGL5230, I'm wondering if you are making proper use of Dask. All the indicators and indexes are optimized to make use of In case you haven't come across this before: https://examples.dask.org/xarray.html |
I second the recommendation about using The So, simply masking your data by clipping areas of interest will not significantly accelerate the computation. It seems however, that after clipping your data is extremely sparse. In this case you could pull the following trick. It will only work if the "mask" 1 is constant in time. The idea is you could collapse your 2D grid to a 1D one by dropping the points where there's never any data. This will reduce the size of the dataset greatly and should make everything faster. A ERA5_pre_clip = pre_ERA5.rio.clip(gdf.geometry.apply(mapping), gdf.crs, drop=True, invert=False)
# Put all lon and lat points into a single dimension, drop any spatial points where everything is NaN
ERA5_pre_clip_1d = ERA5_pre_clip.stack(site=['lat', 'lon']).dropna('site')
# make xclim computation
out_1d = blablabla
# Recover full grid
out_2d = out_1d.unstack('site').reindex(lat=pre_ERA5.lat, lon=pre_ERA5.lon) I'll leave you with the details of this implementation, this is just an (untested!) idea. Footnotes
|
That's great! I will try dask for calculations. I'm thinking the way of @aulemahal . The obvious convenient way is avioding the NAN value. That's will save lot of resource and time. |
I want to know there is any experience for set chunk? As you can see the dataset is 5844 days(16 years). Maybe I set the chunk related to the number of year? Like
|
I use those command to parallelize operations for xclim. But I don't know if this line of code is capable of parallel computation, because this computation still report the kernel died. The code is |
I use the suggestion of @aulemahal . That's will be faster. Drop the NAN value speed up the calculation for the climate index. Because the clip dataset is obviously sparse. Still now,I‘m not sure the dask works . But the problem is solved :) cheers!
|
Dear @aulemahal , I followed your suggestion. And I got the SPEI of cliped dataset. But when I go to clip again using a sub-area of the clip range, I get this error. |
Setup Information
My original data was huge, 22G, and as a result caused my program run to crash frequently. It's a global precipitation dataset from ERA5 data pool.
My idea is to mask this data by my shape file and calculate climate indicators, which might be faster.
ERA5_pre_clip=pre_ERA5.rio.clip(gdf.geometry.apply(mapping),gdf.crs,drop=True,invert=False) ERA5_p95=xclim.core.calendar.percentile_doy(ERA5_pre_clip.tp,per=95,window=5)
But so far it doesn't seem to be much faster. I guess maybe the xclim can't avoid the NAN value?
Now I use this command.It is right? Ref from this
with xclim.set_options(check_missing="any"): ERA5_p95=xclim.core.calendar.percentile_doy(ERA5_pre_clip.tp,per=95,window=5)
Context
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: