-
-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate xarray chunking #30
Comments
|
Absolute whirlwind. I have recreated the issue. Added example to the description above |
Great catch! And thanks for the example, I looked at the xarray docs again which had some detail on what the different parameter values for chunks would do and I guess the chunked size with "auto" is at the whim of dask auto and what it deems ideal. Not sure if this helpful but I recreated the example you made with smaller amount of fake data and when setting Or if you were thinking of a different way altogether of opening/loading mulitple zarr files I would be interested to see what that could look like! |
When loading multiple zarr files using xarray, I have noticed that it often changes the chunk sizes despite the zarrs having the same chunking when saved to disk. Often it will double the chunk size. This is wasteful since it means we are then doubling the data we load off of disk just to get access to a small piece of it. This then slows down our sampling significantly
We should investigate this further and see if there is a better way to load multiple zarr files with xarray
e.g. In the example below xarray makes the chunks 27 times larger!
Note that I haven't printed the time dimension here. Where we open the two files individually the time chunk sizes are 12. Where we open them together the chunk size becomes 36
The text was updated successfully, but these errors were encountered: