Confusing error when use_cftime = True
and chunks = 'auto'
in xr.open_dataset()
#9834
Labels
use_cftime = True
and chunks = 'auto'
in xr.open_dataset()
#9834
What is your issue?
Opening a dataset with
use_cftime=True
turns the time dimension dtype from datetime64 to object. This means that usingchunks='auto'
will fail in dask, since dask can't estimate the size of variables with dtype object.However, the error is a bit confusing, since it's from the underlying dask call, and doesn't tell the user what caused it.
The error is:
Suggestion for now: add an Exception for when
chunks='auto'
anduse_cftime=True
are called at the same time. I think this should be implementable inbackends.open_dataset()
(rather than in any specific engine's open_dataset) since it's likely common to any opening procedure, regardless of backend?Something like
Suggestion for later: If it's possible to estimate the size of the array with
datetime
objects in the time coordinate, it should be possible to estimate it withcftime
objects as well (since whether or not the coordinate itself is stored in one or the other is unlikely to make a difference in how to chunk the other variables). Is there maybe a way to getconventions.decode_cf_variable()
to also return the original datetime object to present for chunking in it place of the converted cftime object? Or just for chunking to just apply the same chunking to a 1D coordinate that it would to that coordinate's dimension in the non-object-dtype arrays that may be present in the same dataset? (I guess this theoretically could be unstable if the object coordinate for some reason takes up a lot more space than it would if it were numeric, etc.).(I'm working on putting together a PR for at least the Exception - please let me know if there's anything I should keep in mind, especially with where the exception would be most appropriate to stick, if this is a bad idea, etc.)
The text was updated successfully, but these errors were encountered: