-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error opening doi="10.5067/H93644NLXWX9"
data
#370
Comments
I could download the file with earthaccess and CURL gets to it using bearer tokens, I wonder if there are some weird redirects that |
cc @martindurant in case you have any thoughts |
the weird thing is that this works just fine: granule = "https://data.gesdisc.earthdata.nasa.gov/data/OCO2_DATA/OCO2_L1B_Science.11r/2022/032/oco2_L1bScGL_40349a_220201_B11006r_220505132311.h5"
fs = earthaccess.get_fsspec_https_session()
with fs.open(granule) as f:
print(f.read(10)) The code that earthaccess uses to open the files is https://github.com/nsidc/earthaccess/blob/69f9e46dfda72ae82045a81635f489dcb041c4f3/earthaccess/store.py#L46C5-L46C16 the main difference is that we are using the EarthAccessFile wrapper and we are not using a context to open the files. For most cases this is not a problem. I also ran this and it worked without a context granule = "https://data.gesdisc.earthdata.nasa.gov/data/OCO2_DATA/OCO2_L1B_Science.11r/2022/032/oco2_L1bScGL_40349a_220201_B11006r_220505132311.h5"
fs = earthaccess.get_fsspec_https_session()
f = fs.open(granule)
print(f.read(10)) My guess is that there is a bug with the way the earthaccess opens the files in the |
The For kerchunking HDF5 files, the "first" cacher is better, because most, but not all, metadata is near the start of the file. In that case you wouldn't need the size, so long as the server does at least support byte-range gets. However, this path is not currently implemented in code.
This is probably done by streaming and then cancelling the read after enough bytes are read. The raw URL seems to return a 303, so this is why HEAD doesn't work? I tried a GET with stream manually and it did return the size, so I'll have a dig around. |
OK I have it: the response headers was missing the "encoding" field. Or, actually, it was there but blank (which is not allowed: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Encoding#syntax ) |
Thanks for digging into this one @martindurant! I can confirm fsspec/filesystem_spec#1440 fixes the original failing example here 👍 |
Closing as fsspec/filesystem_spec#1440 has been merged. Thanks again @martindurant |
I was working with someone who ran into an issue opening some
doi="10.5067/H93644NLXWX9"
data files. Here's a similar example:which outputs the following:
Note that the file size output in
f.fs.info(f.path)
is actuallyNone
-- looking at other datasets, this number is an integer (e.g. using the same code withdoi="10.5067/LESQUBLWS18H"
works just fine).It's not totally clear to me if this is an issue with how
earthaccess
is asking for the data, something going wrong in the backend server where the data is hosted, or something else. I'm hoping others have a better sense for where we should fix things.The text was updated successfully, but these errors were encountered: