-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix earthaccess.EarthAccessFile
method lookup
#620
Conversation
I need to play with this a little bit to better understand what's going on, but I may not have time until the next hack day. |
earthaccess.EarthAccessFile
wrapper need not subclass anything
@mfisher87, have you had a chance to do this? @itcarroll or @betolink, I suppose the larger question for me is, what's the point of this class to begin with? Why do we even need it? |
Thanks for checking on this one @chuckwondo! My guess on the need for this class was something to do with deserializing into a useable object in case authentication timed out. A guess only though, as I don't know when |
@jrbourbeau can explain in detail but the gist of it is that this class allows a serialization trick, if we open granules from our laptop but we are offloading an xarray operation to a Dask cluster in |
@jrbourbeau The essential question is whether removing |
I didn't notice the However, I'm questioning the whole idea of pickling the I may very well be thinking about this poorly, or simply be too inexperienced with using dask (and the like), but without seeing an example of the types of things you think this would be helpful for, offhand I would ask you why you're not simply distributing the URLs rather than the open files? Is it so that you don't have to also potentially distribute credentials across such clusters as well? |
I have not yet, sorry :X |
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. Thanks for integrating Codecov - We've got you covered ☂️ |
Regarding the specific changes in the PR, they look fine to me, but is anybody able to answer my question above? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
pre-commit.ci autofix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @itcarroll! Apologies for the lack of engagement here. The changes here LGTM. Though not sure exactly what's going on with the integration tests
the gist of it is that this class allows a serialization trick, if we open granules from our laptop but we are offloading an xarray operation to a Dask cluster in us-west-2, it will re-open the files in place using s3://url instead of the https://cloud-front-tea url
Yeah, that's exactly right 👍 Obviously there's no http <--> s3 swapping when granules are opened and computed on in the same environment (i.e. both happen outside us-west-2 or both happen inside us-west-2)
The essential question is whether removing fsspec.spec.AbstractBufferedFile from the MRO of EarthAccessFile will break the usage @betolink describes
Nope -- we still keep the https <--> s3 behavior but fix the attribute lookup like you mentioned
Some complexity will be needed to handle the s3 <--> https hand-off between laptops and in-region workers spun up by coiled. Here it's a pickled file-like object, but additional in_region checks in |
No worries @jrbourbeau, thanks for the review!
Someone with write-access needs to trigger them is all. |
This is not yet possible. Once #808 lands, this will be possible. If you are willing to wait for #808 to be approved and merged to main, then you could merge main into your PR, and this will retrigger int. tests, which will then fail again, but then a maintainer will be able to re-run the failed int. tests so that they can pass (assuming nothing was broken). |
Not sure exactly what the state of integrations tests are but FWIW I went ahead and bumped those failing CI jobs and they're passing now |
LOL! Apologies. I'm getting my PRs mixed up. I forgot that part already landed 🤦 |
earthaccess.EarthAccessFile
wrapper need not subclass anythingearthaccess.EarthAccessFile
method lookup
Fixes #610, closes #563.
This PR removes any base class from the definition of
earthaccess.EarthAccessFile
(EAF). Previously, EAF inherited fromfsspec.spec.AbstractBufferedFile
(ABF) so was capable of using methods defined on ABF. But EAF also held an instance inheriting from ABF atself.f
and handed off__getattr__
requests to that instance. Under that setup,self.read
returnedsuper().read
becauseread
is defined on ABF, rather than returningself.f.read
. That is a bug. It was probably assumed that__getattr__
would catch all method calls, but it only handles what__getattribute__
can't find.We've scraped by with this setup because
self.f
(being an instance of an ABF) either does not override the called ABF method or the override does little more than itself callsuper()
. The latter is the case forself.f.read
whenf
is afsspec.implementations.http.HTTPFile
. It is not the case whenf
is afsspec.implementations.http.HTTPStreamFile
. (I discovered this bug while working on fsspec/filesystem_spec#1631.)This PR also updates some type hints and relevant documentation.
f
was wrong, it is an ABF not afsspec.AbstractFileSystem
ToDo if integration tests look okay:
📚 Documentation preview 📚: https://earthaccess--620.org.readthedocs.build/en/620/