Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: adding a very basic FSSpecSource #967

Merged
merged 27 commits into from
Oct 4, 2023
Merged

feat: adding a very basic FSSpecSource #967

merged 27 commits into from
Oct 4, 2023

Conversation

lobis
Copy link
Collaborator

@lobis lobis commented Oct 3, 2023

This PR simply adds the changes from #692 with the necessary updates to make it compatible with current uproot.

A very basic Source is implemented for fsspec allowing the user to optionally use fsspec by setting the appropiate handler in the read operation.

with uproot.open(
    "https://github.com/CoffeaTeam/coffea/raw/master/tests/samples/nano_dy.root",
    http_handler=uproot.source.fsspec.FSSpecSource,
) as f:
    data = f["Events/MET_pt"].array(library="np")
    assert len(data) == 40

@lobis lobis added the feature New feature or request label Oct 3, 2023
@lobis lobis requested a review from jpivarski October 3, 2023 16:32
@jpivarski jpivarski removed their request for review October 3, 2023 18:07
@jpivarski
Copy link
Member

Great to see this starting! (We'll talk on Zoom in an hour.) I'm watching it, but I removed myself as a reviewer for now because you can use the "request review" button as a way of reminding me about it when it's done. I don't think I get emails about a PR going out of draft mode, but I do get emails about requests-to-review, so we use that as a last step in our process.

@jpivarski jpivarski changed the title feat: fsspec feat: adding a very basic FSSpecSource Oct 3, 2023
pyproject.toml Outdated Show resolved Hide resolved
src/uproot/source/fsspec.py Outdated Show resolved Hide resolved
@lobis
Copy link
Collaborator Author

lobis commented Oct 4, 2023

@jpivarski @nsmith- I think this is ready to go.

Currently the fsspec source works in place of the http, file, s3 and xrootd sources. At this point fsspec is only present as a test dependency and this can be merged without causing any problems as it's fully optional.

One of my next steps (for a dedicated PR) would be to refactor the reading.open options for handlers (http_handler, file_handler, etc.) in order to use a single handler option (handler) that can be set to the currenty existing sources or the new fsspec source with the goal of setting the fsspec source as the default once the integration is complete.

I think this would introduce some limitations such as the ability to open some files with one protocol and some with another in the same function call, but as we discussed this is an acceptable compromise. The ability to use the previous sources (http, s3, etc.) will remain and the corresponding source will be assigned from parsing the uri string.

The CI starting to fail when I added s3fs as a dependency (required by fsspec to read s3 files. I have no idea why, it looks like it has something to do with ssl and only fails the CI on some python versions, not all. Any idea what might be going on? (https://github.com/scikit-hep/uproot5/actions/runs/6408524082 for the logs).

I saw that minio is used to add support for s3 in the current implementation of the s3 source, currently being an optional dependency. When this project is done fsspec will become a core dependency and users trying to open s3 files will recieve an error (from fsspec) that will direct them to install s3fs. s3fs should still be present on the test dependencies to validate against s3 files though.

@lobis lobis marked this pull request as ready for review October 4, 2023 16:35
@lobis lobis requested review from jpivarski and nsmith- October 4, 2023 16:35
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! It adds a new class (which doesn't disturb any existing code yet) and it's following all of our local conventions. I think it's ready to merge.

tests/test_0692_fsspec.py Outdated Show resolved Hide resolved
Comment on lines +107 to +109
future = uproot.source.futures.TrivialFuture(item)
chunk = uproot.source.chunk.Chunk(self, start, stop, future)
uproot.source.chunk.notifier(chunk, notifications)()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a blocking call in this first implementatation, but will be made asynchronous later.

@lobis lobis mentioned this pull request Oct 4, 2023
@lobis lobis merged commit 546b0cb into scikit-hep:main Oct 4, 2023
@lobis lobis deleted the fsspec branch October 4, 2023 19:15
lobis added a commit that referenced this pull request Oct 4, 2023
…andlers-unify

* origin/open-handlers-unify:
  feat: adding a very basic FSSpecSource (#967)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants