-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please implement multi-source reading algorithm from CMSSW in fsspec-xrootd #36
Comments
A description of the algorithm is found at https://github.com/cms-sw/cmssw/blob/master/Utilities/XrdAdaptor/doc/multisource_algorithm_design.txt |
From Jan Lukas Späh (https://gitlab.cern.ch/pepper/pepper/-/blob/master/pepper/datasets.py?ref_type=heads#L133-144) there is this nice solution def locate(lfn, xrootddomain):
import XRootD.client
# Same as xrdfs <xrootddomain> locate -h <lfn>
client = XRootD.client.FileSystem("root://" + xrootddomain)
# The flag PrefName (to get domain names instead of IP addresses) does
# not exist in the Python bidings. However, MAKEPATH has the same value
status, loc = client.locate(lfn, XRootD.client.flags.OpenFlags.MAKEPATH)
if loc is None:
raise OSError("XRootD error: " + status.message)
return [f"root://{r.address}/{lfn}" for r in loc] which pays an upfront cost rather than opening the file and then in the background locating additional copies |
Might be useful to implement both (the one from pepper first perhaps, since it is very easy?) and see if they're good in different cases or if one is more robust in the long run? |
Thanks for the credits, @nsmith-, but this is not my code. The code is from Jonas (who left academia I think). Laurids Jeppe (@lauridsj) might also be able to help out as the maintainer of pepper. |
I'm working on implementing this in a fork, and I'm trying to figure out if I'm worrying about something problematic or if it's not an issue. I'm having fsspec handle picking a concrete endpoint for the user, but I realized that when fsspec writes to a file, the user may not know which endpoint they are writing at. Is this handled somehow by XRootD? Is it safe to assume that a user shouldn't be using a redirector in the first place if they don't want to worry about this? |
I would hope that one cannot open a file for writing/appending via a redirector URL. We should check this. |
Sorry, I should have tried it first. It looks like trying to open a file with "multiple copies" in write mode via fsspec raises an OSError. I think that's at the XRootD level, not the fsspec level, so I'd expect we're protected from that issue. |
I'm not super familiar with the details of ROOT file storage - does anyone know if it's safe to assume that the same file hosted in different places is exactly the same? As in, same metaOffset, same file descriptor, etc. |
@rpsimeon34 have you been able to make any progress on this? It is beginning to become necessary. |
I have a minimally intrusive implementation of the pepper algorithm that I need to test. I'll bump that up a bit on my docket. It's "minimally intrusive" in that it just picks any working source for the file when the fsspec File object is created, and then sticks with that source indefinitely. I want to try keeping a backup list of other sources that are automatically tried when fsspec encounters a read failure, but that might take me another couple weeks to test and debug. |
@lobis @nsmith-
Now that uproot 5.2.0 and coffea 2023 are going to be co-released and fsspec will be the main point of entry for any root file, we should try to bring some robustness like what exists in CMSSW that was implemented years ago by Brian and is still in use today.
@nsmith- has a better understand of how it's implemented, but we should implement the workaround in CMSSW that iteratively tries different xrootd endpoint capable of serving a file if the connection is bad or a request is slow. This will allow users to actually utilize redirectors and avoid quite a bit of gnashing of teeth that we typically encounter when coffea users are scaling out their analyses.
The text was updated successfully, but these errors were encountered: