Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please implement multi-source reading algorithm from CMSSW in fsspec-xrootd #36

Open
lgray opened this issue Dec 2, 2023 · 10 comments
Open

Comments

@lgray
Copy link
Contributor

lgray commented Dec 2, 2023

@lobis @nsmith-

Now that uproot 5.2.0 and coffea 2023 are going to be co-released and fsspec will be the main point of entry for any root file, we should try to bring some robustness like what exists in CMSSW that was implemented years ago by Brian and is still in use today.

@nsmith- has a better understand of how it's implemented, but we should implement the workaround in CMSSW that iteratively tries different xrootd endpoint capable of serving a file if the connection is bad or a request is slow. This will allow users to actually utilize redirectors and avoid quite a bit of gnashing of teeth that we typically encounter when coffea users are scaling out their analyses.

@nsmith- nsmith- changed the title please implement the bockelman-workaround from CMSSW in fsspec-xrootd Please implement multi-source reading algorithm from CMSSW in fsspec-xrootd Dec 6, 2023
@nsmith-
Copy link
Member

nsmith- commented Dec 6, 2023

A description of the algorithm is found at https://github.com/cms-sw/cmssw/blob/master/Utilities/XrdAdaptor/doc/multisource_algorithm_design.txt

@nsmith-
Copy link
Member

nsmith- commented Dec 6, 2023

From Jan Lukas Späh (https://gitlab.cern.ch/pepper/pepper/-/blob/master/pepper/datasets.py?ref_type=heads#L133-144) there is this nice solution

def locate(lfn, xrootddomain):
    import XRootD.client
    # Same as xrdfs <xrootddomain> locate -h <lfn>
    client = XRootD.client.FileSystem("root://" + xrootddomain)
    # The flag PrefName (to get domain names instead of IP addresses) does
    # not exist in the Python bidings. However, MAKEPATH has the same value
    status, loc = client.locate(lfn, XRootD.client.flags.OpenFlags.MAKEPATH)
    if loc is None:
        raise OSError("XRootD error: " + status.message)
    return [f"root://{r.address}/{lfn}" for r in loc]

which pays an upfront cost rather than opening the file and then in the background locating additional copies

@lgray
Copy link
Contributor Author

lgray commented Dec 6, 2023

Might be useful to implement both (the one from pepper first perhaps, since it is very easy?) and see if they're good in different cases or if one is more robust in the long run?

@JaLuka98
Copy link

JaLuka98 commented Dec 6, 2023

From Jan Lukas Späh (https://gitlab.cern.ch/pepper/pepper/-/blob/master/pepper/datasets.py?ref_type=heads#L133-144) there is this nice solution

def locate(lfn, xrootddomain):
    import XRootD.client
    # Same as xrdfs <xrootddomain> locate -h <lfn>
    client = XRootD.client.FileSystem("root://" + xrootddomain)
    # The flag PrefName (to get domain names instead of IP addresses) does
    # not exist in the Python bidings. However, MAKEPATH has the same value
    status, loc = client.locate(lfn, XRootD.client.flags.OpenFlags.MAKEPATH)
    if loc is None:
        raise OSError("XRootD error: " + status.message)
    return [f"root://{r.address}/{lfn}" for r in loc]

which pays an upfront cost rather than opening the file and then in the background locating additional copies

Thanks for the credits, @nsmith-, but this is not my code. The code is from Jonas (who left academia I think). Laurids Jeppe (@lauridsj) might also be able to help out as the maintainer of pepper.

@rpsimeon34
Copy link
Contributor

I'm working on implementing this in a fork, and I'm trying to figure out if I'm worrying about something problematic or if it's not an issue. I'm having fsspec handle picking a concrete endpoint for the user, but I realized that when fsspec writes to a file, the user may not know which endpoint they are writing at.

Is this handled somehow by XRootD? Is it safe to assume that a user shouldn't be using a redirector in the first place if they don't want to worry about this?

@nsmith-
Copy link
Member

nsmith- commented Jan 24, 2024

I would hope that one cannot open a file for writing/appending via a redirector URL. We should check this.

@rpsimeon34
Copy link
Contributor

Sorry, I should have tried it first. It looks like trying to open a file with "multiple copies" in write mode via fsspec raises an OSError.

I think that's at the XRootD level, not the fsspec level, so I'd expect we're protected from that issue.

@rpsimeon34
Copy link
Contributor

I'm not super familiar with the details of ROOT file storage - does anyone know if it's safe to assume that the same file hosted in different places is exactly the same? As in, same metaOffset, same file descriptor, etc.

@lgray
Copy link
Contributor Author

lgray commented Feb 14, 2024

@rpsimeon34 have you been able to make any progress on this? It is beginning to become necessary.

@rpsimeon34
Copy link
Contributor

I have a minimally intrusive implementation of the pepper algorithm that I need to test. I'll bump that up a bit on my docket.

It's "minimally intrusive" in that it just picks any working source for the file when the fsspec File object is created, and then sticks with that source indefinitely. I want to try keeping a backup list of other sources that are automatically tried when fsspec encounters a read failure, but that might take me another couple weeks to test and debug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants