Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Glob times out for LIGO origin #72

Open
duncanmmacleod opened this issue Oct 3, 2024 · 5 comments
Open

Glob times out for LIGO origin #72

duncanmmacleod opened this issue Oct 3, 2024 · 5 comments

Comments

@duncanmmacleod
Copy link
Contributor

I am attempting to scope out pelicanfs as a replacement for a custom LIGO file indexing solution - to provide a backend to the LIGO data discovery client relied upon by many scientific applications.

However, when I attempt to perform a 'simple' glob() that should return a few (5-10) URL hits, I get a timeout:

# discover auth token
from scitokens import SciToken
token = SciToken.discover()
tokenstr = token._serialized_token or token.serialize().decode("utf-8")

# connect to origin
from pelicanfs.core import PelicanFileSystem
pelfs = PelicanFileSystem(
    "pelican://osg-htc.org",
    #direct_reads=True,
    headers={"Authorization": f"Bearer {tokenstr}"},
)

# simple 'ls' works quickly
print("ls")
print(pelfs.ls("/igwn/ligo"))

# glob() always times out
print("glob")
print(pelfs.glob(
    "/igwn/ligo/frames/O4/hoft_C00/H1/H-H1_HOFT_C00-141/H-H1_HOFT_C00-141140*-4096.gwf",
    timeout=30,
))
Full traceback
Traceback (most recent call last):
  File "/home/duncan/opt/conda/envs/pelicanfs/lib/python3.12/asyncio/tasks.py", line 520, in wait_for
    return await fut
           ^^^^^^^^^
  File "/home/duncan/git/github.com/PelicanPlatform/pelicanfs/src/pelicanfs/core.py", line 446, in _glob
    allpaths = await self._find(
               ^^^^^^^^^^^^^^^^^
  File "/home/duncan/git/github.com/PelicanPlatform/pelicanfs/src/pelicanfs/core.py", line 377, in wrapper
    return await func(self, dataUrl, *args[1:], **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/duncan/git/github.com/PelicanPlatform/pelicanfs/src/pelicanfs/core.py", line 391, in _find
    results = await self.httpFileSystem._find(path, maxdepth, withdirs, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/duncan/opt/conda/envs/pelicanfs/lib/python3.12/site-packages/fsspec/asyn.py", line 846, in _find
    if withdirs and path != "" and await self._isdir(path):
                                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/duncan/opt/conda/envs/pelicanfs/lib/python3.12/site-packages/fsspec/implementations/http.py", line 516, in _isdir
    return bool(await self._ls(path))
                ^^^^^^^^^^^^^^^^^^^^
  File "/home/duncan/opt/conda/envs/pelicanfs/lib/python3.12/site-packages/fsspec/implementations/http.py", line 207, in _ls
    out = await self._ls_real(url, detail=detail, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/duncan/opt/conda/envs/pelicanfs/lib/python3.12/site-packages/fsspec/implementations/http.py", line 159, in _ls_real
    async with session.get(self.encode_url(url), **self.kwargs) as r:
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/duncan/opt/conda/envs/pelicanfs/lib/python3.12/site-packages/aiohttp/client.py", line 1197, in __aenter__
    self._resp = await self._coro
                 ^^^^^^^^^^^^^^^^
  File "/home/duncan/opt/conda/envs/pelicanfs/lib/python3.12/site-packages/aiohttp/client.py", line 608, in _request
    await resp.start(conn)
  File "/home/duncan/opt/conda/envs/pelicanfs/lib/python3.12/site-packages/aiohttp/client_reqrep.py", line 976, in start
    message, payload = await protocol.read()  # type: ignore[union-attr]
                       ^^^^^^^^^^^^^^^^^^^^^
  File "/home/duncan/opt/conda/envs/pelicanfs/lib/python3.12/site-packages/aiohttp/streams.py", line 640, in read
    await self._waiter
asyncio.exceptions.CancelledError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/duncan/opt/conda/envs/pelicanfs/lib/python3.12/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
                ^^^^^^^^^^
  File "/home/duncan/opt/conda/envs/pelicanfs/lib/python3.12/asyncio/tasks.py", line 519, in wait_for
    async with timeouts.timeout(timeout):
               ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/duncan/opt/conda/envs/pelicanfs/lib/python3.12/asyncio/timeouts.py", line 115, in __aexit__
    raise TimeoutError from exc_val
TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/duncan/git/github.com/PelicanPlatform/pelicanfs/test.py", line 13, in <module>
    print(pelfs.glob(
          ^^^^^^^^^^^
  File "/home/duncan/opt/conda/envs/pelicanfs/lib/python3.12/site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/duncan/opt/conda/envs/pelicanfs/lib/python3.12/site-packages/fsspec/asyn.py", line 101, in sync
    raise FSTimeoutError from return_result
fsspec.exceptions.FSTimeoutError
@bbockelm
Copy link
Contributor

bbockelm commented Oct 3, 2024

Note -- the IGWN origin had an older version of the patch here that caused large directory listings to hang indefinitely. I strongly suspect they are being hit by that.

@duncanmmacleod - can you try a glob with the directory above? I suspect it's smaller and hence should succeed.

@duncanmmacleod
Copy link
Contributor Author

@bbockelm, attempting glob() with other directories where the known results count is 'small' (<100) works and returns very quickly.

I look forward to this patch being applied to the LIGO origin(s).

@josh-willis
Copy link

All CIT operated origins were updated to xrootd-server-5.7.1-1.3.osg23 on 2024-10-29. At least for me, Duncan's sample code above now returns:

(pelicanfs) [joshua.willis@ldas-pcdev1 pelicanfs]$ ./duncan_test.py 
ls
[{'name': '/igwn/ligo/README', 'size': None, 'type': 'file'}, {'name': '/igwn/ligo/frames', 'size': None, 'type': 'file'}, {'name': '/igwn/ligo/cachetest', 'size': None, 'type': 'file'}]
glob
['/igwn/ligo/frames/O4/hoft_C00/H1/H-H1_HOFT_C00-141/H-H1_HOFT_C00-1411403776-4096.gwf', '/igwn/ligo/frames/O4/hoft_C00/H1/H-H1_HOFT_C00-141/H-H1_HOFT_C00-1411407872-4096.gwf']

which I believe is the expected behavior, though @duncanmmacleod should confirm when he is able.

@duncanmmacleod
Copy link
Contributor Author

The glob now returns fine for me.

Sidebar: is glob() supposed to be recursive? If I try:

glob = pelfs.glob(
    "/igwn/ligo/frames/O4/hoft_C00/H1/**/*.gwf",
    timeout=300,
)

I get 0 results.

@turetske
Copy link
Collaborator

The glob now returns fine for me.

Sidebar: is glob() supposed to be recursive? If I try:

glob = pelfs.glob(
    "/igwn/ligo/frames/O4/hoft_C00/H1/**/*.gwf",
    timeout=300,
)

I get 0 results.

Hmm. Let me look into it. I had a hard time getting glob to work quite right so I wouldn't be surprised if it's still a bit buggy. Let me check it out. You could also check what the results are with the http fsspec. Which is what it should mimic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants