Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with xrdcp and xrootd python client while using xrdcl-authz-plugin at coffea-casa #374

Open
oshadura opened this issue Apr 1, 2023 · 5 comments
Assignees

Comments

@oshadura
Copy link
Member

oshadura commented Apr 1, 2023

The user reported that at CMS coffea-casa AF while using xrdcp to copy files, we see "Operation is not implemented" error:

 xrdcp -f root://xcache//store/group/lpcmetx/SIDM/ffNtupleV4/2018/SIDM_XXTo2ATo2Mu2E_mXX-100_mA-1p2_ctau-9p6_TuneCP5_13TeV-madgraph-pythia8/RunIIAutumn18DRPremix-102X_upgrade2018_realistic_v15-v1/210326_161703/0000/ffNtuple_1.root /dev/null 
[0B/0B][100%][==================================================][0B/s]  
Run: [ERROR] Operation is not implemented:  (source)

as well there is a segfault while using xrootd python API:

>>> from XRootD import client
>>> xrd = client.FileSystem("root://xcache//store/group/lpcmetx/SIDM/ffNtupleV4/2018/SIDM_XXTo2ATo2Mu2E_mXX-100_mA-1p2_ctau-9p6_TuneCP5_13TeV-madgraph-pythia8/RunIIAutumn18DRPremix-102X_upgrade2018_realistic_v15-v1/210326_161703/0000/ffNtuple_1.root")
Segmentation fault (core dumped)

Current repository with plugin: https://github.com/jthiltges/xrdcl-authz-plugin/tree/xcache

cc @jthiltges

@jthiltges
Copy link
Contributor

Hi Oksana, can you confirm that the plugin is being built of the xcache branch? Some of the strings suggest it's coming from master. At least for hub.opensciencegrid.org/coffea-casa/cc-ubuntu:2023.03.17.

@btcardwell
Copy link
Contributor

Hi @oshadura and @jthiltges, I'm "the user" in Oksana's original post, and I thought it might be helpful to give a little context. The main functionality I'm looking for is to be able to list files on LPC EOS like I would with xrdfs root://cmseos.fnal.gov ls. Of course fixing this such that xrootd works in general would be great, but if you know another good way to do this from coffea-casa, I'd happily do that instead :)

@oshadura
Copy link
Member Author

Now since we have deployed @jthiltges plugin Segmentation fault (core dumped) is fixed, but still some functionality, such as xrdfs is missing:


# the following command works on lxplus
$ xrdfs root://cmseos.fnal.gov// ls /store/group/lpcmetx/SIDM/ffNtupleV4/2018/SIDM_XXTo2ATo2Mu2E_mXX-100_mA-1p2_ctau-9p6_TuneCP5_13TeV-madgraph-pythia8/RunIIAutumn18DRPremix-102X_upgrade2018_realistic_v15-v1/210326_161703/0000/

# but the equivalent command hangs on coffea-casa
$ xrdfs root://xcache// ls /store/group/lpcmetx/SIDM/ffNtupleV4/2018/SIDM_XXTo2ATo2Mu2E_mXX-100_mA-1p2_ctau-9p6_TuneCP5_13TeV-madgraph-pythia8/RunIIAutumn18DRPremix-102X_upgrade2018_realistic_v15-v1/210326_161703/0000/

# even though the same command works on coffea-casa if I specify one specific file
$ xrdfs root://xcache// ls /store/group/lpcmetx/SIDM/ffNtupleV4/2018/SIDM_XXTo2ATo2Mu2E_mXX-100_mA-1p2_ctau-9p6_TuneCP5_13TeV-madgraph-pythia8/RunIIAutumn18DRPremix-102X_upgrade2018_realistic_v15-v1/210326_161703/0000/ffNtuple_1.root

@jthiltges
Copy link
Contributor

Interesting result. This appears to partially be an issue with our xcache (running in docker).

The xcache tells the client to contact 172.23.0.2, which is a private IP of the xcache container. And as expected, the client cannot connect.

$ xrdfs red-xcache1.unl.edu:1094 locate '*'
[::172.23.0.2]:1094 Server ReadWrite 
$ xrdfs xcache:1094 locate '*'
[::172.23.0.2]:1094 Server ReadWrite

For now, I switched the red-xcache container over to host-mode networking (network_mode: host) and the ls proceeds to fail differently

$ xrdfs red-xcache1.unl.edu ls /store/group/lpcmetx/SIDM/ffNtupleV4/2018/SIDM_XXTo2ATo2Mu2E_mXX-100_mA-1p2_ctau-9p6_TuneCP5_13TeV-madgraph-pythia8/RunIIAutumn18DRPremix-102X_upgrade2018_realistic_v15-v1/210326_161703/0000
[ERROR] Server responded with an error: [3005] Unable to open directory /store/group/lpcmetx/SIDM/ffNtupleV4/2018/SIDM_XXTo2ATo2Mu2E_mXX-100_mA-1p2_ctau-9p6_TuneCP5_13TeV-madgraph-pythia8/RunIIAutumn18DRPremix-102X_upgrade2018_realistic_v15-v1/210326_161703/0000; too many levels of symbolic links

On the xcache server side:

230411 17:53:14 543 scitokens_Access: Grant authorization based on scopes for operation=dir, path=/store/group/lpcmetx/SIDM/ffNtupleV4/2018/SIDM_XXTo2ATo2Mu2E_mXX-100_mA-1p2_ctau-9p6_TuneCP5_13TeV-madgraph-pythia8/RunIIAutumn18DRPremix-102X_upgrade2018_realistic_v15-v1/210326_161703/0000
[2023-04-11 17:53:18.077326 +0000][Warning][XRootD            ] [[email protected]:1094] Redirect limit has been reached for message kXR_dirlist (path: /store/group/lpcmetx/SIDM/ffNtupleV4/2018/SIDM_XXTo2ATo2Mu2E_mXX-100_mA-1p2_ctau-9p6_TuneCP5_13TeV-madgraph-pythia8/RunIIAutumn18DRPremix-102X_upgrade2018_realistic_v15-v1/210326_161703/0000), the last known error is: [ERROR] Error response: no such file or directory
230411 17:53:18 543 ofs_opendir: cms-jovy.405:[email protected] Unable to open directory /store/group/lpcmetx/SIDM/ffNtupleV4/2018/SIDM_XXTo2ATo2Mu2E_mXX-100_mA-1p2_ctau-9p6_TuneCP5_13TeV-madgraph-pythia8/RunIIAutumn18DRPremix-102X_upgrade2018_realistic_v15-v1/210326_161703/0000; too many levels of symbolic links
230411 17:53:18 543 cms-jovy.405:[email protected] Xrootd_Response: sending err 3005: Unable to open directory /store/group/lpcmetx/SIDM/ffNtupleV4/2018/SIDM_XXTo2ATo2Mu2E_mXX-100_mA-1p2_ctau-9p6_TuneCP5_13TeV-madgraph-pythia8/RunIIAutumn18DRPremix-102X_upgrade2018_realistic_v15-v1/210326_161703/0000; too many levels of symbolic links
[2023-04-11 17:53:18.077682 +0000][Warning][XRootD            ] Redirect trace-back:
[2023-04-11 17:53:18.077682 +0000][Warning][XRootD            ]         0. Redirected from: root://cmsxrootd.fnal.gov:1094//store/group/lpcmetx/SIDM/ffNtupleV4/2018/SIDM_XXTo2ATo2Mu2E_mXX-100_mA-1p2_ctau-9p6_TuneCP5_13TeV-madgraph-pythia8/RunIIAutumn18DRPremix-102X_upgrade2018_realistic_v15-v1/210326_161703/0000 to: root://cms-xrd-global.cern.ch:1094/
[2023-04-11 17:53:18.077682 +0000][Warning][XRootD            ]         1. Redirected from: root://cms-xrd-global.cern.ch:1094/ to: root://cms-xrd-transit.cern.ch:1094/
[2023-04-11 17:53:18.077682 +0000][Warning][XRootD            ]         2. Retrying: root://cms-xrd-global.cern.ch:1094/
...
[2023-04-11 17:53:18.077682 +0000][Warning][XRootD            ]         29. Redirected from: root://cms-xrd-global.cern.ch:1094/ to: root://cms-xrd-transit.cern.ch:1094/
[2023-04-11 17:53:18.077682 +0000][Warning][XRootD            ]         30. Retrying: root://cms-xrd-global.cern.ch:1094/
230411 17:53:18 543 XrdTLS: cms-jovy.405:[email protected] TLS error rc=0 ec=6 (zero_return) errno=0.
230411 17:53:18 543 XrootdXeq: cms-jovy.405:[email protected] disc 0:00:04

I suspect that listing directory contents will be painfully slow if the request doesn't go directly to the target server/cluster. Otherwise, I'm guessing it will result in a search of the entire hierarchy.

@oshadura
Copy link
Member Author

Update, the second query now end up showing too many levels of symbolic links error:

cms-jovyan@jupyter-oksana-2eshadura-40cern-2ech:~$ xrdfs root://xcache// ls /store/group/lpcmetx/SIDM/ffNtupleV4/2018/SIDM_XXTo2ATo2Mu2E_mXX-100_mA-1p2_ctau-9p6_TuneCP5_13TeV-madgraph-pythia8/RunIIAutumn18DRPremix-102X_upgrade2018_realistic_v15-v1/210326_161703/0000/
[ERROR] Server responded with an error: [3005] Unable to open directory /store/group/lpcmetx/SIDM/ffNtupleV4/2018/SIDM_XXTo2ATo2Mu2E_mXX-100_mA-1p2_ctau-9p6_TuneCP5_13TeV-madgraph-pythia8/RunIIAutumn18DRPremix-102X_upgrade2018_realistic_v15-v1/210326_161703/0000/; too many levels of symbolic links

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants