-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More consistent platform priority #1302
base: master
Are you sure you want to change the base?
Conversation
This fixes an inconsistency observed in GRID jobs when loading a software package "A". The bug was: - The GRID runtime determines available platforms (say el8 and el7) for package A - then launches an el8 apptainer container - alienv executed inside el8 container loads el7 software, inconsistent with the choice of el8 container ... and despite the fact that el8 software A is actually available. Bug fixed by slightly adjusting the platform priority: We always search the current platform first of all before going elsewhere.
Goes back to an observation made in ticket https://its.cern.ch/jira/browse/O2-4804. |
yes , although this is just a copy of the cvmfs files, so the changes should be done there (as already done for el9 actually). |
Well. Isn't this repo the authoritative source for the script on cvmfs? I thought we merely publish it there... directly from here. (Anything else appears be error-prone.) In any case, I don't know how to edit files on cvmfs. |
agreed. There are historical reasons for this and I guess it is time to fix them once I am back. |
@@ -67,7 +67,7 @@ case $distro_name in | |||
;; | |||
8*) | |||
distro_xrelease=8.x | |||
platform=el7 | |||
platform=el8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is now on CVMFS.
# The above priority is a general search priority. However, given a platform, we should actually | ||
# start searching for software matching this precise platform and only later fall back to other | ||
# platforms. This prevents loading el9 or el7 software, when I am actually on el8 (and el8 software is available). | ||
PLATFORM_PRIORITY="${platform}-$uname_m ${PLATFORM_PRIORITY//${platform}-$uname_m/}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can this happen? We should never have O2 / O2Physics tags for different architectures with the same name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand your question (but I suppose that we might in fact have the same name on different arches). This line fixes this problem mentioned above: "alienv executed inside el8 container loads el7 software, inconsistent with the choice of el8 container ... and despite the fact that el8 software A is actually available."
The fix is simply to look first of all if we have software for el8 ... instead of loading the el7 one simply because it is higher up the list. Refining the search list seems a reasonable thing to do and I don't see what might speak against this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a reproducer:
ssh lxplus8.cern.ch
/cvmfs/alice.cern.ch/bin/alienv enter O2sim::async-2023-pp-apass3-20240320.1-1
Hello, everyone. |
Hi @geonmo, This might be misleading since AliPhysics / AliROOT packages are only build for EL7, while O2 packages are only available for EL9. The situation where the same package would be available on two different versions is in practice not there, at least not in your case. Can you share more logs of Using either EL7 or EL9 container images from CVMFS, the same ones that are used in Grid jobs, works for me. For example:
Cheers, .costin |
@costing @ktf : Regardless if the situation applies to @geonmo, the original bug addressed here indeed still remains. The reproducer is package The commit provided in this PR fixes this problem and I think we should either deploy it or finally close it. |
Can we do this after Christmas? I'd rather not change behaviour before the holidays. @singiamtel also has on his work plan to make the version in repository as the source of truth (instead of the version in CVMFS). We decided to move after Christmas that change as well. |
@costing After checking it out, it looks like I was using alienv incorrectly. I was under the impression that it would work without a version in the EL9 native state, so I assumed it would work without a version in the EL7 apptainer state. It seems to work fine with the version specified. I was under the assumption that the OS version would select the appropriate AliPhysics version. Thank you for your help. [geonmo@lxplus961 ~]$alienv enter VO_ALICE@AliPhysics
[AliPhysics] ~ > exit
[geonmo@lxplus961 ~]$apptainer shell -e -B /cvmfs /cvmfs/singularity.opensciencegrid.org/opensciencegrid/osgvo-el7:latest
Singularity :~> source /cvmfs/alice.cern.ch/etc/login.sh
Singularity :~> alienv enter VO_ALICE@AliPhysics
Python/v3.9.16-8(55):ERROR:102: Tcl command execution failed: if { [module-info mode load] } {
setenv SSL_CERT_FILE [exec $PKG_ROOT/bin/python3 -c "import certifi; print(certifi.where())"]
}
...
env: /lib64/libc.so.6: version `GLIBC_2.32' not found (required by /cvmfs/alice.cern.ch/el9-x86_64/Packages/jemalloc/v5.1.0-2/lib/libjemalloc.so)
env: /lib64/libc.so.6: version `GLIBC_2.34' not found (required by /cvmfs/alice.cern.ch/el9-x86_64/Packages/jemalloc/v5.1.0-2/lib/libjemalloc.so)
''' |
Note that this patch allows EL7 environments to use it without entering a version name. It's a bit of a convenience. |
Normally you should be able to run older binaries (or build on older platforms) on newer platforms. On lxplus9 this works for example:
Only in case you want to run EL9 binaries on an EL7 host you would need a container, otherwise they are not strictly needed. And I suggest using our containers instead of the OSG ones as they were prepared with all the necessary dependencies for our packages. Cheers, .costin |
That is not patch-related either, if no version is specified the last package (last I know in lexicographic order, which might not be the most recent one, depending on the naming convention) is picked up. |
@costing What I wanted was to load EL7 binaries in an EL7 environment. With that alienv, EL9 will be loaded first, even if I want to use EL7 packages in an EL7 environment. For AliPhysics and other packages, the tag names of the EL7 and EL9 versions are separate, so there shouldn't be a problem, but for other packages (I haven't tested it), there may be a problem if the tag names are the same. For example, if you simply enter alienv enter python in an EL7 environment, it will load python for EL9 and cause a problem. Therefore, it seems that it is necessary to set the EL7 binary to be loaded first in an EL7 environment. |
This fixes an inconsistency observed in GRID jobs when loading a software package "A".
The bug was:
Bug fixed by slightly adjusting the platform priority: We always search the current platform first of all before going elsewhere.