Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge Upstream #57

Merged
merged 5 commits into from
Sep 26, 2023
Merged

Merge Upstream #57

merged 5 commits into from
Sep 26, 2023

Conversation

rbeucher
Copy link
Member

No description provided.

@rbeucher
Copy link
Member Author

Hi @dsroberts

I have merged your changes and I am trying to deploy a new version.

Tests are timing out so I might need to increase the walltime but I also get that in the error log:

No components were able to be opened in the pml framework.

This typically means that either no components of this type were
installed, or none of the installed components can be loaded.
Sometimes this means that shared libraries required by these
components are unable to be found/loaded.

  Host:      gadi-cpu-clx-1614
  Framework: pml

@dsroberts
Copy link

@rbeucher haven't seen this failure mode before. Can you put the squashfs somewhere I can see it so that I can have a look?

@rbeucher
Copy link
Member Author

I think it's in the staging folder in the admin folder of the xp65 project.
You need to be part of xp65_w to have access. I can give you access if you send me request.
Thanks a lot @dsroberts !

@dsroberts
Copy link

@rbeucher Instead of joining xp65_w where I can potentially break things, could you copy it into some place I can read it? Send me the path on slack once the copy is complete.

@rbeucher
Copy link
Member Author

Hi @dsroberts ,

Did you get a chance to look into this? I tried again today and I got the same pb.

R

@dsroberts
Copy link

Hi @rbeucher

Sorry for the delay, between the ACCESS workshop, kids sick at home and other projects this slipped off my radar.

I got back to it today and I've figured out the issue. UCX is being bought in to your conda environments, which is conflicting with the system installation of UCX which is used by OpenMPI. Remember that in these environments the OpenMPI installation from conda-forge is replaced by the system OpenMPI installation. mamba repoquery suggests that pyarrow version 12 is bringing UCX in. analysis3-unstable is currently using pyarrow 11, which is why we haven't seen this. We also don't have esmvalcore installed. The import that is causing this failure is coming from somewhere inside that.

Our containerised environment does have ucx installed, but we pin the version to the latest available on Gadi (1.14.0) and add that to the replace_from_apps array in install_config.sh. That may work for you, but if it doesn't, the other option is to pin libarrow<12 in environment.yml.

Dale

@rbeucher
Copy link
Member Author

Thanks Dale. I'm gonna try that

@rbeucher
Copy link
Member Author

Just to clarify, you get UCX from ucx-py? right? And then replace UCX by the system one using the replace_from_apps array?
ucx-py (and thus UCX) does not seem to be pinned.

@dsroberts
Copy link

Just to clarify, you get UCX from ucx-py? right? And then replace UCX by the system one using the replace_from_apps array? ucx-py (and thus UCX) does not seem to be pinned.

Yes, however, we host our own version on the coecms conda channel. The package metadata for that has ucx pinned to 1.14.0. The ucx-py available through conda-forge is ancient, I'm not sure how you'll go with that.

UCX is pinned in that package to match the version on GADI
@rbeucher
Copy link
Member Author

Thanks a lot @dsroberts. It works now.

@rbeucher rbeucher merged commit 8ed9ad1 into main Sep 26, 2023
2 checks passed
rbeucher pushed a commit that referenced this pull request Jul 10, 2024
Move bind dirs to launch config file, construct bind string on the fl…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants