-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge Upstream #57
Merge Upstream #57
Conversation
Hi @dsroberts I have merged your changes and I am trying to deploy a new version. Tests are timing out so I might need to increase the walltime but I also get that in the error log:
|
@rbeucher haven't seen this failure mode before. Can you put the squashfs somewhere I can see it so that I can have a look? |
I think it's in the staging folder in the admin folder of the xp65 project. |
@rbeucher Instead of joining xp65_w where I can potentially break things, could you copy it into some place I can read it? Send me the path on slack once the copy is complete. |
Hi @dsroberts , Did you get a chance to look into this? I tried again today and I got the same pb. R |
Hi @rbeucher Sorry for the delay, between the ACCESS workshop, kids sick at home and other projects this slipped off my radar. I got back to it today and I've figured out the issue. UCX is being bought in to your conda environments, which is conflicting with the system installation of UCX which is used by OpenMPI. Remember that in these environments the OpenMPI installation from conda-forge is replaced by the system OpenMPI installation. Our containerised environment does have Dale |
Thanks Dale. I'm gonna try that |
Just to clarify, you get UCX from ucx-py? right? And then replace UCX by the system one using the replace_from_apps array? |
Yes, however, we host our own version on the |
UCX is pinned in that package to match the version on GADI
Thanks a lot @dsroberts. It works now. |
Move bind dirs to launch config file, construct bind string on the fl…
No description provided.