Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New strategy for defining architecture in distributed tests #3880

Open
wants to merge 39 commits into
base: main
Choose a base branch
from

Conversation

simone-silvestri
Copy link
Collaborator

@simone-silvestri simone-silvestri commented Oct 29, 2024

closes #3897

@glwagner glwagner marked this pull request as ready for review October 29, 2024 16:01
@glwagner
Copy link
Member

glwagner commented Nov 5, 2024

Adding the Manifest won't help for the tests because the test environment generates a new manifest every time, I learned

@simone-silvestri
Copy link
Collaborator Author

Ok, good to know. I am trying different approaches, but everything seems to fail; at least I can remove the manifest.

@simone-silvestri
Copy link
Collaborator Author

Finally, this works. It would need approval to get the tests back online.

Copy link
Member

@ali-ramadhan ali-ramadhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

commands:
- "srun julia -O0 --color=yes --project -e 'using Pkg; Pkg.test()'"
agents:
slurm_mem: 120G
slurm_mem: 8G
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?


[targets]
test = ["DataDeps", "Enzyme", "SafeTestsets", "Test", "TimesDates"]
test = ["DataDeps", "SafeTestsets", "Test", "Enzyme", "MPIPreferences", "TimesDates"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this the crucial part?

Copy link
Member

@glwagner glwagner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. For future generations, can you please write a little bit about what you tried and what ended up working? I can't tell if all the changes are necessary, though the end result is fairly clean. Mostly I am wondering about slurm_mem. I'm also curious why we cannot call precompile_runtime inside runtests.jl and it is necessary to call it before Pkg.test(). This has implications for the CI of other packages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The MPI we use in the distributed tests is not CUDA-aware
3 participants