Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with MDS crashing fix #2

Open
andybroth opened this issue Jun 4, 2024 · 2 comments
Open

Problem with MDS crashing fix #2

andybroth opened this issue Jun 4, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@andybroth
Copy link
Contributor

Because MDS crashes, there was a fix to check for an MDS crash and go on to the next shot. When this happens, it relaunches but uses the default slurm directory option in submit_single_run() in launch_parallel_jobs_function.py. This is currently not updated based on the slurm directory when you run the launch_parallel_jobs.py script.

Because of this bug, it ends up crashing when MDS fails because this slurm directory doesn't exist.

@andybroth andybroth added the bug Something isn't working label Jun 4, 2024
@andybroth
Copy link
Contributor Author

Fix for now, just update the default slurm directory in the file to whatever you set in launch_parallel_jobs.py

@andybroth
Copy link
Contributor Author

andybroth commented Jun 6, 2024

shutil.copyfile(baseconfig_filename, os.path.join(slurm_dir,f'.yaml'))

Line 26 in launch_parallel_jobs_function.py

I think this line is the issue and can maybe just be commented out @HiroFarre?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants