-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Urania machine and validate flag to scheduler #6184
base: develop
Are you sure you want to change the base?
Add Urania machine and validate flag to scheduler #6184
Conversation
2700b51
to
5a846f9
Compare
5a846f9
to
9ab841b
Compare
6f41978
to
455043b
Compare
support/Python/Schedule.py
Outdated
@@ -854,6 +857,11 @@ def scheduler_options(f): | |||
"You may also want to use '--clean-output'." | |||
), | |||
) | |||
@click.option( | |||
"--no-validate", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this --validate/--no-validate
, and set default=True
. Then the variable is positive, which is much easier to understand (you have to change the variable name no_validate
to validate
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the name.
{{ super() -}} | ||
#SBATCH -D ./ | ||
#SBATCH --nodes {{ num_nodes | default(1) }} | ||
#SBATCH --ntasks-per-node=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have found that 2 tasks per node work well on other clusters of this size, maybe even 3. I see you currently do 1 task but 2 comm threads. Have you tried 2 tasks with 1 comm thread each? Not sure if there's a difference @knelli2 @nilsdeppe do you know?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically each task = an MPI rank = a comm core
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't really experimented with this. So I don't know.
support/SubmitScripts/Urania.sh
Outdated
#SBATCH --cpus-per-task=72 | ||
#SBATCH --mem=240000 | ||
#SBATCH -t {{ time_limit | default("1-00:00:00") }} | ||
#SBATCH -p {{ queue | default("p.debug") }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default in the yaml is p.urania
. Make it the same here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed. Had left it when testing.
support/SubmitScripts/Urania.sh
Outdated
|
||
{% block head %} | ||
{{ super() -}} | ||
#SBATCH -D ./ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't needed I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
support/SubmitScripts/Urania.sh
Outdated
module load gsl/1.16 | ||
module load cmake/3.26 | ||
module load hdf5-serial/1.12.2 | ||
module load anaconda/3/2021.11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the other clusters we have a file like support/Environments/caltech_hpc_gcc.sh
that loads the modules. That enables you to load the modules to configure a build directory. Then you don't need to load the modules in the submit script, or rather, you would load the modules before submitting a job. I also load the modules in my ~/.bashrc
sometimes (that's personal preference).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an environment file. I tried to remove the module loads from the submit script but it printed there where no modules loaded even if I had them loaded on my terminal when submitting. So I put them back in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You really don't want to duplicate the list of modules in these two places. Replace it here with this:
source ${SPECTRE_HOME}/support/Environments/urania.sh
spectre_load_modules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
eddd75c
to
3d4f862
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can squash
support/Environments/urania.sh
Outdated
source /u/guilara/repos/spack/share/spack/setup-env.sh | ||
spack env activate env3_spectre_impi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that you can have Spack generate a load file for you so you don't have to source setup-env.sh
(which can take a long time): https://spack.readthedocs.io/en/latest/environments.html#loading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
support/Environments/urania.sh
Outdated
source /u/guilara/repos/spack/share/spack/setup-env.sh | ||
spack env activate env3_spectre_impi | ||
# Load python environment | ||
source $SPECTRE_HOME/env/bin/activate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest you put this Python environment in some fixed location so others can use it, e.g. /u/guilara/envs/spectre_python
. Otherwise everyone has to create a new Py env in every spectre checkout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created this
spectre_setup_charm_paths() { | ||
# Define Charm paths | ||
export CHARM_ROOT=/u/guilara/charm_impi_2/mpi-linux-x86_64-smp | ||
export PATH=$PATH:/u/guilara/charm_impi_2/mpi-linux-x86_64-smp/bin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest you create a module file like /u/guilara/modules/charm/7.0.0-impi-2
that sets these paths. Then you can module load
it below and don't have to call this extra spectre_setup_charm_paths
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I want to tackle this right now. Installing charm was a bit tricky
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then better set these paths in spectre_load_modules
, or else you always have to call this extra function?
support/Environments/urania.sh
Outdated
-D MEMORY_ALLOCATOR=JEMALLOC \ | ||
-D BUILD_PYTHON_BINDINGS=ON \ | ||
-D MACHINE=Urania \ | ||
-D Python_EXECUTABLE=${SPECTRE_HOME}/env/bin/python \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can delete this line since the Py env is sourced, so cmake should find the correct python
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
-D Python_EXECUTABLE=${SPECTRE_HOME}/env/bin/python \ | ||
-D Catch2_DIR=/u/guilara/repos/Catch2/install_dir/lib64/cmake/Catch2 \ | ||
-D MPI_C_COMPILER=/mpcdf/soft/SLE_15/packages/skylake\ | ||
/impi/gcc_11-11.2.0/2021.7.1/bin/mpigcc \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So cmake doesn't find these without help even though the impi/2021.7
module is loaded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried removing them. I must have added them because it couldn't find it.
-- Could NOT find MPI_C (missing: MPI_C_LIB_NAMES MPI_C_HEADER_DIR MPI_C_WORKS)
-- Could NOT find MPI (missing: MPI_C_FOUND C)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is MPI needed? If everything works without explicitly linking MPI then just remove these lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do need it, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No we don't link to it explicitly, only charm is built with it.
support/Python/Schedule.py
Outdated
validate_input_file( | ||
input_file_path.resolve(), executable=executable, work_dir=run_dir | ||
) | ||
if validate or validate is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if validate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
support/Python/Schedule.py
Outdated
@@ -861,6 +866,11 @@ def scheduler_options(f): | |||
"You may also want to use '--clean-output'." | |||
), | |||
) | |||
@click.option( | |||
"--validate/--no-validate", | |||
default=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default=True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
support/SubmitScripts/Urania.sh
Outdated
{% endblock %} | ||
|
||
{% block charm_ppn %} | ||
# One thread for communication |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One Two threads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected
support/SubmitScripts/Urania.sh
Outdated
#SBATCH --ntasks-per-node=1 | ||
#SBATCH --ntasks-per-core=1 | ||
#SBATCH --cpus-per-task=72 | ||
#SBATCH --mem=240000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--mem=240G
(is this needed?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
support/SubmitScripts/Urania.sh
Outdated
module load gsl/1.16 | ||
module load cmake/3.26 | ||
module load hdf5-serial/1.12.2 | ||
module load anaconda/3/2021.11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You really don't want to duplicate the list of modules in these two places. Replace it here with this:
source ${SPECTRE_HOME}/support/Environments/urania.sh
spectre_load_modules
3d4f862
to
4121fc5
Compare
Fix validate flag Fix
2c0688e
to
1438c07
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good for now. If you want to change anything later, you can.
Please squash this all into two commits (after @nilsvu has looked at this again). 1 for the new validate option in python, and 2 for adding the Urania env, machine, and submit files.
source ${SPECTRE_HOME}/support/Environments/urania.sh | ||
spectre_load_modules | ||
spectre_setup_charm_paths |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[optional] Might also want to module list
after this so you can see what's loaded
spectre_setup_charm_paths() { | ||
# Define Charm paths | ||
export CHARM_ROOT=/u/guilara/charm_impi_2/mpi-linux-x86_64-smp | ||
export PATH=$PATH:/u/guilara/charm_impi_2/mpi-linux-x86_64-smp/bin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then better set these paths in spectre_load_modules
, or else you always have to call this extra function?
module load anaconda/3/2021.11 | ||
module load paraview/5.10 | ||
# Load Spack environment | ||
source /u/guilara/repos/spack/share/spack/setup-env.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can delete this line
-D Python_EXECUTABLE=${SPECTRE_HOME}/env/bin/python \ | ||
-D Catch2_DIR=/u/guilara/repos/Catch2/install_dir/lib64/cmake/Catch2 \ | ||
-D MPI_C_COMPILER=/mpcdf/soft/SLE_15/packages/skylake\ | ||
/impi/gcc_11-11.2.0/2021.7.1/bin/mpigcc \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No we don't link to it explicitly, only charm is built with it.
source /urania/u/guilara/repos/spack/var/spack/environments\ | ||
/env3_spectre_impi/loads | ||
# Load python environment | ||
source /u/guilara/envs/spectre_env |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing "bin/activate" ?
-D MEMORY_ALLOCATOR=JEMALLOC \ | ||
-D BUILD_PYTHON_BINDINGS=ON \ | ||
-D MACHINE=Urania \ | ||
-D Catch2_DIR=/u/guilara/repos/Catch2/install_dir/lib64/cmake/Catch2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Backslash at end of line missing
Proposed changes
Add submit script information about Urania. Add no-validate flag to scheduler to prevent it from attempting to run mpi on the head node ( #6183).
Upgrade instructions
Code review checklist
make doc
to generate the documentation locally intoBUILD_DIR/docs/html
.Then open
index.html
.code review guide.
bugfix
ornew feature
if appropriate.Further comments