Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Urania machine and validate flag to scheduler #6184

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

guilara
Copy link
Contributor

@guilara guilara commented Jul 30, 2024

Proposed changes

Add submit script information about Urania. Add no-validate flag to scheduler to prevent it from attempting to run mpi on the head node ( #6183).

Upgrade instructions

Code review checklist

  • The code is documented and the documentation renders correctly. Run
    make doc to generate the documentation locally into BUILD_DIR/docs/html.
    Then open index.html.
  • The code follows the stylistic and code quality guidelines listed in the
    code review guide.
  • The PR lists upgrade instructions and is labeled bugfix or
    new feature if appropriate.

Further comments

@guilara guilara force-pushed the PRs-Urania-SubmitScript branch from 2700b51 to 5a846f9 Compare July 30, 2024 09:02
@guilara guilara requested a review from nilsvu July 30, 2024 09:09
@guilara guilara marked this pull request as draft July 30, 2024 09:37
@guilara guilara force-pushed the PRs-Urania-SubmitScript branch from 5a846f9 to 9ab841b Compare July 30, 2024 09:44
@guilara guilara marked this pull request as ready for review July 30, 2024 09:45
@guilara guilara force-pushed the PRs-Urania-SubmitScript branch 3 times, most recently from 6f41978 to 455043b Compare July 30, 2024 12:13
@@ -854,6 +857,11 @@ def scheduler_options(f):
"You may also want to use '--clean-output'."
),
)
@click.option(
"--no-validate",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this --validate/--no-validate, and set default=True. Then the variable is positive, which is much easier to understand (you have to change the variable name no_validate to validate)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the name.

{{ super() -}}
#SBATCH -D ./
#SBATCH --nodes {{ num_nodes | default(1) }}
#SBATCH --ntasks-per-node=1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have found that 2 tasks per node work well on other clusters of this size, maybe even 3. I see you currently do 1 task but 2 comm threads. Have you tried 2 tasks with 1 comm thread each? Not sure if there's a difference @knelli2 @nilsdeppe do you know?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically each task = an MPI rank = a comm core

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't really experimented with this. So I don't know.

#SBATCH --cpus-per-task=72
#SBATCH --mem=240000
#SBATCH -t {{ time_limit | default("1-00:00:00") }}
#SBATCH -p {{ queue | default("p.debug") }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default in the yaml is p.urania. Make it the same here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed. Had left it when testing.


{% block head %}
{{ super() -}}
#SBATCH -D ./
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't needed I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

support/SubmitScripts/Urania.sh Show resolved Hide resolved
module load gsl/1.16
module load cmake/3.26
module load hdf5-serial/1.12.2
module load anaconda/3/2021.11
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the other clusters we have a file like support/Environments/caltech_hpc_gcc.sh that loads the modules. That enables you to load the modules to configure a build directory. Then you don't need to load the modules in the submit script, or rather, you would load the modules before submitting a job. I also load the modules in my ~/.bashrc sometimes (that's personal preference).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an environment file. I tried to remove the module loads from the submit script but it printed there where no modules loaded even if I had them loaded on my terminal when submitting. So I put them back in

Copy link
Member

@nilsvu nilsvu Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You really don't want to duplicate the list of modules in these two places. Replace it here with this:

source ${SPECTRE_HOME}/support/Environments/urania.sh
spectre_load_modules

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@guilara guilara force-pushed the PRs-Urania-SubmitScript branch 3 times, most recently from eddd75c to 3d4f862 Compare July 31, 2024 15:45
@guilara guilara requested a review from nilsvu July 31, 2024 15:48
@guilara guilara changed the title Add Urania machine and no-validate flag to scheduler Add Urania machine and validate flag to scheduler Jul 31, 2024
Copy link
Member

@nilsvu nilsvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can squash

Comment on lines 22 to 23
source /u/guilara/repos/spack/share/spack/setup-env.sh
spack env activate env3_spectre_impi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that you can have Spack generate a load file for you so you don't have to source setup-env.sh (which can take a long time): https://spack.readthedocs.io/en/latest/environments.html#loading

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

source /u/guilara/repos/spack/share/spack/setup-env.sh
spack env activate env3_spectre_impi
# Load python environment
source $SPECTRE_HOME/env/bin/activate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest you put this Python environment in some fixed location so others can use it, e.g. /u/guilara/envs/spectre_python. Otherwise everyone has to create a new Py env in every spectre checkout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created this

spectre_setup_charm_paths() {
# Define Charm paths
export CHARM_ROOT=/u/guilara/charm_impi_2/mpi-linux-x86_64-smp
export PATH=$PATH:/u/guilara/charm_impi_2/mpi-linux-x86_64-smp/bin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest you create a module file like /u/guilara/modules/charm/7.0.0-impi-2 that sets these paths. Then you can module load it below and don't have to call this extra spectre_setup_charm_paths function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I want to tackle this right now. Installing charm was a bit tricky

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then better set these paths in spectre_load_modules, or else you always have to call this extra function?

-D MEMORY_ALLOCATOR=JEMALLOC \
-D BUILD_PYTHON_BINDINGS=ON \
-D MACHINE=Urania \
-D Python_EXECUTABLE=${SPECTRE_HOME}/env/bin/python \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can delete this line since the Py env is sourced, so cmake should find the correct python.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

-D Python_EXECUTABLE=${SPECTRE_HOME}/env/bin/python \
-D Catch2_DIR=/u/guilara/repos/Catch2/install_dir/lib64/cmake/Catch2 \
-D MPI_C_COMPILER=/mpcdf/soft/SLE_15/packages/skylake\
/impi/gcc_11-11.2.0/2021.7.1/bin/mpigcc \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So cmake doesn't find these without help even though the impi/2021.7 module is loaded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried removing them. I must have added them because it couldn't find it.

-- Could NOT find MPI_C (missing: MPI_C_LIB_NAMES MPI_C_HEADER_DIR MPI_C_WORKS) 
-- Could NOT find MPI (missing: MPI_C_FOUND C)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is MPI needed? If everything works without explicitly linking MPI then just remove these lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do need it, no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we don't link to it explicitly, only charm is built with it.

validate_input_file(
input_file_path.resolve(), executable=executable, work_dir=run_dir
)
if validate or validate is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if validate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

@@ -861,6 +866,11 @@ def scheduler_options(f):
"You may also want to use '--clean-output'."
),
)
@click.option(
"--validate/--no-validate",
default=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default=True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed

{% endblock %}

{% block charm_ppn %}
# One thread for communication
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One Two threads

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected

#SBATCH --ntasks-per-node=1
#SBATCH --ntasks-per-core=1
#SBATCH --cpus-per-task=72
#SBATCH --mem=240000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--mem=240G (is this needed?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

module load gsl/1.16
module load cmake/3.26
module load hdf5-serial/1.12.2
module load anaconda/3/2021.11
Copy link
Member

@nilsvu nilsvu Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You really don't want to duplicate the list of modules in these two places. Replace it here with this:

source ${SPECTRE_HOME}/support/Environments/urania.sh
spectre_load_modules

@guilara guilara force-pushed the PRs-Urania-SubmitScript branch from 3d4f862 to 4121fc5 Compare August 5, 2024 16:18
Fix validate flag

Fix
@guilara guilara force-pushed the PRs-Urania-SubmitScript branch from 2c0688e to 1438c07 Compare August 5, 2024 18:11
@guilara guilara requested a review from nilsvu August 5, 2024 18:14
Copy link
Contributor

@knelli2 knelli2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good for now. If you want to change anything later, you can.

Please squash this all into two commits (after @nilsvu has looked at this again). 1 for the new validate option in python, and 2 for adding the Urania env, machine, and submit files.

Comment on lines +28 to +30
source ${SPECTRE_HOME}/support/Environments/urania.sh
spectre_load_modules
spectre_setup_charm_paths
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] Might also want to module list after this so you can see what's loaded

spectre_setup_charm_paths() {
# Define Charm paths
export CHARM_ROOT=/u/guilara/charm_impi_2/mpi-linux-x86_64-smp
export PATH=$PATH:/u/guilara/charm_impi_2/mpi-linux-x86_64-smp/bin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then better set these paths in spectre_load_modules, or else you always have to call this extra function?

module load anaconda/3/2021.11
module load paraview/5.10
# Load Spack environment
source /u/guilara/repos/spack/share/spack/setup-env.sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can delete this line

-D Python_EXECUTABLE=${SPECTRE_HOME}/env/bin/python \
-D Catch2_DIR=/u/guilara/repos/Catch2/install_dir/lib64/cmake/Catch2 \
-D MPI_C_COMPILER=/mpcdf/soft/SLE_15/packages/skylake\
/impi/gcc_11-11.2.0/2021.7.1/bin/mpigcc \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we don't link to it explicitly, only charm is built with it.

source /urania/u/guilara/repos/spack/var/spack/environments\
/env3_spectre_impi/loads
# Load python environment
source /u/guilara/envs/spectre_env
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing "bin/activate" ?

@knelli2 knelli2 added cli/pybindings Command line interface & Python bindings clusters Super computer support (env, submit, machine files) labels Aug 27, 2024
-D MEMORY_ALLOCATOR=JEMALLOC \
-D BUILD_PYTHON_BINDINGS=ON \
-D MACHINE=Urania \
-D Catch2_DIR=/u/guilara/repos/Catch2/install_dir/lib64/cmake/Catch2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Backslash at end of line missing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cli/pybindings Command line interface & Python bindings clusters Super computer support (env, submit, machine files)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants