From eb2c5ba8aeda1062e4ee72c3b1ff0b180ff00f51 Mon Sep 17 00:00:00 2001 From: tayloraubry <123703599+tayloraubry@users.noreply.github.com> Date: Wed, 18 Dec 2024 09:03:47 -0700 Subject: [PATCH 01/13] Update vasp.md Updates post system time, cpe 22 modules have now been removed. --- docs/Documentation/Applications/vasp.md | 17 ++++++----------- 1 file changed, 6 insertions(+), 11 deletions(-) diff --git a/docs/Documentation/Applications/vasp.md b/docs/Documentation/Applications/vasp.md index cf781f120..0151b4375 100644 --- a/docs/Documentation/Applications/vasp.md +++ b/docs/Documentation/Applications/vasp.md @@ -49,23 +49,18 @@ NREL offers modules for VASP 5 and VASP 6 on CPUs as well as GPUs on certain sys #### CPU -There are several modules for CPU builds of VASP 5 and VASP 6. As of 08/09/2024 we have released new modules for VASP on Kestrel CPUs: +There are several modules for CPU builds of VASP 5 and VASP 6. ``` CPU $ module avail vasp - ------------- /nopt/nrel/apps/cpu_stack/modules/default/application ------------- - #new modules: - vasp/5.4.4+tpc vasp/6.3.2_openMP+tpc vasp/6.4.2_openMP+tpc - vasp/5.4.4_base vasp/6.3.2_openMP vasp/6.4.2_openMP - - # Legacy modules will be removed during system time in December! - vasp/5.4.4 vasp/6.3.2 vasp/6.4.2 (D) +------------- /nopt/nrel/apps/cpu_stack/modules/default/application ------------- + vasp/5.4.4+tpc vasp/6.3.2_openMP+tpc vasp/6.4.2_openMP+tpc + vasp/5.4.4 vasp/6.3.2_openMP vasp/6.4.2_openMP (D) ``` - What’s new: + Notes: - * New modules have been rebuilt with the latest Cray Programming Environment (cpe23), updated compilers, and math libraries. + * These modules have been built with the latest Cray Programming Environment (cpe23), updated compilers, and math libraries. * OpenMP capability has been added to VASP 6 builds. * Modules that include third-party codes (e.g., libXC, libBEEF, VTST tools, and VASPsol) are now denoted with +tpc. Use `module show vasp/` to see details of a specific version. From 324b8729f2cc26b11ba9dde1c3840cb92ca88514 Mon Sep 17 00:00:00 2001 From: hyandt Date: Fri, 20 Dec 2024 11:04:07 -0700 Subject: [PATCH 02/13] add partition info --- docs/Documentation/Systems/Kestrel/Running/index.md | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/docs/Documentation/Systems/Kestrel/Running/index.md b/docs/Documentation/Systems/Kestrel/Running/index.md index eff3dd02d..4066f56b8 100644 --- a/docs/Documentation/Systems/Kestrel/Running/index.md +++ b/docs/Documentation/Systems/Kestrel/Running/index.md @@ -11,7 +11,7 @@ There are two general types of compute nodes on Kestrel: CPU nodes and GPU nodes ### CPU Nodes -Standard CPU-based compute nodes on Kestrel have 104 cores and 240G of usable RAM. 256 of those nodes have a 1.7TB NVMe local disk. There are also 10 bigmem nodes with 2TB of RAM and 5.6TB NVMe local disk. +Standard CPU-based compute nodes on Kestrel have 104 cores and 240G of usable RAM. 256 of those nodes have a 1.7TB NVMe local disk. There are also 10 bigmem nodes with 2TB of RAM and 5.6TB NVMe local disk. Two racks of the CPU compute nodes have dual interconnect network cards which may increase performance for certain types of multi-node jobs. ### GPU Nodes @@ -45,6 +45,7 @@ The following table summarizes the partitions on Kestrel: | ```long``` | Nodes that prefer jobs with walltimes > 2 days.
*Maximum walltime of any job is 10 days*| 525 nodes total.
262 nodes per user.| ```--time <= 10-00```
```--mem <= 246064```
```--tmp <= 1700000 (256 nodes)```| |```bigmem``` | Nodes that have 2 TB of RAM and 5.6 TB NVMe local disk. | 8 nodes total.
4 nodes per user. | ```--mem > 246064```
```--time <= 2-00```
```--tmp > 1700000 ``` | |```bigmeml``` | Bigmem nodes that prefer jobs with walltimes > 2 days.
*Maximum walltime of any job is 10 days.* | 4 nodes total.
3 nodes per user. | ```--mem > 246064```
```--time > 2-00```
```--tmp > 1700000 ``` | +|```hbw``` | CPU compute nodes with dual interconnect network cards. | 512 nodes total.
256 nodes per user. | ```-p hbw```
```--constraint=hbw```
```--time > 2-00```| | ```shared```| Nodes that can be shared by multiple users and jobs. | 64 nodes total.
Half of partition per user.
2 days max walltime. | ```-p shared```
or
```--partition=shared```| | ```sharedl```| Nodes that can be shared by multiple users and prefer jobs with walltimes > 2 days. | 16 nodes total.
8 nodes per user. | ```-p sharedl```
or
```--partition=sharedl```| | ```gpu-h100```| Shareable GPU nodes with 4 NVIDIA H100 SXM 80GB Computational Accelerators. | 130 nodes total.
65 nodes per user. | ```1 <= --gpus <= 4```
```--time <= 2-00```| @@ -81,6 +82,12 @@ Currently, there are 64 standard compute nodes available in the shared partition srun ./my_progam # Use your application's commands here ``` +### High Bandwidth Partition + +To request nodes with dual interconnect cards, you can either specify the `hbw` partition, or the feature constaint `--constraint=hbw`. Do not specify the constraint and a partition other than the hbw partition. This will prevent your jobs from having +priority on the high bandwidth nodes, thus increasing queue time. + + ### GPU Jobs From 434a5126f7cc7ead97b8a4b03dac240be23df151 Mon Sep 17 00:00:00 2001 From: hyandt Date: Fri, 20 Dec 2024 11:12:19 -0700 Subject: [PATCH 03/13] add vscode bug --- docs/Documentation/Development/VSCode/vscode.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/docs/Documentation/Development/VSCode/vscode.md b/docs/Documentation/Development/VSCode/vscode.md index e1678529d..97e2c3d63 100644 --- a/docs/Documentation/Development/VSCode/vscode.md +++ b/docs/Documentation/Development/VSCode/vscode.md @@ -16,6 +16,15 @@ You may then enter your HPC username and the address of an HPC system to connect Enter your HPC password (or password and OTP code if external) and you will be connected to a login node. You may open a folder on the remote host to browse your home directory and select files to edit, and so on. +!!! bug "Windows SSH "Corrupted MAC on input" Error" + Some people who use Windows 10/11 computers to ssh to Kestrel via Visual Studio Code's SSH extension might receive an error message about a "Corrupted MAC on input" or "message authentication code incorrect." To workaround this issue, you will need to create an ssh config file on your local computer, `~/.ssh/config`, with a host entry for Kestrel that specifies a new message authentication code: + ``` + Host kestrel + HostName kestrel.hpc.nrel.gov + MACs hmac-sha2-512 + ``` + This [Visual Studio Blog post](https://code.visualstudio.com/blogs/2019/10/03/remote-ssh-tips-and-tricks) has further instructions on how to create the ssh configuration file for Windows and VS Code. + ## Caution About VS Code Processes Please be aware that the Remote SSH extension runs processes on the remote host. This includes any extensions or helpers, include language parsers, code analyzers, AI code assistants, and so on. These extensions can take up a _considerable_ amount of CPU and RAM on any remote host that VS Code connects to. Jupyter notebooks loaded through VS Code will also be executed on the remote host and can use excessive CPU and RAM, as well. When the remote host is a shared login node on an HPC system, this can be a considerable drain on the resources of the login node, and cause system slowdowns for all users of that login node. From 69b2f044b82569bb3ed792e0820412e8690ae2e0 Mon Sep 17 00:00:00 2001 From: mselensky Date: Tue, 31 Dec 2024 16:13:15 -0700 Subject: [PATCH 04/13] remove reference to parallel partition on swift --- docs/Documentation/Systems/Swift/running.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/Documentation/Systems/Swift/running.md b/docs/Documentation/Systems/Swift/running.md index 705a4055b..536a44608 100644 --- a/docs/Documentation/Systems/Swift/running.md +++ b/docs/Documentation/Systems/Swift/running.md @@ -31,7 +31,6 @@ The most up to date list of partitions can always be found by running the `sinfo | long | jobs up to ten days of walltime | | standard | jobs up to two days of walltime | | gpu | Nodes with four NVIDIA A100 40 GB Computational Accelerators, up to two days of walltime | -| parallel | optimized for large parallel jobs, up to two days of walltime | | debug | two nodes reserved for short tests, up to four hours of walltime | Each partition also has a matching `-standby` partition. Allocations which have consumed all awarded AUs for the year may only submit jobs to these partitions, and their default QoS will be set to `standby`. Jobs in standby partitions will be scheduled when there are otherwise idle cycles and no other non-standby jobs are available. Jobs that run in the standby queue will not be charged any AUs. From a4ed65b521a375ab68c1b121ab775c3d74911a4a Mon Sep 17 00:00:00 2001 From: xinhe2205 <45882470+xinhe2205@users.noreply.github.com> Date: Thu, 2 Jan 2025 14:00:22 -0700 Subject: [PATCH 05/13] Update starccm.md --- docs/Documentation/Applications/starccm.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/Documentation/Applications/starccm.md b/docs/Documentation/Applications/starccm.md index a5ca3b8aa..8212ef92f 100644 --- a/docs/Documentation/Applications/starccm.md +++ b/docs/Documentation/Applications/starccm.md @@ -37,8 +37,8 @@ Then you need to create a Slurm script `` as shown below to sub #!/bin/bash -l #SBATCH --time=2:00:00 # walltime limit of 2 hours #SBATCH --nodes=2 # number of nodes - #SBATCH --ntasks-per-node=104 # number of tasks per node (<=104 on Kestrel) - #SBATCH --ntasks=72 # total number of tasks + #SBATCH --ntasks-per-node=96 # number of tasks per node (<=104 on Kestrel) + #SBATCH --ntasks=192 # total number of tasks #SBATCH --job-name=your_simulation # name of job #SBATCH --account= # name of project allocation @@ -68,8 +68,8 @@ STAR-CCM+ comes with its own Intel MPI. To use the Intel MPI, the Slurm script s #!/bin/bash -l #SBATCH --time=2:00:00 # walltime limit of 2 hours #SBATCH --nodes=2 # number of nodes - #SBATCH --ntasks-per-node=104 # number of tasks per node (<=104 on Kestrel) - #SBATCH --ntasks=72 # total number of tasks + #SBATCH --ntasks-per-node=96 # number of tasks per node (<=104 on Kestrel) + #SBATCH --ntasks=192 # total number of tasks #SBATCH --job-name=your_simulation # name of job #SBATCH --account= # name of project allocation @@ -102,8 +102,8 @@ STAR-CCM+ can run with Cray MPI. The following Slurm script submits STAR-CCM+ jo #!/bin/bash -l #SBATCH --time=2:00:00 # walltime limit of 2 hours #SBATCH --nodes=2 # number of nodes - #SBATCH --ntasks-per-node=104 # number of tasks per node (<=104 on Kestrel) - #SBATCH --ntasks=72 # total number of tasks + #SBATCH --ntasks-per-node=96 # number of tasks per node (<=104 on Kestrel) + #SBATCH --ntasks=192 # total number of tasks #SBATCH --job-name=your_simulation # name of job #SBATCH --account= # name of project allocation From a6066c307d08b6f100b2a479760c921c9bc0e653 Mon Sep 17 00:00:00 2001 From: arswalid Date: Wed, 8 Jan 2025 10:10:33 -0700 Subject: [PATCH 06/13] removal of eagle in docs --- docs/Documentation/Viz_Analytics/paraview.md | 40 ++++++++------------ 1 file changed, 16 insertions(+), 24 deletions(-) diff --git a/docs/Documentation/Viz_Analytics/paraview.md b/docs/Documentation/Viz_Analytics/paraview.md index 9e6166ff0..a5ab2f927 100644 --- a/docs/Documentation/Viz_Analytics/paraview.md +++ b/docs/Documentation/Viz_Analytics/paraview.md @@ -2,7 +2,7 @@ *ParaView is an open-source, multi-platform data analysis and visualization application. ParaView users can quickly build visualizations to analyze their data using qualitative and quantitative techniques. The data exploration can be done interactively in 3D or programmatically using ParaView's batch processing capabilities. ParaView was developed to analyze extremely large data sets using distributed memory computing resources. It can be run on supercomputers to analyze data sets of terascale as well as on laptops for smaller data.* -The following tutorials are meant for Eagle and Kestrel supercomputers. +The following tutorials are meant for Kestrel supercomputer. ## Using ParaView in Client-Server Mode @@ -15,14 +15,14 @@ The first step is to install ParaView. It is recommended that you use the binaries provided by Kitware on your workstation matching the NREL installed version. This ensures client-server compatibility. The version number that you install must identically match the version installed at NREL. -To determine which version of ParaView is installed on the cluster, connect to Eagle or Kestrel as you normally would, load the ParaView module with `module load paraview`, then check the version with `pvserver --version`. +To determine which version of ParaView is installed on the cluster, connect to Kestrel as you normally would, load the ParaView module with `module load paraview`, then check the version with `pvserver --version`. The version number, e.g., 5.11.0, will then be displayed to your terminal. To download the correct ParaView client binary version for your desktop environment, visit the ParaView [website](https://www.paraview.org/download/). 1. Reserve Compute Nodes - The first step is to reserve the computational resources on Eagle/Kestrel that will be running the ParaView server. + The first step is to reserve the computational resources on Kestrel that will be running the ParaView server. This requires using the Slurm `salloc` directive and specifying an allocation name and time limit for the reservation. @@ -32,7 +32,7 @@ To download the correct ParaView client binary version for your desktop environm (Otherwise, for interactive jobs that just require one process on one node, the "salloc-then-srun" construct isn't necessary at all; for that type of job you may just use `srun -A -t