-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
31 changed files
with
1,151 additions
and
1,646 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,7 @@ | ||
{ | ||
"MD013": false, | ||
"MD033": false, | ||
"MD038": false | ||
"MD038": false, | ||
"MD046": false, | ||
"MD041": false | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
{ | ||
"checks":{ | ||
"hyperbole.misc": false, | ||
"typography.exclamation": false, | ||
"typography.symbols": false | ||
}} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
#!/bin/bash | ||
|
||
|
||
shopt -s globstar | ||
|
||
total="$(echo docs/**/*.md | wc -w)" | ||
unver="$(./checks/get_unverified.py docs/**/*.md | wc -l)" | ||
|
||
printf "%s%%\n" "$(echo "(100-(100*$unver/$total))" | bc)" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,22 +4,12 @@ hidden: false | |
position: 5 | ||
tags: | ||
- slurm | ||
title: Finding Job Efficiency | ||
vote_count: 8 | ||
vote_sum: 8 | ||
zendesk_article_id: 360000903776 | ||
zendesk_section_id: 360000189716 | ||
--- | ||
|
||
|
||
|
||
[//]: <> (REMOVE ME IF PAGE VALIDATED) | ||
[//]: <> (vvvvvvvvvvvvvvvvvvvv) | ||
!!! warning | ||
This page has been automatically migrated and may contain formatting errors. | ||
[//]: <> (^^^^^^^^^^^^^^^^^^^^) | ||
[//]: <> (REMOVE ME IF PAGE VALIDATED) | ||
|
||
## On Job Completion | ||
|
||
It is good practice to have a look at the resources your job used on | ||
|
@@ -29,13 +19,13 @@ future. | |
Once your job has finished check the relevant details using the tools: | ||
`nn_seff` or `sacct` For example: | ||
|
||
**nn\_seff** | ||
### Using `nn_seff` | ||
|
||
``` sl | ||
```bash | ||
nn_seff 30479534 | ||
``` | ||
|
||
``` sl | ||
```txt | ||
Job ID: 1936245 | ||
Cluster: mahuika | ||
User/Group: user/group | ||
|
@@ -48,30 +38,25 @@ CPU Efficiency: 98.55% 00:01:08 of 00:01:09 core-walltime | |
Mem Efficiency: 10.84% 111.00 MB of 1.00 GB | ||
``` | ||
|
||
Notice that the CPU efficiency was high but the memory efficiency was | ||
very low and consideration should be given to reducing memory requests | ||
for similar jobs. If in doubt, please contact <[email protected]> for | ||
guidance. | ||
|
||
|
||
Notice that the CPU efficiency was high but the memory efficiency was low and consideration should be given to reducing memory requests | ||
for similar jobs. If in doubt, please contact [[email protected]](mailto:[email protected]) for guidance. | ||
|
||
**sacct** | ||
### Using `sacct` | ||
|
||
``` sl | ||
```bash | ||
sacct --format="JobID,JobName,Elapsed,AveCPU,MinCPU,TotalCPU,Alloc,NTask,MaxRSS,State" -j <jobid> | ||
``` | ||
!!! prerequisite Tip | ||
|
||
!!! tip | ||
*If you want to make this your default* `sacct` *setting, run;* | ||
``` sl | ||
```bash | ||
echo 'export SACCT_FORMAT="JobID,JobName,Elapsed,AveCPU,MinCPU,TotalCPU,Alloc%2,NTask%2,MaxRSS,State"' >> ~/.bash_profile | ||
source ~/.bash_profile | ||
``` | ||
|
||
------------------------------------------------------------------------ | ||
|
||
Below is an output for reference: | ||
|
||
``` sl | ||
```txt | ||
JobID JobName Elapsed AveCPU MinCPU TotalCPU AllocCPUS NTasks MaxRSS State | ||
------------ ---------- ---------- ---------- ---------- ---------- ---------- -------- ---------- ---------- | ||
3007056 rfm_ANSYS+ 00:27:07 03:35:55 16 COMPLETED | ||
|
@@ -82,9 +67,7 @@ Below is an output for reference: | |
*All of the adjustments below still allow for a degree of variation. | ||
There may be factors you have not accounted for.* | ||
|
||
------------------------------------------------------------------------ | ||
|
||
### **Walltime** | ||
#### Walltime | ||
|
||
From the `Elapsed` field we may want to update our next run to have a | ||
more appropriate walltime. | ||
|
@@ -93,7 +76,7 @@ more appropriate walltime. | |
#SBATCH --time=00:40:00 | ||
``` | ||
|
||
### **Memory** | ||
#### Memory | ||
|
||
The `MaxRSS` field shows the maximum memory used by each of the job | ||
steps, so in this case 13 GB. For our next run we may want to set: | ||
|
@@ -102,7 +85,7 @@ steps, so in this case 13 GB. For our next run we may want to set: | |
#SBATCH --mem=15G | ||
``` | ||
|
||
### **CPU's** | ||
#### CPUs | ||
|
||
`TotalCPU` is the number of computation hours, in the best case scenario | ||
the computation hours would be equal to `Elapsed` x `AllocCPUS`. | ||
|
@@ -116,8 +99,6 @@ however bear in mind there are other factors that affect CPU efficiency. | |
#SBATCH --cpus-per-task=10 | ||
``` | ||
|
||
|
||
|
||
Note: When using sacct to determine the amount of memory your job used - | ||
in order to reduce memory wastage - please keep in mind that Slurm | ||
reports the figure as RSS (Resident Set Size) when in fact the metric | ||
|
@@ -153,19 +134,20 @@ If 'nodelist' is not one of the fields in the output of your `sacct` or | |
`squeue` commands you can find the node a job is running on using the | ||
command; `squeue -h -o %N -j <jobid>` The node will look something like | ||
`wbn123` on Mahuika or `nid00123` on Māui | ||
!!! prerequisite Note | ||
|
||
!!! note | ||
If your job is using MPI it may be running on multiple nodes | ||
|
||
### htop | ||
### Using `htop` | ||
|
||
``` sl | ||
```bash | ||
ssh -t wbn175 htop -u $USER | ||
``` | ||
|
||
If it is your first time connecting to that particular node, you may be | ||
prompted: | ||
|
||
``` sl | ||
```txt | ||
The authenticity of host can't be established | ||
Are you sure you want to continue connecting (yes/no)? | ||
``` | ||
|
@@ -185,15 +167,16 @@ Processes in green can be ignored | |
|
||
**S** - State, what the thread is currently doing. | ||
|
||
- R - Running | ||
- S - Sleeping, waiting on another thread to finish. | ||
- D - Sleeping | ||
- Any other letter - Something has gone wrong! | ||
- R - Running | ||
- S - Sleeping, waiting on another thread to finish. | ||
- D - Sleeping | ||
- Any other letter - Something has gone wrong! | ||
|
||
**CPU%** - Percentage CPU utilisation. | ||
|
||
**MEM% **Percentage Memory utilisation. | ||
!!! prerequisite Warning | ||
**MEM%** - Percentage Memory utilisation. | ||
|
||
!!! warning | ||
If the job finishes, or is killed you will be kicked off the node. If | ||
htop freezes, type `reset` to clear your terminal. | ||
|
||
|
@@ -204,21 +187,18 @@ time* the CPUs are in use. This is not enough to get a picture of | |
overall job efficiency, as required CPU time *may vary by number of | ||
CPU*s. | ||
|
||
The only way to get the full context, is to compare walltime performance | ||
between jobs at different scale. See [Job | ||
Scaling](../../Getting_Started/Next_Steps/Job_Scaling_Ascertaining_job_dimensions.md) | ||
for more details. | ||
The only way to get the full context, is to compare walltime performance between jobs at different scale. See [Job Scaling](../../Getting_Started/Next_Steps/Job_Scaling_Ascertaining_job_dimensions.md) for more details. | ||
|
||
### Example | ||
|
||
![qdyn\_eff.png](../../assets/images/Finding_Job_Efficiency_0.png) | ||
|
||
From the above plot of CPU efficiency, you might decide a 5% reduction | ||
of CPU efficiency is acceptable and scale your job up to 18 CPU cores . | ||
of CPU efficiency is acceptable and scale your job up to 18 CPU cores . | ||
|
||
![qdyn\_walltime.png](../../assets/images/Finding_Job_Efficiency_1.png) | ||
|
||
However, when looking at a plot of walltime it becomes apparent that | ||
performance gains per CPU added drop significantly after 4 CPUs, and in | ||
fact absolute performance losses (negative returns) are seen after 8 | ||
CPUs. | ||
CPUs. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,23 +10,12 @@ zendesk_article_id: 7352114813455 | |
zendesk_section_id: 7348936006031 | ||
--- | ||
|
||
|
||
|
||
[//]: <> (REMOVE ME IF PAGE VALIDATED) | ||
[//]: <> (vvvvvvvvvvvvvvvvvvvv) | ||
!!! warning | ||
This page has been automatically migrated and may contain formatting errors. | ||
[//]: <> (^^^^^^^^^^^^^^^^^^^^) | ||
[//]: <> (REMOVE ME IF PAGE VALIDATED) | ||
|
||
Charges for Subscription usage are typically invoiced on a quarterly | ||
basis. | ||
|
||
If your organisation requires a Purchase Order (PO) Number be used for | ||
invoices, the PO Number must be provided to us upon signing your | ||
Subscription service agreement. | ||
|
||
|
||
|
||
If you have any questions about Subscription billing processes, don’t | ||
hesitate to [get in touch](mailto:[email protected]). | ||
hesitate to [get in touch](mailto:[email protected]). |
Oops, something went wrong.