Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aurora job allocation doc: qsub examples to select nodes within slot/chassis/cabinet etc? examples of useful pbs qstat options #527

Open
kaushikvelusamy opened this issue Nov 7, 2024 · 1 comment

Comments

@kaushikvelusamy
Copy link
Contributor

Providing my notes on some items to cover

1. Allocating Nodes in Specific Racks or Cabinets

Selecting Nodes in Specific Cabinets or Chassis:

  • To allocate nodes in a specific cabinet, use the command:qsub -l select=tier0=x4407 pbs_submit_script.sh
  • To allocate nodes in a specific chassis, use the command:qsub -l select=tier1=x4407c2 pbs_submit_script.sh

Requesting a Single Node per Specified Cabinet:

  • Concatenate select statements as follows:l select=1:ncpus=208:tier0=x4001+1:ncpus=208:tier0=x4002+1:ncpus=208:tier0=x4003+...

Determining the Number of Available Nodes:

  • To find the number of free nodes for each cabinet on the LustreApps queue:pbsnodes -avF JSON | jq '.nodes[] | select((.resources_available.at_queue == "LustreApps")) | select((.state == "free")) | .resources_available.host' | sed 's/^"//' | grep -o '^[^c]*' | sort | uniq -c
  • To find the number of up nodes (free and in use) for each cabinet on the LustreApps queue:pbsnodes -avF JSON | jq '.nodes[] | select((.resources_available.at_queue == "LustreApps")) | select((.state == "free") or (.state == "job-exclusive")) | .resources_available.host' | sed 's/^"//' | grep -o '^[^c]*' | sort | uniq -c

2. Advanced Node Selection Syntax

Example of Selecting a Specific Number of Cabinets:

  • Use a combination of select and place statements to group chunks of nodes.
  • Example command:qsub -l select=60+60+60+60 -l place=group=tier0 pbs_submit_script.sh
  • This command requests groups of 60 nodes, grouped by the tier0 resource.

3. Limitations and Considerations

  • Incomplete Cabinets: There may not be complete cabinets available, meaning requests for a full cabinet might never run.
  • Node Grouping: The group selection works only if the entire set of requested nodes fits the criteria.
  • System Interconnect Topology: When allocating nodes above a rack in size, it's generally advisable to spread out due to the characteristics of the interconnect topology (such as Dragonfly on Aurora).
  • Verifying Node Availability: Use the commands provided to check the availability of nodes in a chassis or cabinet before allocation.

4. Additional Useful Commands

Checking Nodes in Chassis:

  • To find the number of free nodes in each chassis on the LustreApps queue:pbsnodes -avF JSON | jq '.nodes[] | select((.resources_available.at_queue == "LustreApps")) | select((.state == "free")) | .resources_available.host' | sed 's/^"//' | grep -o '^[^s]*' | sort | uniq -c

  • To find the number of up nodes (free and in use) in each chassis on the LustreApps queue:pbsnodes -avF JSON | jq '.nodes[] | select((.resources_available.at_queue == "LustreApps")) | select((.state == "free") or (.state == "job-exclusive")) | .resources_available.host' | sed 's/^"//' | grep -o '^[^s]*' | sort | uniq -c

  • To find the number of free nodes in each chassis on the LustreApps queue:pbsnodes -avF JSON | jq '.nodes[] | select((.resources_available.at_queue == "LustreApps")) | select((.state == "free")) | .resources_available.host' | sed 's/^"//' | grep -o '^[^s]*' | sort | uniq -c

pbsnodes -avSj | awk '{ if ($2 = "free" ) print $1 "\t" $2 }'

you can select specific nodes via 

-l select=host=x4703c2s3b0n0
qsub -l select=host=x1922c6s3b0n0+1:host=x1922c7s6b0n0 -q workq-route -l walltime=00:20:00 -l filesystems=gila -A Aurora_deployment -I

watch qstat -was1 workq
to see what is queued up and about to begin in my queue

qstat -Twas1 lustre_scaling | column -t | sort -k 6 -n

to order by running and then waiting
qstat -Twas1 lustre_scaling

to sort by num nodes
| column -t | sort -k 6 -n

$ qstat -fxw <job_id>
Check for comment field.
run_count if its increasing then its trying to offline nodes and bringing in new nodes

qstat -xwpau $USER
to show a list of recently submitted jobs and you can see the Elap Time vs Req'd Time

Nodes can have more than one status (down,offline is pretty common for instance), PBS will only show the first on the list in a summary view like
res
pbsnodes -avSj

you should also keep in mind that node statuses will matter,
pbsnodes -avSj

and

pbsnodes -l

will help a lot with that (the first shows job id with nodes and their status, the second shows nodes that are considered 'down' and are in a unusuable state.

so

qstat -was1 workq

will get you that info
for workq. Also

qstat -Qf workq

will show full details on the queue, and the resources_assigned.nodect entry will have how many nodes have jobs running on them

qstat -fx 8997637.amn-0001

pbs_rstat - show reservation

@kaushikvelusamy
Copy link
Contributor Author

reference to clush and pdsh will be helpful to new users

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant