desi_use_reservation for prods #2346

sbailey · 2024-08-27T00:09:44Z

This PR adds a new script desi_use_reservation to assist in moving jobs from the regular queue into a batch reservation while running productions. It moves "just enough" jobs to fill the reservation, but then pauses so that we don't overfill it because

once a job is in a reservation, you can't move it back into the regular queue (you have to move it to a different reservation, or otherwise cancel and resubmit)
reservations are handy for catching up on high priority reruns of specific jobs and we don't want thousands of other jobs backed up in the reservation queue.

The script auto-derives the size of the reservation, whether it is a CPU/GPU partition, and what regular queue jobs are eligible to be run. It prioritizes "bottleneck" jobs like ccdcalib, nightlyflat, psfnight over regular jobs like arc, flat, tilenight, ztile. It only moves jobs into the reservation if they are not waiting on a dependency so that we don't fill the reservation backlog with jobs that can't run anyway. As a consequence, this should be run fairly frequently so that it can move jobs that have become newly eligible.

I have been testing and refining this today with the Kibo run using reservations kibo26_cpu and kibo26_gpu. I'm not done with ideas for additional improvements, but I'll cut myself off from "one more thing" and get this PR out for review. Note: it is safe to run this in a different environment from the production itself, since it is just moving jobs around, not actually submitting jobs that need the right environment.

I updated desispec.workflow.queue.get_jobs_in_queue to include a RESERVATION column. Otherwise the functionality is currently contained inside the bin/desi_use_reservation, though pieces are broken out into functions that could be moved into desisepc.workflow.queue and desispec.scripts as needed.

Example usage

Dry run, run once and exit

Check status of reservation and recommend what to do, but don't actually do anything:

$> desi_use_reservation -r kibo26_cpu --dry-run
INFO:desi_use_reservation:23:get_reservation_info: Getting reservation info with: scontrol show res kibo26_cpu --json
INFO:queue.py:533:get_jobs_in_queue: Querying jobs in queue with: squeue -u desi -o "%i,%P,%v,%j,%u,%t,%M,%D,%R"
INFO:desi_use_reservation:118:use_reservation: At Mon Aug 26 16:57:46 2024, kibo26_cpu (30 nodes) has 7 jobs using 13 nodes
INFO:desi_use_reservation:119:use_reservation: 4 CPU jobs using 6 nodes are eligible to be moved into the reservation
INFO:desi_use_reservation:125:use_reservation: Adding jobs to use 6 additional nodes
INFO:desi_use_reservation:132:use_reservation: Move ccdcalib-20220315-00126226-a0123456789 to kibo26_cpu
INFO:desi_use_reservation:132:use_reservation: Move ccdcalib-20220314-00126112-a0123456789 to kibo26_cpu
INFO:desi_use_reservation:132:use_reservation: Move psfnight-20220312-00125887-a0123456789 to kibo26_cpu
INFO:desi_use_reservation:132:use_reservation: Move psfnight-20220309-00125501-a0123456789 to kibo26_cpu
INFO:desi_use_reservation:144:use_reservation: Dry run mode; will print what to do but not actually run the commands
scontrol update ReservationName=kibo26_cpu JobID=29829831,29829728,29828642,29827788
INFO:desi_use_reservation:199:main: Done checking at Mon Aug 26 16:57:46 2024

Update reservation in a loop

Actually move jobs from the regular queue into the reservation, entering a loop checking every 5 minutes until 2024-08-26T17:30. Include 10 extra nodes worth of jobs so that there is a little buffer of jobs finishing and new ones starting before the next check:

$> desi_use_reservation -r kibo26_cpu --sleep 5 --until 2024-08-26T17:30 -n 10

akremin

Thanks for this incredibly useful script. I have added a few comments inline. One documents something we discussed in person that should be documented for the future, two others require very minor changes to make the code more robust. If this merge is time-critical we can proceed without the corrections since the script will be fine >99% of the time, but the additional robustness would be nice to have.

bin/desi_use_reservation

akremin · 2024-08-27T16:53:35Z

bin/desi_use_reservation

+    #- which regular queue partition is eligible for this reservation?
+    regular_partition = resinfo['partition']
+
+    #- Determine CPU vs. GPU reservation


As discussed in person, this should be improved in the future to "future proof" it for later systems. For now it works fine on Perlmutter and we don't yet know how things may change for NERSC 10, so this is fine to leave as-is for now.

bin/desi_use_reservation

sbailey · 2024-08-27T21:47:54Z

@akremin thanks for the comments. I addressed your two comments that needed updates; please re-review.

In the meantime I also added logic to try to prevent a major imbalance of pending tilenight/ztile vs. flat jobs. In the end I'm not actually sure that helps our situation today of a backlog of GPU jobs preventing us from submitting more CPU jobs because either way we have to run those GPU jobs (flat or tilenight or ztile). Thoughts?

akremin · 2024-08-27T23:01:47Z

I see the additions for job balancing but I don't see either of the requested changes. Is it possible you forgot to push?

As we discussed earlier the load balancing isn't a bad thing since it allows us to complete earlier nights before moving on to processing flats+science exposures on later nights. But all gpu jobs need to run at some point for all nights to be complete, so it won't change the amount of time to completion of the production. From a human perspective though it is nice to complete nights before moving on in a depth-first strategy, so I like this change.

I might even advocate for 10x instead of 20x. We have 12 flats in a night and ~10-30 tilenights. So 20 is a major imbalance.

sbailey · 2024-08-27T23:11:43Z

Missing changes pushed. I left the imbalance checking code, but also left it at only correcting the imbalance if it gets way out of whack with a factor of 20. The original reason of wanting to get the flats processed sooner is that they unblock the nightlyflat, which then unblocks the science tiles making them eligible to run in the regular queue even without a reservation. Running an equivalent number of tilenight/ztile jobs doesn't unblock as much.

akremin

Thanks the changes look good and my requested changes have been implemented.

Stephen Bailey added 2 commits August 26, 2024 16:50

desi_use_reservation for prods

e5d683a

irony: crash-inducing typo in log message for different crash condition

95d079b

sbailey added the pipeline label Aug 27, 2024

sbailey requested a review from akremin August 27, 2024 00:09

sort to prioritize earlier nights

20529f5

akremin reviewed Aug 27, 2024

View reviewed changes

limit flat vs. tilenight,ztile imbalance

a8b0479

robustness checks

61c000e

akremin approved these changes Aug 27, 2024

View reviewed changes

akremin merged commit 9af08d6 into main Aug 27, 2024
26 checks passed

akremin deleted the use_reservation branch August 27, 2024 23:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

desi_use_reservation for prods #2346

desi_use_reservation for prods #2346

sbailey commented Aug 27, 2024

akremin left a comment

akremin Aug 27, 2024

sbailey commented Aug 27, 2024

akremin commented Aug 27, 2024

sbailey commented Aug 27, 2024

akremin left a comment

desi_use_reservation for prods #2346

desi_use_reservation for prods #2346

Conversation

sbailey commented Aug 27, 2024

Example usage

Dry run, run once and exit

Update reservation in a loop

akremin left a comment

Choose a reason for hiding this comment

akremin Aug 27, 2024

Choose a reason for hiding this comment

sbailey commented Aug 27, 2024

akremin commented Aug 27, 2024

sbailey commented Aug 27, 2024

akremin left a comment

Choose a reason for hiding this comment