Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modernize submit_prod #2322

Merged
merged 10 commits into from
Aug 16, 2024
Merged

Modernize submit_prod #2322

merged 10 commits into from
Aug 16, 2024

Conversation

akremin
Copy link
Member

@akremin akremin commented Aug 15, 2024

Overview

This PR modernizes the submit_prod.py script and renames the desi_run_prod to desi_submit_prod to make the names consistent. The code accepts a limited number of parameters via a yaml file. An example is provided in the tests section of this PR.

Implementation Details

Most things should be specified via the yaml file, including the nights to process (either as a first and last night or as a list of nights), the name of the SPECPROD, the queue and reservation to submit to, surveys to submit, and what types of redshift jobs to submit. Other arguments to desi_proc_night are not currently exposed but could be in the future with small changes to the code to expect those parameters in the yaml file.

If given a first and last night the code loads all available exposure_tables and identifies all nights between the two dates (inclusive on both ends) that have valid science observations. The "THRUNIGHT" for tile completeness checking is defined as the last night unless specified in the yaml file. The first night can be left blank and defaults to 20201214, the first night of Fuji and Iron. Redshifts jobs default to submitting cumulative jobs. The code doesn't currently allow the user to request cumulatives on every night of observations (like we do in daily), it only allows cumulatives to be run on the last night of observation (as is done in productions). This could be allowed in the future with minimal changes to submit_prod.py.

Each night is submitted in chronological order. If a processing table already exists, that night is assumed to be complete and is skipped. If more than queue_threshold jobs are in the user's queue then the script stops submitting nights and exits. Because it skips over nights with processing tables this allows for repeated running of the script to submit all nights even with a queue limit. The default queue_threshold is 4800.

The logs for each night are redirected to $DESI_SPECTRO_REDUX/$SPECPROD/run/logs/.log.

If a "sentinel" file exists in $DESI_SPECTRO_REDUX/$SPECPROD/run/sentinel.txt, the code doesn't check for processing tables and exists assuming the production is done. This is only written by the script if all nights have been submitted.

The selection of nights was factored out into a standalone function that can take the yaml file or a path to the yaml file. This is useful for other scripts to be able to derive the list of nights from the yaml file.

Tests

Two test runs were performed, one of a few specified nights in:
/global/cfs/cdirs/desi/users/kremin/PRs/submit_prod/

The yaml file looks like:

>cat example.yaml

# EXAMPLE Test Prod


SPECPROD: 'submit_prod'
NIGHTS: [20231002, 20231203]
#FIRST_NIGHT: 20201214
#LAST_NIGHT: 20240409
THRU_NIGHT: 20240409
#Z_SUBMIT_TYPES: ['cumulative', 'pernight']
#SURVEYS: ['sv3', 'main']
QUEUE: 'regular'
#RESERVATION: 'K1res'

And a rerun of all of jura (in dry_run_level=2):
/global/cfs/cdirs/desi/users/kremin/PRs/submit_jura/

The yaml file looks like:

> cat jura.yaml

# Test Reproducibility of jura


SPECPROD: 'submit_jura'
FIRST_NIGHT: 20201214
LAST_NIGHT: 20240409
THRU_NIGHT: 20240409
Z_SUBMIT_TYPES: cumulative
QUEUE: 'regular'

note that THRU_NIGHT, Z_SUBMIT_TYPES, and QUEUE are not strictly necessary as they are set to defaults, but are given to be explicit.

Neither produced errors and the outputs look sensible. The tables in submit_prod are the same length as in jura. I will do a more in-depth check of these output tables and provide an update in this PR.

@akremin akremin requested a review from sbailey August 15, 2024 18:38
Copy link
Contributor

@sbailey sbailey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good; I am trusting your real-world testing on this. I had forgotten about the existence of desi_run_prod (now desi_submit_prod). We didn't use that for Jura but let's use it for K1 and Kibo.

Minor comments inline; none are blocking factors.

bin/desi_submit_prod Outdated Show resolved Hide resolved
py/desispec/workflow/queue.py Show resolved Hide resolved
bin/desi_submit_prod Outdated Show resolved Hide resolved
py/desispec/scripts/submit_prod.py Show resolved Hide resolved
py/desispec/scripts/submit_prod.py Show resolved Hide resolved
py/desispec/scripts/submit_prod.py Show resolved Hide resolved
@akremin
Copy link
Member Author

akremin commented Aug 16, 2024

I've rerun my very simple "example.yaml" 2-night prod in dry-run-level=2 and it succeeded and wrote the new prod_submission_complete.txt. This is ready for re-review, but please don't merge until after the crossnight redshift branch as there is one change that needs to be made.

@sbailey
Copy link
Contributor

sbailey commented Aug 16, 2024

I addressed the TODO items in submit_prod.py for the structural changes from PR #2321, and re-tested using a version of this branch rebased with main. The doc test failures are due to remaining structural issues in the non-rebased branch of this PR, but should be fine once merged with main. After other tests pass I will merge this and then re-retest main.

@sbailey sbailey merged commit 8f11717 into main Aug 16, 2024
25 of 26 checks passed
@sbailey sbailey deleted the submit_prod branch August 16, 2024 21:19
@sbailey sbailey mentioned this pull request Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants