-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modernize submit_prod #2322
Modernize submit_prod #2322
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good; I am trusting your real-world testing on this. I had forgotten about the existence of desi_run_prod (now desi_submit_prod). We didn't use that for Jura but let's use it for K1 and Kibo.
Minor comments inline; none are blocking factors.
I've rerun my very simple "example.yaml" 2-night prod in |
I addressed the TODO items in submit_prod.py for the structural changes from PR #2321, and re-tested using a version of this branch rebased with main. The doc test failures are due to remaining structural issues in the non-rebased branch of this PR, but should be fine once merged with main. After other tests pass I will merge this and then re-retest main. |
Overview
This PR modernizes the
submit_prod.py
script and renames the desi_run_prod todesi_submit_prod
to make the names consistent. The code accepts a limited number of parameters via a yaml file. An example is provided in the tests section of this PR.Implementation Details
Most things should be specified via the yaml file, including the nights to process (either as a first and last night or as a list of nights), the name of the SPECPROD, the queue and reservation to submit to, surveys to submit, and what types of redshift jobs to submit. Other arguments to
desi_proc_night
are not currently exposed but could be in the future with small changes to the code to expect those parameters in the yaml file.If given a first and last night the code loads all available exposure_tables and identifies all nights between the two dates (inclusive on both ends) that have valid science observations. The "THRUNIGHT" for tile completeness checking is defined as the last night unless specified in the yaml file. The first night can be left blank and defaults to 20201214, the first night of Fuji and Iron. Redshifts jobs default to submitting cumulative jobs. The code doesn't currently allow the user to request cumulatives on every night of observations (like we do in daily), it only allows cumulatives to be run on the last night of observation (as is done in productions). This could be allowed in the future with minimal changes to
submit_prod.py
.Each night is submitted in chronological order. If a processing table already exists, that night is assumed to be complete and is skipped. If more than
queue_threshold
jobs are in the user's queue then the script stops submitting nights and exits. Because it skips over nights with processing tables this allows for repeated running of the script to submit all nights even with a queue limit. The defaultqueue_threshold
is 4800.The logs for each night are redirected to $DESI_SPECTRO_REDUX/$SPECPROD/run/logs/.log.
If a "sentinel" file exists in
$DESI_SPECTRO_REDUX/$SPECPROD/run/sentinel.txt
, the code doesn't check for processing tables and exists assuming the production is done. This is only written by the script if all nights have been submitted.The selection of nights was factored out into a standalone function that can take the yaml file or a path to the yaml file. This is useful for other scripts to be able to derive the list of nights from the yaml file.
Tests
Two test runs were performed, one of a few specified nights in:
/global/cfs/cdirs/desi/users/kremin/PRs/submit_prod/
The yaml file looks like:
>cat example.yaml
And a rerun of all of jura (in dry_run_level=2):
/global/cfs/cdirs/desi/users/kremin/PRs/submit_jura/
The yaml file looks like:
> cat jura.yaml
note that THRU_NIGHT, Z_SUBMIT_TYPES, and QUEUE are not strictly necessary as they are set to defaults, but are given to be explicit.
Neither produced errors and the outputs look sensible. The tables in
submit_prod
are the same length as in jura. I will do a more in-depth check of these output tables and provide an update in this PR.