You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The queue failure script appears to have a subtle bug when doing cross-night dependency tracking. This came about in daily during cleanup activities and my hypothesis is that the tile was also problematic on another night and was rerun, leading to a new internal id for the job, which is not what is recorded in this processing table. That was known to be a shortcoming of the cross-night system.
Todo is to look into this more and either fix a bug if present or think about ways to mitigate the shortcoming listed above if that is the issue.
INFO:processing.py:1273:recursive_submit_failed: Identified row 241114049 as needing resubmission.
INFO:processing.py:1274:recursive_submit_failed: 241114049: Expid(s): [262809] Job: cumulative
INFO:processing.py:1288:recursive_submit_failed: Internal ID: 241017044 not in id_to_row_map. This is expected since it's from another day.
INFO:proctable.py:647:read_minimal_full_proctab_cols: Loading the following processing tables for full processing table cache from directory: /global/cfs/cdirs/desi/spectro/redux/daily/processing_tables, filenames: ['processing_table_daily-20241017.csv']
INFO:proctable.py:680:read_minimal_full_proctab_cols: Caching processing table rows for full cache
INFO:proctable.py:712:update_full_ptab_cache: Replacing all current entries in processing table cache for nights [20241017]
INFO:queue.py:495:update_from_queue: qtable not provided, querying Slurm using ptab's LATEST_QID set
INFO:queue.py:502:update_from_queue: Querying Slurm for 0 QIDs from table of length 41.
INFO:queue.py:507:update_from_queue: No QIDs left to query. Returning the original table.
INFO:proctable.py:712:update_full_ptab_cache: Replacing all current entries in processing table cache for nights [20241017]
INFO:processing.py:1288:recursive_submit_failed: Internal ID: 241110049 not in id_to_row_map. This is expected since it's from another day.
INFO:proctable.py:647:read_minimal_full_proctab_cols: Loading the following processing tables for full processing table cache from directory: /global/cfs/cdirs/desi/spectro/redux/daily/processing_tables, filenames: ['processing_table_daily-20241110.csv']
INFO:proctable.py:680:read_minimal_full_proctab_cols: Caching processing table rows for full cache
INFO:proctable.py:712:update_full_ptab_cache: Replacing all current entries in processing table cache for nights [20241110]
INFO:queue.py:495:update_from_queue: qtable not provided, querying Slurm using ptab's LATEST_QID set
INFO:queue.py:502:update_from_queue: Querying Slurm for 13 QIDs from table of length 94.
INFO:queue.py:331:queue_info_from_qids: Querying Slurm with the following: sacct -X --parsable2 --delimiter=, --format=jobid,jobname,partition,submit,eligible,start,end,elapsed,state,exitcode -j 33078756,33078758,33078759,33078760,33078761,33078762,33078763,33078764,33078765,33078767,33078768,33078770,33078771
INFO:queue.py:511:update_from_queue: Slurm returned information on 13 jobs out of 94 jobs in the ptab. Updating those now.
Traceback (most recent call last):
File "/global/common/software/desi/perlmutter/desiconda/20240425-2.2.0/code/desispec/main/bin/desi_proc_night", line 141, in <module>
proc_night(**args.__dict__)
File "/global/common/software/desi/perlmutter/desiconda/20240425-2.2.0/code/desispec/main/py/desispec/scripts/proc_night.py", line 376, in proc_night
ptable, nsubmits = update_and_recursively_submit(ptable,
File "/global/common/software/desi/perlmutter/desiconda/20240425-2.2.0/code/desispec/main/py/desispec/workflow/processing.py", line 1224, in update_and_recursively_submit
proc_table, submits = recursive_submit_failed(rown, proc_table, submits,
File "/global/common/software/desi/perlmutter/desiconda/20240425-2.2.0/code/desispec/main/py/desispec/workflow/processing.py", line 1300, in recursive_submit_failed
entry = reftab[reftab['INTID'] == idep][0]
File "/global/common/software/desi/perlmutter/desiconda/20240425-2.2.0/conda/lib/python3.10/site-packages/astropy/table/table.py", line 2064, in __getitem__
return self.Row(self, item)
File "/global/common/software/desi/perlmutter/desiconda/20240425-2.2.0/conda/lib/python3.10/site-packages/astropy/table/row.py", line 41, in __init__
raise IndexError(
IndexError: index 0 out of range for table with length 0
The text was updated successfully, but these errors were encountered:
The queue failure script appears to have a subtle bug when doing cross-night dependency tracking. This came about in daily during cleanup activities and my hypothesis is that the tile was also problematic on another night and was rerun, leading to a new internal id for the job, which is not what is recorded in this processing table. That was known to be a shortcoming of the cross-night system.
Todo is to look into this more and either fix a bug if present or think about ways to mitigate the shortcoming listed above if that is the issue.
The text was updated successfully, but these errors were encountered: