Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script to update processing table column layout #2376

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

akremin
Copy link
Member

@akremin akremin commented Sep 19, 2024

This adds a command line script, desi_reformat_processing_tables, to update the processing table column layout. As of now, there have only been two versions of the processing tables. The old tables used CAMWORD instead of PROCCAMWORD used in the modern processing tables to represent the cameras used in the processing. The old tables also lack BADAMPS, LASTSTEP, and EXPFLAG.

The script takes a list of nights, range of nights, or "all" to run on all processing tables.

For each identified table, the script will rename CAMWORD to PROCCAMWORD if found. If BADAMPS, LASTSTEP, EXPFLAG, or any other column defined in desispec.workflow.proctable.get_processing_table_column_defs() is missing, the code will add the column where the value of each row is the default for that column type. If other columns are found in the existing table that aren't in the list of current columns given in desispec.workflow.proctable.get_processing_table_column_defs(), the column is removed.

Tests

I ran this at NERSC on a copy of the daily processing tables. The processing tables and a notebook verifying that the output files have all the expected columns can be found here:
/global/cfs/cdirs/desi/users/kremin/PRs/modernize_proctables/daily/

The old files that are replaced have a *.replaced-{TIMESTAMP}.* added to the file name. The new files have the expected name for a processing table of a specprod on a given night.

@coveralls
Copy link

coveralls commented Sep 19, 2024

Coverage Status

coverage: 30.091% (-0.07%) from 30.164%
when pulling 43cd53f on modernize_proctables
into 97b1748 on main.

Copy link
Contributor

@sbailey sbailey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put some cosmetic requests inline.

From desihub/desisurveyops#209, I thought that night 20210328 was one of the nights that needed to be updated, but when I tried to run a test on that night it says:

[login30 daily] desi_reformat_processing_tables --dry-run -n 20210328
Nights:  [20210328]
INFO:tableio.py:323:load_table: Found table: /global/cfs/cdirs/desi/users/sjbailey/spectro/redux/daily/processing_tables/processing_table_daily-20210328.csv
/global/cfs/cdirs/desi/users/sjbailey/spectro/redux/daily/processing_tables/processing_table_daily-20210328.csv has all of the expected columns, not updating this table.
Exposure table regenerations complete

Please check and/or post example nights that do need to be updated.

For the record: normally I'd be itching to ask for unit tests, but since I expect this to be a one-off script to update some tables and then be done with it, that seems less critical.

bin/desi_reformat_processing_tables Outdated Show resolved Hide resolved
py/desispec/scripts/reformat_proctables.py Outdated Show resolved Hide resolved
py/desispec/scripts/reformat_proctables.py Outdated Show resolved Hide resolved
orig_filetype: str. The file extension (without the '.') of the exposure
tables.
out_filetype: str. The file extension for the outputted exposure tables
(without the '.').
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that you've already implemented it, this may not matter, but what's the motivation for orig_filetype and out_filetype? Do we have any processing tables that aren't *.csv?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adapted the parser from desi_reformat_exptables which is much older. In the past exposure_tables and processing_tables were less mature and the file type was specified (with a default of csv). At this point, they are fundamental enough that changing the filetype is unlikely, but I kept it since I didn't see harm in allowing extra functionality.

@akremin
Copy link
Member Author

akremin commented Sep 20, 2024

I have made all of the requested changes. Unfortunately, now there is an issue with a docstring somewhere. I will resolve that tomorrow morning.

All nights on or before 20210208 require this reformatting. I will also investigate whether 20210328 needs reformatting that this is missing.

I also failed to mention that I ran tests at nersc on a copy of the daily processing tables. The processing tables and a notebook verifying that the output files have all the expected columns can be found here:
/global/cfs/cdirs/desi/users/kremin/PRs/modernize_proctables/daily/

The old files that are replaced have a *.replaced-{TIMESTAMP}.* added to the file name. The new files have the expected name for a processing table of a specprod on a given night.

@akremin
Copy link
Member Author

akremin commented Sep 20, 2024

@sbailey I have fixed the docstring issue. I also double-checked why 20210328 did not need updating. The crash on that night is unrelated to the processing tables format. The issue for that night is in updating results from the queue. The solution for 20210328 might require a further code change, but I'd like #2351 and #2376 merged first if you're happy with both. That way we can get all of the tables in the same format and I can adapt the queue-checking code off of #2351 which also modifies that code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants