-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-48178: Add documentation for DRP schemas. #299
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,20 @@ | ||||||
Data Release Production Schemas | ||||||
=============================== | ||||||
|
||||||
The Data Release Production (DRP) table schemas describe the `Object`, `Source`, `CcdVisit`, and `Visit` tables produced by either a regularly-tested "live" pipeline or a historical pipeline used in an important production. | ||||||
In the future all data release tables (`ForcedSource`, `DIASource`, `DIAObject`, etc.) will be included as well. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
When new major data release productions occur (e.g. a new Data Preview or Data Release), one of the live schemas is typically copied into a new file and adjusted to account for any differences specific to that production. | ||||||
|
||||||
In particular: | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this should be a complete sentence, but the exact verbiage I will leave to you. :) |
||||||
|
||||||
- `hsc.yaml` maps to the live pipelines as configured for the Subaru Hypersuprime-Cam instrument and its Strategic Survey Program, one of the primary precursor datasets used for LSST development. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
The `ci_hsc_gen3` package (run nightly, as well as optionally prior to other pipeline code merges) in Jenkins tests that the schemas in this file match the Parquet datasets produced by the pipeline definition at [`drp_pipe/pipelines/HSC/DRP-ci_hsc.yaml`](https://github.com/lsst/drp_pipe/blob/main/pipelines/HSC/DRP-ci_hsc.yaml). | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
The other HSC pipelines in `drp_pipe` should produce files with the same schemas as well, because they share almost all configuration with the `ci_hsc` pipeline. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
- `imsim.yaml` similarly maps to the live pipelines as configured for the LSST ImSim simulator, in particular as run for the LSST Dark Energy Science Collaboration's "Data Challenge 2" project (DESC DC2). | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
This is the same simulated dataset used for LSST's Data Preview 0.1 and 0.2, but the pipelines have evolved considerably since those productions. | ||||||
The `ci_imsim` package (run nightly, as well as optionally prior to other pipeline code merges) in Jenkins tests that the schemas in this file match the Parquet datasets produced by the pipeline definition at [`drp_pipe/pipelines/LSSTCam-imSim/DRP-ci_imsim.yaml`](https://github.com/lsst/drp_pipe/blob/main/pipelines/LSSTCam-imSim/DRP-ci_imsim.yaml). | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
The other `LSSTCam-imSim` pipelines in `drp_pipe` should produce files with the same schemas as well, because they share almost all configuration with the `ci_imsim` pipeline. | ||||||
|
||||||
These files must be updated whenever the final pipeline output tables change, but it is expected that these changes will usually be minor, since they are not formally change-controlled. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Can you clarify what this sentence means? I'm not sure I understand how minor changes relates to "not formally change-controlled." |
||||||
The intent is that change control bodies will instead be involved when these live schemas are copied for new productions that will be released to science users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have typically just used "schema" instead of "table schema" within the project docs. (This is just a minor stylistic suggestion. Feel free to ignore if you prefer the original wording.)