Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move has_calibration from workflow_execution_set to data_generation_set #2235

Merged
merged 18 commits into from
Nov 1, 2024

Conversation

brynnz22
Copy link
Contributor

@brynnz22 brynnz22 commented Oct 30, 2024

This PR addresses issue https://github.com/microbiomedata/nmdc-schema/issues/2139#issuecomment-2445511119.

It moves the has_calibration slot from any workflow_exectuion doc that has it and moves it to its corresponding data_generation_doc A lot of the has_calibrations in the workflow exectuion docs have just false or False. This migrator removes those entirely since the calibration does not exist.

It also fixes the schema so the range of has_calibration is only CalibraitonInformation (removes any_of and string range)

It also moves the CalibrationInformation class and slots to the nmdc.yaml file instead of the workflow_execution.yaml file.

tagging @mslarae13 for FYI

PR Information

What type of PR is this? (check all applicable)

  • Refactor
  • Feature
  • Bug Fix
  • Optimization
  • Documentation
  • Schema change: Structure and content
    • created, updated, or deleted a class, slot, or enum
    • changed whether a slot is multivalued
    • changed the way a slot is assigned to a class
    • changed the permissible_values of an enum
    • etc.
  • Schema change: Cleanup and preparation
    • updated the description of a class, slot, or enum
    • updated the mappings of a class, slot, or enum to an ontology
    • added an enum for future use (it is not in the range of any slot)
    • etc.

Could this schema change make it so any valid data becomes invalid?

This is a question about what the schema allows. It is not a question about what happens to exists in the NMDC database right now.

Example: If, in this PR branch, you renamed a slot from foo to foo_bar, the answer to this question would be "yes," even if nothing in the NMDC database currently uses the foo slot.

More examples: slot or class name changes, changes to a slot's multivalued state, changes to a slot's range (e.g. string to integer), changes to slot assignments to classes, changes to an enum's permissible_values

  • Yes (A migrator is required)
  • No
  • I need help determining this

If you answered "Yes", does this PR branch include that migrator?

  • Yes
  • No, this PR is incomplete and I need help writing the migrator

Does this PR have any downstream implications?

Examples: any change here that requires a change to workflows, workflow automation, the Mongo-to-Postgres ingest process, Jupyter notebooks, the Runtime, etc.

  • Yes (Explain below)
  • No

Copy link

github-actions bot commented Oct 30, 2024

PR Preview Action v1.4.8
🚀 Deployed preview to https://microbiomedata.github.io/nmdc-schema/pr-preview/pr-2235/
on branch gh-pages at 2024-10-31 22:25 UTC

@brynnz22
Copy link
Contributor Author

brynnz22 commented Oct 30, 2024

@eecavanna @kheal I went ahead and just set up a partial migrator. Not sure if you want to do it this way @eecavanna, but let us know. Since @kheal has two PRs with migrations lined up (I know of this one: #2203 (comment)). After hers get merged in, I can merge main into this branch and move her migrators to the partial here.

Copy link
Contributor

@kheal kheal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One tiny small change.

I pulled the branch and tested the migrator, looks great.

src/schema/nmdc.yaml Outdated Show resolved Hide resolved
@eecavanna
Copy link
Collaborator

eecavanna commented Oct 31, 2024

You can run the black code formatter without installing it, by copy/pasting the Python code into the left side of this web page — https://black.vercel.app/ — and then copying the formatted result from the right side.

I recommend doing that for the migrator as I noticed different closing parentheses and function arguments were indented by different relative amounts. Example (closing parentheses):

                    data_gen_doc = self.adapter.get_document_having_value_in_field(
                        collection_name="data_generation_set", field_name="id", value=workflow_execution_doc["was_informed_by"]
                        )

                    calibration_doc = self.adapter.get_document_having_value_in_field(
                        collection_name="calibration_set", field_name="calibration_object", value=has_calibration_data_obj
                    )

@eecavanna
Copy link
Collaborator

eecavanna commented Oct 31, 2024

Doing this as a partial migrator works for me — thanks for taking the lead on making that!

For the partial migrator here in this PR, I'd recommend putting something more granular/specific in the _to_version attribute (e.g. PR2235 or 11.1.0.part_1)

     _from_version = "11.0.3"
-    _to_version = "11.1.0"
+    _to_version = "11.1.0.part_1"

@eecavanna
Copy link
Collaborator

eecavanna commented Oct 31, 2024

It also fixes the schema so the range of has_calibration is only CalibraitonInformation (removes any_of and string range)

Woohoo! This will help resolve microbiomedata/refscan#26

brynnz22 and others added 2 commits October 31, 2024 15:16
…migrator_from_11_0_3_to_11_1_0_part_1.py


Thanks!!

Co-authored-by: eecavanna <[email protected]>
@brynnz22
Copy link
Contributor Author

Thanks @eecavanna I did your suggestions. The black.vercel does not work for me. I just reformatted, but I don't use a formatter because it was making a whole bunch of changes to existing code and showing in the git diffs. So I stopped using those. But I can try to activate them when I write from scratch and then deactivate moving forward.

Copy link
Collaborator

@eecavanna eecavanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The migration-related parts of this look good to me. I particularly like the thorough doctests. Thanks for addressing all my comments. I'm comfortable with this being merged in. I will run the code through black some time in the future.

@turbomam
Copy link
Member

turbomam commented Nov 1, 2024

Thanks, @brynnz22 ! How did you make you decisions about moving schema elements from nmdc.yaml to workflow_execution_activity.yaml? I don't see anything wrong about it, but I would like to hear your thought process (or similar thoughts from others). Deciding how many modules we should have and what should go in each (from a thematic and technical perspective) is unfinished work in my perspective.

@turbomam turbomam merged commit 51ac45e into main Nov 1, 2024
3 checks passed
@turbomam turbomam deleted the calibration-change branch November 1, 2024 14:36
@brynnz22
Copy link
Contributor Author

brynnz22 commented Nov 1, 2024

Hey @turbomam this was @kheal 's idea. I think it made more sense to be there since calibration is more general. Calibration is no longer on any Workflow executions so it did not make sense to be there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants