diff --git a/lib/workload/stateless/stacks/metadata-manager/README.md b/lib/workload/stateless/stacks/metadata-manager/README.md index 2143035ea..68f56b99b 100644 --- a/lib/workload/stateless/stacks/metadata-manager/README.md +++ b/lib/workload/stateless/stacks/metadata-manager/README.md @@ -37,19 +37,27 @@ to the URL: `.../library?libraryId=LIB001` ## Schema -This is the current (WIP) schema that reflects the current implementation. +This is the current (WIP) schema that reflects the current implementation. The schema is based on the +draft [draw.io in Google Drive](https://app.diagrams.net/#G10ryWSXORMo7Qj7ghvj37LHYqmMm4hXW-#%7B%22pageId%22%3A%22vfe626awnvWGlhOGvxTV%22%7D) +. ![schema](docs/schema.drawio.svg) To modify the diagram, open the `docs/schema.drawio.svg` with [diagrams.net](https://app.diagrams.net/?src=about). -`orcabus_id` is the unique identifier for each record in the database. It is generated by the application where the -first 3 characters are the model prefix followed by [ULID](https://pypi.org/project/ulid-py/) separated by a dot (.). -The prefix is as follows: +The `orcabus_id` serves as the unique identifier for each record in the database. It is generated by the application +using the [ULID](https://pypi.org/project/ulid-py/) library. When a record is accessed via the API, the `orcabus_id` +is presented with a prefix consisting of three characters followed by a dot (.). The specific prefix varies depending +on the model of the record. -- Library model are `lib` -- Specimen model are `spc` -- Subject model are `sbj` +| Model | Prefix | +|------------|--------| +| Subject | `sbj.` | +| Sample | `smp.` | +| Library | `lib.` | +| Individual | `idv.` | +| Contact | `ctc.` | +| Project | `prj.` | ## How things work @@ -59,22 +67,22 @@ In the near future, we might introduce different ways to load data into the appl loading data from the Google tracking sheet and mapping it to its respective model as follows. -| Sheet Header | Table | Field Name | -|-------------------|------------|----------------------| -| SubjectID | `Subject` | lab_subject_id | -| ExternalSubjectID | `Subject` | subject_id | -| SampleID | `Specimen` | sample_id | -| ExternalSampleID | `Specimen` | external_specimen_id | -| Source | `Specimen` | source | -| LibraryID | `Library` | library_id | -| Phenotype | `Library` | phenotype | -| Workflow | `Library` | workflow | -| Quality | `Library` | quality | -| Type | `Library` | type | -| Coverage (X) | `Library` | coverage | -| Assay | `Library` | assay | -| ProjectOwner | `Library` | project_owner | -| ProjectName | `Library` | project_name | +| Sheet Header | Table | Field Name | +|-------------------|--------------|--------------------| +| SubjectID | `Individual` | individual_id | +| ExternalSubjectID | `Subject` | subject_id | +| SampleID | `Sample` | sample_id | +| ExternalSampleID | `Sample` | external_sample_id | +| Source | `Sample` | source | +| LibraryID | `Library` | library_id | +| Phenotype | `Library` | phenotype | +| Workflow | `Library` | workflow | +| Quality | `Library` | quality | +| Type | `Library` | type | +| Coverage (X) | `Library` | coverage | +| Assay | `Library` | assay | +| ProjectName | `Project` | project_id | +| ProjectOwner | `Contact` | contact_id | Some important notes of the sync: diff --git a/lib/workload/stateless/stacks/metadata-manager/docs/schema.drawio.svg b/lib/workload/stateless/stacks/metadata-manager/docs/schema.drawio.svg index 0984839ec..5ca6d206e 100644 --- a/lib/workload/stateless/stacks/metadata-manager/docs/schema.drawio.svg +++ b/lib/workload/stateless/stacks/metadata-manager/docs/schema.drawio.svg @@ -1,4 +1,4 @@ -LibraryPKorcabus_idlibrary_idphenotypeworkflow qualitytypeassaycoverageSubjectPKorcabus_idsubject_idsubject_idSamplePKorcabus_idsample_idexternal_sample_idProjectOwnerPKproject_ownerProjectNamePKproject_name \ No newline at end of file +LibraryPKorcabus_idlibrary_idphenotypeworkflow qualitytypeassaycoverageSubjectPKorcabus_idsubject_idSamplePKorcabus_idsample_idexternal_sample_idsourceProjectPKorcabus_idproject_idnamedescriptionContactPKorcabus_idcontact_idnamedescriptionemailindividualPKorcabus_idindividual_idsource \ No newline at end of file diff --git a/lib/workload/stateless/stacks/metadata-manager/proc/service/tracking_sheet_srv.py b/lib/workload/stateless/stacks/metadata-manager/proc/service/tracking_sheet_srv.py index 8913c4e74..aad78f4f8 100644 --- a/lib/workload/stateless/stacks/metadata-manager/proc/service/tracking_sheet_srv.py +++ b/lib/workload/stateless/stacks/metadata-manager/proc/service/tracking_sheet_srv.py @@ -76,6 +76,7 @@ def persist_lab_metadata(df: pd.DataFrame, sheet_year: str): # The data frame is to be the source of truth for the particular year # So we need to remove db records which are not in the data frame # Only doing this for library records and (dangling) sample/subject may be removed on a separate process + # Note: We do not remove many-to-many relationships if current df has changed # For the library_id we need craft the library_id prefix to match the year # E.g. year 2024, library_id prefix is 'L24' as what the Lab tracking sheet convention