Readme Update

umccr · Sep 18, 2024 · 9d8a901 · 9d8a901
1 parent bcf2887
commit 9d8a901
Show file tree

Hide file tree

Showing 3 changed files with 33 additions and 24 deletions.
diff --git a/lib/workload/stateless/stacks/metadata-manager/README.md b/lib/workload/stateless/stacks/metadata-manager/README.md
@@ -37,19 +37,27 @@ to the URL: `.../library?libraryId=LIB001`
 
 ## Schema
 
-This is the current (WIP) schema that reflects the current implementation.
+This is the current (WIP) schema that reflects the current implementation. The schema is based on the
+draft [draw.io in Google Drive](https://app.diagrams.net/#G10ryWSXORMo7Qj7ghvj37LHYqmMm4hXW-#%7B%22pageId%22%3A%22vfe626awnvWGlhOGvxTV%22%7D)
+.
 
 ![schema](docs/schema.drawio.svg)
 
 To modify the diagram, open the `docs/schema.drawio.svg` with [diagrams.net](https://app.diagrams.net/?src=about).
 
-`orcabus_id` is the unique identifier for each record in the database. It is generated by the application where the
-first 3 characters are the model prefix followed by [ULID](https://pypi.org/project/ulid-py/) separated by a dot (.).
-The prefix is as follows:
+The `orcabus_id` serves as the unique identifier for each record in the database. It is generated by the application
+using the [ULID](https://pypi.org/project/ulid-py/) library. When a record is accessed via the API, the `orcabus_id`
+is presented with a prefix consisting of three characters followed by a dot (.). The specific prefix varies depending
+on the model of the record.
 
-- Library model are `lib`
-- Specimen model are `spc`
-- Subject model are `sbj`
+| Model      | Prefix |
+|------------|--------|
+| Subject    | `sbj.` | 
+| Sample     | `smp.` | 
+| Library    | `lib.` | 
+| Individual | `idv.` |
+| Contact    | `ctc.` | 
+| Project    | `prj.` |
 
 ## How things work
 
@@ -59,22 +67,22 @@ In the near future, we might introduce different ways to load data into the appl
 loading data
 from the Google tracking sheet and mapping it to its respective model as follows.
 
-| Sheet Header      | Table      | Field Name           |
-|-------------------|------------|----------------------|
-| SubjectID         | `Subject`  | lab_subject_id       |
-| ExternalSubjectID | `Subject`  | subject_id  |
-| SampleID          | `Specimen` | sample_id            |
-| ExternalSampleID  | `Specimen` | external_specimen_id |
-| Source            | `Specimen` | source               |
-| LibraryID         | `Library`  | library_id           |
-| Phenotype         | `Library`  | phenotype            |
-| Workflow          | `Library`  | workflow             |
-| Quality           | `Library`  | quality              |
-| Type              | `Library`  | type                 |
-| Coverage (X)      | `Library`  | coverage             |
-| Assay             | `Library`  | assay                |
-| ProjectOwner      | `Library`  | project_owner        |
-| ProjectName       | `Library`  | project_name         |
+| Sheet Header      | Table        | Field Name         |
+|-------------------|--------------|--------------------|
+| SubjectID         | `Individual` | individual_id      |
+| ExternalSubjectID | `Subject`    | subject_id         |
+| SampleID          | `Sample`     | sample_id          |
+| ExternalSampleID  | `Sample`     | external_sample_id |
+| Source            | `Sample`     | source             |
+| LibraryID         | `Library`    | library_id         |
+| Phenotype         | `Library`    | phenotype          |
+| Workflow          | `Library`    | workflow           |
+| Quality           | `Library`    | quality            |
+| Type              | `Library`    | type               |
+| Coverage (X)      | `Library`    | coverage           |
+| Assay             | `Library`    | assay              |
+| ProjectName       | `Project`    | project_id         |
+| ProjectOwner      | `Contact`    | contact_id         |
 
 Some important notes of the sync:
 

diff --git a/lib/workload/stateless/stacks/metadata-manager/docs/schema.drawio.svg b/lib/workload/stateless/stacks/metadata-manager/docs/schema.drawio.svg
diff --git a/lib/workload/stateless/stacks/metadata-manager/proc/service/tracking_sheet_srv.py b/lib/workload/stateless/stacks/metadata-manager/proc/service/tracking_sheet_srv.py
@@ -76,6 +76,7 @@ def persist_lab_metadata(df: pd.DataFrame, sheet_year: str):
     # The data frame is to be the source of truth for the particular year
     # So we need to remove db records which are not in the data frame
     # Only doing this for library records and (dangling) sample/subject may be removed on a separate process
+    # Note: We do not remove many-to-many relationships if current df has changed
 
     # For the library_id we need craft the library_id prefix to match the year
     # E.g. year 2024, library_id prefix is 'L24' as what the Lab tracking sheet convention