Skip to content

Commit

Permalink
Readme Update
Browse files Browse the repository at this point in the history
  • Loading branch information
williamputraintan committed Sep 18, 2024
1 parent bcf2887 commit 9d8a901
Show file tree
Hide file tree
Showing 3 changed files with 33 additions and 24 deletions.
54 changes: 31 additions & 23 deletions lib/workload/stateless/stacks/metadata-manager/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,19 +37,27 @@ to the URL: `.../library?libraryId=LIB001`

## Schema

This is the current (WIP) schema that reflects the current implementation.
This is the current (WIP) schema that reflects the current implementation. The schema is based on the
draft [draw.io in Google Drive](https://app.diagrams.net/#G10ryWSXORMo7Qj7ghvj37LHYqmMm4hXW-#%7B%22pageId%22%3A%22vfe626awnvWGlhOGvxTV%22%7D)
.

![schema](docs/schema.drawio.svg)

To modify the diagram, open the `docs/schema.drawio.svg` with [diagrams.net](https://app.diagrams.net/?src=about).

`orcabus_id` is the unique identifier for each record in the database. It is generated by the application where the
first 3 characters are the model prefix followed by [ULID](https://pypi.org/project/ulid-py/) separated by a dot (.).
The prefix is as follows:
The `orcabus_id` serves as the unique identifier for each record in the database. It is generated by the application
using the [ULID](https://pypi.org/project/ulid-py/) library. When a record is accessed via the API, the `orcabus_id`
is presented with a prefix consisting of three characters followed by a dot (.). The specific prefix varies depending
on the model of the record.

- Library model are `lib`
- Specimen model are `spc`
- Subject model are `sbj`
| Model | Prefix |
|------------|--------|
| Subject | `sbj.` |
| Sample | `smp.` |
| Library | `lib.` |
| Individual | `idv.` |
| Contact | `ctc.` |
| Project | `prj.` |

## How things work

Expand All @@ -59,22 +67,22 @@ In the near future, we might introduce different ways to load data into the appl
loading data
from the Google tracking sheet and mapping it to its respective model as follows.

| Sheet Header | Table | Field Name |
|-------------------|------------|----------------------|
| SubjectID | `Subject` | lab_subject_id |
| ExternalSubjectID | `Subject` | subject_id |
| SampleID | `Specimen` | sample_id |
| ExternalSampleID | `Specimen` | external_specimen_id |
| Source | `Specimen` | source |
| LibraryID | `Library` | library_id |
| Phenotype | `Library` | phenotype |
| Workflow | `Library` | workflow |
| Quality | `Library` | quality |
| Type | `Library` | type |
| Coverage (X) | `Library` | coverage |
| Assay | `Library` | assay |
| ProjectOwner | `Library` | project_owner |
| ProjectName | `Library` | project_name |
| Sheet Header | Table | Field Name |
|-------------------|--------------|--------------------|
| SubjectID | `Individual` | individual_id |
| ExternalSubjectID | `Subject` | subject_id |
| SampleID | `Sample` | sample_id |
| ExternalSampleID | `Sample` | external_sample_id |
| Source | `Sample` | source |
| LibraryID | `Library` | library_id |
| Phenotype | `Library` | phenotype |
| Workflow | `Library` | workflow |
| Quality | `Library` | quality |
| Type | `Library` | type |
| Coverage (X) | `Library` | coverage |
| Assay | `Library` | assay |
| ProjectName | `Project` | project_id |
| ProjectOwner | `Contact` | contact_id |

Some important notes of the sync:

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ def persist_lab_metadata(df: pd.DataFrame, sheet_year: str):
# The data frame is to be the source of truth for the particular year
# So we need to remove db records which are not in the data frame
# Only doing this for library records and (dangling) sample/subject may be removed on a separate process
# Note: We do not remove many-to-many relationships if current df has changed

# For the library_id we need craft the library_id prefix to match the year
# E.g. year 2024, library_id prefix is 'L24' as what the Lab tracking sheet convention
Expand Down

0 comments on commit 9d8a901

Please sign in to comment.