diff --git a/lib/workload/stateless/stacks/metadata-manager/README.md b/lib/workload/stateless/stacks/metadata-manager/README.md index 77242c0e0..1d4ccf5db 100644 --- a/lib/workload/stateless/stacks/metadata-manager/README.md +++ b/lib/workload/stateless/stacks/metadata-manager/README.md @@ -43,8 +43,10 @@ This is the current (WIP) schema that reflects the current implementation. To modify the diagram, open the `docs/schema.drawio.svg` with [diagrams.net](https://app.diagrams.net/?src=about). -`orcabus_id` is the unique identifier for each record in the database. It is generated by the application where the first 3 characters are the model prefix followed by [ULID](https://pypi.org/project/ulid-py/) separated by a dot (.). +`orcabus_id` is the unique identifier for each record in the database. It is generated by the application where the +first 3 characters are the model prefix followed by [ULID](https://pypi.org/project/ulid-py/) separated by a dot (.). The prefix is as follows: + - Library model are `lib` - Specimen model are `spc` - Subject model are `sbj` @@ -72,13 +74,13 @@ from the Google tracking sheet and mapping it to its respective model as follows | ProjectOwner | `Library` | project_owner | | ProjectName | `Library` | project_name | - Some important notes of the sync: 1. The sync will only run from the current year. -2. The tracking sheet is the single source of truth, any deletion/update on any record (including the record that has - been - loaded) will also apply to the existing data. +2. The tracking sheet is the single source of truth for the current year. Any deletion or update to existing records + will be applied based on their internal IDs (`library_id`, `specimen_id`, and `subject_id`). For the library + model, the deletion will only occur based on the current year's prefix. For example, syncing the 2024 tracking + sheet will only query libraries with `library_id` starting with `L24` to determine whether to delete it. 3. `LibraryId` is treated as a unique value in the tracking sheet, so for any duplicated value (including from other tabs) it will only recognize the last appearance. 4. In cases where multiple records share the same unique identifier (such as SampleId), only the data from the most diff --git a/lib/workload/stateless/stacks/metadata-manager/deploy/README.md b/lib/workload/stateless/stacks/metadata-manager/deploy/README.md index 32fefc5c8..554400d13 100644 --- a/lib/workload/stateless/stacks/metadata-manager/deploy/README.md +++ b/lib/workload/stateless/stacks/metadata-manager/deploy/README.md @@ -40,22 +40,21 @@ To query in a local terminal gsheet_sync_lambda_arn=$(aws ssm get-parameter --name '/orcabus/metadata-manager/sync-gsheet-lambda-arn' --with-decryption | jq -r .Parameter.Value) ``` -The lambda handler will accept an array of years from which sheet to run from the GSheet workbook. If no year is specified, it will run the current year. +The lambda handler will accept a single year from which sheet to run from the GSheet workbook. If no year is specified, it will run the current year. ```json { - "year":["2024"] + "year": "2024" } ``` -Note that if you specify more than one year at a single invoke (e.g. `["2020", "2021"]`), there are high chances that lambda -would timeout and the sync is not completed properly. +Invoking lambda cmd: ```sh aws lambda invoke \ --function-name $gsheet_sync_lambda_arn \ --invocation-type Event \ - --payload '{ "year": ["2024"] }' \ + --payload '{ "year": "2024" }' \ --cli-binary-format raw-in-base64-out \ res.json ```