Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: initial completeness-tracking work #304

Merged
merged 1 commit into from
Apr 9, 2024
Merged

feat: initial completeness-tracking work #304

merged 1 commit into from
Apr 9, 2024

Conversation

mikix
Copy link
Contributor

@mikix mikix commented Mar 26, 2024

This is early support for completeness-tracking (the ability to mark which groups & resources have been loaded by the ETL and are thus ready for studies to use).

  • Adds some new (secret) CLI arguments: --export-group, --export-timestamp, and --write-completion
  • Adds a new etl__completion table which holds:
    • table
    • group
    • export_time
  • Adds a new etl__completion_encounters table which holds:
    • encounter_id
    • group
    • export_time
  • This table is automatically written to, using the CLI values
  • Currently, those arguments are optional. A future change will make them required. (though hopefully usually automatically inferred from export logs)
  • When using the ndjson output format, you can no longer have any files in the output folder. This is to safeguard against accidents (and to make some code paths simpler)

This PR starts to address #296 (Library side here)

Future Plans
This PR is just an preparatory, disabled-by-default step 1. These are bits of work that still have to be done to make this feature real (and required) across Cumulus.

  • Parse export logs to autodetect arg values
  • Require the new ETL args (from user, from log, or from our own export)
  • Prevent overwriting new group data with old, to solve Epic use cases (check dates before deltalake write)
  • Generate export logs from our own bulk export implementation, allowing us to parse it back later
  • Backfill/migration support (maybe as simple as running --group-name legacy --timestamp 2020 for all current encounters)
  • Handle empty input sets - may require patching the bulk exporter to leave an empty file in those cases
  • User docs

Checklist

  • Consider if documentation (like in docs/) needs to be updated
  • Consider if tests should be added

@mikix mikix force-pushed the mikix/completion1 branch from 501eb88 to e15692f Compare March 26, 2024 17:36
@mikix mikix force-pushed the mikix/completion1 branch 5 times, most recently from 6aead50 to 9d815e3 Compare April 8, 2024 13:04
@mikix mikix changed the title WIP: feat: initial completeness-tracking work feat: initial completeness-tracking work Apr 8, 2024
@mikix mikix marked this pull request as ready for review April 8, 2024 14:11
This is early support for completeness-tracking (the ability to mark
which groups & resources have been loaded by the ETL and are thus ready
for studies to use).

- Adds some new (secret for now) CLI arguments:
  --export-group
  --export-timestamp
  --write-completion
- Adds a new `etl__completion` table which holds:
  - table
  - group
  - export_time
- Adds a new `etl__completion_encounters` table which holds:
  - group
  - encounter_id
  - export_time
- This table is automatically written to, using the CLI values
- Currently, those arguments are optional. A future change will make
  them required. (though hopefully usually automatically inferred from
  export logs)
- The export args will be automatically provided internally, if we are
  handling the bulk export ourselves (i.e. Loaders can provide group
  name and export timestamp).
- When using the ndjson output format, you can no longer have any files
  in the output folder. This is to safeguard against accidents (and to
  make some code paths simpler)
@mikix mikix force-pushed the mikix/completion1 branch from 9d815e3 to b464da3 Compare April 9, 2024 16:41
@mikix mikix merged commit 1bedc98 into main Apr 9, 2024
3 checks passed
@mikix mikix deleted the mikix/completion1 branch April 9, 2024 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants