-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mark which resources are loaded for which patients (i.e. "completion tracking") #296
Comments
Further thinking about this. tl;dr; Let's track Encounters as our primary completeness kernel-of-truth (rather than patients as the description above talks about). Studies can ignore any and all Encounter-linked data if it's not loaded yet. There are two use cases I can think of:
Those are closely related, but a little different. The first is about incompleteness at a broad scale leading to a diminished ability to do meaningful analysis. The second is about incompleteness at a small scale leading to inaccurate analysis. Solving for the first (incompleteness aka "Is there an ongoing ingestion process?"):
Solving for the second (inaccuracy aka "How to stop the data from lying to me during an ingestion process?"):
|
Problem scenarios, tricky to get right even with the above Encounter-oriented thinking:
Some of the above is probably helped a lot by doing resource exports at the same time. And then we could probably try to use So:
This would also let us catch probable-mistakes like loading old data on top of newer data by looking at the export timestamp you are providing. (important in the Cerner context, which doesn't have |
Current statusThis mostly works! 🎉 ... But you have to opt-in. You can manually enable this feature on the ETL side and the Library will automatically respect the tracking:
What does completion tracking actually do again?The Library
See code. Remaining workIdeally completion tracking would be enabled by default. But before flipping that switch, this is the remaining work to be done:
Empty input set thoughts
|
Update on the empty input set problem: I've gone with a solution that assumes it's rare you actually want to do it - the ETL refuses to upload an empty set unless you pass an Once that's landed, I think the only remaining tasks blocking enabling this by default are:
|
There is one other new thought: look into whether we can make the completion table non-unique for group/resource (i.e. record every time we push to the table, not just the latest time). This would give us more provenance information, but would maybe require changes to the completion code in Library & ETL - have to confirm what would be necessary there. |
Now that completion tracking is enabled by default, I'll close this ticket. I spun a remaining useful item into its own ticket: #356 |
This comes from a study need:
core__
tables) would probably want to ignore patients that don't have the resources they care about loaded.My initial thoughts on this are to have the ETL keep a metadata table around, marking which resources are "finished" at the Group level. And then which patients belong to which Groups. That way a study could ask if patient X has Conditions yet.
Brainstorming for that approach:
patientsencounters:patientsencounters, ETL will also write all row IDs to a table likeetl_patient_groupsetl_encounter_groupsPATIENT_IDENCOUNTER_ID, GROUP_ID)The text was updated successfully, but these errors were encountered: