-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cohort sampling #1657
Cohort sampling #1657
Conversation
Note: this makes reading a sample much much slower.
Add endpoint for has-samples by sourceKey
Hi, thanks for the PR. I think it's worth trying to get this update in the 2.8 release since we have a few extra weeks (so we don't interrupt the symposium). If things go smoothly, let's try to get this in. The conflict was related to a recent file update on cohort definition service, which involved re-formatting it to fit the decision to use tab-indentation. Can you resolve the conflict (should be simple)? In addition, it would be helpful to submit the files with tab-indent so that we won't have to go back later and update the formatting. |
Thanks @chrisknoll. Will look into converting spaces into tabs. |
Last commit solves the BIGINT issue mentioned by @chrisknoll in the Atlas PR: |
Hi, the changes to covert to string look good, I confirmed that the payload in the JSON is comming as a string, however the values are still corrupted. But, i figured out where: In line 467 of CohortSamplingService.java, you have a Once I made both changes on the Atlas and WebAPI side, I can pull up the patient profile and the visualization appears! |
@chrisknoll Thanks again for your tests. Changes made in our local branch |
Remove default to include event counts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested sample endpoints on MSSQL and PDW.
Hi @MaximMoinat @chrisknoll we would like to leverage the Any concerns about this? And if okay, should we add to this table create or are we able to do flyway type additions to Thanks! Jon For reference, the code: IF OBJECT_ID('@results_schema.cohort_sample_element', 'U') IS NULL |
Sounds alright to me. The existing sampling could would of course need to be updated. I’m not sure what the policy for results DDL migrations is; flyway style to update a table, including rules on how to create new id’s for existing samples would be more convenient for users that already have the samples table. |
There are no flyway migrations that manage results schema upates (there are far too many platforms to support). Only WebAPI tables are managed by flyway. So, in cases where there's updates to a results schema, we call it out in the release notes and reference the relevant DDL. Users need to manually manage these changes (including migrating existing data if necessary). @jduke99 : is there any reason why you can't use the sample_id-person_id in the sample_element table to link to your own table dedicated to tracking annotation results? I'm nervous about having cross dependencies on this table (ie: every time someone wants to attach information about a sample, it needs to add a colum n to cohort_sample_element). |
Thanks both for the feedback. @chrisknoll This could be done re linking to the sample_id-person_id. So the We really wanted to have Samples be the home for launching the validation because it just makes sense from a workflow perspective. You create a sample of patients, you launch a validation from the Sample page
it is clean and we believe will make for a very sensible workflow. it will be relatively inefficient to run through the whole annotations table looking for matches of who has been annotated, compared with just writing a flag to the sample_elements table. Essentially if it is So I just would prefer for it to run smoothly and quickly rather than potentially be slow and grind unnecessarily. Adding an Anyway, those are my thoughts. Ultimately we will go with whatever you recommend. Thanks, Jon |
Hey, @jduke99 . I understand your concerns,, but we need to balance clean design with performance. If every new feature attached to samples (and there could be many) results in adding a new column to this table, it quickly gets out of control. The cohort annotation feature is a auxiliary function of cohort samples. There could be others, and for each of these auxiliary functions (such as, mark a person 'complete' for a given sample and annotation set) then we should have a table that tracks the person, the annotation set and the sample, and there completion status. With this form, you can also track other information about their responses: is it complete or partial? What % of completion? Does the result mean they pass or fail? All these points of information can't be put into the So, I'd ask that you plan on having a dedicated table that is keyed on the sample_id, person_id + {annootation identifier}. Yes, this does result in a join (and a left join people without annotation status shoudl be returned) but the performance impact will be minimal (how large of a sample are we talking about? 10...100..1000? That will take no time to join). We definitely can't design the database schema around 'removing joins' we're going to have to join at some point. |
Sounds good @chrisknoll. Thanks for the insight and we will take your suggested approach of keying on the sample_id, person_id + {annotation identifier} in a separate table. Thanks for the quick response. |
Backend code for a the Atlas cohort sampling feature (OHDSI/Atlas#2357), allowing to store created samples per cohort and data source.
For more details see this forum post.