Cohort sampling #1657

MaximMoinat · 2020-10-14T13:41:37Z

Backend code for a the Atlas cohort sampling feature (OHDSI/Atlas#2357), allowing to store created samples per cohort and data source.

For more details see this forum post.

Note: this makes reading a sample much much slower.

…source

Add endpoint for has-samples by sourceKey

chrisknoll · 2020-10-14T14:01:03Z

Hi, thanks for the PR. I think it's worth trying to get this update in the 2.8 release since we have a few extra weeks (so we don't interrupt the symposium). If things go smoothly, let's try to get this in.

The conflict was related to a recent file update on cohort definition service, which involved re-formatting it to fit the decision to use tab-indentation. Can you resolve the conflict (should be simple)? In addition, it would be helpful to submit the files with tab-indent so that we won't have to go back later and update the formatting.

MaximMoinat · 2020-10-14T14:06:43Z

Thanks @chrisknoll. Will look into converting spaces into tabs.

MaximMoinat · 2020-10-28T10:18:34Z

Last commit solves the BIGINT issue mentioned by @chrisknoll in the Atlas PR:
OHDSI/Atlas#2357 (comment)

chrisknoll · 2020-10-28T15:44:41Z

Hi, the changes to covert to string look good, I confirmed that the payload in the JSON is comming as a string, however the values are still corrupted. But, i figured out where:

In line 467 of CohortSamplingService.java, you have a RowMapper that pulls the data out of the result set. For personId, you're using rs.getInt("person_id"), but this should be rs.getLong("person_id"). I changed this locally, and the person_id's returned the correct values (as strings).

Once I made both changes on the Atlas and WebAPI side, I can pull up the patient profile and the visualization appears!

MaximMoinat · 2020-10-30T08:35:45Z

@chrisknoll Thanks again for your tests. Changes made in our local branch

Remove default to include event counts.

Cohort Sample PR Updates

chrisknoll

Tested sample endpoints on MSSQL and PDW.

jduke99 · 2021-03-24T19:28:38Z

Hi @MaximMoinat @chrisknoll we would like to leverage the cohort_sample_element table for the validation tool. Ideally we'd like to add a cohort_sample_element_id (ie a unique id for each row in this table) and an annotation_result_id field to connect between the sample patients and the annotations that are created.

Any concerns about this? And if okay, should we add to this table create or are we able to do flyway type additions to $results_schema tables?

Thanks!

Jon

For reference, the code:

IF OBJECT_ID('@results_schema.cohort_sample_element', 'U') IS NULL
CREATE TABLE @results_schema.cohort_sample_element(
cohort_sample_id int NOT NULL,
rank_value int NOT NULL,
person_id bigint NOT NULL,
age int,
gender_concept_id int
);

blootsvoets · 2021-03-24T19:46:57Z

Sounds alright to me. The existing sampling could would of course need to be updated. I’m not sure what the policy for results DDL migrations is; flyway style to update a table, including rules on how to create new id’s for existing samples would be more convenient for users that already have the samples table.

chrisknoll · 2021-03-24T20:56:46Z

I’m not sure what the policy for results DDL migrations

There are no flyway migrations that manage results schema upates (there are far too many platforms to support). Only WebAPI tables are managed by flyway. So, in cases where there's updates to a results schema, we call it out in the release notes and reference the relevant DDL. Users need to manually manage these changes (including migrating existing data if necessary).

@jduke99 : is there any reason why you can't use the sample_id-person_id in the sample_element table to link to your own table dedicated to tracking annotation results? I'm nervous about having cross dependencies on this table (ie: every time someone wants to attach information about a sample, it needs to add a colum n to cohort_sample_element).

jduke99 · 2021-03-24T22:01:14Z

Thanks both for the feedback.

@chrisknoll This could be done re linking to the sample_id-person_id. So the cohort_sample_element_id field is kind of just for convenience and can be dropped if it complicates things. But the annotation_result_id is a bit trickier for the following reason:

We really wanted to have Samples be the home for launching the validation because it just makes sense from a workflow perspective. You create a sample of patients, you launch a validation from the Sample page
eg as an action here

and when you look at the sample list you can see which patients have been reviewed, eg as an extra column here

it is clean and we believe will make for a very sensible workflow.

it will be relatively inefficient to run through the whole annotations table looking for matches of who has been annotated, compared with just writing a flag to the sample_elements table. Essentially if it is null they have not been annotated. If it has a value then they have. No need to join any tables.

So I just would prefer for it to run smoothly and quickly rather than potentially be slow and grind unnecessarily. Adding an annotation_result_id column would achieve this goal. However having everyone manually update is a big bummer. Too bad we can't do flyway on $results...

Anyway, those are my thoughts. Ultimately we will go with whatever you recommend.

Thanks,

Jon

chrisknoll · 2021-03-24T23:17:32Z

it will be relatively inefficient to run through the whole annotations table looking for matches of who has been annotated, compared with just writing a flag to the sample_elements table. Essentially if it is null they have not been annotated. If it has a value then they have. No need to join any tables.

Hey, @jduke99 . I understand your concerns,, but we need to balance clean design with performance. If every new feature attached to samples (and there could be many) results in adding a new column to this table, it quickly gets out of control. The cohort annotation feature is a auxiliary function of cohort samples. There could be others, and for each of these auxiliary functions (such as, mark a person 'complete' for a given sample and annotation set) then we should have a table that tracks the person, the annotation set and the sample, and there completion status. With this form, you can also track other information about their responses: is it complete or partial? What % of completion? Does the result mean they pass or fail? All these points of information can't be put into the cohort_sample_element table.... the cohort_sample_element table is just supposed to track members of a sample.

So, I'd ask that you plan on having a dedicated table that is keyed on the sample_id, person_id + {annootation identifier}. Yes, this does result in a join (and a left join people without annotation status shoudl be returned) but the performance impact will be minimal (how large of a sample are we talking about? 10...100..1000? That will take no time to join). We definitely can't design the database schema around 'removing joins' we're going to have to join at some point.

jduke99 · 2021-03-25T13:47:34Z

Sounds good @chrisknoll. Thanks for the insight and we will take your suggested approach of keying on the sample_id, person_id + {annotation identifier} in a separate table. Thanks for the quick response.

blootsvoets and others added 28 commits January 28, 2020 11:04

Initial sampling implementation

89a3974

Add name to cohort sample

3eaf3d6

Revert server port to 8080

dbe6328

Fix queryParam -> pathParam

c0cfc65

Initial update to fix SQL commands for sample generation

ea742ff

Attempt to move cohort_sample table to ohdsi

ca86aad

Separate cohort_sample from cohort_sample_element

d70cc25

Create index in separate query

c3adeba

Misc fixes

7933edf

Fix gender sampling and DTOs

d82594e

Documentation and exception string updates

1e7ef5a

Provide record counts with each sample element

d65efea

Note: this makes reading a sample much much slower.

Make sample record counts easier to turn off

dafbb21

Make record counts optional

83fd26c

Delete samples on cohort redefinition

76ef810

Run sample deletion in tasklet

ccb005b

Don't run removal if no cohort samples are present

f83d408

Fixes with authorization and SQL server

0b65c77

Check cohort generation status before generating sample

85c793b

Fix Oracle and PostgreSQL migration syntax

e192d45

add endpoint to check that samples exist for a cohort

60da4e0

calcuate age in subquery to allow usage in sample expression

a26f765

delete samples of a source upon triggering cohort generation of that …

e381ec1

…source

fix sql for gender sample criteria

dde9418

add has-samples endpoint by sourceKey and cohortDefinitionId

10fd14d

add isValid attribute to cohortsample summary

b4df22f

Merge pull request #2 from thehyve/has-samples-by-source

6a25647

Add endpoint for has-samples by sourceKey

Merge branch 'master-ohdsi' into cohort-sampling

7366005

Merge branch 'master' into cohort-sampling

8454e34

use tab-indentation

a25245b

MaximMoinat mentioned this pull request Oct 14, 2020

Enhanced Cohort Sampling and Patient Profiles OHDSI/Atlas#2357

Draft

Make person ID a String for JS compatibility

1019d91

fix corrupt bigint person_id

681619b

MaximMoinat closed this Oct 30, 2020

MaximMoinat reopened this Oct 30, 2020

chrisknoll added 3 commits November 4, 2020 16:21

Rename rank to rank_value.

329b717

Remove default to include event counts.

Added refresh function.

0dbb9f8

Made refreshCohortSample POST.

5f17870

chrisknoll mentioned this pull request Nov 9, 2020

Added simplified cohort sample UI. OHDSI/Atlas#2373

Merged

chrisknoll and others added 2 commits November 9, 2020 04:10

Added security records for refresh to migration scripts.

bdea2c1

Merge pull request #4 from chrisknoll/cohort-sampling-cknoll1-update1

b1e15fc

Cohort Sample PR Updates

chrisknoll approved these changes Nov 9, 2020

View reviewed changes

chrisknoll merged commit ae36601 into OHDSI:master Nov 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cohort sampling #1657

Cohort sampling #1657

MaximMoinat commented Oct 14, 2020 •

edited

Loading

chrisknoll commented Oct 14, 2020

MaximMoinat commented Oct 14, 2020

MaximMoinat commented Oct 28, 2020

chrisknoll commented Oct 28, 2020 •

edited

Loading

MaximMoinat commented Oct 30, 2020

chrisknoll left a comment

jduke99 commented Mar 24, 2021

blootsvoets commented Mar 24, 2021

chrisknoll commented Mar 24, 2021 •

edited

Loading

jduke99 commented Mar 24, 2021 •

edited

Loading

chrisknoll commented Mar 24, 2021

jduke99 commented Mar 25, 2021

Cohort sampling #1657

Cohort sampling #1657

Conversation

MaximMoinat commented Oct 14, 2020 • edited Loading

chrisknoll commented Oct 14, 2020

MaximMoinat commented Oct 14, 2020

MaximMoinat commented Oct 28, 2020

chrisknoll commented Oct 28, 2020 • edited Loading

MaximMoinat commented Oct 30, 2020

chrisknoll left a comment

Choose a reason for hiding this comment

jduke99 commented Mar 24, 2021

blootsvoets commented Mar 24, 2021

chrisknoll commented Mar 24, 2021 • edited Loading

jduke99 commented Mar 24, 2021 • edited Loading

chrisknoll commented Mar 24, 2021

jduke99 commented Mar 25, 2021

MaximMoinat commented Oct 14, 2020 •

edited

Loading

chrisknoll commented Oct 28, 2020 •

edited

Loading

chrisknoll commented Mar 24, 2021 •

edited

Loading

jduke99 commented Mar 24, 2021 •

edited

Loading