Recap data pull discussion #10

kcho · 2021-04-02T16:46:03Z

`lochness.redcap` is pulling all available data from the REDCap to a json file

when the lochness.redcap.sync is re-executed, lochness pulls the whole data again and then compares existing json before overwriting.

Problems

1. Daily pull of the data for all subjects may put too much load on the `REDCap` server

Do we know what is the limit of API data pull? eg) 1GB in a week?
How big will the json file be for a subject be?

2. extensive work is required on the `logbook` to select and extract the data from the `json dump` to visualize in the `DPDash`

how many fields are there?
will the fields be changed in any point of the study?

Solutions

Add a function in lochness.redcap to pull only specific fields?
- Justin & Habib's suggestion
Add a function that pulls the field that shows the date of last edit (Do we have such field in REDCap?)
- if this field is different from the downloaded json -> redownload all files
- if this field is the same as from the downloaded json -> skip
Include 'redcap completed' column in the metadata.csv to stop or make the pulling less often.

The text was updated successfully, but these errors were encountered:

tashrifbillah · 2021-04-02T19:33:54Z

Hi @kcho and @sbouix , let's continue the discussion here.

By Kevin (edited by Tashrif):

I found a “Data Entry Trigger” function in Redcap. Whenever a record is modified or updated, it sends a POST signal with a bunch of information to a dedicated server. If the major problem in pulling all the data on a daily basis is the REDCap server overloading, do you think implementing “Data Entry Trigger” and connecting to lochness would be a solution (or overkill?)

Suggested workflow:

Redcap Record gets updated
“Data entry trigger” sends the name of the data field updated to AWS
On the AWS, the list of updated records is stored
Lochness pulls this information from the AWS
Only download the updated fields

This would solve the REDCap server problem and we would be able to keep all of the up-to-date REDCap data in lochness.

tashrifbillah · 2021-04-02T20:08:41Z

Okay, here is my modified workflow:

REDCap record is updated
Data Entry Trigger emits a signal
Our very own https://predict.bwh.harvard.edu/ hosted watchdog (TBD) catches the signal
The watchdog (TBD) determines whether the update is an essential one
If yes, asks lochness to pull the updated record

The last three steps could be done by a cron like bot.

sbouix · 2021-04-02T20:16:46Z

To add the agenda the ability to detect tags for particular variables.

kcho · 2021-04-02T20:24:50Z

Thanks for this @tashrifbillah

Could you set up a url under the https://predict.bwh.harvard.edu/, so it can catch the POST signal from REDCap Data Entry Trigger please?

or if we have any other publically open ports among PNL servers, please let me know. I'll test getting the signal.

sbouix · 2021-04-02T20:26:29Z

The only 2 externally facing servers I know of are hcpep-xnat and our web server. Predict is behind the firewall.

tashrifbillah · 2021-04-02T20:33:53Z

Hi Kevin, do you know of a tutorial that I can go through to learn to upload a file to REDCap? I need to be able to upload, trigger, and listen independently to be able to set up such a thing. Also, where did you get the screenshot? If writing is hard, MS Teams call works for me.

tashrifbillah · 2021-04-02T20:38:28Z

Is this the function I need?

kcho · 2021-04-02T20:46:27Z

Hi Kevin, do you know of a tutorial that I can go through to learn to upload a file to REDCap? I need to be able to upload, trigger, and listen independently to be able to set up such a thing.

I have not uploaded a file before, but I would suggest to look at the api playground and try import file API Method.
API doc is here: https://redcap.partners.org/redcap/api/help

Also, where did you get the screenshot?

Screenshot is from
REDCAP - "Project Setup" -> "Enable optional modules and customizations"

kcho · 2021-04-02T21:41:04Z

Quickly tested to see if REDCap sends the signal to an open server.

Project id
Username
Record ID
Name of instrument modified

are sent to the server. I think it can act as a very useful logging system.

I’ll bring this up in our next meeting, so we can discuss how we can including this.

redcap_url=https%3A%2F%2Fredcap.partners.org%2Fredcap%2F&project_url=https%3A%2F%2Fredcap.partners.org%2Fredcap%2Fredcap_v10.0.30%2Findex.php%3Fpid%3D26709&project_id=26709&username=kc244&record=100111111&instrument=adverse_events_ae&adverse_events_ae_complete=0

tashrifbillah · 2021-04-05T13:01:47Z

2. extensive work is required on the logbook to select and extract the data from the json dump to visualize in the 
DPDash

how many fields are there?

The HCP-EP survey I am working with has 915 fields in each of the six instruments a.k.a surveys.

will the fields be changed in any point of the study?

The fields are the same across the six instruments so they should be consistent across the study.

kcho · 2021-04-06T16:26:29Z

@sbouix @tashrifbillah
I thought about the architecture below for what we have discussed yesterday about the REDCap data pulling. I think there were two main problems we discussed yesterday. One is PII and the other is server overloading. Below is my suggestion, please let me know what you think. I'll start working on them soon.

Proposed `REDCap` pulling architecture

PII part

lochness.redcap pulls all data from REDCap server to PROTECTED/survey/raw/ABCD01.json
Save json - data free from PII

lochness.redcap (or predict_pii.redcap or logbook.redcap)
- from PROTECTED/survey/raw/ABCD01.json remove all PII fields
  - using REDCap tags "PII" (need to review how we can pull this information)
- and save it in GENERAL/survey/raw/ABCD01.json

Save another json - data with the PIIs replaced with pseudo-random strings

lochness.redcap (or predict_pii.redcap or logbook.redcap)
- process PII fields in PROTECTED/survey/raw/ABCD01.json and save it in PROTECTED/survey/processed/ABCD01.json
- copy PROTECTED/survey/raw/ABCD01.json to GENERAL/survey/processed/ABCD01.json

Redcap server overloading problem part

before pulling any data from REDCap, lochness.redcap checks for files under PROTECTED/survery/raw
- if there is ABCD01.json already
  - check for db, which is updated live by listening to the POST-SIGNAL from REDCap Data Entry Trigger
    - if ABCD01 is in the db, execute the download
    - if ABCD01 is not in the db, skip the download
repeat PII part above
in the lochness - lochness transfer, change of the ABCD01.json should be detected by sha1 / hash / other methods to only pull the updated data.

tashrifbillah · 2021-04-06T18:38:47Z

What is the distinction between points 2 and 3 under PII Part?

kcho · 2021-04-06T18:49:10Z

What is the distinction between points 2 and 3 under PII Part?

Sorry - edited a bit
Point 2 is for saving a json in GENERAL - data that has no PII
Point 3 is for saving a json in GENERAL - data that has the PII fields replaced to pseudo-random strings

sbouix · 2021-04-06T22:17:17Z

Let's concentrate on REDCap server overloading first.

The PII masking is more complex, some variables can be deleted (e.g. name), others replaced by another variable (e.g. birthdate -> age in years). I am not sure we should have two copies of pretty much the same thing (raw vs processed). Also because I would like to import the anonymized data into MGB REDCap, we should figure out how that will be affected by (2) vs (3). Finally, we may be better off having a table with a list of pii variables as input rather than try to extract the tag from REDCap.

sbouix · 2021-04-06T22:18:36Z

For lochness to lochness transfer. I also think datalad might be useful. Something to discuss with Chris and Mathias on Friday.

tashrifbillah · 2021-04-07T14:23:18Z

Hi @kcho , did you try making a workstation listen to REDCap signal yet? If you haven't, I can try that for my entertainment out of DPDash crisscross ;)

kcho · 2021-04-07T21:08:15Z

Hi @kcho , did you try making a workstation listen to REDCap signal yet? If you haven't, I can try that for my entertainment out of DPDash crisscross ;)

I haven't yet tried it in the workstation- but I've drafted a commandline tool and a module for listening to the POST signal from the redcap server in the lochness.redcap
https://github.com/PREDICT-DPACC/lochness/blob/devel/kcho/redcap_new_arch/scripts/listen_to_redcap.py

kcho · 2021-04-08T02:50:32Z

Let's concentrate on REDCap server overloading first.

The model shown below has been uploaded to the devel/kcho/redcap_new_arch.
master...PREDICT-DPACC:devel/kcho/redcap_new_arch

To do

test in PNL workstation
record a demo
discuss the consequences of the Data Entry Trigger (DET) capture server going down

Figure

Summary

1. Make a database from the POST signals from the REDCap `Data Entry Trigger`

listen_to_recap.py: live server that captures and saves all the POST signals received from REDCap Data Entry Trigger
- saves a table looks like below

timestamp	project_id	redcap_username	record	instrument
1617823322.701979	26709	kc244	subject0002	inclusionexclusion
1617823322.711633	26709	kc244	subject0001	inclusionexclusion

The path of the DB above entered into config.yml

2. `lochness.redcap` checks for any updates in the `Data Entry Trigger` database before executing datapull

lochness.redcap.get_data_entry_trigger_df: loads the DET database
lochness.redcap.check_if_modified: compares st_mtime of already saved jsons vs DET database for any recent updates

tashrifbillah · 2021-04-08T13:47:12Z

In

check DET-DB
recent update

Do you plan to compare checksum like mediaflux does? Here are nipype ways of computing checksum:

kcho · 2021-04-08T13:56:06Z

In
check DET-DB
recent update
Do you plan to compare checksum like mediaflux does? Here are nipype ways of computing checksum:

shortcut
https://github.com/nipy/nipype/blob/6c060304f380c46b2f05c5afdc7171dbbdfadc58/nipype/utils/filemanip.py#L212

detailed
https://github.com/nipy/nipype/blob/6c060304f380c46b2f05c5afdc7171dbbdfadc58/nipype/utils/filemanip.py#L179

Since the Data Entry Trigger Database (DET-DB) is a CSV file containing all the REDCap field updates and the timestamp of each POST signal, I compare the last modified date of the already existing json file vs last update captured in the DET-DB for each subject (if this subject exists in the DET-DB)

tashrifbillah · 2021-04-08T18:05:14Z

Hi @kcho , is it expecting an empty csv file?

kcho · 2021-04-08T20:21:15Z

Hi @kcho , is it expecting an empty csv file?

It's expecting the path of the DET-DB csv file. If the already csv exists, the live capture server will append new information to the existing csv file.

tashrifbillah · 2021-04-17T02:31:55Z

Currently, how is it being programmed--listen_to_redcap.py running sync.py --source redcap sort of?

kcho · 2021-04-17T15:23:13Z

Currently, the two python scripts have to be executed separately. Just realized it could be useful to design following your comment.

listen_to_redcap.py running sync.py --source redcap sort of?

Any downside to doing this? Programatically, how would you spin out sync.py continuously running while also continuously running the listen_to_redcap.py from the single execution? multiprocess module?

tashrifbillah · 2021-04-17T15:30:03Z

multiprocess module?

It should be a chanied process--trigger comes first and then pull. We shall discuss more during our Monday brainstorming session.

By the way, do we have access to @sbouix 's presentation on what data reside in what platforms? I am trying to understand which platforms should trigger data entry signals. I understand for PRoNET, it would be REDCap. What would that be for PRESCIENT?

sbouix · 2021-04-17T17:10:01Z

The primary database system for PRESCIENT will be RPMS (Research Project Management System). It is custom built by the Orygen team and doesn't have the extensive documentation or API functionalities of REDCap. We're working to get access to their IT infrastructure to setup a development environment and start developing the Lochness RPMS module.

kcho added documentation Improvements or additions to documentation enhancement New feature or request question Further information is requested labels Apr 2, 2021

kcho mentioned this issue Apr 10, 2021

Listen to REDCap Data Entry trigger POST signal for selective REDCap pull #13

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recap data pull discussion #10

Recap data pull discussion #10

kcho commented Apr 2, 2021 •

edited by tashrifbillah

Loading

tashrifbillah commented Apr 2, 2021 •

edited

Loading

tashrifbillah commented Apr 2, 2021 •

edited

Loading

sbouix commented Apr 2, 2021

kcho commented Apr 2, 2021

sbouix commented Apr 2, 2021

tashrifbillah commented Apr 2, 2021 •

edited

Loading

tashrifbillah commented Apr 2, 2021

kcho commented Apr 2, 2021 •

edited by tashrifbillah

Loading

kcho commented Apr 2, 2021

tashrifbillah commented Apr 5, 2021

kcho commented Apr 6, 2021 •

edited

Loading

tashrifbillah commented Apr 6, 2021

kcho commented Apr 6, 2021 •

edited

Loading

sbouix commented Apr 6, 2021

sbouix commented Apr 6, 2021

tashrifbillah commented Apr 7, 2021

kcho commented Apr 7, 2021

kcho commented Apr 8, 2021

tashrifbillah commented Apr 8, 2021

kcho commented Apr 8, 2021 •

edited

Loading

tashrifbillah commented Apr 8, 2021

kcho commented Apr 8, 2021

tashrifbillah commented Apr 17, 2021

kcho commented Apr 17, 2021

tashrifbillah commented Apr 17, 2021 •

edited

Loading

sbouix commented Apr 17, 2021

Recap data pull discussion #10

Recap data pull discussion #10

Comments

kcho commented Apr 2, 2021 • edited by tashrifbillah Loading

lochness.redcap is pulling all available data from the REDCap to a json file

Problems

1. Daily pull of the data for all subjects may put too much load on the REDCap server

2. extensive work is required on the logbook to select and extract the data from the json dump to visualize in the DPDash

Solutions

tashrifbillah commented Apr 2, 2021 • edited Loading

tashrifbillah commented Apr 2, 2021 • edited Loading

sbouix commented Apr 2, 2021

kcho commented Apr 2, 2021

sbouix commented Apr 2, 2021

tashrifbillah commented Apr 2, 2021 • edited Loading

tashrifbillah commented Apr 2, 2021

kcho commented Apr 2, 2021 • edited by tashrifbillah Loading

kcho commented Apr 2, 2021

tashrifbillah commented Apr 5, 2021

kcho commented Apr 6, 2021 • edited Loading

Proposed REDCap pulling architecture

PII part

Redcap server overloading problem part

tashrifbillah commented Apr 6, 2021

kcho commented Apr 6, 2021 • edited Loading

sbouix commented Apr 6, 2021

sbouix commented Apr 6, 2021

tashrifbillah commented Apr 7, 2021

kcho commented Apr 7, 2021

kcho commented Apr 8, 2021

Figure

Summary

1. Make a database from the POST signals from the REDCap Data Entry Trigger

2. lochness.redcap checks for any updates in the Data Entry Trigger database before executing datapull

tashrifbillah commented Apr 8, 2021

kcho commented Apr 8, 2021 • edited Loading

tashrifbillah commented Apr 8, 2021

kcho commented Apr 8, 2021

tashrifbillah commented Apr 17, 2021

kcho commented Apr 17, 2021

tashrifbillah commented Apr 17, 2021 • edited Loading

sbouix commented Apr 17, 2021

kcho commented Apr 2, 2021 •

edited by tashrifbillah

Loading

`lochness.redcap` is pulling all available data from the REDCap to a json file

1. Daily pull of the data for all subjects may put too much load on the `REDCap` server

2. extensive work is required on the `logbook` to select and extract the data from the `json dump` to visualize in the `DPDash`

tashrifbillah commented Apr 2, 2021 •

edited

Loading

tashrifbillah commented Apr 2, 2021 •

edited

Loading

tashrifbillah commented Apr 2, 2021 •

edited

Loading

kcho commented Apr 2, 2021 •

edited by tashrifbillah

Loading

kcho commented Apr 6, 2021 •

edited

Loading

Proposed `REDCap` pulling architecture

kcho commented Apr 6, 2021 •

edited

Loading

1. Make a database from the POST signals from the REDCap `Data Entry Trigger`

2. `lochness.redcap` checks for any updates in the `Data Entry Trigger` database before executing datapull

kcho commented Apr 8, 2021 •

edited

Loading

tashrifbillah commented Apr 17, 2021 •

edited

Loading