Simply fine-tuning ETL #131

chathasphere · 2022-05-04T13:37:07Z

Instead of manipulating data during preprocessing to identify cases and controls, I think it would be a lot simpler to optionally supply a list of patient IDs and labels to retrieve.

In a Jupyter notebook (say), we could build a suitable cohort of cases and controls, and then select for only these IDs during chunk iteration. I think this could speed up ETL significantly.

chathasphere self-assigned this May 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simply fine-tuning ETL #131

Simply fine-tuning ETL #131

chathasphere commented May 4, 2022

Simply fine-tuning ETL #131

Simply fine-tuning ETL #131

Comments

chathasphere commented May 4, 2022