Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ClinicalTrials.gov ingest #144

Merged
merged 15 commits into from
Dec 13, 2023
Merged

Improve ClinicalTrials.gov ingest #144

merged 15 commits into from
Dec 13, 2023

Conversation

cthoyt
Copy link
Member

@cthoyt cthoyt commented Oct 3, 2023

This PR improves the ClinicalTrials.gov ingest in the following ways:

  1. Automates download (note, I looked into using the new API but it doesn't appear to give MeSH IDs consistently). Current count is 467,838 trials.
  2. Add additional fields (this is now more easily configurable with a list)
    • StudyType (e.g., observational, interventional)
    • DesignAllocation (e.g., randomized, Non-Randomized) - transformed to boolean indicating randomized or not
    • OverallStatus (e.g., Completed, Active, Recruiting)
    • Phase (e.g., N/A, Phase 1, Phase 2) - transformed to integer
    • WhyStopped, e.g. NCT06048705 says it was stopped because GSK changed its research priorities
    • StartDate (e.g. November 1, 2023, May 1984) - transformed to integer representing year
    • AnticipatedStartDate - (e.g. Actual or Anticipated) - transformed to boolean

@cthoyt cthoyt requested a review from kkaris October 3, 2023 08:40
Copy link
Collaborator

@kkaris kkaris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. I suggested two fields that could get property types set, and that we need to check for NaN's.

src/indra_cogex/sources/clinicaltrials/__init__.py Outdated Show resolved Hide resolved
src/indra_cogex/sources/clinicaltrials/__init__.py Outdated Show resolved Hide resolved
src/indra_cogex/sources/clinicaltrials/__init__.py Outdated Show resolved Hide resolved
@kkaris kkaris force-pushed the improve-clinical-trials branch from 5efef0a to c11916c Compare December 12, 2023 22:58
@kkaris kkaris marked this pull request as ready for review December 12, 2023 22:58
@bgyori bgyori force-pushed the improve-clinical-trials branch from caa7052 to 2252bbf Compare December 13, 2023 15:49
@bgyori bgyori merged commit 227127c into main Dec 13, 2023
4 checks passed
@cthoyt cthoyt deleted the improve-clinical-trials branch December 15, 2023 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants