This is data package with information on public and private registries of clinical trials. It intended to be used to track sources of trial data for the OpenTrials platform and to document existing and possible methods of data extraction.
The majority of information about registries comes from the WHO ICTRP (International Clinical Trials Registry Platform) and the websites of the individual registries.
For registers that are members of WHO's ICTRP, another option for data acquisition would be to write a generic parser for the XML dump provided to ICTRP. We would also need to source the dump URL per register.
Many options for scraping Clinical Trials exist. It is the largest source for structured clinical trials data. The scraper we will use has been started here:
https://github.com/roll/clinicaltrials-scraper
This is currently a very basic scraper for scraping ISRCTN (currently only downloading identifiers e.g. ISRCTN41598423, but can be easily extended).
https://github.com/annapowellsmith/isrctn
There are comparatively few options for scraping EU Clinical Trials Register data, but a potentially useful R package for retrieving both CT and EU data exists here.
https://github.com/rfhb/ctrdata
There is an existing scraper written for scraping data from the Australia and New Zealand register here:
https://classic.scraperwiki.com/scrapers/australia_new_zealand_clinical_trials/
https://github.com/tfmorris/australia_new_zealand_clinical_trials/blob/master/scraper.py
This is both the code and scraper for the Brazilian Clinical Trials Registry. The repo actually contains a very small dump (57 trials in XML form) in the repo. Trials have WHO's UTN (Uniform Trial Number): Another option for data acquisition would be to write a generic parser for registers that provide XML dumps to ICTRP. DUMP_URL = 'http://www.ensaiosclinicos.gov.br/rg/all/xml/ictrp'.
https://github.com/bireme/opentrials/tree/master/data
This is a scraper for the Japanese network of clinical trials. We would most likely create an entirely new scraper:
https://github.com/nick111/MedicalDataCrawl/commit/bdefcfa27009dca60d89edc40a806b19be2dd3a9
No existing tooling found. Would write new scraper.
No existing tooling found. Would write new scraper.
No existing tooling found. Would write new scraper.
No existing tooling found. Would write new scraper.
No existing tooling found. Would write new scraper.
No existing tooling found. Would write new scraper.
No existing tooling found. Would write new scraper.
No existing tooling found. Would write new scraper.
No existing tooling found. Would write new scraper.
No existing tooling found. Would write new scraper.
No existing tooling found. Would write new scraper.
No existing tooling found. Would write new scraper.
No existing tooling found. Would write new scraper.
No existing tooling found. Would write new scraper.
No existing tooling found. Would write new scraper.
No existing tooling found. Would write new scraper.
No existing tooling found. Would write new scraper.
No existing tooling found. Would write new scraper.
No existing tooling found. Would write new scraper.
Using publicly available and downloadable data from ClinicalTrials.gov, a restructured and reformatted relational database was developed. This is referred to as the database for Aggregate Analysis of ClinicalTrials.gov (AACT).
Python classes / functions to parse Clinical Trial data in the form of XML returned from clinicaltrials.gov.
https://github.com/pjfan/ClinicalTrialsParser
Trialverse aims to enable world-wide collaboration on extracting data and meta-data on existing clinical trials, and to eventually reduce the duplication of effort inherent in the current systematic review and evidence-based decision making processes. It is part of ADDIS 2, a drugis.org project, and is currently in early development.
https://github.com/drugis/trialverse
Not exactly a scraper, but tooling for importing clinical trials data. Most useful for providing regular expressions (and tests!) for matching different clinical trials identifiers (e.g. ISRCTN41598423, NCT02590237, etc.):