-
Notifications
You must be signed in to change notification settings - Fork 26
Where we get data from
Peter Inglesby edited this page Nov 9, 2018
·
2 revisions
See sources.json for an up-to-date list.
- BNF codes from https://apps.nhsbsa.nhs.uk/infosystems/data/showDataSelector.do?reportId=126
- Prescribing data from https://apps.nhsbsa.nhs.uk/infosystems/data/showDataSelector.do?reportId=124
- dm+d from https://isd.digital.nhs.uk/trud3/user/authenticated/group/0/pack/6/subpack/24/releases
- Patient list sizes from https://digital.nhs.uk/data-and-information/publications/statistical/patients-registered-at-a-gp-practice/
- CCG details from https://files.digital.nhs.uk/assets/ods/current/eccg.csv
- Practice details from https://files.digital.nhs.uk/assets/ods/current/epraccur.csv
- Practice postcodes from https://files.digital.nhs.uk/assets/ods/current/gridall.csv
- Miscellaneous prescribing metadata from eg http://datagov.ic.nhs.uk/presentation/2017_08_August/T201708ADDR+BNFT.CSV and http://datagov.ic.nhs.uk/presentation/2017_08_August/T201708CHEM+SUBS.CSV
In each case, we download the data automatically using a Python script that runs as part of our data pipeline, rather than doing so manually via a web browser.
Data hosted on the ISP (the BNF codes and the prescribing data) is on a website that's protected by a captcha. To download these datasets, a human has to solve the captcha in their browser. This sets a cookie in the user's browser, which is then passed as a parameter to the Python script.
Similarly, the TRUD data (just the dm+d dataset for now) is on a website that's protected by a password. Again, a human must log in with a password, and then pass the corresponding cookie to the Python script.