Where we get data from

See sources.json for an up-to-date list.

BNF codes from https://apps.nhsbsa.nhs.uk/infosystems/data/showDataSelector.do?reportId=126
Prescribing data from https://apps.nhsbsa.nhs.uk/infosystems/data/showDataSelector.do?reportId=124
dm+d from https://isd.digital.nhs.uk/trud3/user/authenticated/group/0/pack/6/subpack/24/releases
Patient list sizes from https://digital.nhs.uk/data-and-information/publications/statistical/patients-registered-at-a-gp-practice/
CCG details from https://files.digital.nhs.uk/assets/ods/current/eccg.csv
Practice details from https://files.digital.nhs.uk/assets/ods/current/epraccur.csv
Practice postcodes from https://files.digital.nhs.uk/assets/ods/current/gridall.csv
Miscellaneous prescribing metadata from eg http://datagov.ic.nhs.uk/presentation/2017_08_August/T201708ADDR+BNFT.CSV and http://datagov.ic.nhs.uk/presentation/2017_08_August/T201708CHEM+SUBS.CSV

In each case, we download the data automatically using a Python script that runs as part of our data pipeline, rather than doing so manually via a web browser.

Data hosted on the ISP (the BNF codes and the prescribing data) is on a website that's protected by a captcha. To download these datasets, a human has to solve the captcha in their browser. This sets a cookie in the user's browser, which is then passed as a parameter to the Python script.

Similarly, the TRUD data (just the dm+d dataset for now) is on a website that's protected by a password. Again, a human must log in with a password, and then pass the corresponding cookie to the Python script.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Where we get data from

Clone this wiki locally