Skip to content

Notes on scraping netFormulary formularies

Ben Goldacre edited this page Jul 19, 2018 · 3 revisions

Around 100 CCGs use netFormulary to list their formularies. The full list of CCGs is here.

We have discussed scraping these formularies so that we can better understand how GPs follow prescribing guidelines. However, the drug names in the netFormulary formularies are not consistent with names in the BNF or dm+d, and there are no full BNF codes for the drugs.

For instance, for what the dm+d calls "Betamethasone 0.1% / Neomycin 0.5% ear/eye/nose drops", the Airedale formulary calls "Betamethasone with Neomycin" (and includes "Betamethasone 0.1% Neomycin Sulphate 0.5% Eye/Ear/Nose Drops" in the comments for the drug), and the Bucks formulary calls "Betamethasone Sodium Phosphate 0.1% with Neomycin Sulphate 0.5%" (and includes "(ear/eye/nose drops)" in the comments).

Scraping

Actually accessing the data in order to scrape it should be straightforward. Each CCG's formulary is under its own domain (linked to from the Customers page). You can then access the formulary for a given BNF chapter at /chaptersSubDetails.asp?FormularySectionID=[chapter_id]&FC=1.

For instance, here is the chapter 12 for Bucks.

The data is in HTML tables and the HTML data for each drug seems to be structured in the same way.

Use cases

There are various things it would be nice to be able to do.

For example:

  • what is the agreement / disagreement among formularies on specific named antidepressants, or antihypertensives.
  • what is the concordance between recommended drugs, and prescribed drugs, within CCG.

So issues around scraping include:

  • even if we can't get perfect DM+D match, can we match drug names?
  • do they have a clear data structure around indication (i.e. can we easily find "this page and list of named drugs is about treatment of hypertension"?)