Skip to content

PPD data model

Peter Inglesby edited this page Jul 28, 2018 · 9 revisions

MDR, dm+d, and CDR

The MDR is the legacy database on which dispenser payments is based. It most notably is mapped to BNF code only, which makes it hard to make use of the rich metadata available in the publicly-available dm+d NHS database. It also has a restriction of 40 characters on the BNF Name field.

dm+d is itself generated from CDR, which is the next-generation payments database that's been in development for around a decade.

CDR notes

BNF Code mappings

We can map from PPD data (which is supplied in MDR format) to dm+d via a mapping which is updated by NHSBSA every quarter. As dm+d is updated monthly, this means it's possible for some actively prescribed dm+d products to have no corresponding BNF Code mapped; however, given the lag of 8+ weeks between a dispensing event and the release of the monthly PPD data, this is unlikely ever to be a problem.

MDR field notes

What is quantity?

What is actual_cost?

  • Doesn't reflect broken bulk, or other costs based on prescribing which cannot be directly attributed to practices. These costs include VAT, nationally unidentified prescribing, broken bulk, schedule adjustments, net cross boundary costs and out of pocket expenses and payments for containers for national unidentified prescribing (source)

dm+d notes

dm+d is the NHS Dictionary of Medicines & Devices and is documented here.

There is a dm+d browser here but its search is pretty slow, and it'd be nice to build our own.

Importing data

Data is published weekly by NHS Digital from TRUD, but we currently download it as part of the monthly import. There is an issue about changing to a weekly download. The code for the task that fetches the data is here, and the code for the task that imports the data is here.

VMPs and AMPs and VMPPs and AMPPs, oh my!

VMPs, AMPs, VMPPs, and AMPPs are the main objects in the dm+d data model.

This is quoted directly from the documentation above:

  • The Virtual Medicinal Product (VMP) describes the generic title for a product including the form and strength, for example Atenolol 100mg tablets.

  • The Virtual Medicinal Product Pack (VMPP) describes the generic title for a generic or proprietary product pack which is known to have been available. The description includes the pack size, for example Atenolol 100mg tablets 28 tablet.

  • The Actual Medicinal Product (AMP) describes an actual product which is known to have been available linked to the name of a particular supplier, for example Tenormin 100mg tablets (AstraZeneca UK Ltd).

  • The Actual Medicinal Product Pack (AMPP) describes an actual product which is known to have been available linked to both the name of a particular supplier and information on the pack size of the product, for example Tenormin 100mg tablets (AstraZeneca UK Ltd) 28 tablet 2 x 14 tablets.

There are foreign key relationships between these four types of object. (These relationships are implicit in our current implementation, but are observed to be enforced in the source.)

The relationships are as follows:

  • A VMP has many AMPs / an AMP belongs to a VMP
  • A VMP has many VMPPs / a VMPP belongs to a VMP
  • An AMP has many AMPPs / an AMPP belongs to an AMP
  • A VMPP has many AMPPs / an AMPP belongs to a VMPP

You can think of the relationship between the four types of object as forming a diamond.

VMP<------+ AMP
 ^           ^
 |           |
 |           |
 |           |
 +           +
VMPP<-----+ AMPP

We store these four types of object in four tables, with the following (simplified) schemas:

VMP (table dmd_vmp)

field description
vpid Primary key
nm Name

VMPP (table dmd_vmpp)

field description
vppid Primary key
vpid Foreign key to VMP
nm Name

AMP (table dmd_amp)

field description
apid Primary key
vpid Foreign key to VMP
nm Name

AMPP (table dmd_ampp)

field description
appid Primary key
apid Foreign key to AMP
vppid Foreign key to VMPP
nm Name

Invariants

There are no dangling references between objects in the diamond

That is, an AMP's VMP exists, a VMPP's VMP exists, and an AMPP's VMPP and AMP both exist.

Eg: SELECT COUNT(*) FROM dmd_amp LEFT OUTER JOIN dmd_vmp ON dmd_amp.vpid = dmd_vmp.vpid WHERE dmd_vmp.vpid IS NULL.

The diamond is consistent

That is, whichever way you traverse the diamond from AMPP to VMP, you get the same result. In other words, an AMPP's AMP's VMP is the same as that AMPP's VMPP's VMP.

SELECT COUNT(dmd_ampp.*)
FROM dmd_ampp
INNER JOIN dmd_amp ON dmd_ampp.apid = dmd_amp.apid
INNER JOIN dmd_vmpp ON dmd_ampp.vppid = dmd_vmpp.vppid
WHERE dmd_amp.vpid != dmd_vmpp.vpid;

Every diamond is (almost) complete

That is, every VMP has at least one VMPP, AMP, and AMPP.

All AMPs have at least one AMPP. And All but 3 VMPPs have at least one AMPP.

# (SELECT vppid FROM dmd_vmpp) EXCEPT (SELECT DISTINCT(vppid) FROM dmd_ampp);
┌───────────────────┐
│       vppid       │
├───────────────────┤
│ 10722111000001100 │
│ 17371011000001101 │
│ 17370911000001109 │
└───────────────────┘

Not an invariant

Every pair of VMPP/AMP has a single AMPP

That is, some pairs of VMPP/AMP have multiple AMPPs.

There are 157 such pairs, involving 112 VMPs, 121 VMPPs, 152 AMPs, and 319 AMPPs.

Of these pairs, 34 only have one valid AMPP.

For the remaining 123, 24 are straight up duplicates. See Atorvastatin 20mg tablets (Dowelhurst Ltd) for an example.

The remaining 99 have differing fields as follows:

count differing fields
41 nm, subp, disccd, discdt
35 nm, subp
13 nm, subp, discdt
3 discdt
2 disccd, discdt
2 nm, subp, disccd
2 nm, legal_catcd, subp, disccd, discdt
1 legal_catcd, disccd, discdt

In the majority of these cases, the AMPPs correspond to different sub-packs (that's the subp field).

For instance:

# SELECT nm, subp FROM dmd_ampp WHERE vppid = 8080811000001103 AND apid = 8075911000001106;
┌───────────────────────────────────────────────────────────────────────────────────────────────┬────────────────┐
│                                              nm                                               │      subp      │
├───────────────────────────────────────────────────────────────────────────────────────────────┼────────────────┤
│ Visipaque 320 solution for injection 100ml bottles (Amersham Health) 10 bottle plastic bottle │ plastic bottle │
│ Visipaque 320 solution for injection 100ml bottles (Amersham Health) 10 bottle glass bottle   │ glass bottle   │
└───────────────────────────────────────────────────────────────────────────────────────────────┴────────────────┘

or

# select nm, subp from dmd_ampp where vppid = 34610211000001106 and apid = 34610311000001103;
┌──────────────────────────────────────────────────────────────────────────────────┬─────────────────┐
│                                        nm                                        │      subp       │
├──────────────────────────────────────────────────────────────────────────────────┼─────────────────┤
│ Ravicti 1.1g/ml oral liquid (Swedish Orphan Biovitrum Ltd) 25 ml 7 x 5ml syringe │ 7 x 5ml syringe │
│ Ravicti 1.1g/ml oral liquid (Swedish Orphan Biovitrum Ltd) 25 ml 7 x 3ml syringe │ 7 x 3ml syringe │
│ Ravicti 1.1g/ml oral liquid (Swedish Orphan Biovitrum Ltd) 25 ml 7 x 1ml syringe │ 7 x 1ml syringe │
│ Ravicti 1.1g/ml oral liquid (Swedish Orphan Biovitrum Ltd) 25 ml                 │ ¤               │
└──────────────────────────────────────────────────────────────────────────────────┴─────────────────┘

In the other 6 cases, there are two AMPP records for what is apparently the same thing, but with differing details about discontinuation. See for instance this and this.

CREATE MATERIALIZED VIEW dmd_vmpp_amp_dups AS
SELECT vppid,
       apid,
       COUNT(*)
FROM dmd_ampp
GROUP BY vppid,
         apid
HAVING COUNT(*) > 1
ORDER BY COUNT(*);

Thinking about the objects belonging to a VMP

I have found it helpful to think of the structure of the data by thinking just about the objects belonging to a single VMP. These objects can be arranged in a 2-dimensional table, with VMPPs as the column headings, AMPs as the row headings, and AMPPs in the table cells.

For example, here's the table for VMP Diclofenac 2.32% gel:

Diclofenac 2.32% gel 100 gram Diclofenac 2.32% gel 30 gram Diclofenac 2.32% gel 50 gram
Diclofenac 2.32% gel (DE Pharmaceuticals) Diclofenac 2.32% gel (DE Pharmaceuticals) 30 gram Diclofenac 2.32% gel (DE Pharmaceuticals) 50 gram
Diclofenac 2.32% gel (Colorama Pharmaceuticals Ltd) Diclofenac 2.32% gel (Colorama Pharmaceuticals Ltd) 30 gram Diclofenac 2.32% gel (Colorama Pharmaceuticals Ltd) 50 gram
Voltarol 12 Hour Emulgel P 2.32% gel (GlaxoSmithKline Consumer Healthcare) Voltarol 12 Hour Emulgel P 2.32% gel (GlaxoSmithKline Consumer Healthcare) 100 gram Voltarol 12 Hour Emulgel P 2.32% gel (GlaxoSmithKline Consumer Healthcare) 30 gram Voltarol 12 Hour Emulgel P 2.32% gel (GlaxoSmithKline Consumer Healthcare) 50 gram

We see that there are 3 VMPPs (for 100g, 30g, and 50g pack sizes), and 3 AMPs (for 3 different manufacturers). There are 7 AMPPs: each manufacturer produces 30g and 50g packs, but only GSK produce a 100g pack.

Clone this wiki locally