Skip to content

Commit

Permalink
fix(linking): fetch all Person records when calculating Belongingness… (
Browse files Browse the repository at this point in the history
#97)

… Ratio

## Description
* Fix Belongingness Ratio calculation by fetching all Patient records
within the Person Clusters identified in Blocking
* Waiting to update tests until review of code changes, in case
reviewers want to see the different linkage results

## Related Issues
#90
  • Loading branch information
alhayward authored Oct 30, 2024
1 parent 65f9563 commit 0fb8042
Show file tree
Hide file tree
Showing 4 changed files with 37 additions and 28 deletions.
8 changes: 4 additions & 4 deletions docs/mpi-design.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ The `Person` model represents an individual in the MPI system. Each person may h

### 2. **Patient Model**

The `Patient` model represents an external record for a person. This is a point-in-time representation of an individual sourced in a health care document. Each `Patient` is linked to a single `Person` and contain multiple `BlockingValue` records that aid in matching similar patients.
The `Patient` model represents an external record for a person. This is a point-in-time representation of an individual sourced in a health care document. Each `Patient` is linked to a single `Person` and contains multiple `BlockingValue` records that aid in matching similar patients.

### 3. **BlockingValue Model**

Expand Down Expand Up @@ -62,12 +62,12 @@ The MPI system is designed to link records from different sources that potential
2. **Blocking**:
The **BlockingKey** enum defines the types of blocking values that are generated from the patient data. For example, the first 4 characters of the patient's first name or their birthdate can serve as a blocking key.

Blocking is used to reduce the search space for potential matches by grouping patients based on these simplified values. Only records that share the same blocking values are compared in detail, making the matching process more efficient.
Blocking is used to reduce the search space for potential matches by grouping patients based on these simplified values. Patient records that share the same blocking values _or_ patient records in the Person clusters of those that share the same blocking values are compared in detail. This makes the matching process more efficient.

We use the **BlockingValue** records from the incoming patient to quickly find potential matches against existing patients in the MPI (matching on their blocking values).
We use the **BlockingValue** records from the incoming patient to quickly find potential matches against existing patients in the MPI (matching on their blocking values, and including any patient records in the Person clusters of those blocked on).

3. **Record Linkage**:
The system compares patient records that share the same blocking values. It uses other details stored in the **Patient** model, such as name, address, MRN, etc, to calculate a **belongingness ratio**, indicating the percentage of patients within a person cluster that the new patient record matches with.
The system compares the incoming patient record against patient records that share the same blocking values, as well as patient records in the Person clusters of those that share the same blocking values (even if these patients weren't blocked on individually). It uses other details stored in the **Patient** model, such as name, address, MRN, etc, to calculate a **belongingness ratio**, indicating the percentage of patients within a person cluster that the new patient record matches with.

4. **Person Linking**:
If the belongingness ratio exceeds a threshold, the patient is linked to an existing **Person** in the MPI. If it does not exceed the threshold, that means no suitable match was found, and a new **Person** record is created.
Expand Down
2 changes: 2 additions & 0 deletions src/recordlinker/linking/link.py
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,8 @@ def link_record_against_mpi(
# block on the pii_record and the algorithm's blocking criteria, then
# iterate over the patients, grouping them by person
with TRACER.start_as_current_span("link.block"):
# get all candidate Patient records identified in blocking
# and the remaining Patient records in their Person clusters
patients = mpi_service.get_block_data(session, record, algorithm_pass)
for patient in patients:
clusters[patient.person].append(patient)
Expand Down
13 changes: 8 additions & 5 deletions src/recordlinker/linking/mpi_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,14 +17,18 @@


def get_block_data(
session: orm.Session, record: schemas.PIIRecord, algorithm_pass: models.AlgorithmPass
session: orm.Session,
record: schemas.PIIRecord,
algorithm_pass: models.AlgorithmPass
) -> typing.Sequence[models.Patient]:
"""
Get all of the matching Patients for the given data using the provided
blocking keys defined in the algorithm_pass.
blocking keys defined in the algorithm_pass. Also, get all the
remaining Patient records in the Person clusters identified in
blocking to calculate Belongingness Ratio.
"""
# Create the base query
base = expression.select(models.Patient.id).distinct()
base = expression.select(models.Patient.person_id).distinct()

# Build the join criteria, we are joining the Blocking Value table
# multiple times, once for each Blocking Key. If a Patient record
Expand Down Expand Up @@ -55,10 +59,9 @@ def get_block_data(
)

# Using the subquery of unique Patient IDs, select all the Patients
expr = expression.select(models.Patient).where(models.Patient.id.in_(base))
expr = expression.select(models.Patient).where(models.Patient.person_id.in_(base))
return session.execute(expr).scalars().all()


def insert_patient(
session: orm.Session,
record: schemas.PIIRecord,
Expand Down
42 changes: 23 additions & 19 deletions tests/unit/linking/test_mpi_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -179,8 +179,13 @@ def test_with_person_and_external_patient_id(self, session):
class TestGetBlockData:
@pytest.fixture
def prime_index(self, session):

person_1 = models.Person()
session.add(person_1)
session.flush()

data = [
{
({
"name": [
{
"given": [
Expand All @@ -191,8 +196,8 @@ def prime_index(self, session):
}
],
"birthdate": "01/01/1980",
},
{
}, person_1),
({
"name": [
{
"given": [
Expand All @@ -202,8 +207,8 @@ def prime_index(self, session):
}
],
"birthdate": "1943-2-25",
},
{
}, None),
({
"name": [
{
"given": [
Expand All @@ -214,8 +219,8 @@ def prime_index(self, session):
{"given": ["John"], "family": "Lewis"},
],
"birthdate": "1980-01-01",
},
{
}, None),
({
"name": [
{
"given": [
Expand All @@ -225,8 +230,8 @@ def prime_index(self, session):
}
],
"birthdate": "1980-01-01",
},
{
}, person_1),
({
"name": [
{
"given": [
Expand All @@ -236,8 +241,8 @@ def prime_index(self, session):
}
],
"birthdate": "1980-01-01",
},
{
}, person_1),
({
"name": [
{
"given": [
Expand All @@ -247,8 +252,8 @@ def prime_index(self, session):
}
],
"birthdate": "1985-11-12",
},
{
}, None),
({
"name": [
{
"given": [
Expand All @@ -258,10 +263,10 @@ def prime_index(self, session):
}
],
"birthdate": "",
},
}, None)
]
for datum in data:
mpi_service.insert_patient(session, schemas.PIIRecord(**datum))
for (datum, person) in data:
mpi_service.insert_patient(session, schemas.PIIRecord(**datum), person=person)

def test_block_invalid_key(self, session):
data = {
Expand Down Expand Up @@ -408,7 +413,7 @@ def test_block_on_birthdate_first_name_and_last_name(self, session, prime_index)
"birthdate": "Jan 1 1980",
}
matches = mpi_service.get_block_data(session, schemas.PIIRecord(**data), algorithm_pass)
assert len(matches) == 2
assert len(matches) == 3
data = {
"name": [
{
Expand All @@ -431,7 +436,6 @@ def test_block_on_multiple_names(self, session, prime_index):
{"use": "maiden", "given": ["John"], "family": "Doe"},
]
}
algorithm_pass = models.AlgorithmPass(blocking_keys=["FIRST_NAME", "LAST_NAME"])
algorithm_pass = models.AlgorithmPass(
id=1,
algorithm_id=1,
Expand All @@ -442,7 +446,7 @@ def test_block_on_multiple_names(self, session, prime_index):
kwargs={},
)
matches = mpi_service.get_block_data(session, schemas.PIIRecord(**data), algorithm_pass)
assert len(matches) == 4
assert len(matches) == 5

def test_block_missing_keys(self, session, prime_index):
data = {"birthdate": "01/01/1980"}
Expand Down

0 comments on commit 0fb8042

Please sign in to comment.