Develop evaluation methods for matching models #23

not-the-fish · 2017-09-28T16:04:28Z

We will want to compare, select, and evaluate matching models. This requires generating and storing metrics (see dssg/pgdedupe#20 for some possibilities) and, perhaps comparing Type I and Type II error rates on labeled pairs not used in the training data (see #20).

This will likely entail storing metrics in a metrics table and a notebook/methods/workflow for conducting comparisons and evaluations.

not-the-fish · 2017-10-04T14:55:29Z

Many of these metrics with have cluster score thresholds (see #26), so the metrics table should be similar in shape to the triage results evaluations table.

not-the-fish · 2017-10-04T15:11:45Z

Metrics:

Number of clusters @ threshold
Number of unmatched records @ threshold
Number of exact matches
Average size of cluster @ threshold
Maximum size of cluster @ threshold
Percentage of clusters of size 2 @ threshold
Number of blocks
Average size of blocks
Maximum size of block
Minimum size of block
Precision and recall on holdout labels @ threshold (see Store labeled pairs in a table #20)

thcrock · 2018-04-19T18:04:20Z

@nanounanue here are ideas for metrics

thcrock · 2018-04-19T18:06:20Z

From Joe:
recall
number of unique persons identified
This is one way to check whether the model is not matching enough people. e.g. If we don't match anyone -- we assume every event is for a separate person -- we'll probably get a ridiculous number of people in the data. We might even get more people than live in the jurisidiction
[] Measure of variation on the number of persons identified
[] maximum number of person events
To understand what I mean, think in the extreme, where we say all records belong to a single person. That person would have more events than is reasonable, e.g. 1 person has 10,000 jail bookings. This can help provide a check on the quality of the matches
[] Number of times the user says the model made a mistake

not-the-fish mentioned this issue Oct 12, 2017

Calculate evaluation statistics #31

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Develop evaluation methods for matching models #23

Develop evaluation methods for matching models #23

not-the-fish commented Sep 28, 2017

not-the-fish commented Oct 4, 2017

not-the-fish commented Oct 4, 2017 •

edited

Loading

thcrock commented Apr 19, 2018

thcrock commented Apr 19, 2018

Develop evaluation methods for matching models #23

Develop evaluation methods for matching models #23

Comments

not-the-fish commented Sep 28, 2017

not-the-fish commented Oct 4, 2017

not-the-fish commented Oct 4, 2017 • edited Loading

thcrock commented Apr 19, 2018

thcrock commented Apr 19, 2018

not-the-fish commented Oct 4, 2017 •

edited

Loading