Enormously incorrect m probability; how can I troubleshoot? #1434
-
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
and what it looks like to me is that the number of records that match the Take a look at the table of comparisons that comes out of .predict(). Filter this so you only look at comparisons where eg In general, if the number of records in each level is too small, I try to adjust the specificity of each rule so that the comparisons are more equally distributed between levels. So for instance I might just remove the |
Beta Was this translation helpful? Give feedback.
Im looking at
and what it looks like to me is that the number of records that match the
LastName jaro_winkler_similarity >= 0.9
level is very small, both for true-matches and non-matches. This could lead to outlier error. For example if there is only one record among true matches where this happens, and 10 records among non-matches, then this makes the model think that if it sees this level, it is very indicative of a non-match. If you had more records then this ratio might not be as skewed.Take a look at the table of comparisons that comes out of .predict(). Filter this so you only look at comparisons where eg
gamma_last_name == 3
(except use the right column name, and it might not be …