-
I am new to this framework, and I am excited to get started. I have followed the tutorial, and I have a question. I am sorry if it has already been asked and answered before, but I was unable to find it. Kind regards, |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
This would typically be done with match weights. You could fix them for that particular field, see: Basically you want a very strong negative match weight for the 'does not match' case. So in your case: import splink.comparison_level_library as cll
import splink.comparison_library as cl
from splink import DuckDBAPI, Linker, SettingsCreator, block_on, splink_datasets
db_api = DuckDBAPI()
df = splink_datasets.fake_1000
settings = SettingsCreator(
link_type="dedupe_only",
comparisons=[
cl.ExactMatch("first_name"),
cl.ExactMatch("surname"),
cl.ExactMatch("dob"),
cl.ExactMatch("city").configure(term_frequency_adjustments=True),
cl.CustomComparison(
comparison_levels=[
cll.NullLevel("email"),
cll.ExactMatchLevel("email").configure(
m_probability=0.99, # You could fix these or not bother
u_probability=0.01, # You could fix these or not bother
fix_m_probability=True, # You could fix these or not bother
fix_u_probability=True, # You could fix these or not bother
),
cll.ElseLevel().configure(
m_probability=1e-7, #This is the strong negative weight that needs to be fixed. make this value even close to 0 for even stronger negative mathch weight
u_probability=1.00, #This is the strong negative weight that needs to be fixed
fix_m_probability=True, #This is the strong negative weight that needs to be fixed
fix_u_probability=True, #This is the strong negative weight that needs to be fixed
),
],
output_column_name="email",
),
],
blocking_rules_to_generate_predictions=[
block_on("first_name"),
block_on("surname"),
],
max_iterations=2,
)
linker = Linker(df, settings, db_api)
linker.training.estimate_probability_two_random_records_match(
[block_on("first_name", "surname")], recall=0.7
)
linker.training.estimate_u_using_random_sampling(max_pairs=1e6)
linker.training.estimate_parameters_using_expectation_maximisation(block_on("dob"))
linker.visualisations.match_weights_chart() |
Beta Was this translation helpful? Give feedback.
This would typically be done with match weights. You could fix them for that particular field, see:
#2379
Basically you want a very strong negative match weight for the 'does not match' case.
Note when you fix the values you don't need to worry about them summing to one.
So in your case: