-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BlockingRule: Clarify name of sql property #1700
Conversation
splink/blocking.py
Outdated
@@ -40,15 +40,15 @@ def blocking_rule_to_obj(br): | |||
class BlockingRule: | |||
def __init__( | |||
self, | |||
blocking_rule: BlockingRule | dict | str, | |||
blocking_rule_sql: BlockingRule | dict | str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should allow this to take as an input any of those types - i think this should be string only
I've just pushed a commit to enforce this temporariliy to check that tests still pass
A wider thought for another PR/Splink 4 - I think It wouldn't be too hard to set the dialect for each BR in a central settings class/area when we go to unpack the settings dictionary (assuming we are planning to continue forward with that). It's generally bad practice to have high 'coupling' (dependence between functions), which this is a perfect example of - changes to the salting methodology needs to be closely mirrored in the function linked above. On top of that, it's quite hard to follow if you're new to the code base. It would be better if we simply had to call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good!
@@ -40,15 +40,20 @@ def blocking_rule_to_obj(br): | |||
class BlockingRule: | |||
def __init__( | |||
self, | |||
blocking_rule: BlockingRule | dict | str, | |||
blocking_rule_sql: str, | |||
salting_partitions=1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well add the int
type hint here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good opint. I'll add it to the next PR, which changes this a bit anyway
My comment above is already being addressed by - #1701. Glad to see we're thinking the same thing. |
Type of PR
Summary
This PR attempts to clarify the BlockingRule class. It is a step towards enabling ExplodingBlockingRules
It will be the first in a set of three PRs which incrementally add clarifications.
Clarify name of sql property
🤔 Previously
BlockingRule
had asql
property which was unused, and a property calledblocking_rule
which returned the sql. The later was confusing: did it return aBlockingRule
or a sql string?.Solution: There's now a single property called
blocking_rule_sql
Blocking rule init argument allowed
blocking_rule
to be BlockingRule, dict or str🤔 This made the code hard to read/unpredictable code because types were unclear
Solution It's now a string only
Clarify name methos
🤔 Previously
BlockingRule
had asql
propertyand_not_preceding_rules_sql(self)
. Not very clearSolution: I have clarified method names here to
exclude_pairs_generated_by_this_rule_sql
andexclude_pairs_generated_by_all_preceding_rules_sql
. This is partly because these implementations will differnt in #1692Finally I've also added type hinting to make refactoring a bit easier