Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BlockingRule: Clarify name of sql property #1700

Merged
merged 12 commits into from
Nov 7, 2023

Conversation

RobinL
Copy link
Member

@RobinL RobinL commented Nov 6, 2023

Type of PR

  • BUG
  • FEAT
  • MAINT
  • DOC

Summary

This PR attempts to clarify the BlockingRule class. It is a step towards enabling ExplodingBlockingRules

It will be the first in a set of three PRs which incrementally add clarifications.

Clarify name of sql property

🤔 Previously BlockingRule had a sql property which was unused, and a property called blocking_rule which returned the sql. The later was confusing: did it return a BlockingRule or a sql string?.

Solution: There's now a single property called blocking_rule_sql

Blocking rule init argument allowed blocking_rule to be BlockingRule, dict or str

🤔 This made the code hard to read/unpredictable code because types were unclear

Solution It's now a string only

Clarify name methos

🤔 Previously BlockingRule had a sql property and_not_preceding_rules_sql(self). Not very clear

Solution: I have clarified method names here toexclude_pairs_generated_by_this_rule_sql and exclude_pairs_generated_by_all_preceding_rules_sql. This is partly because these implementations will differnt in #1692

Finally I've also added type hinting to make refactoring a bit easier

@RobinL RobinL changed the title rename sql property BlockingRule: Clarify name of sql property Nov 6, 2023
@RobinL RobinL requested a review from ThomasHepworth November 6, 2023 18:01
@@ -40,15 +40,15 @@ def blocking_rule_to_obj(br):
class BlockingRule:
def __init__(
self,
blocking_rule: BlockingRule | dict | str,
blocking_rule_sql: BlockingRule | dict | str,
Copy link
Member Author

@RobinL RobinL Nov 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should allow this to take as an input any of those types - i think this should be string only

I've just pushed a commit to enforce this temporariliy to check that tests still pass

@ThomasHepworth
Copy link
Contributor

A wider thought for another PR/Splink 4 - I think apply_salt should be handled from within the BlockingRule class, not outside.

It wouldn't be too hard to set the dialect for each BR in a central settings class/area when we go to unpack the settings dictionary (assuming we are planning to continue forward with that).

It's generally bad practice to have high 'coupling' (dependence between functions), which this is a perfect example of - changes to the salting methodology needs to be closely mirrored in the function linked above.

On top of that, it's quite hard to follow if you're new to the code base.

It would be better if we simply had to call br.salted_blocking_rules to get the output. Whether it came from a SparkLinker would then ideally be handled by the BlockingRule class (or a salting subclass).

Copy link
Contributor

@ThomasHepworth ThomasHepworth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good!

@@ -40,15 +40,20 @@ def blocking_rule_to_obj(br):
class BlockingRule:
def __init__(
self,
blocking_rule: BlockingRule | dict | str,
blocking_rule_sql: str,
salting_partitions=1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well add the int type hint here too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good opint. I'll add it to the next PR, which changes this a bit anyway

@ThomasHepworth
Copy link
Contributor

My comment above is already being addressed by - #1701. Glad to see we're thinking the same thing.

@RobinL RobinL merged commit 5104945 into master Nov 7, 2023
@RobinL RobinL deleted the refactor_blockingrule_rename_sql_property branch November 7, 2023 18:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants