-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate tests for Splink 4 (ComparisonLevelCreator
and ComparisonCreator
and related changes)
#1714
Conversation
as this is already backend-agnostic we don't really need this to live on the `test_helpers`, but this will lubricate the migration
ComparisonLevelCreator
ComparisonLevelCreator
These are things that cause test-collection to fail
tests now run at least, although mainly failing
@ADBond to add datediff and the other levels I've done, shall i just do a PR pointing at this branch? |
It's okay, I have merged in |
thanks! |
…omparison-tf Migrate tests - comparison term frequencies
Migrate tests - ctl
ComparisonLevelCreator
and ComparisonCreator
ComparisonLevelCreator
and ComparisonCreator
and related changes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a branch to translate our existing (Splink 3) tests to work on the new
ComparisonLevelCreator
class (see #1689, #1703). This will be updated as and when new features get implemented in Splink 4, specifically the various missing comparison levels, and the library and template comparisons)The changes to
splink/
itself should all be relatively minimal, and mostly pertain to allowing the tests to run - anything sufficiently substantial will instead get its own PR tosplink4_dev
. That said, it is probably good to check the changes there particularly carefully in case I accidentally do anything silly.The rest of the PR is large, but most of the changes should be fairly straightforward - mostly renaming + the small changes needed with updated syntax. I will try and highlight anything particularly noteworthy so that it doesn't get lost in the noise.
Progress:
LevenshteinLevel
DatediffLevel
JaroWinklerLevel
ColumnsReversedLevel
JaroLevel
JaccardLevel
DamerauLevenshteinLevel
DistanceInKMLevel
ArrayIntersectLevel
PercentageDifferenceLevel
CustomLevel
DistanceFunctionLevel
or_
and_
not_
ExactMatch
LevenshteinAtThresholds
CustomComparison
DamerauLevenshteinAtThresholds
JaccardAtThresholds
JaroAtThresholds
JaroWinklerAtThresholds
DistanceInKMAtThresholds
DatediffAtThresholds
ArrayIntersectAtSizes
DistanceFunctionAtThresholds
EmailComparison
DateComparison
NameComparison
PostcodeComparison
ForenameSurnameComparison
InputExpression
(forregex_extract
,lower
, etc)NullLevel
regex validationTests now actually run [although some have been temporarily disabled], so we can start to get semi-useful feedback.
Number of failing tests [postgres, sqlite respectively in parentheses where included]:
11096938564555257 😭5648464440393533 (5, 3)33 (15, 12)33 (5, 2)3132 😭25 (5, 0)24 (3, 0)8 (4, 2)8 (4, 0)0 (0, 0) 🥳 🎉 🎊 ✅ 😎