You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In my example below, partial_ratio_alignment seems to cut short the matching in the 2nd string, I was expecting it to include the additional "et."
Code:
query_string="Business's say they got nothing out of last night's budget."
contains_string="Business's say they've got nothing out of last night's budget. It's really hard out there!"
match = fuzz.partial_ratio_alignment(query_string, contains_string, score_cutoff = 90)
print(match)
print(query_string[match.src_start:match.src_end], contains_string[match.dest_start:match.dest_end])
Output:
ScoreAlignment(score=94.91525423728814, src_start=0, src_end=59, dest_start=0, dest_end=59)
("Business's say they got nothing out of last night's budget.", "Business's say they've got nothing out of last night's budg")
The text was updated successfully, but these errors were encountered:
partial_ratio uses a sliding window approach to find the optimal alignment of the shorter string with the longer string. So it will not find an alignment, where the subsequence in the longer string is longer than the shorter string. The subequence can be either as long as the shorter string or if it starts/ends at the start/end of the longer string can be shorter.
The metric you are searching for is Smith Waterman, which is not implemented in rapidfuzz yet: #175
FWIW, I had pretty good results with parasail. Here's an example:
query_string="Business's say they got nothing out of last night's budget."
contains_string="Business's say they've got nothing out of last night's budget. It's really hard out there!"
result = parasail.ssw(query_string, contains_string, 10, 1, parasail.blosum50)
print(query_string[result.read_begin1:result.read_end1+1])
print(contains_string[result.ref_begin1:result.ref_end1+1])
output:
Business's say they got nothing out of last night's budget.
Business's say they've got nothing out of last night's budget.
I later used rapidfuzz again for distance / score calculations.
In my example below, partial_ratio_alignment seems to cut short the matching in the 2nd string, I was expecting it to include the additional "et."
The text was updated successfully, but these errors were encountered: