calling similarity on large array #419

onbio · 2024-11-21T04:29:00Z

onbio
Nov 21, 2024

I wish to calculate the normalized similarity rapidfuzz.distance.Indel.normalized_similarity(val1, val2)
I have a word to match in the second column of a large tab delaminated file. I wish to get the whole line along with the normalized similarity scores of anything matching over 85%

For eg
The word to match "evtatinyn"

few lines from input file:
mui001 lewthtin 0.000007
xui008 levthatin 0.0010004
ui1 [vtatinyn 0.0000807
ul5 levthatin 0.000003
ppu5 gevtiktin 0.000000002
pip9 lewttin 0.00008
muix1 mewttingiants 0.0000002
ftk69 wttinoo[ys 0.00001

I wish to get lines having similarity score in the second column over 85% (something like 0.8888888888888888 ui1 [vtatinyn 0.0000807)

At present I am loading data using pandas indata = pd.read_csv('input.tab', sep='\t', lineterminator='\n')

Any assistance will be helpful.
Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

calling similarity on large array #419

{{title}}

Replies: 0 comments

Select a reply

calling similarity on large array #419

onbio Nov 21, 2024

Replies: 0 comments

onbio
Nov 21, 2024