The smart-match module contains functions for calculating strings/sets similarity.
similarity: A value in a range of [0, 1], which represents how similar the two strings are. The larger the value, the more similar the two strings are.
dissimilarity: A value in a range of [0, 1], which represents how dissimilar the two strings are. The larger the value, the more dissimilar the two strings are. For a pair of strings, similarity = 1 - dissimilarity
distance: How far the two strings are. Notice that not all the methods support distance method.
score The larger the score, the more similar the two strings are. Notice not all the methods have score method.
We support three levels of string matching.
char: Similarity computation based on characters in the strings.
term: Similarity computation based on terms in the strings.
gram: Similarity computation based on q-grams in the strings.
We support the following methods.
Method | similarity | dissimilarity | distance | score |
Levenshtein (default) | ✅ | ✅ | ✅ | ❌ |
Euclidean | ✅ | ✅ | ✅ | ❌ |
Damerau Levenshtein | ✅ | ✅ | ✅ | ❌ |
Block Distance | ✅ | ✅ | ✅ | ❌ |
Cosine | ✅ | ✅ | ❌ | ❌ |
Tanimoto Coefficient | ✅ | ✅ | ❌ | ❌ |
Dice | ✅ | ✅ | ❌ | ❌ |
Simon White | ✅ | ✅ | ❌ | ❌ |
Longest Common Substring | ✅ | ✅ | ✅ | ✅ |
Longest Common SubSequence | ✅ | ✅ | ✅ | ✅ |
Overlap Coefficient | ✅ | ✅ | ❌ | ❌ |
Generalized Overlap Coefficient | ✅ | ✅ | ❌ | ❌ |
Jaccard | ✅ | ✅ | ❌ | ❌ |
Generalized Jaccard | ✅ | ✅ | ❌ | ❌ |
Hamming | ✅ | ✅ | ✅ | ❌ |
Jaro | ✅ | ✅ | ❌ | ❌ |
Jaro Winkler | ✅ | ✅ | ❌ | ❌ |
Needleman Wunch | ✅ | ✅ | ❌ | ✅ |
Smith Waterman | ✅ | ✅ | ❌ | ✅ |
Smith Waterman Gotoh | ✅ | ✅ | ❌ | ✅ |
Monge Elkan | ✅ | ✅ | ❌ | ❌ |
pip install smart-match
import smart_match
print(smart_match.similarity('hello', 'hero'))
print(smart_match.dissimilarity('hello', 'hero'))
print(smart_match.distance('hello', 'hero'))
Check Wiki for more details.
smart-match is a free software. See the file LICENSE for the full text.