-
-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SymSpell.from(words, {maxDistance: 3, verbosity: 2}) taking long time to load large word dictionary #210
Comments
Hello @kabyanil, would decreasing Also, and this might not be intuitive, you really want to benchmark against a linear search over the dataset because in some scenarios (very long strings typically), it might be faster. |
|
It is not a good idea to serialize deletion combinations. It will not make the index faster to instantiate as computing deletion combinations is not costly per se. It's would be the same as iterating over already generated ones.
I don't think so. If I remember correctly, the costly thing here is basically that you need to index a whole lot of strings in a hashmap, which takes time. All Levenshtein-based indices basically need to insert some subspace of possible combinations into a map and this effectively translate time complexity into space complexity. But once again, maxDistance 3 is very ambitious for most of those indices that are all mainly targeting 1-2 maxDistance because you are fighting against combinatorics.
No idea. But I feel this is probably not the case. Have you checked how hunspell and other spell checkers deal with this problem? They usually rely on clever heuristics that alleviate the fact that you would need perfect Levenshtein distance computation to work. |
I am loading a dictionary of 163,196 words into SymSpell. It is taking more than a minute to load. If I load more than that, the Tauri app crashes. I am wondering, is there any way to load the words once, generate the precalculated delete combinations, and save the generated dictionary for later use? It seems to me that the precalculation step is redundant, running every time the app is launched.
Please share your thoughts on this. Thanks.
The text was updated successfully, but these errors were encountered: