forked from codespell-project/codespell
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Replace
data: str
with candidates: Sequence[str]
When the spelling dictionaries are loaded, previously the correction line was just stored in memory as a simple text. Through out the code, callers would then have to deal with the `data` attribute, correctly `split()` + `strip()` it. With this change, the dictionary parsing code now encapsulates this problem. The auto-correction works from the assumption that there is only one candidate. This assumption is invariant and seem to be properly maintained in the code. Therefore, we can just pick the first candidate word when doing a correction. In the code, the following name changes are performed: * `Misspelling.data` -> `Misspelling.candidates` * `fixword` -> `candidates` when used for multiple candidates (`fixword` remains for when it is a correction) On performance: Performance-wise, this change moves computation from "checking" time to "startup" time. The performance cost does not appear to be noticeable in my baseline (codespell-project#3419). Though, keep the corpus weakness on the ratio of cased vs. non-cased corrections with multiple candidates in mind. The all lowercase typo is now slightly more expensive (it was passed throughout `fix_case` and fed directly into the `print` in the original code. In the new code, it will always need a `join`). There are still an overweight of lower-case only corrections in general, so the unconditional `.join` alone is not sufficient to affect the performance noticeably.
- Loading branch information
Showing
3 changed files
with
30 additions
and
23 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,16 +15,16 @@ | |
Copyright (C) 2010-2011 Lucas De Marchi <[email protected]> | ||
Copyright (C) 2011 ProFUSION embedded systems | ||
""" | ||
from typing import Dict, Set | ||
from typing import Dict, Set, Sequence | ||
|
||
# Pass all misspellings through this translation table to generate | ||
# alternative misspellings and fixes. | ||
alt_chars = (("'", "’"),) # noqa: RUF001 | ||
|
||
|
||
class Misspelling: | ||
def __init__(self, data: str, fix: bool, reason: str) -> None: | ||
self.data = data | ||
def __init__(self, candidates: Sequence[str], fix: bool, reason: str) -> None: | ||
self.candidates = candidates | ||
self.fix = fix | ||
self.reason = reason | ||
|
||
|
@@ -44,7 +44,11 @@ def add_misspelling( | |
fix = True | ||
reason = "" | ||
|
||
misspellings[key] = Misspelling(data, fix, reason) | ||
misspellings[key] = Misspelling( | ||
tuple(c.strip() for c in data.split(",")), | ||
fix, | ||
reason, | ||
) | ||
|
||
|
||
def build_dict( | ||
|