Skip to content
This repository has been archived by the owner on Jul 22, 2024. It is now read-only.

"Space word jump" heuristic works incorrectly when abbreviation has a space #9

Open
dko-slapdash opened this issue Jul 24, 2020 · 4 comments

Comments

@dko-slapdash
Copy link

I'm trying to set SCORE_CHARACTER_JUMP=0 to disable matching mid-word characters.
This is to e.g. not let “Filter by: Assigned To” string to match "test" abbreviation which makes sense.
Like "words prefix-only search".

But it has some side effect in how the library processes spaces.

Imagine for simplicity that we have string="se f" and abbreviation="s f"

In this case, we expect that the abbreviation will perfectly match the string, but it doesn't happen - the score returned is 0.

I think the solution is following:

image

I.e. if the currently checked abbreviation character is a space, and there is a space in the string, we should continue no matter how far this space is (not necessarily if the current string character is also a space).

@ConradIrwin
Copy link
Contributor

@dko-slapdash Thanks for the report!

I think that setting SCORE_CHARACTER_JUMP = 0 is not quite what you want to do, because in the example you give you're jumping a character (e).

In general I'd like for command-score to allow aggressive abbreviations so act should definitely be allowed to match account; or ace throw. However as you point out, it's not clear that it's useful for act to match anchor point because the jump is across a word-boundary to not-the-start-of-the-next-word.

To fix that we'd need to teach the library to notice this kind of leap explicitly, which may be a little more complicated than the change you propose, and I'd be open to trying it out and seeing if the experience is better or worse.

FWIW: At superhuman we use this library in a UX that selects one item, and suggests three more out of a list that's a few hundred items long. In that scenario, it's OK for words that don't match anything very well to suggest something because it's better than just showing an error.

@dko-slapdash
Copy link
Author

Thanks for the quick reply! We're actually experimenting with SCORE_CHARACTER_JUMP=0, SCORE_TRANSPOSITION=0 mode (because without SCORE_TRANSPOSITION=0, having just SCORE_CHARACTER_JUMP=0 is not enough). Considering that it's unlikely that people would search by mid-word characters (and also since our strings are 3-4 word sentences, not like CamelCaseIdentifiers or file_names.ts.

We also don't limit the number of matched strings, i.e. we use superhumanCommandScore as both filtering AND ranking tool.

@ConradIrwin
Copy link
Contributor

If you only want to support matches that are prefix matches of words, then you can likely save a lot of computation time by reducing the number of matches considered.

Right now we do index = lowerString.indexOf(abbreviationChar, index + 1); to iterate over every potential character match. You can probably change this to jump to the next match of /\s{abbreviationChar}/ if you don't want in-word jumps at all, or something a bit more complicated if you want in-word jumps in the current word, but not jumps across spaces.

@dko-slapdash
Copy link
Author

dko-slapdash commented Jul 24, 2020

I think \s approach won’t work, because I want “sefi” (or “se fi” after the change I posted above) to match “search files”. I.e. here “e” matches a mid-word character, but it’s okay, because it goes after “s” which matches a start-word character; same is with “i” after “f”.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants