Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: fuzzy matching #8

Open
infokiller opened this issue Jan 3, 2017 · 18 comments
Open

Feature request: fuzzy matching #8

infokiller opened this issue Jan 3, 2017 · 18 comments

Comments

@infokiller
Copy link

Do you think it would be hard to support it?

I think that can make typing much easier on average.

Thanks!

@mike-fabian
Copy link
Owner

There is some degree of fuzzy matching already.

If you have the hunspell dictionaries installed and
ibus-typing-booster can find them and python3-enchant is available,
you will get spell checking matches. I.e. if you slightly mistype a
word, you will still get a match.

On top of that, matching is accent insensitive, i.e. if you type in a
language using many accents (Czech, French, German, ...) you can type
words without the accents and select the correct words with the
accents in most cases. Or even type the accents completely wrong and
still get matches of the correct words. That helps a lot if you are
learning the language and are not sure about the correct accents.

Is that what you meant?

@infokiller
Copy link
Author

What I mean is that if I type "hm" I will also get a match for "home", similar to how, for example, https://github.com/junegunn/fzf works.

@mike-fabian
Copy link
Owner

For something as short as "hm" that will easily give far too many matches. If you have too many matches to choose from, speed will suffer.
fzf seems focused on command lines, not human languages.

@infokiller
Copy link
Author

Can you please elaborate on the speed issues you expect with lots of matches?
Also, how is the scoring/ranking done?

@mike-fabian
Copy link
Owner

I mean the speed issues of the user having to select from too many
matches. For example, “hm” could match “him”, “ham”, “hum”, “home”,
“hame”, “ohm”, ... and probably many others. If the candidate list
shows many matches you have to look at all of them and decide which is
the one you wanted. That takes time and just typing the complete word
may be much faster then, at least if you are a fast typist. The length
of the first page of the candidate list is configurable, it can be
between 1 and 9. 9 is already quite a lot. If the desired match is not
in the first page of the candidate list and you have to scroll down to
the next page or pages, this is almost always slower than just
continue typing the complete word.

Lots of matches can be useful only in rare cases, like when trying to
find a particular emoji for example.

Swiftkey, a similar program to speed up typing which runs on Android
never shows more than 3 candidates. Showing lots of candidates isn’t
really useful, more important is to show high quality candidates,
i.e. candidates you are really likely to type.

The scoring is done by remembering what the user types often.

For example, if you often type “I want beer” and less often type “I
want bread”, the next time you type “I want b” you will see “beer” as
the first candidate. I.e. if you type one or more letters (like “b”),
ibus-typing-booster calculates which words starting with these letters
you typed most often considering the words you typed immediately
before that (like “I want”). In Gnome programs, ibus-typing-booster
can get that context by asking the program you are typing in for the
context left of the cursor, i.e. even if you move the cursor somewhere
else using the mouse or the arrow keys, ibus-typing-booster can get
the correct context. In non-Gnome programs this doesn’t work and
ibus-typing-booster falls back to just remembering the last words you
typed, in that case the context may be wrong if you move the cursor
and continue typing in a different position.

Try this out by typing a few sentences and then type similar
sentences again, you will see that you get words scored high
in the candidates which you typed recently in that context.

If you type a word completely but spelled wrong, for example if you
type “somwhere” you will get a spell checking suggestion like
“somewhere” high up in the candidate list. If you select that
candidate “somewhere” and do the same spelling mistake again later,
ibus-typing-booster remembers that and after typing “somw” you will
already see “somewhere” in the candidate list (Because it remembers
that you recently type something starting with “somw” and then
selected “somewhere”).

@infokiller
Copy link
Author

Thanks @mike-fabian for the thorough explanation!

The way I see it, the problem you describe with having to scan a long list of candidates is more about scoring/ranking than matching. If you think about it, (boolean) matching can be viewed as some form of scoring as well (where matches get the score 1 and non-matches get 0, assuming you show the user all words with score > 0).
So if the scoring works well, even if there are lots of matches, the right matches will be on top. For example, given a user query of "ho", it probably makes sense to rank "home" higher than "through", even though both match fuzzily.
Now, getting the scoring perfect is an unsolved problem. But it may be feasible to make the scoring good enough so that fuzzy matching will be a positive change.

@wolfv
Copy link

wolfv commented Oct 28, 2019

@mike-fabian @infokiller I have extracted the Android auto-suggestion and correction library from the last open source release of the Android (now Google) keyboard. It is basically a C++ library.

I am working on nice Python bindings. You can check it out here:

https://github.com/wolfv/suggestr

My initial version from a couple years ago also had a DBus Server, which we could easily revamp to make it work. I don't know the exact model how ibus-typing booster retrieves suggestions but I think using the Android keyboard sources is ideal. It is very fast, supports the usage of context, comes with a couple of good dictionaries (that have probabilities).

Maybe someone has a cycle or two to try to get it to work together? I think this would be a game-changer, not only for ibus-typing-booster, but also for touchscreen keyboards in Gnome (and upcoming phones).

@infokiller
Copy link
Author

@wolfv sounds great, I agree that the Android keyboard is a good direction!

@mike-fabian @infokiller I have extracted the Android auto-suggestion and correction library from the last open source release of the Android (now Google) keyboard. It is basically a C++ library.

I am working on nice Python bindings. You can check it out here:

https://github.com/wolfv/suggestr

My initial version from a couple years ago also had a DBus Server, which we could easily revamp to make it work.

What did you use a dbus server for?

I don't know the exact model how ibus-typing booster retrieves suggestions but I think using the Android keyboard sources is ideal. It is very fast, supports the usage of context, comes with a couple of good dictionaries (that have probabilities).

Maybe someone has a cycle or two to try to get it to work together? I think this would be a game-changer, not only for ibus-typing-booster, but also for touchscreen keyboards in Gnome (and upcoming phones).

By "get it to work together", do you mean combining suggestr with ibus-typing-booster?

@mike-fabian
Copy link
Owner

@mike-fabian @infokiller I have extracted the Android auto-suggestion and correction library from the last open source release of the Android (now Google) keyboard. It is basically a C++ library.

That means the current version is not open source anymore?

I am working on nice Python bindings. You can check it out here:

https://github.com/wolfv/suggestr

My initial version from a couple years ago also had a DBus Server, which we could easily revamp to make it work. I don't know the exact model how ibus-typing booster retrieves suggestions but I think using the Android keyboard sources is ideal. It is very fast, supports the usage of context, comes with a couple of good dictionaries (that have probabilities).

Maybe someone has a cycle or two to try to get it to work together? I think this would be a game-changer, not only for ibus-typing-booster, but also for touchscreen keyboards in Gnome (and upcoming phones).

I’ll have a look.

But I was never really impressed by the prediction quality of the Android keyboard, SwiftKey seems to work much better.

@wolfv
Copy link

wolfv commented Oct 29, 2019

yes, at some point in time google decided to create a close-sourced fork of the keyboard, and rebrand it as Google Keyboard. Some features (such as swipe-to-type) have never been available in the open source version, although some code fragments for this feature can be found in the source code.
However, the sources for the original Android keyboard can still be found and are also still maintained.

Well ... if you find the sources for SwiftKey somewhere, sure, that might be better. But it's not a trivial piece of software so I think we should work with what we got :) Everything is going to be quite a step up from the current approach of using Hunspell (which doesn't factor in the keyboard layout, for example, and doesn't offer any real predictive capabilities, as well as not taking into account the context).

I'd be excited if you give it a shot, even though my stuff is also not really "finished".

Regarding the DBus server: this was just an experiment to create some API to the code. I used it to get suggestions into UberWriter (as a prototype). The Python interface will be much more powerful and we'll be able to create DBus or other bindings going from there.

@mike-fabian
Copy link
Owner

Factoring in the keyboard layout is more difficult on the desktop because you don't know what the keyboard layout actually is, usually.

@mike-fabian
Copy link
Owner

mike-fabian commented Oct 29, 2019

The current approach is not only hunspell, that is a sort of last ditch fallback if no learned data is available yet. And of course hunspell gives spellchecking suggestions. Preditions in ibus-typing-booster at the moment rely mostly on what the user has typed in the past already and it does take context into account. Only if nothing can be found in the user database for that context, hunspell predictions end up at the top of the prediction list. Usually I see predictions from my user database at the top of the prediction list.

@wolfv
Copy link

wolfv commented Oct 29, 2019

Ok, then I probably haven't tried it for long enough, sorry about that.

Locally I have example code for the suggestr package that allows one to build up user dictionaries (with probabilities etc) as well, there is some pretty advanced code in the keyboard sources to allow that. The bigger problem is that this is all largely undocumented, so it's a bit like reverse-engineering.

I think most keyboard layouts are fairly standard, and the layouts are also configurable. Exact matching layouts might not be required, but the knowledge which keys are next to each other can help a good bit for getting fitting results IMO (that's likely also how SwiftKey and anybody else does it).

@mike-fabian
Copy link
Owner

Some keyboards like the French one for example are very different.

@wolfv
Copy link

wolfv commented Oct 30, 2019

Sure - you can dynamically switch Layouts with the android keyboard sources. here is (the only) currently included definition: https://github.com/wolfv/suggestr/blob/master/prediction/key_set.h

Will need to figure out how to properly modify this to support other layouts then QWERTY but it can certainly be done :)

Also, of course, touch and regular keyboards are somewhat different, so maybe it would be good to tune some other geometric values -- but on the other hand I think out of the box this will give nice predictions.

@wolfv
Copy link

wolfv commented Nov 5, 2019

Here is more work (apparently with dynamic keyboard layouts) in the Chromium sources: https://chromium.googlesource.com/chromiumos/platform/suggest/+/refs/heads/master/src/demo.cc

@wolfv
Copy link

wolfv commented Nov 22, 2019

@mike-fabian I have just added support for reading in arbitrary layouts (in Python), you can find an example Notebook using the German keyboard layout here: https://github.com/wolfv/suggestr/blob/master/notebooks/Example%20Usage.ipynb

And the layouts are stored here: https://github.com/wolfv/suggestr/blob/master/layouts/de.yaml

Note that these layouts are taken from the squeekboard repo (the Librem 5 soft keyboard).
The code to read in the layout is pretty straight forward and we could adjust it to any sort of keyboard layout definition. And we could also make a translator from the original YAML layout definition to a simpler, pre-computed file format that can be easily parsed from within C++.

Let me know what you think!

@wolfv
Copy link

wolfv commented Jan 12, 2020

@mike-fabian did you ever have a chance to look at suggestr?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants