Option for 'smart' fuzzy search #5509

pintassilgo · 2024-05-07T00:16:45Z

One thing I like in VSCode is the semi fuzzy search, both in Command palette and in autocomplete suggestions.

I mean, it's fuzzy only for the first letter of the word and for capital letters, and only if it follows left-to-right order.

Examples:

folfil in Command palette suggests open FOLder containing the current FILe.
document.qs in editor suggests document.querySelector in autocomplete popup.

Current Cuda has ui_listbox_fuzzy which is even enabled by default but I disable it as it suggests items very much unrelated to what was typed.

Edit: just correcting myself, this semi fuzzy is only used in VSCode for Command palette. For autocomplete in editor, the simple unrestricted fuzzy search is used.

The text was updated successfully, but these errors were encountered:

Alexey-T · 2024-05-07T03:20:22Z

Current fuzzy algo is described here
https://wiki.freepascal.org/CudaText#Command_Palette
what change do you suggest?

pintassilgo · 2024-05-07T03:44:02Z

Current fuzzy:
Searches char by char left to right without caring about how many chars are in between.

Match example: ararft finds caret char left.

Current non fuzzy:
Searches char by char left to right, but you need to type spaces to match non-consecutive chars.

Match example: car ha ft finds caret char left.
No match example: carhaft doesn't find caret char left.

Suggestion (VSCode behavior):
Searches char by char left to right, but non-consecutive chars should only match when:

it's the first char of the word OR
it's a upper case letter in the match.

Match example: opfol finds open current folder. First it finds op, next char in the match is e and not the requested f, but there's a word at right having f as first letter so it matches. fol still matches folder, so we have the final result: opfol matches open current folder.

Another match example: qs finds querySelector. First it finds q. Next char in the match is not the requested s, but there's a upper case S at the right, so it matches.

It's like searching people names by their initials. AT should match Alexey Torgashin and AlexeyTorgashin.

Relevant: _ should also be treated as word separator, so Alexey_torgashin should still match.

Alexey-T · 2024-05-07T04:59:10Z

OK, todo.

Alexey-T · 2024-05-07T05:43:57Z

Beta updated.

one thing. bug or not? 'basc' cannot find 'Bash script'. but should it? 'bas' matches 3 chars in "Bash" and last 'c' don't match (not at word begin).

pintassilgo · 2024-05-07T05:56:27Z

Thank you. Could you please make it work in autocomplete? My main use for this feature is to use in JavaScript things such as qs for querySelector.

pintassilgo · 2024-05-07T05:58:01Z

one thing. bug or not? 'basc' cannot find 'Bash script'. but should it?

Good question. Yes, I think it's a bug, it should find.

pintassilgo · 2024-05-07T05:59:56Z

VSCode doesn't have this issue, see:

Current Cuda wouldn't be able to find this entry, it will stop at A Brea, so the d would be a miss.

Alexey-T · 2024-05-07T06:18:04Z

Thank you. Could you please make it work in autocomplete? My main use for this feature is to use in JavaScript things such as qs for querySelector.

autocomplete is not the menu-like dlg. it uses plugin (python) fuzzy filter. it is LSP's code. maybe veksha can show me where is this code in LSP?

Alexey-T · 2024-05-07T08:04:43Z

Solved. beta updated. I am not sure it is fully OK.

Alexey-T · 2024-05-07T08:43:38Z

Current Cuda wouldn't be able to find this entry, it will stop at A Brea, so the d would be a miss.

My last code cannot do it too. too many words I guess. but I already do loop 'from 0 to 255' per each string, it is slow already.

Alexey-T · 2024-05-07T08:50:47Z

'Add data breakpoint at address'
-- my code cannot find it by 'abread'. but it can by 'abreaad'. the 'at' word is messing with my code.

pintassilgo · 2024-05-07T08:53:25Z

From what I'm seeing, the start of the string needs to be the start of a word for the fuzzy search to become active.

Example: pluman works to find plugin: Addon Manager: Install.... But ginman doesn't work.

Surprisingly, VSCode behaves the same way, so if you think it's fine... But I feel it shouldn't be this way. The start of the string can be any part of a word. Only when you start connecting to a different word that there's the need to start from the first char...

Alexey-T · 2024-05-07T09:01:23Z

But ginman doesn't work.

works after my new fix.

Alexey-T · 2024-05-07T10:58:45Z

Last beta 73 works worse ('basc' is not found for 'bash script'), ignore it.

Alexey-T · 2024-05-07T18:30:19Z

https://wiki.freepascal.org/CudaText#Command_Palette
Could you , pls, improve the wiki text? post it here , I will submit to the wiki.

pintassilgo · 2024-05-07T19:11:46Z

Suggestion for algorithm:

First, internally replace capital letters in items array by adding a white space at the left. So lexer: JavaScript would become lexer: Java Script (of course this change is not exposed, it's just for processing the search).

Then, first run a full-fuzzy search. Suppose find input is abcde, so regex is a.*b.*c.*d.*e. If there's no match, stop, there's no result.

If there are matches, you collect them and test each pair of chars:

First it checks for ab|a.*\bb. If it fails, there's no match. If it matches, go to next step.
Check for bc|b.*\bc. If it fails, there's no match. If it matches, go to next step.
Check for cd|c.*\bd. If it fails, there's no match. If it matches, go to next step.
Check for de|d.*\be. If it fails, there's no match. If it matches, include the item in results list.

So Cuda will run n regex operations, where n is the number of chars in words >1 length in find string.

Spaces in find input are treated as \s and single-char-words can be simply copied to the regex.

Word boundary \b should be improved to include underscore, so each .*\b in the examples above would actually be .*(\b|_).

This code is efficient, shouldn't be slow for any size of find string.

pintassilgo · 2024-05-07T19:18:39Z

Wait... this algorithm is not complete.

abc would match ab sbc, but it shouldn't. Need to think more.

Alexey-T · 2024-05-07T19:38:25Z

a.*b.*c.*d.*e. If there's no match, stop, there's no result. If there are matches, you collect them and test each pair of chars:

Can I collect them all? regex engine allows to find 1st match for given regex, not all possible matches.

pintassilgo · 2024-05-07T19:47:02Z

The solution can be capturing from regex result and carrying to next regex. I'll say in terms of JavaScript regex, don't know if Pascal is different.

Instead of ab|a.*\bb, it would be something like (ab)|a.*(\bb), carrying the resultant groups to next regex.

Suppose the item about amount boat. Running the regex above would return two groups: ab (from "about") and \bb (from "boat").

So in the next step Cuda would reuse these values in the regex between OR | operator: ab|\bb, appending the char from the next pair, which is c, so abc|\bbc.

So instead of bc|b.*\bc that I said in previous comment, would be (abc|\bbc)|(?:ab|\bb).*(\bc).

By doing this, abc wouldn't find ab sbc, as expected.

But by doing this you're using different regex for each match. So it will be no longer n regex, but multiplied by number of matches for the full-fuzzy search. Still shouldn't be slow, I guess. Short strings will have many matches, but n is small. Longer string will have few matches for big n. So total is never big at the point of being an issue.

pintassilgo · 2024-05-07T19:54:09Z

Can I collect them all? regex engine allows to find 1st match for given regex, not all possible matches.

The full-fuzzy would be just to discard early items that don't match. It's not needed to find multiple matches for the same item, this step is just to avoid doing useless work in next steps by searching in items you're already sure won't match at the end.

pintassilgo · 2024-05-07T20:52:25Z

Could you , pls, improve the wiki text?

I can try, but I'm not good with English, result might be bad.
Also, I'm not sure how currently algorithm work. I know how it's supposed to be based on what I suggested from the start, but as you know it's not reliable.

For instance, selecaret should find selection: cancel carets, but keep first caret/selection but at least with the beta I have it doesn't work. It matches items such as selection: remove first caret, but I don't know why it doesn't find the other item.

Alexey-T · 2024-05-08T05:34:44Z

should find selection: cancel carets, but keep first caret/selection

Yes, I see this issue too. 'cancel' word breaks finding of 'carets'. 'ca' too.

Alexey-T · 2024-05-08T07:45:30Z

What we need here it the algo (in Pascal) which can find ALL combinations of fuzzy positions in a given text.

input param Text can be e.g. 'select ca11 ca22 33ca33 44ca c55a55'
input param Find can be 'seleca'
function must fill the list with 5 matches: 'sele' from 1st word, 'ca' is from 5 possible words.

If we create this func, later steps are trivial! ie checking combinations for validity. valid combinations here are only 2.

now I reverted fuzzy code to old one (1.214.0). until we write this algo.

Alexey-T · 2024-05-08T07:55:20Z

Note: example above: must find not 5 items, but much more! because 'c' can be from e.g. 'ca22' and 'a' can be from different next word e.g. '44ca'.

pintassilgo · 2024-05-12T10:52:55Z

Algorithm that I believe would work.

Example of input string: "folfil".
Example item: "open folder containing the current file".

Run a full fuzzy to discard items that doesn't match. In JS regex, that would be using .* between each char, so f.*o.*l.*f.*i.*l. Store only the matching strings.
For the resulting list with matching strings, split each string in individual words (so now you have a list of lists), with an index flag in each word (initial value 0). Capital letters must also be treated as the start of a new word, so "JavaScript" would be split to "Java" and "Script". Example (comma as delimiter): open[0],folder[0],containing[0],the[0],current[0],file[0].
Take the first char of the input string and search at the index position of each word. For every matched word, increase the index. Discard all words to the left of first match. Example: first char of the input is f, so folder[1],containing[0],the[0],current[0],file[1].
Take the next char of the input string and search at the index position of each word. For every matched word, increase the index. If a non-first word with index>0 no longer matches, restore index to 0 and try again. Discard all words to the left of first match. Example: next char is o, so folder[2],containing[0],the[0],current[0],file[0].
Repeat previous step until the list of words is empty (which means it doesn't match) or when you finished all chars of input string (which means it matches).

Continuing next steps of the given example til the end:

l → folder[3],containing[0],the[0],current[0],file[0]
f → file[1]
i → file[2]
l → file[3]
finished input string → item matches.

Another example:
Input: seleca
Item: select ca11 ca22 33ca33 44ca c55a55

Steps:

s → select[1],call[0],ca22[0],33ca33[0],44ca[0],c55a55[0]
e → select[2],call[0],ca22[0],33ca33[0],44ca[0],c55a55[0]
l → select[3],call[0],ca22[0],33ca33[0],44ca[0],c55a55[0]
e → select[4],call[0],ca22[0],33ca33[0],44ca[0],c55a55[0]
c → select[5],call[1],ca22[1],33ca33[0],44ca[0],c55a55[1]
a → call[2],ca22[2],33ca33[0],44ca[0],c55a55[0]
finished input string → item matches.

Slightly different example:
Input: selecca
Item: select ca11 ca22 33ca33 44ca c55a55

Steps:

s → select[1],call[0],ca22[0],33ca33[0],44ca[0],c55a55[0]
e → select[2],call[0],ca22[0],33ca33[0],44ca[0],c55a55[0]
l → select[3],call[0],ca22[0],33ca33[0],44ca[0],c55a55[0]
e → select[4],call[0],ca22[0],33ca33[0],44ca[0],c55a55[0]
c → select[5],call[1],ca22[1],33ca33[0],44ca[0],c55a55[1]
c → call[1],ca22[1],33ca33[0],44ca[0],c55a55[1]
a → call[2],ca22[2],33ca33[0],44ca[0],c55a55[0]
finished input string → item matches.

Other example:
Input: selecaret
Item: selection: cancel carets, but keep first caret/selection

Steps:

s → selection[1]:,cancel[0],carets\,[0],but[0],keep[0],first[0],caret/[0],selection[1]
e → selection[2]:,cancel[0],carets\,[0],but[0],keep[0],first[0],caret/[0],selection[2]
l → selection[3]:,cancel[0],carets\,[0],but[0],keep[0],first[0],caret/[0],selection[3]
e → selection[4]:,cancel[0],carets\,[0],but[0],keep[0],first[0],caret/[0],selection[4]
c → selection[5]:,cancel[1],carets\,[1],but[0],keep[0],first[0],caret/[1],selection[5]
a → cancel[2],carets\,[2],but[0],keep[0],first[0],caret/[2],selection[0]
r → carets\,[3],but[0],keep[0],first[0],caret/[3],selection[0]
e → carets\,[4],but[0],keep[0],first[0],caret/[4],selection[0]
t → carets\,[5],but[0],keep[0],first[0],caret/[5],selection[0]
finished input string → item matches.

Non-match example:
Input: abc
Item: ab sbc

Steps:

a → ab[1],sbc[0]
b → ab[2],sbc[0]
c → list of words is empty, so it doesn't match.

Alexey-T · 2024-05-18T14:49:17Z

Maybe I will read your idea in details, but later, in mid-summer. no promise. I must convert the idea to pascal first. only @dinkumoil knows the pascal here, so he may help if he wants.

Alexey-T added the enhancement label May 7, 2024

Alexey-T added a commit that referenced this issue May 7, 2024

rewrite fuzzu search, #5509

b640518

Alexey-T added a commit that referenced this issue May 7, 2024

rewrite fuzzy search, #5509

37c14bc

Alexey-T added a commit that referenced this issue May 7, 2024

rewrite fuzzy search, #5509

b72eb8a

This comment was marked as off-topic.

Sign in to view

This comment was marked as outdated.

Sign in to view

Alexey-T added a commit that referenced this issue May 7, 2024

fixing #5509 (comment)

b2325aa

pintassilgo mentioned this issue May 8, 2024

Complete_From_Text issues #5513

Closed

Alexey-T added the hard todo/delayed label May 8, 2024

Alexey-T changed the title ~~Option to semi fuzzy search~~ Option for 'smart' fuzzy search May 10, 2024

Alexey-T removed the enhancement label May 11, 2024

Repository owner locked and limited conversation to collaborators Jul 7, 2024

Alexey-T converted this issue into discussion #5602 Jul 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Option for 'smart' fuzzy search #5509

Option for 'smart' fuzzy search #5509

pintassilgo commented May 7, 2024 •

edited

Loading

Alexey-T commented May 7, 2024

pintassilgo commented May 7, 2024 •

edited

Loading

Alexey-T commented May 7, 2024

Alexey-T commented May 7, 2024

pintassilgo commented May 7, 2024

pintassilgo commented May 7, 2024

pintassilgo commented May 7, 2024

Alexey-T commented May 7, 2024 •

edited

Loading

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as outdated.

Alexey-T commented May 7, 2024

Alexey-T commented May 7, 2024 •

edited

Loading

Alexey-T commented May 7, 2024 •

edited

Loading

pintassilgo commented May 7, 2024 •

edited

Loading

Alexey-T commented May 7, 2024

Alexey-T commented May 7, 2024 •

edited

Loading

Alexey-T commented May 7, 2024

pintassilgo commented May 7, 2024 •

edited

Loading

pintassilgo commented May 7, 2024

Alexey-T commented May 7, 2024 •

edited

Loading

pintassilgo commented May 7, 2024 •

edited

Loading

pintassilgo commented May 7, 2024 •

edited

Loading

pintassilgo commented May 7, 2024 •

edited

Loading

Alexey-T commented May 8, 2024 •

edited

Loading

Alexey-T commented May 8, 2024 •

edited

Loading

Alexey-T commented May 8, 2024 •

edited

Loading

pintassilgo commented May 12, 2024 •

edited

Loading

Alexey-T commented May 18, 2024 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

Option for 'smart' fuzzy search #5509

Option for 'smart' fuzzy search #5509

Comments

pintassilgo commented May 7, 2024 • edited Loading

Alexey-T commented May 7, 2024

pintassilgo commented May 7, 2024 • edited Loading

Alexey-T commented May 7, 2024

Alexey-T commented May 7, 2024

pintassilgo commented May 7, 2024

pintassilgo commented May 7, 2024

pintassilgo commented May 7, 2024

Alexey-T commented May 7, 2024 • edited Loading

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as outdated.

Alexey-T commented May 7, 2024

Alexey-T commented May 7, 2024 • edited Loading

Alexey-T commented May 7, 2024 • edited Loading

pintassilgo commented May 7, 2024 • edited Loading

Alexey-T commented May 7, 2024

Alexey-T commented May 7, 2024 • edited Loading

Alexey-T commented May 7, 2024

pintassilgo commented May 7, 2024 • edited Loading

pintassilgo commented May 7, 2024

Alexey-T commented May 7, 2024 • edited Loading

pintassilgo commented May 7, 2024 • edited Loading

pintassilgo commented May 7, 2024 • edited Loading

pintassilgo commented May 7, 2024 • edited Loading

Alexey-T commented May 8, 2024 • edited Loading

Alexey-T commented May 8, 2024 • edited Loading

Alexey-T commented May 8, 2024 • edited Loading

pintassilgo commented May 12, 2024 • edited Loading

Alexey-T commented May 18, 2024 • edited Loading

This issue was moved to a discussion.

pintassilgo commented May 7, 2024 •

edited

Loading

pintassilgo commented May 7, 2024 •

edited

Loading

Alexey-T commented May 7, 2024 •

edited

Loading

Alexey-T commented May 7, 2024 •

edited

Loading

Alexey-T commented May 7, 2024 •

edited

Loading

pintassilgo commented May 7, 2024 •

edited

Loading

Alexey-T commented May 7, 2024 •

edited

Loading

pintassilgo commented May 7, 2024 •

edited

Loading

Alexey-T commented May 7, 2024 •

edited

Loading

pintassilgo commented May 7, 2024 •

edited

Loading

pintassilgo commented May 7, 2024 •

edited

Loading

pintassilgo commented May 7, 2024 •

edited

Loading

Alexey-T commented May 8, 2024 •

edited

Loading

Alexey-T commented May 8, 2024 •

edited

Loading

Alexey-T commented May 8, 2024 •

edited

Loading

pintassilgo commented May 12, 2024 •

edited

Loading

Alexey-T commented May 18, 2024 •

edited

Loading