Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Put search result with exact title in first position. #766

Open
mgautierfr opened this issue Mar 14, 2023 · 18 comments
Open

Put search result with exact title in first position. #766

mgautierfr opened this issue Mar 14, 2023 · 18 comments

Comments

@mgautierfr
Copy link
Collaborator

mgautierfr commented Mar 14, 2023

When user search for a term, we should put article with the exact same title first in the result.
(Same for suggestion)

See comments in #653

@danielzgtg
Copy link

See kiwix/kiwix-android#2033 and kiwix/kiwix-android#2035 which I created 3 years ago.

@kelson42
Copy link
Contributor

kelson42 commented Mar 14, 2023

@mgautierfr We need concrete examples and explanations about why this is not the case today. We should talk about "suggestions" as this is what this is about.

If the example is the "apple", then the only way i see is too add a layer on the top of Xapian and I'm against it because it has been done at least twice (even by @mgautierf) and this was bringing more problems than goods. Not in favour of redoing same errors over time.

@kelson42 kelson42 self-assigned this Mar 14, 2023
@Jaifroid
Copy link

By "accident", Kiwix JS does something like this. The accident is that we only very recently got full-text searching (thanks to the libzim WASM), so I grafted ft search on top of existing title search. Because ft search is considerably slower than title search, we get search results coming in a two-stage process: exact prefix matching first (pseudo-case-insensitive), and then a few seconds later, the ft results (which are pruned to remove any duplicates before displaying them).

NB We can't currently provide any "snippets", because that part of the API isn't yet bound to JavaScript. (It might be too slow, anyway.)

@kelson42
Copy link
Contributor

@Jaifroid I hardly believe doing what you describe implements this feature because ft search does not implement this feature either.

@danielzgtg
Copy link

It might be too slow, anyway

kiwix-android search feels slower than kiwix-js anyway. It spins for noticeably longer than on desktop.

ft search does not implement this feature either.

Perhaps something else in kiwix-js is implementing this. On kiwix-js I can actually find what I'm looking for in the first result, while on kiwix-android I have to scroll and look through a bunch of random results

@rgaudin
Copy link
Member

rgaudin commented Mar 15, 2023

@Jaifroid I hardly believe doing what you describe implements this feature because ft search does not implement this feature either.

You've read it wrong I believe. @Jaifroid said there is a Title-prefix search displayed (sort of like suggestions) while the FT search is requested in the background and once FT results are ready, those are added to the page (removing the entries that were already there from the prefix search).

@rgaudin
Copy link
Member

rgaudin commented Mar 15, 2023

Regardless of how practical it is to implement, I support the feature request as this IMO a very common scenario: you type a request, you get the suggestions but it's not giving you exactly what you wanted. So you type hoping for better results, expecting those entries to be present anyway.

@Jaifroid
Copy link

Jaifroid commented Mar 15, 2023

I guess we do need a proper specification of the problem. Kiwix Desktop (and Kiwix Serve) seem to do a version of prefix matching if you enter more than a single word, but we get a slightly unintuitive list of results for single words

I compared searching for "caribbean basin" in Kiwix JS and Kiwix Desktop (see top screenshot, full English Wikipedia) -- almost exaclty the same results for the title search (outlined in red). But with "apple" we get a very different search result order, with the first result matching the fruit being the one outlined in red in each case (bottom screenshot).

To be clear, Kiwix JS title search is not intelligent or weighted in any way, it merely does a binary search on as many upper-case and lower-case variants of the entered prefix as it can. and gathers anything that matches the prefix. It then fills up the rest of the space (up to the max search results requested, default 30, but user-selectable) with full-text search results (from which duplicates are removed).

Search_comparison
apple_search

@kelson42
Copy link
Contributor

@rgaudin Honestly, I have no real clue honestly what this ticket is about as there is not concrete example of input/output... If this is not done I will close the ticket as I can not follow what all this is about.

@mgautierfr
Copy link
Collaborator Author

My initial idea was about search for term. If you search for "Apple" on wikipedia_en_all, you have this list (https://library.kiwix.org/viewer#search?content=wikipedia_en_all_maxi_2023-02&pattern=apple):

  • List of songs recorded by Fiona Apple
  • Apple (disambiguation)
  • String interpolation
  • Apple TV
  • Timeline of the Apple II family
  • ...
  • Apple [18th position]
  • ....

The idea is to "move" the "Apple" result (the article with a title equal (case insensitive) to the search term) on top of the list
as it is probably a really relevant result.

How the "move" is implemented is still open to discussion. It could be specific criteria in xapian to give the highest score to "Apple" article, or it could be the libzim iterator starting with "Apple" and then with the classic xapian results (skip in the "Apple" article in them), or libkiwix itself inserting the result in the html page (maybe with a specific section), or ...

But as @Jaifroid suggests in its last comment, we could also do the same for suggestions.

This could be compared with kiwix/libkiwix#748. We were redirecting directly to the exact title article in case of search. Now we are not redirecting, but at least we could put the exact title article first.

@kelson42
Copy link
Contributor

kelson42 commented Mar 17, 2023

@mgautierfr To me, if the ticket seems obvious for suggestions, it sounds far less obvious for ft search. If I ft search "Verdun", would be kind of expecting "Battle of Verdun" as first result, but if I search a suggestion, kind of expect "Verdun" as first result.

In both cases, this is the job of Xapian to deliver things properly... see no fundamental reason it could not.

@Jaifroid
Copy link

In both cases, this is the job of Xapian to deliver things properly...

What happens for ZIMs that don't have a Xapian index? Presumably fallback to binary search of Directory Entry titles.

@rgaudin
Copy link
Member

rgaudin commented Mar 17, 2023

@mgautierfr To me, if the ticket seems obvious for suggestions, it sounds far less obvious for ft search. If I ft search "Verdun", would be kind of expecting "Battle of Verdun" as first result, but if I search a suggestion, kind of expect "Verdun" as first result.

In both cases, this is the job of Xapian to deliver things properly... see no fundamental reason it could not.

I think there are two distinct discussions here: what we'd want to get and how to implement it. It's usually more efficient to define the former first and then try to reconcile with the second.

Away from all technical considerations, I believe if there is an entry matching the exact search query, it should be highlighted. It can be the first result or a different card or anything that tells the user “you've requested this, we have it”.
Keep in mind that from a user's perspective the differences between suggestions and search are:

  • Suggestions only displays entries with query in the Title (not explicit but figured out easily after a few tries)
  • Search have an excerpt with context and the length of the Entry.

So it's reasonable to assume that a suggested Entry can be considered but user would like more details before discarding it.

In terms of UX, I think I'd even want if that matching Entry is a redirect to have something like “Le great XXX (redirection from XXX)”

I'd be careful with examples (in this ticket! Not in other related to improving search) as you seem to incorporate cultural background to it. We can design various scoring mechanism so that we influence the sorting of search results.

In your example, on WPEN that battle article is not the first result. Verdun, the city, is. WPFR is similar but it could be different. That's a discussion about sorting and it's not what this ticket is about.

This ticket is about a UX improvement of asserting that the exact search query has a matching result and this could be highlighted.

I agree the ticket title is a bit incorrect as it suggests a technical solution.

@kelson42
Copy link
Contributor

In both cases, this is the job of Xapian to deliver things properly...

What happens for ZIMs that don't have a Xapian index? Presumably fallback to binary search of Directory Entry titles.

This topic It's not a prority considering we don't produce this kind of ZIM files. That said, considering the logic of dichotomy finding, this should be already the case IMHO.

@danielzgtg
Copy link

I think it's important to have a concrete example. It's impossible to objectively measure whether the bug is fixed or not without a test case.

it should be highlighted. It can be [...] or a different card

No, it shouldn't be a different card. On desktop, I want to just press the enter key without looking. On mobile, I want to tap the first search result row with my eyes closed.

Simple Wikipedia

I will be using https://library.kiwix.org/viewer#wikipedia_en_simple_all_mini_2023-03/A/Main_Page .

Example 1: apple

Expected behavior

Apple
Apple & Onion
Apple (company)
Apple (disambiguation)
Apple (tree)
Apple A10
Apple A10X
Apple A11
Apple A4
Apple A5

Actual Behavior

Apple
Adam's apple
Apple & Onion
Apple (company)
Apple (disambiguation)
Apple (tree)
Apple A10
Apple A10X
Apple A11
Apple A4

Example 2: mountain

Expected behavior

Mountain
Mountain (band)
Mountain Ash
Mountain Ash, Rhondda Cynon Taf
Mountain Avens
Mountain Brook, Alabama
Mountain Daylight Time
Mountain Dew
Mountain Gorilla
Mountain Grove, Missouri

Actual Behavior

Mountain
Baekdu Mountain
Bare Mountain
Bear Mountain
Brokeback Mountain
Daniel (mountain)
Death Mountain
Deomali (mountain)
Fold mountain
Folded mountain

Example 3: library

Expected behavior

Library
Library Network of Western Switzerland
Library Tower
Library and Archives Canada
Library classification
Library of Alexandria
Library of Birmingham
Library of Celsus
Library of Congress
Library of Congress Control Number

Actual Behavior

Library
1949 (library)
Bodleian Library
Bodleian library
British Library
Carnegie library
Library Tower
Library classification
National library
National library

Wiktionary

This bug is worse with wiktionary which I mainly use Kiwix for, but there are less users compared to wikipedia. In wiktionary, the exact result doesn't even appear first. I will use wiktionary_en_all_maxi_2023-02.zim but it doesn't work in unpatched library.kiwix.org. It works on staging kiwix-js, and normal kiwix-android.

Example 4: des

Expected behavior

des
des Abends
des Morgens
des Pudels Kern
des Weiteren
des avonds
des de
des doods
des families
des fois que

Actual Behavior

-des
-deş
DES
DEs
Des
dEs
des
des-
deś
deš

Example 5: que

Expected behavior

que
que Dios te bendiga
que aproveche
[redacted]
que chuta
que colsaconste
[redacted]
que demande le peuple
que descanse en paz
[redacted]

Actual Behavior

'que
-que
QUE
Que
Que.
Que(^')
que
què
qué
quê

The more intelligent suggestion behavior from https://simple.wikipedia.org/wiki/Main_Page that uses statistics is also good.

@mgautierfr
Copy link
Collaborator Author

I agree the ticket title is a bit incorrect as it suggests a technical solution.

This ticket is a response to #653 (comment) stating we need other implementation idea to discuss the need of a feature.

To me, if the ticket seems obvious for suggestions, it sounds far less obvious for ft search. If I ft search "Verdun", would be kind of expecting "Battle of Verdun" as first result, but if I search a suggestion, kind of expect "Verdun" as first result.

I don't see why we should have "Battle of Verdun" as first result.
If I search for Hiroshima or Nagasaki I want to have information about the city, no about a (important) event happened years ago.

Interestingly, search on en.wikipedia.org for "Verdun", "Hiroshima" or "Nagasaki" give the exact article title first.
But a search for "Bir Hakeim" gives the "Battle of Bir-Hakeim" first and "Bir-Hakeim" second.

This let me think that the "natural" (relevance) sorting of wikipedia give a lot of importance to the exactitude of the title but this is not the only criteria to select the first result.


@danielzgtg Your example seems to be base on suggestion. It is right ?
And your expected behavior must be a bit clarified. How do you choose the order of the article ?

Your example with wikionnary is interesting. As we stem the words, we have all titles 'que, Que, Qué, ... reduced to the same stem and, as the title is only one word, xapian have no clue about how to sort the results.

@Jaifroid
Copy link

Jaifroid commented Mar 20, 2023

The expected order listed above is, in each case, the order given by binary search of the title order list of directory entries, augmented by testing for several common case variations. So, when entering library, a search is also done for Library (and LIBRARY). This is the algorithm used in Kiwix JS browser extension version (augmented by full-text search a few seconds later, if it is available and if we haven't already got 30 results from binary search). Kiwix JS has no concept of "suggestions".

This algorithm is highly effective for Wikipedia/Wiktionary, but _almost useless_ for any ZIM where the alphabetical title order is meaningless (in a Stack Exchange ZIM, the title of many articles/questions will begin with "What...", and the key word will be buried somewhere in the title).

The reason it is highly effective for Wikipedia/Wiktionary is because editors of articles add lots of redirects from common search terms (often including common misspellings and common case variants) to the underlying article). So, we effectively have a "pre-weighted" and augmented alphabetical search index. It makes sense to leverage this, if possible.

@danielzgtg
Copy link

"Battle of Bir-Hakeim" first and "Bir-Hakeim" second

I'm fine with Wikipedia doing that because pressing enter will go to the exact search result if found. However someone declined my suggestion for adding this at kiwix/kiwix-android#2033 (comment) , so I need the exact search result at the top.

How do you choose the order of the article ?

The expected order listed above is

Exactly as Jaifroid described for kiwix-js.

Stack Exchange ZIM

I never thought of that. But that should be done together with some kind of intelligent ranking feature. The ranking should pay less attention to stopwords and more attention to highly ranked questions/answers. Anyway, that would be more complicated to implement than the change described in this GitHub issue.

reduced to the same stem and, as the title is only one word, xapian have no clue about how to sort the results.

This behaviour from kiwix-android makes the app hard to use. Therefore, we should implement the original request in this GitHub issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants