Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trimmer and stop word filter are missing from search pipelines #151

Open
dhdaines opened this issue Jul 5, 2024 · 2 comments · May be fixed by #154
Open

Trimmer and stop word filter are missing from search pipelines #151

dhdaines opened this issue Jul 5, 2024 · 2 comments · May be fixed by #154

Comments

@dhdaines
Copy link
Contributor

dhdaines commented Jul 5, 2024

They are not added, which will definitely cause problems with recall in the case where users add punctuation to their queries.

Unfortunately, this is a bug-compatibility with lunr.js issue: https://github.com/olivernn/lunr.js/blob/aa5a878f62a6bba1e8e5b95714899e17e8150b38/lunr.js#L49

But it should be documented and there should be a documented way to work around it. This is pretty easy:

builder = get_default_builder()
builder.search_pipeline.before(stemmer.stemmer, trimmer.trimmer)
builder.search_pipeline.before(stemmer.stemmer, stop_word_filter.stop_word_filter)
@dhdaines
Copy link
Contributor Author

dhdaines commented Jul 5, 2024

This documentation for lunr.js is incorrect, for instance: https://lunrjs.com/docs/lunr.Pipeline.html :

An instance of lunr.Index created with the lunr shortcut will contain a pipeline with a stop word filter and an English language stemmer

@dhdaines
Copy link
Contributor Author

dhdaines commented Jul 5, 2024

Here is a minimal example to show the problem, which I think you'll agree is pretty serious:

from lunr import lunr
index = lunr(
    ref="id",
    fields=["title", "body"],
    documents=[
        {"id": "1", "title": "To be or not to be?", "body": "That is the question!"}
    ],
)
print(index.search("What is the question?"))  # Should print something, but doesn't!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant