Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MB-58901: Introduce support for BM25 scoring #2113

Merged
merged 27 commits into from
Jan 15, 2025
Merged

Conversation

Thejas-bhat
Copy link
Member

@Thejas-bhat Thejas-bhat commented Dec 6, 2024

Introducing support for BM25 scoring

Key stats necessary for the scoring

  • fieldLength - the number of terms in a field within a doc.
  • avgDocLength - the average of terms in a field across all the docs in the index.
  • totalDocs - total number of docs in an index.

Introduces a mechanism to maintain consistent scoring in a situation where the index is partitioned as a bleve.IndexAlias. This is achieved using the existing preSearch mechanism where the first phase of the entire search involves fetching the above mentioned stats, aggregating them and redistributing back to the bleve indexes which would use them while calculating the score for a hit. In order to enable this global scoring mechanism, the user needs to set the context argument of the SearchInContext with:
ctx = context.WithValue(ctx, search.SearchTypeKey, search.GlobalScoring)

Implementation wise, the user needs to explicitly mention BM25 as the scoring mechanism at indexMapping.ScoringModel level to actually use this scoring mechanism. This parameter is a global setting, i.e. when performing a search on multiple fields, all the fields are scored with the same scoring model.
The storage layer exposes an API which returns the number of terms in a field's term dictionary which is used to compute the avgDocLength. At the indexing layer, we check if the queried field supports BM25 scoring and if consistent scoring is availed. This is followed by fetching the stats either from the local bleve index or from a context (in the case where we're availing the consistent scoring) to compute the actual score.

Note: The scoring is highly dependent on the size of an individual bleve index's termDictionary (specific to a field) so there can be some discrepancies especially given that each index is further composed of multiple 'segments'. However in large scale use cases these discrepancies can be quite small and don't affect the order of the doc hits - in which case the user may choose to avoid this altogether.

@Thejas-bhat Thejas-bhat force-pushed the bm25-refactor branch 5 times, most recently from 4b626d0 to 45efde1 Compare December 12, 2024 10:39
Base automatically changed from presearchRefactor to master December 17, 2024 08:52
@Thejas-bhat Thejas-bhat changed the title WIP: BM25 scoring MB-58901: Introduce support for BM25 scoring Jan 6, 2025
@Thejas-bhat Thejas-bhat marked this pull request as ready for review January 6, 2025 11:44
@abhinavdangeti abhinavdangeti added this to the v2.5.0 milestone Jan 6, 2025
@abhinavdangeti
Copy link
Member

@Thejas-bhat it seems you'll need to push up the Cardinality() api to all zap versions.

Copy link
Member

@abhinavdangeti abhinavdangeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Thejas-bhat Make sure to pull the latest from origin/bm25-refactor before you push more commits here :)

mapping/index.go Outdated Show resolved Hide resolved
search/searcher/search_term.go Outdated Show resolved Hide resolved
search/util.go Show resolved Hide resolved
search/util.go Outdated Show resolved Hide resolved
search/util.go Outdated Show resolved Hide resolved
@abhinavdangeti
Copy link
Member

@Thejas-bhat a thought regarding the "global scoring" code path for bm25 - what is the default behavior in elastic?

Would you add a couple of GO benchmark test to differentiate between bm25 with and without global scoring and record these numbers within the commit message^ - trying to decide whether to enable "global scoring" by default.

search/scorer/scorer_term.go Outdated Show resolved Hide resolved
search/scorer/scorer_term.go Show resolved Hide resolved
search/searcher/search_term.go Outdated Show resolved Hide resolved
search/searcher/search_term.go Outdated Show resolved Hide resolved
@Thejas-bhat
Copy link
Member Author

@Thejas-bhat a thought regarding the "global scoring" code path for bm25 - what is the default behavior in elastic?

Would you add a couple of GO benchmark test to differentiate between bm25 with and without global scoring and record these numbers within the commit message^ - trying to decide whether to enable "global scoring" by default.

By default, elastic disables the feature. It'll be a bit difficult to benchmark this at golang unit level over here, because the latency is mainly visible when the index alias has multiple shards and each of which is spread across multiple nodes.

Copy link
Member

@abhinavdangeti abhinavdangeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor refactor suggestion, looks good to me otherwise.

index_impl.go Outdated Show resolved Hide resolved
index_impl.go Outdated Show resolved Hide resolved
index_alias_impl.go Show resolved Hide resolved
index_impl.go Show resolved Hide resolved
Copy link
Member

@abhinavdangeti abhinavdangeti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add a unit test comparing bm25 to tf-idf scoring over the same dataset?

index_test.go Outdated Show resolved Hide resolved
@Thejas-bhat
Copy link
Member Author

Thejas-bhat commented Jan 14, 2025

Can we also add a unit test comparing bm25 to tf-idf scoring over the same dataset?

I've added a unit test on a sample 10 doc dataset, however i don't think there something concrete we can compare over there apart from tfidf scores being higher than bm25 (due to algo difference and its more like a placeholder honestly). a much better test would be to have a large enough dataset, and two indexes - one with bm25 and the other with tfidf and then compare the relevancy of the hits from each of them (https://www.elastic.co/blog/found-bm-vs-lucene-default-similarity)

@Thejas-bhat Thejas-bhat merged commit cbafdca into master Jan 15, 2025
9 checks passed
@abhinavdangeti abhinavdangeti deleted the bm25-refactor branch January 15, 2025 15:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants