Skip to content
This repository has been archived by the owner on Apr 4, 2023. It is now read-only.

Commit

Permalink
Merge #563
Browse files Browse the repository at this point in the history
563: Improve the `estimatedNbHits` when a `distinctAttribute` is specified r=irevoire a=Kerollmops

This PR is related to meilisearch/meilisearch#2532 but it doesn't fix it entirely. It improves it by computing the excluded documents (the ones with an already-seen distinct value) before stopping the loop, I think it was a mistake and should always have been this way.

The reason it doesn't fix the issue is that Meilisearch is lazy, just to be sure not to compute too many things and answer by taking too much time. When we deduplicate the documents by their distinct value we must do it along the water, everytime we see a new document we check that its distinct value of it doesn't collide with an already returned document. 

The reason we can see the correct result when enough documents are fetched is that we were lucky to see all of the different distinct values possible in the dataset and all of the deduplication was done, no document can be returned.

If we wanted to implement that to have a correct `extimatedNbHits` every time we should have done a pass on the whole set of possible distinct values for the distinct attribute and do a big intersection, this could cost a lot of CPU cycles.

Co-authored-by: Kerollmops <[email protected]>
  • Loading branch information
bors[bot] and Kerollmops authored Jun 22, 2022
2 parents 38a8d3c + d2f84a9 commit d546f6f
Showing 1 changed file with 3 additions and 2 deletions.
5 changes: 3 additions & 2 deletions milli/src/search/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,6 @@ impl<'a> Search<'a> {
debug!("Number of candidates found {}", candidates.len());

let excluded = take(&mut excluded_candidates);

let mut candidates = distinct.distinct(candidates, excluded);

initial_candidates |= bucket_candidates;
Expand All @@ -236,10 +235,12 @@ impl<'a> Search<'a> {
for candidate in candidates.by_ref().take(self.limit - documents_ids.len()) {
documents_ids.push(candidate?);
}

excluded_candidates |= candidates.into_excluded();

if documents_ids.len() == self.limit {
break;
}
excluded_candidates = candidates.into_excluded();
}

Ok(SearchResult {
Expand Down

0 comments on commit d546f6f

Please sign in to comment.