-
-
Notifications
You must be signed in to change notification settings - Fork 767
Reduce sorting in TopDocs #2646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
1da9e51
99c35a2
6df8dc2
267b41b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -114,9 +114,7 @@ where T: PartialOrd + Clone | |
} | ||
|
||
Ok(top_collector | ||
.into_sorted_vec() | ||
.into_iter() | ||
.skip(self.offset) | ||
.into_sorted_after(self.offset) | ||
.map(|cdoc| (cdoc.feature, cdoc.doc)) | ||
.collect()) | ||
} | ||
|
@@ -169,7 +167,7 @@ impl<T: PartialOrd + Clone> TopSegmentCollector<T> { | |
pub fn harvest(self) -> Vec<(T, DocAddress)> { | ||
let segment_ord = self.segment_ord; | ||
self.topn_computer | ||
.into_sorted_vec() | ||
.into_vec() | ||
.into_iter() | ||
.map(|comparable_doc| { | ||
( | ||
|
@@ -206,10 +204,11 @@ mod tests { | |
top_collector.collect(5, 0.3); | ||
assert_eq!( | ||
top_collector.harvest(), | ||
// Note: Individual segments are not sorted. | ||
vec![ | ||
(0.8, DocAddress::new(0, 1)), | ||
(0.2, DocAddress::new(0, 3)), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i am not fond of the assert now. The order is now very specific to the implementation. Can you assert on the set. For instance, by sorting the results (in the unit test.) |
||
(0.3, DocAddress::new(0, 5)), | ||
(0.2, DocAddress::new(0, 3)) | ||
] | ||
); | ||
} | ||
|
@@ -224,11 +223,12 @@ mod tests { | |
top_collector.collect(9, -0.2); | ||
assert_eq!( | ||
top_collector.harvest(), | ||
// Note: Individual segments are not sorted. | ||
vec![ | ||
(0.9, DocAddress::new(0, 7)), | ||
(0.8, DocAddress::new(0, 1)), | ||
(0.2, DocAddress::new(0, 3)), | ||
(0.3, DocAddress::new(0, 5)), | ||
(0.2, DocAddress::new(0, 3)) | ||
(0.9, DocAddress::new(0, 7)), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same as above |
||
] | ||
); | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -850,6 +850,24 @@ where | |
self.buffer | ||
} | ||
|
||
/// Returns the elements between `offset` and `top_n` in sorted order. | ||
pub fn into_sorted_after( | ||
mut self, | ||
offset: usize, | ||
) -> impl Iterator<Item = ComparableDoc<Score, D, R>> { | ||
if self.buffer.len() > self.top_n { | ||
self.truncate_top_n(); | ||
} | ||
|
||
if offset >= self.buffer.len() { | ||
return vec![].into_iter().skip(0); | ||
} | ||
|
||
let (_, _, remainder) = self.buffer.select_nth_unstable(offset); | ||
remainder.sort_unstable(); | ||
self.buffer.into_iter().skip(offset) | ||
Comment on lines
+866
to
+868
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not convinced this is actually beneficial, do you have any benchmarks demonstrating that doing these two effective sorts is more efficient than the one sort? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
But no: I have not run benchmarks! Does CI run a benchmark suite? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, I would create a set of micro benchmarks between the two changes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For small values of offset, it is probably harmful. ... Can we have unit tests? There are benchmarks but we do not run them in CI. |
||
} | ||
|
||
/// Returns the top n elements in stored order. | ||
/// Useful if you do not need the elements in sorted order, | ||
/// for example when merging the results of multiple segments. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you put that in a different file? collector_bench for instance?