Reduce sorting in TopDocs #2646

stuhood · 2025-06-07T22:54:16Z

Reduce sorting in TopDocs by:

not sorting individual segments -- they will be merged and re-top-n'd.
not sorting the portion skipped by the offset -- we can pivot instead.

ChillFish8 · 2025-06-07T23:21:28Z

src/collector/top_score_collector.rs

+        let (_, _, remainder) = self.buffer.select_nth_unstable(offset);
+        remainder.sort_unstable();
+        self.buffer.into_iter().skip(offset)


I am not convinced this is actually beneficial, do you have any benchmarks demonstrating that doing these two effective sorts is more efficient than the one sort?

select_nth is not a sort: it only pivots elements around the nth position (in linear time).

But no: I have not run benchmarks! Does CI run a benchmark suite?

No, I would create a set of micro benchmarks between the two changes.

For small values of offset, it is probably harmful.
For high values of offset, it is probably helpful.

...

Can we have unit tests?

There are benchmarks but we do not run them in CI.

…offset`.

stuhood · 2025-06-15T00:34:35Z

I've added some microbenchmarks, but they're pretty clearly in the wrong place: should I break out a separate benchmark target? Note though that the fact that the microbenchmarks were only testing a single segment was probably not realistic for the agg benchmarks either.

Splitting the two changes into two commits, I see:

For Reduce sorting in TopDocs by removing per-segment sorting:

full
top_docs_small_shallow    Memory: 97.3 KB (-0.58%)     Avg: 4.5728ms (-2.23%)      Median: 4.5707ms (-2.05%)      [4.5540ms .. 4.6014ms]
top_docs_small_deep       Memory: 6.2 MB (-0.01%)      Avg: 14.2786ms (-33.59%)    Median: 14.2741ms (-33.70%)    [14.1030ms .. 14.5956ms]
top_docs_large_shallow    Memory: 698.7 KB (-0.05%)    Avg: 6.8426ms (-8.48%)      Median: 6.8434ms (-8.26%)      [6.7612ms .. 6.9138ms]
top_docs_large_deep       Memory: 6.8 MB (-0.01%)      Avg: 14.2691ms (-37.26%)    Median: 14.2846ms (-37.28%)    [14.0335ms .. 14.3867ms]
dense
top_docs_small_shallow    Memory: 92.9 KB (-0.35%)    Avg: 4.6792ms (-2.13%)      Median: 4.6785ms (-2.01%)      [4.6559ms .. 4.7226ms]
top_docs_small_deep       Memory: 5.9 MB              Avg: 15.0375ms (-29.50%)    Median: 15.0420ms (-29.46%)    [14.8413ms .. 15.1579ms]
top_docs_large_shallow    Memory: 662.9 KB            Avg: 7.0038ms (-6.82%)      Median: 6.9954ms (-6.79%)      [6.9232ms .. 7.0746ms]
top_docs_large_deep       Memory: 6.4 MB              Avg: 15.2383ms (-32.39%)    Median: 15.2466ms (-32.32%)    [15.1046ms .. 15.3693ms]
sparse
top_docs_small_shallow    Memory: 40.7 KB (+0.59%)    Avg: 17.1796ms (+1.15%)    Median: 17.1670ms (+1.01%)    [17.1109ms .. 17.3223ms]
top_docs_small_deep       Memory: 2.9 MB              Avg: 22.8191ms (-6.98%)    Median: 22.8086ms (-6.93%)    [22.6527ms .. 23.3086ms]
top_docs_large_shallow    Memory: 324.8 KB            Avg: 18.6777ms (-0.57%)    Median: 18.6723ms (-0.64%)    [18.5654ms .. 18.8087ms]
top_docs_large_deep       Memory: 3.2 MB              Avg: 22.7570ms (-7.25%)    Median: 22.6764ms (-7.45%)    [22.5315ms .. 23.3794ms]
multivalue
top_docs_small_shallow    Memory: 93.7 KB (-4.49%)     Avg: 5.0297ms (-1.87%)      Median: 5.0248ms (-1.96%)      [4.9936ms .. 5.1738ms]
top_docs_small_deep       Memory: 5.9 MB (-5.25%)      Avg: 15.2124ms (-30.98%)    Median: 15.2013ms (-31.04%)    [15.0788ms .. 15.3836ms]
top_docs_large_shallow    Memory: 662.9 KB (-5.13%)    Avg: 7.2877ms (-8.40%)      Median: 7.2846ms (-8.60%)      [7.1833ms .. 7.3893ms]
top_docs_large_deep       Memory: 6.4 MB (-5.25%)      Avg: 15.3104ms (-34.34%)    Median: 15.2782ms (-34.49%)    [15.0508ms .. 15.6572ms]

For Reduce sorting in TopDocs by not sorting the portion skipped by the 'offset':

full
top_docs_small_shallow    Memory: 92.9 KB (-4.48%)     Avg: 4.5523ms (-0.45%)     Median: 4.5486ms (-0.49%)     [4.5117ms .. 4.6390ms]
top_docs_small_deep       Memory: 5.9 MB (-5.24%)      Avg: 14.0569ms (-1.55%)    Median: 14.0743ms (-1.40%)    [13.9429ms .. 14.1655ms]
top_docs_large_shallow    Memory: 662.9 KB (-5.13%)    Avg: 6.7969ms (-0.67%)     Median: 6.8028ms (-0.59%)     [6.6842ms .. 6.8763ms]
top_docs_large_deep       Memory: 6.4 MB (-5.25%)      Avg: 14.5138ms (+1.71%)    Median: 14.5239ms (+1.68%)    [14.3561ms .. 14.6191ms]
dense
top_docs_small_shallow    Memory: 93.8 KB (+0.96%)    Avg: 4.7285ms (+1.05%)     Median: 4.7258ms (+1.01%)     [4.6904ms .. 4.8164ms]
top_docs_small_deep       Memory: 5.9 MB              Avg: 14.6045ms (-2.88%)    Median: 14.5767ms (-3.09%)    [14.4669ms .. 15.2436ms]
top_docs_large_shallow    Memory: 662.9 KB            Avg: 6.9839ms (-0.28%)     Median: 6.9797ms (-0.22%)     [6.8799ms .. 7.1466ms]
top_docs_large_deep       Memory: 6.4 MB              Avg: 14.8650ms (-2.45%)    Median: 14.8287ms (-2.74%)    [14.7127ms .. 15.9067ms]
sparse
top_docs_small_shallow    Memory: 40.4 KB (-0.61%)    Avg: 17.4549ms (+1.60%)    Median: 17.4584ms (+1.70%)    [17.3747ms .. 17.5759ms]
top_docs_small_deep       Memory: 2.9 MB              Avg: 22.6667ms (-0.67%)    Median: 22.6236ms (-0.81%)    [22.4235ms .. 22.9492ms]
top_docs_large_shallow    Memory: 324.8 KB            Avg: 18.9714ms (+1.57%)    Median: 18.9832ms (+1.67%)    [18.8322ms .. 19.1031ms]
top_docs_large_deep       Memory: 3.2 MB              Avg: 22.5214ms (-1.04%)    Median: 22.4954ms (-0.80%)    [22.2794ms .. 23.0993ms]
multivalue
top_docs_small_shallow    Memory: 97.6 KB (+4.17%)     Avg: 5.0778ms (+0.96%)     Median: 5.0783ms (+1.07%)     [5.0458ms .. 5.1178ms]
top_docs_small_deep       Memory: 6.2 MB (+5.54%)      Avg: 14.8880ms (-2.13%)    Median: 14.8954ms (-2.01%)    [14.7995ms .. 14.9522ms]
top_docs_large_shallow    Memory: 698.7 KB (+5.41%)    Avg: 7.4105ms (+1.68%)     Median: 7.4109ms (+1.73%)     [7.3391ms .. 7.4817ms]
top_docs_large_deep       Memory: 6.8 MB (+5.54%)      Avg: 15.0866ms (-1.46%)    Median: 15.0917ms (-1.22%)    [14.9904ms .. 15.1436ms]

So unless the offset is huge (e.g. when paging through a huge result set), the select_nth_unstable change might not be worth it. Let me know what you think reasonable values for offset in the microbenchmark might be.

fulmicoton · 2025-06-20T08:28:42Z

benches/agg_bench.rs

@@ -73,6 +75,12 @@ fn bench_agg(mut group: InputGroup<Index>) {
    register!(group, histogram_with_avg_sub_agg);
    register!(group, avg_and_range_with_avg_sub_agg);

+    register!(group, top_docs_small_shallow);


can you put that in a different file? collector_bench for instance?

fulmicoton · 2025-06-20T08:30:39Z

src/collector/top_collector.rs

            vec![
                (0.8, DocAddress::new(0, 1)),
+                (0.2, DocAddress::new(0, 3)),


i am not fond of the assert now. The order is now very specific to the implementation. Can you assert on the set. For instance, by sorting the results (in the unit test.)

fulmicoton · 2025-06-20T08:30:51Z

src/collector/top_collector.rs

                (0.3, DocAddress::new(0, 5)),
-                (0.2, DocAddress::new(0, 3))
+                (0.9, DocAddress::new(0, 7)),


same as above

ChillFish8 reviewed Jun 7, 2025

View reviewed changes

stuhood added 4 commits June 14, 2025 18:13

Add a microbenchmark for TopDocs.

1da9e51

Increase the agg bench segment count.

99c35a2

Reduce sorting in TopDocs by removing per-segment sorting.

6df8dc2

Reduce sorting in TopDocs by not sorting the portion skipped by the `…

267b41b

…offset`.

stuhood force-pushed the stuhood.reduce-top-n-sorting branch from 4526fc0 to 267b41b Compare June 15, 2025 00:26

fulmicoton reviewed Jun 20, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Reduce sorting in TopDocs #2646

Reduce sorting in TopDocs #2646

stuhood commented Jun 7, 2025

Uh oh!

ChillFish8 Jun 7, 2025

Uh oh!

stuhood Jun 7, 2025 •

edited

Loading

Uh oh!

ChillFish8 Jun 7, 2025

Uh oh!

fulmicoton Jun 12, 2025

Uh oh!

stuhood commented Jun 15, 2025 •

edited

Loading

Uh oh!

fulmicoton Jun 20, 2025

Uh oh!

fulmicoton Jun 20, 2025

Uh oh!

fulmicoton Jun 20, 2025

Uh oh!

Uh oh!

Uh oh!

Reduce sorting in TopDocs #2646

Are you sure you want to change the base?

Reduce sorting in TopDocs #2646

Conversation

stuhood commented Jun 7, 2025

Uh oh!

ChillFish8 Jun 7, 2025

Choose a reason for hiding this comment

Uh oh!

stuhood Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ChillFish8 Jun 7, 2025

Choose a reason for hiding this comment

Uh oh!

fulmicoton Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

stuhood commented Jun 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fulmicoton Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

fulmicoton Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

fulmicoton Jun 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

stuhood Jun 7, 2025 •

edited

Loading

stuhood commented Jun 15, 2025 •

edited

Loading