More efficient counter for query hits #156

alexklibisz · 2020-09-16T23:03:41Z

The custom MatchHashesAndScoreQuery currently uses a counter that requires allocating an array of shorts with one entry for every document in the segment to track the number of matches for each doc, and then iterating over that array twice to get the top k docs. This is actually substantially faster than using any sort of hashmap I've found, including primitive hash maps. I've tried hppc, hppcrt, and fastlib, and all of them are least 2x as slow (e.g. a segment with 1.1M docs gets 40 q/s with arrays, 20 q/s with hashmaps). I figure this kind of array setup won't scale forever, but I don't want to change it until there's some comparably fast alternative.

The text was updated successfully, but these errors were encountered:

alexklibisz · 2020-09-24T14:08:48Z

Another library maybe worth trying: https://github.com/real-logic/agrona/blob/master/agrona/src/main/java/org/agrona/collections/Int2IntCounterMap.java

JanecekPetr · 2021-03-14T16:21:10Z

By far the fastest primitive hashmap implementation I found is https://github.com/leventov/Koloboke. It only contains maps and sets, and the original website is down, but the code quality is still there.

alexklibisz · 2022-07-17T19:22:12Z

Closing as it falls under #160

alexklibisz added performance priority-low labels Sep 20, 2020

alexklibisz mentioned this issue Sep 21, 2020

Optimize top-k counting for approximate queries #160

Closed

alexklibisz closed this as completed Jul 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More efficient counter for query hits #156

More efficient counter for query hits #156

alexklibisz commented Sep 16, 2020

alexklibisz commented Sep 24, 2020

JanecekPetr commented Mar 14, 2021

alexklibisz commented Jul 17, 2022

More efficient counter for query hits #156

More efficient counter for query hits #156

Comments

alexklibisz commented Sep 16, 2020

alexklibisz commented Sep 24, 2020

JanecekPetr commented Mar 14, 2021

alexklibisz commented Jul 17, 2022