Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More efficient counter for query hits #156

Closed
alexklibisz opened this issue Sep 16, 2020 · 3 comments
Closed

More efficient counter for query hits #156

alexklibisz opened this issue Sep 16, 2020 · 3 comments

Comments

@alexklibisz
Copy link
Owner

The custom MatchHashesAndScoreQuery currently uses a counter that requires allocating an array of shorts with one entry for every document in the segment to track the number of matches for each doc, and then iterating over that array twice to get the top k docs. This is actually substantially faster than using any sort of hashmap I've found, including primitive hash maps. I've tried hppc, hppcrt, and fastlib, and all of them are least 2x as slow (e.g. a segment with 1.1M docs gets 40 q/s with arrays, 20 q/s with hashmaps). I figure this kind of array setup won't scale forever, but I don't want to change it until there's some comparably fast alternative.

@alexklibisz
Copy link
Owner Author

@JanecekPetr
Copy link

By far the fastest primitive hashmap implementation I found is https://github.com/leventov/Koloboke. It only contains maps and sets, and the original website is down, but the code quality is still there.

@alexklibisz
Copy link
Owner Author

Closing as it falls under #160

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants