-
-
Notifications
You must be signed in to change notification settings - Fork 670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: make BlockSegmentPostings::open
and TermInfoStore
public
#2520
Conversation
I don't understand this paragraph. Can you explain in greater length what was your problem and why this helps? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments.
(probably the best would be to explain the motivation better - in the PR comments)
Thanks to your reivew. |
@b41sh i still don't understand. I'm sorry. This won't get merged if there is a proper justification. Can you add a link to the code maybe? |
@fulmicoton I'm sorry I didn't explain it clearly, maybe you can look at our implementation |
@b41sh This is already the case with stock tantivy, isn't it? tantivy might create You could just focus on the Directory abstraction. |
@fulmicoton Thank you for your advice, but I still have a problem I don't understand, please give me some advise. |
I close this PR first. I am still not familiar with |
You could implement your own Directory, that emits its own implementation of file handle. Your You then only need to implement This one will always be called on a range that is as tight as possible. |
Thanks, I will try. |
Databend uses tantivy to implement inverted index, and tantivy
Searcher
needs to read all the index file data when starting up, which is very large, resulting in poor query performance. To improve performance, we implemented theSearcher
ourselves to query the matched docs, and improve performance by reading only theFST
data,TermInfo
data, and the matched terms related data to reduce the size of data to be read. We need to useBlockSegmentPostings
andTermInfoStore
, as well as some fields from theQuery
.