Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index too large #4

Open
fulmicoton opened this issue Dec 23, 2019 · 5 comments
Open

Index too large #4

fulmicoton opened this issue Dec 23, 2019 · 5 comments

Comments

@fulmicoton
Copy link

The search benchmark consists in indexing all docs in wikipedia en.
To level the field, we merge all segments down to a single segment.

I was happy to see that rucene also implemented force_merge with the blocking option.

Unfortunately after the merge finish, I end up with an index of 24 GB.
(Tantivy and Lucene both end up with an index of 3GB.)

@fulmicoton
Copy link
Author

Apologies: I found one of the problem : I was indexing with term vectors.information!
I'll reindex and report here if it solves the problem or not

@fulmicoton
Copy link
Author

Correction 6.6GB.

This is a bit more than twice the size I would have expected. I think the files that were before the merged are simply not deleted.

@sunxiaoguang
Copy link
Contributor

Hi Paul, Thanks for reporting issues, We will try fixing these issues and let you know when we are done

@fulmicoton
Copy link
Author

I am mostly blocked on issue #3

@sunxiaoguang
Copy link
Contributor

@tongjianlin Can you double check if we return from blocking force_merge before old segments getting reclaimed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants