Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LuceneOptimizedFieldInfosFormat may write a value that is too large #3005

Open
ScottDugas opened this issue Dec 16, 2024 · 0 comments
Open

Comments

@ScottDugas
Copy link
Collaborator

If there are too many fields in lucene, it may fail to serialize the FieldInfos.

This is most problematic if you have a map-like object that is using the keys as fields in lucene, and thus different records could create new sets of fields. It also increases the chances that, after having the index enabled for a while it will be unable to save a new record, because that new record would be trying to add new fields.

This can easily be mitigated by one of two code changes:

  1. Start having multiple FieldInfos files, either per-segment, or grouping them to try to share as many fields as possible. This could be most valuable if there are a variety of shapes, such that an individual segment wouldn't have to many fields, but across the whole index it could be a lot. IIUC lucene generally tries to read all of the field infos, and thus this doesn't save much in terms of what needs to be read, and will probably add some disk usage (although presumably minimal in comparison to the index content).
  2. Make the FieldInfos be able to spread the protobuf across multiple key-value pairs, similar to what we do for Records, when they are too large.

This could also be mitigated operationally by using partitioning, and using smaller partitions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant