LuceneOptimizedFieldInfosFormat may write a value that is too large #3005

ScottDugas · 2024-12-16T14:41:21Z

If there are too many fields in lucene, it may fail to serialize the FieldInfos.

This is most problematic if you have a map-like object that is using the keys as fields in lucene, and thus different records could create new sets of fields. It also increases the chances that, after having the index enabled for a while it will be unable to save a new record, because that new record would be trying to add new fields.

This can easily be mitigated by one of two code changes:

Start having multiple FieldInfos files, either per-segment, or grouping them to try to share as many fields as possible. This could be most valuable if there are a variety of shapes, such that an individual segment wouldn't have to many fields, but across the whole index it could be a lot. IIUC lucene generally tries to read all of the field infos, and thus this doesn't save much in terms of what needs to be read, and will probably add some disk usage (although presumably minimal in comparison to the index content).
Make the FieldInfos be able to spread the protobuf across multiple key-value pairs, similar to what we do for Records, when they are too large.

This could also be mitigated operationally by using partitioning, and using smaller partitions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LuceneOptimizedFieldInfosFormat may write a value that is too large #3005

LuceneOptimizedFieldInfosFormat may write a value that is too large #3005

ScottDugas commented Dec 16, 2024

LuceneOptimizedFieldInfosFormat may write a value that is too large #3005

LuceneOptimizedFieldInfosFormat may write a value that is too large #3005

Comments

ScottDugas commented Dec 16, 2024