You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Many years ago (2016, IIRC) the code to fetch individual source fields (according to the _source argument in a search request) was changed to always use Lucene's built-in automaton matching logic to pick which source fields to return. This possibly makes sense if there are dotted paths to object subfields or if there are wildcard patterns.
I don't think it makes sense when there's just a list of field names that someone wants to retrieve. In that case, we should probably just stick them all in a HashSet and evaluate a contains() predicate to decide which fields to include in a response.
In particular, if there are a large number of fields (and those fields have long names), we end up generating a big union between linear automata. The resulting graph can have many states and many transitions, so Lucene ends up throwing a TooComplexToDeterminizeException.
Related component
Search:Performance
To Reproduce
Create an index with a lot of fields (a few thousand), with long field names.
Run a search request that fetches a lot of those fields (a few thousand) in the _source parameter.
Get a TooComplexToDeterminizeException
Expected behavior
We shouldn't get an exception in the simple case.
(I think I'm okay with getting an exception when there are a lot of object subfields being requested or a bunch of wildcard patterns.)
Additional Details
Plugins
Please list all plugins currently enabled.
Screenshots
If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
OS: [e.g. iOS]
Version [e.g. 22]
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered:
Essentially, if the includes/excludes have no * or . characters, we just stick them in HashSets and return the fields that are in the includes but not the excludes.
Describe the bug
Many years ago (2016, IIRC) the code to fetch individual source fields (according to the
_source
argument in a search request) was changed to always use Lucene's built-in automaton matching logic to pick which source fields to return. This possibly makes sense if there are dotted paths to object subfields or if there are wildcard patterns.I don't think it makes sense when there's just a list of field names that someone wants to retrieve. In that case, we should probably just stick them all in a
HashSet
and evaluate acontains()
predicate to decide which fields to include in a response.In particular, if there are a large number of fields (and those fields have long names), we end up generating a big union between linear automata. The resulting graph can have many states and many transitions, so Lucene ends up throwing a
TooComplexToDeterminizeException
.Related component
Search:Performance
To Reproduce
_source
parameter.TooComplexToDeterminizeException
Expected behavior
We shouldn't get an exception in the simple case.
(I think I'm okay with getting an exception when there are a lot of object subfields being requested or a bunch of wildcard patterns.)
Additional Details
Plugins
Please list all plugins currently enabled.
Screenshots
If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: