Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Fetching source uses automata even for simple matching #17114

Open
msfroh opened this issue Jan 24, 2025 · 2 comments · May be fixed by #17160
Open

[BUG] Fetching source uses automata even for simple matching #17114

msfroh opened this issue Jan 24, 2025 · 2 comments · May be fixed by #17160
Assignees
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@msfroh
Copy link
Collaborator

msfroh commented Jan 24, 2025

Describe the bug

Many years ago (2016, IIRC) the code to fetch individual source fields (according to the _source argument in a search request) was changed to always use Lucene's built-in automaton matching logic to pick which source fields to return. This possibly makes sense if there are dotted paths to object subfields or if there are wildcard patterns.

I don't think it makes sense when there's just a list of field names that someone wants to retrieve. In that case, we should probably just stick them all in a HashSet and evaluate a contains() predicate to decide which fields to include in a response.

In particular, if there are a large number of fields (and those fields have long names), we end up generating a big union between linear automata. The resulting graph can have many states and many transitions, so Lucene ends up throwing a TooComplexToDeterminizeException.

Related component

Search:Performance

To Reproduce

  1. Create an index with a lot of fields (a few thousand), with long field names.
  2. Run a search request that fetches a lot of those fields (a few thousand) in the _source parameter.
  3. Get a TooComplexToDeterminizeException

Expected behavior

We shouldn't get an exception in the simple case.

(I think I'm okay with getting an exception when there are a lot of object subfields being requested or a bunch of wildcard patterns.)

Additional Details

Plugins
Please list all plugins currently enabled.

Screenshots
If applicable, add screenshots to help explain your problem.

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

@msfroh
Copy link
Collaborator Author

msfroh commented Jan 24, 2025

To clarify, I think we should add a special case to this method:

public static Function<Map<String, ?>, Map<String, Object>> filter(String[] includes, String[] excludes) {

Essentially, if the includes/excludes have no * or . characters, we just stick them in HashSets and return the fields that are in the includes but not the excludes.

@hye-on
Copy link
Contributor

hye-on commented Jan 25, 2025

Hi @msfroh Can I take this issue? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
Status: 🆕 New
Development

Successfully merging a pull request may close this issue.

3 participants