-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Derived Fields feature causes search query performance regression on indices with many field mappings #16603
Comments
Relevant PR: #13720 |
@rishabhmaurya -- maybe we could define an implicit index setting that uses the What do you think? |
@rishabh6788 Do we have something in OSB to capture/monitor this? |
At this point I don't think there is any workload that supports this use-case. |
related to #16564 |
@rishabhmaurya -- what do you think of changing this line: OpenSearch/server/src/main/java/org/opensearch/index/mapper/DerivedFieldResolverFactory.java Lines 48 to 52 in 9da6170
to: if (derivedFieldAllowed && derivedFieldsPresent) {
return new DefaultDerivedFieldResolver(queryShardContext, derivedFieldsObject, derivedFields);
} else {
return new NoOpDerivedFieldResolver();
} |
Describe the bug
When upgrading to 2.15.0 we experienced a significant search query performance regression in our test suite. After comparing profiling output from the same test in 2.14.0, it turned out to be the result of the new 'derived fields' feature. Specifically, it appears that for every field referenced in a query it iterates over every field defined in the index mappings. Until a few days ago each iteration performed an expensive string comparison.
OpenSearch/server/src/main/java/org/opensearch/index/mapper/DefaultDerivedFieldResolver.java
Line 69 in 61dbcd0
Our use case is a little unusual in that we have a large number (~10,000) of fields defined in our mappings and we issue large queries (~150 clauses). This issue added over a second of overhead to each query we performed (~7 ms per clause) in our testing. This slowdown was present despite not using the derived field type feature at all - each referenced field in the query would cause an iteration through the entire set of mapped fields. This issue was partially addressed on the main branch a few days ago as mentioned above - by short circuiting the expensive string comparison operation if the field mapping is not a derived field type. However, the iteration itself is still present. It should be possible to precompute the set of derived fields outside of the context of a particular query (e.g. add a method MapperService.derivedFieldTypes()) and iterate over only those and not every mapped field in the index.
As a workaround, for now we have disabled the 'derived fields' feature which replaces the DefaultDerivedFieldResolver with the NoOpDerivedFieldResolver.
Related component
Search:Performance
To Reproduce
Expected behavior
I would expect search query performance for the scenario described above (large number of defined field mappings and large number of query clauses in the search query) to perform comparably in the newer releases of OpenSearch as it did in 2.14.0.
Additional Details
Plugins
Host/Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: