Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Workload Improvement] Adding single term aggregation task in workloads #165

Open
sandeshkr419 opened this issue Jan 29, 2024 · 2 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@sandeshkr419
Copy link

sandeshkr419 commented Jan 29, 2024

Is your feature request related to a problem?

While working on aggregations performance, I encountered a gap in workloads.
Presently, all the workloads do not have single term aggregation request as part of their runs. This is one of the common use case which we should definitely be benchmarking.

Example search requests which is missing:

GET /my_index/_search
{
  "size": 0,
  "aggs": {
    "response_codes": {
      "terms": {
        "field" : "response_code"
      }
    }
  }
}

Task in custom workload I used temporarily:

{
      "name": "country_term_aggregation",
      "operation-type": "search",
      "body": {
        "size": 0,
        "aggs": {
          "country_population": {
            "terms": {
              "field": "country_code.raw"
            }
          }
        }
      }
    }

What solution would you like?

  1. Identify the workloads for which it would make sense to include the term aggregations. 2 of the obvious inclusions I see is geonames & http_logs
  2. The existing workload tasks which have term in their name should be renamed to term_query to create a distinction between term queries and term aggregations.
  3. Include single term aggregations in the identified workloads.

What alternatives have you considered?

None.

Do you have any additional context?

Cases in terms aggregations when the fielddata is indexed or not should be accounted separately. For example, with geonames workload, if you trigger the above query with "field": "country_code.raw" - then low cardinality workflow is triggered, however, if you run with "field": "country_code" - then the regular dense cardinality workflow is triggered.

@sandeshkr419 sandeshkr419 added enhancement New feature or request untriaged labels Jan 29, 2024
@gkamat gkamat removed the untriaged label Jan 30, 2024
@gkamat
Copy link
Collaborator

gkamat commented Jan 30, 2024

This will certainly improve and flesh out the functionality of the current workloads. However, a discussion is warranted on how the term query should be renamed.

@rishabhmaurya
Copy link

Lets also add cardinality aggregation operation in BIG5 workload on a low cardinality field if it makes sense.
related to opensearch-project/OpenSearch#11959

@IanHoang IanHoang added the good first issue Good for newcomers label Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants