Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Refactor STATS substitution optimizer rules #110345

Open
2 tasks
alex-spies opened this issue Jul 1, 2024 · 2 comments
Open
2 tasks

ESQL: Refactor STATS substitution optimizer rules #110345

alex-spies opened this issue Jul 1, 2024 · 2 comments
Labels
:Analytics/ES|QL AKA ESQL >refactoring Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@alex-spies
Copy link
Contributor

In the substitutions batch of our LogicalPlanOptimizer, there's 4 rules that take an expression like | STATS foo = avg(x*x) + 2 and turn this into a simple aggregation with enclosing EVALs; in this example, this becomes (essentially)

| EVAL $$x = x*x
| STATS $$foo_sum = sum($$x), $$foo_count = count($$x)
| EVAL $$foo = $$foo_sum/$$foo_count, foo = $$foo + 2
| KEEP foo

This is becoming complicated and more difficult to argue about due to the substitutions happening in 4 rules; let's see if we can do with just 2 rules.

More specifically,

  1. ReplaceStatsNestedExpressionWithEval turns STATS avg(x*x) + 2 into EVAL $$x = x*x | STATS foo = avg($$x) + 2.
  2. ReplaceStatsAggExpressionWithEval then turns | STATS foo = avg($$x) + 2 into | STATS $$foo = avg($$x) | EVAL foo = $$foo + 2
  3. SubstituteSurrogates replaces | STATS $$foo = avg($$x) by | STATS $$foo_sum = sum($$x), $$foo_count = count($$x) | EVAL $$foo = $$foo_sum/$$foo_count
  4. Then we run ReplaceStatsNestedExpressionWithEval again to account for stuff that happened in TranslateMetricsAggregate

It makes sense that there's 1 rule that creates EVALs after the aggregation (ReplaceStatsNestedExpressionWithEval) and one that pulls nested expressions out of agg functions into an EVAL before the aggregation (ReplaceStatsAggExpressionWithEval).

  • However, SubstituteSurrogates should only substitute and let ReplaceStatsNestedExpressionWithEval handle creating the EVAL after the STATS.
  • We should check if we can somehow do without a second run of ReplaceStatsNestedExpressionWithEval after TranslateMetricsAggregate
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jul 1, 2024
@astefan
Copy link
Contributor

astefan commented Jul 25, 2024

SubstituteSurrogates does something ok now, but considering #100634 this rule should be executed multiple times instead. PropagateEvalFoldables (when enabled for aggregates as well) should cover cases where the foldable expression is not inside the aggregate, for example eval x = [5,6,7] | stats max(x). And, when PropagateEvalFoldables is executed, the SubstituteSurrogates rule is no longer executed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >refactoring Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

3 participants