Skip to content

Commit

Permalink
ESQL: Documents STATS on multivalue groups
Browse files Browse the repository at this point in the history
This documents running `STATS` on a multivalued column. It also removes
a long out of date warning about a limitation of grouping.
  • Loading branch information
nik9000 committed Jul 10, 2024
1 parent 1c80c1f commit 8a3adbe
Show file tree
Hide file tree
Showing 2 changed files with 55 additions and 5 deletions.
32 changes: 27 additions & 5 deletions docs/reference/esql/processing-commands/stats.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

[source,esql]
----
STATS [column1 =] expression1[, ..., [columnN =] expressionN]
STATS [column1 =] expression1[, ..., [columnN =] expressionN]
[BY grouping_expression1[, ..., grouping_expressionN]]
----

Expand Down Expand Up @@ -39,8 +39,8 @@ NOTE: `STATS` without any groups is much much faster than adding a group.

NOTE: Grouping on a single expression is currently much more optimized than grouping
on many expressions. In some tests we have seen grouping on a single `keyword`
column to be five times faster than grouping on two `keyword` columns. Do
not try to work around this by combining the two columns together with
column to be five times faster than grouping on two `keyword` columns. Do
not try to work around this by combining the two columns together with
something like <<esql-concat>> and then grouping - that is not going to be
faster.

Expand Down Expand Up @@ -80,14 +80,36 @@ include::{esql-specs}/stats.csv-spec[tag=statsCalcMultipleValues]
include::{esql-specs}/stats.csv-spec[tag=statsCalcMultipleValues-result]
|===

It's also possible to group by multiple values (only supported for long and
keyword family fields):
[[esql-stats-mv-group]]
If the grouping key is multivalued then the input row is in all groups:

[source.merge.styled,esql]
----
include::{esql-specs}/stats.csv-spec[tag=mv-group]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/stats.csv-spec[tag=mv-group-result]
|===

It's also possible to group by multiple values:

[source,esql]
----
include::{esql-specs}/stats.csv-spec[tag=statsGroupByMultipleValues]
----

If the all grouping keys are multivalued then the input row is in all groups:

[source.merge.styled,esql]
----
include::{esql-specs}/stats.csv-spec[tag=multi-mv-group]
----
[%header.monospaced.styled,format=dsv,separator=|]
|===
include::{esql-specs}/stats.csv-spec[tag=multi-mv-group-result]
|===

Both the aggregating functions and the grouping expressions accept other
functions. This is useful for using `STATS...BY` on multivalue columns.
For example, to calculate the average salary change, you can use `MV_AVG` to
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1819,3 +1819,31 @@ warning:Line 3:17: java.lang.ArithmeticException: / by zero
w_avg:double
null
;

docsStatsMvGroup
// tag::mv-group[]
ROW i=1, a=["a", "b"] | STATS MIN(i) BY a | SORT a ASC
// end::mv-group[]
;

// tag::mv-group-result[]
MIN(i):integer | a:keyword
1 | a
1 | b
// end::mv-group-result[]
;

docsStatsMultiMvGroup
// tag::multi-mv-group[]
ROW i=1, a=["a", "b"], b=[2, 3] | STATS MIN(i) BY a, b | SORT a ASC, b ASC
// end::multi-mv-group[]
;

// tag::multi-mv-group-result[]
MIN(i):integer | a:keyword | b:integer
1 | a | 2
1 | a | 3
1 | b | 2
1 | b | 3
// end::multi-mv-group-result[]
;

0 comments on commit 8a3adbe

Please sign in to comment.