ESQL: Documents STATS on multivalue groups

This documents running `STATS` on a multivalued column. It also removes a long out of date warning about a limitation of grouping.
nik9000 · Jul 10, 2024 · 8a3adbe · 8a3adbe
1 parent 1c80c1f
commit 8a3adbe
Show file tree

Hide file tree

Showing 2 changed files with 55 additions and 5 deletions.
diff --git a/docs/reference/esql/processing-commands/stats.asciidoc b/docs/reference/esql/processing-commands/stats.asciidoc
@@ -6,7 +6,7 @@
 
 [source,esql]
 ----
-STATS [column1 =] expression1[, ..., [columnN =] expressionN] 
+STATS [column1 =] expression1[, ..., [columnN =] expressionN]
 [BY grouping_expression1[, ..., grouping_expressionN]]
 ----
 
@@ -39,8 +39,8 @@ NOTE: `STATS` without any groups is much much faster than adding a group.
 
 NOTE: Grouping on a single expression is currently much more optimized than grouping
       on many expressions. In some tests we have seen grouping on a single `keyword`
-      column to be five times faster than grouping on two `keyword` columns. Do 
-      not try to work around this by combining the two columns together with 
+      column to be five times faster than grouping on two `keyword` columns. Do
+      not try to work around this by combining the two columns together with
       something like <<esql-concat>> and then grouping - that is not going to be
       faster.
 
@@ -80,14 +80,36 @@ include::{esql-specs}/stats.csv-spec[tag=statsCalcMultipleValues]
 include::{esql-specs}/stats.csv-spec[tag=statsCalcMultipleValues-result]
 |===
 
-It's also possible to group by multiple values (only supported for long and
-keyword family fields):
+[[esql-stats-mv-group]]
+If the grouping key is multivalued then the input row is in all groups:
+
+[source.merge.styled,esql]
+----
+include::{esql-specs}/stats.csv-spec[tag=mv-group]
+----
+[%header.monospaced.styled,format=dsv,separator=|]
+|===
+include::{esql-specs}/stats.csv-spec[tag=mv-group-result]
+|===
+
+It's also possible to group by multiple values:
 
 [source,esql]
 ----
 include::{esql-specs}/stats.csv-spec[tag=statsGroupByMultipleValues]
 ----
 
+If the all grouping keys are multivalued then the input row is in all groups:
+
+[source.merge.styled,esql]
+----
+include::{esql-specs}/stats.csv-spec[tag=multi-mv-group]
+----
+[%header.monospaced.styled,format=dsv,separator=|]
+|===
+include::{esql-specs}/stats.csv-spec[tag=multi-mv-group-result]
+|===
+
 Both the aggregating functions and the grouping expressions accept other
 functions. This is useful for using `STATS...BY` on multivalue columns.
 For example, to calculate the average salary change, you can use `MV_AVG` to

diff --git a/x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats.csv-spec b/x-pack/plugin/esql/qa/testFixtures/src/main/resources/stats.csv-spec
@@ -1819,3 +1819,31 @@ warning:Line 3:17: java.lang.ArithmeticException: / by zero
 w_avg:double
 null
 ;
+
+docsStatsMvGroup
+// tag::mv-group[]
+ROW i=1, a=["a", "b"] | STATS MIN(i) BY a | SORT a ASC
+// end::mv-group[]
+;
+
+// tag::mv-group-result[]
+MIN(i):integer | a:keyword
+             1 | a
+             1 | b
+// end::mv-group-result[]
+;
+
+docsStatsMultiMvGroup
+// tag::multi-mv-group[]
+ROW i=1, a=["a", "b"], b=[2, 3] | STATS MIN(i) BY a, b | SORT a ASC, b ASC
+// end::multi-mv-group[]
+;
+
+// tag::multi-mv-group-result[]
+MIN(i):integer | a:keyword | b:integer
+             1 | a         | 2
+             1 | a         | 3
+             1 | b         | 2
+             1 | b         | 3
+// end::multi-mv-group-result[]
+;