Pinot provides support for aggregations using GROUP BY. You can use the following functions to get the aggregated value.
Function | Description | Example |
---|---|---|
COUNT | Get the count of rows in a group | COUNT(*)
|
MIN | Get the minimum value in a group | MIN(playerScore)
|
MAX | Get the maximum value in a group | MAX(playerScore)
|
SUM | Get the sum of values in a group | SUM(playerScore)
|
AVG | Get the average of the values in a group | AVG(playerScore)
|
MODE | Get the most frequent value in a group. When multiple modes are present it gives the minimum of all the modes. This behaviour can be overridden to get the maximum or the average mode. |
|
MINMAXRANGE | Returns the max - min value in a group |
MINMAXRANGE(playerScore)
|
PERCENTILE(column, N) | Returns the Nth percentile of the group where N is a decimal number between 0 and 100 inclusive | PERCENTILE(playerScore, 50), PERCENTILE(playerScore, 99.9)
|
PERCENTILEEST(column, N) | Returns the Nth percentile of the group using Quantile Digest algorithm | PERCENTILEEST(playerScore, 50), PERCENTILEEST(playerScore, 99.9)
|
PercentileTDigest(column, N) | Returns the Nth percentile of the group using T-digest algorithm | PERCENTILETDIGEST(playerScore, 50), PERCENTILETDIGEST(playerScore, 99.9)
|
DISTINCT | Returns the distinct row values in a group | DISTINCT(playerName)
|
DISTINCTCOUNT | Returns the count of distinct row values in a group | DISTINCTCOUNT(playerName)
|
DISTINCTCOUNTBITMAP | Returns the count of distinct row values in a group. This function is accurate for INT or dictionary encoded column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collision. | DISTINCTCOUNTBITMAP(playerName)
|
DISTINCTCOUNTHLL | Returns an approximate distinct count using HyperLogLog in a group | DISTINCTCOUNTHLL(playerName)
|
DISTINCTCOUNTRAWHLL | Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching. |
<code></code> |
FASTHLL (Deprecated) | WARN: will be deprecated soon. FASTHLL stores serialized HyperLogLog in String format, which performs worse than DISTINCTCOUNTHLL, which supports serialized HyperLogLog in BYTES (byte array) format | FASTHLL(playerName)
|
DistinctCountThetaSketch | See Cardinality Estimation | |
DistinctCountRawThetaSketch | See Cardinality Estimation |
The following aggregation functions can be used for multi-value columns
Function | Description | Example |
---|---|---|
COUNTMV | Get the count of rows in a group | COUNTMV(playerName)
|
MINMV | Get the minimum value in a group | MINMV(playerScores)
|
MAXMV | Get the maximum value in a group | MAXMV(playerScores)
|
SUMMV | Get the sum of values in a group | SUMMV(playerScores)
|
AVGMV | Get the avg of values in a group | AVGMV(playerScores)
|
MINMAXRANGEMV | Returns the max - min value in a group |
MINMAXRANGEMV(playerScores)
|
PERCENTILEMV(column, N) | Returns the Nth percentile of the group where N is a decimal number between 0 and 100 inclusive |
|
PERCENTILEESTMV(column, N) | Returns the Nth percentile of the group using Quantile Digest algorithm |
|
PercentileTDigestMV(column, N) | Returns the Nth percentile of the group using T-digest algorithm |
|
DISTINCTCOUNTMV | Returns the count of distinct row values in a group | DISTINCTCOUNTMV(playerNames)
|
DISTINCTCOUNTBITMAPMV | Returns the count of distinct row values in a group. This function is accurate for INT or dictionary encoded column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collision. | DISTINCTCOUNTBITMAPMV(playerNames)
|
DISTINCTCOUNTHLLMV | Returns an approximate distinct count using HyperLogLog in a group | DISTINCTCOUNTHLLMV(playerNames)
|
DISTINCTCOUNTRAWHLLMV | Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching. | DISTINCTCOUNTRAWHLLMV(playerNames)
|
FASTHLLMV (Deprecated) | stores serialized HyperLogLog in String format, which performs worse than DISTINCTCOUNTHLL, which supports serialized HyperLogLog in BYTES (byte array) format | FASTHLLMV(playerNames)
|