Skip to content

Latest commit

 

History

History
308 lines (300 loc) · 11.3 KB

supported-aggregations.md

File metadata and controls

308 lines (300 loc) · 11.3 KB

Supported Aggregations

Pinot provides support for aggregations using GROUP BY. You can use the following functions to get the aggregated value.

Function Description Example
COUNT Get the count of rows in a group COUNT(*)
MIN Get the minimum value in a group MIN(playerScore)
MAX Get the maximum value in a group MAX(playerScore)
SUM Get the sum of values in a group SUM(playerScore)
AVG Get the average of the values in a group AVG(playerScore)
MODE Get the most frequent value in a group. When multiple modes are present it gives the minimum of all the modes. This behaviour can be overridden to get the maximum or the average mode.

MODE(playerScore)

MODE(playerScore, 'MIN')

MODE(playerScore, 'MAX')

MODE(playerScore, 'AVG')

MINMAXRANGE Returns the max - min value in a group MINMAXRANGE(playerScore)
PERCENTILE(column, N) Returns the Nth percentile of the group where N is a decimal number between 0 and 100 inclusive PERCENTILE(playerScore, 50), PERCENTILE(playerScore, 99.9)
PERCENTILEEST(column, N) Returns the Nth percentile of the group using Quantile Digest algorithm PERCENTILEEST(playerScore, 50), PERCENTILEEST(playerScore, 99.9)
PercentileTDigest(column, N) Returns the Nth percentile of the group using T-digest algorithm PERCENTILETDIGEST(playerScore, 50), PERCENTILETDIGEST(playerScore, 99.9)
DISTINCT Returns the distinct row values in a group DISTINCT(playerName)
DISTINCTCOUNT Returns the count of distinct row values in a group DISTINCTCOUNT(playerName)
DISTINCTCOUNTBITMAP Returns the count of distinct row values in a group. This function is accurate for INT or dictionary encoded column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collision. DISTINCTCOUNTBITMAP(playerName)
DISTINCTCOUNTHLL Returns an approximate distinct count using HyperLogLog in a group DISTINCTCOUNTHLL(playerName)
DISTINCTCOUNTRAWHLL Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching.

DISTINCTCOUNTRAWHLL(playerName)

<code></code>

FASTHLL (Deprecated) WARN: will be deprecated soon. FASTHLL stores serialized HyperLogLog in String format, which performs worse than DISTINCTCOUNTHLL, which supports serialized HyperLogLog in BYTES (byte array) format FASTHLL(playerName)
DistinctCountThetaSketch See Cardinality Estimation
DistinctCountRawThetaSketch See Cardinality Estimation

Multi-value column functions

The following aggregation functions can be used for multi-value columns

Function Description Example
COUNTMV Get the count of rows in a group COUNTMV(playerName)
MINMV Get the minimum value in a group MINMV(playerScores)
MAXMV Get the maximum value in a group MAXMV(playerScores)
SUMMV Get the sum of values in a group SUMMV(playerScores)
AVGMV Get the avg of values in a group AVGMV(playerScores)
MINMAXRANGEMV Returns the max - min value in a group MINMAXRANGEMV(playerScores)
PERCENTILEMV(column, N) Returns the Nth percentile of the group where N is a decimal number between 0 and 100 inclusive

PERCENTILEMV(playerScores, 50),

PERCENTILEMV(playerScores, 99.9)

PERCENTILEESTMV(column, N) Returns the Nth percentile of the group using Quantile Digest algorithm

PERCENTILEESTMV(playerScores, 50),

PERCENTILEESTMV(playerScores, 99.9)

PercentileTDigestMV(column, N) Returns the Nth percentile of the group using T-digest algorithm

PERCENTILETDIGESTMV(playerScores, 50),

PERCENTILETDIGESTMV(playerScores, 99.9),

DISTINCTCOUNTMV Returns the count of distinct row values in a group DISTINCTCOUNTMV(playerNames)
DISTINCTCOUNTBITMAPMV Returns the count of distinct row values in a group. This function is accurate for INT or dictionary encoded column, but approximate for other cases where hash codes are used in distinct counting and there may be hash collision. DISTINCTCOUNTBITMAPMV(playerNames)
DISTINCTCOUNTHLLMV Returns an approximate distinct count using HyperLogLog in a group DISTINCTCOUNTHLLMV(playerNames)
DISTINCTCOUNTRAWHLLMV Returns HLL response serialized as string. The serialized HLL can be converted back into an HLL and then aggregated with other HLLs. A common use case may be to merge HLL responses from different Pinot tables, or to allow aggregation after client-side batching. DISTINCTCOUNTRAWHLLMV(playerNames)
FASTHLLMV (Deprecated) stores serialized HyperLogLog in String format, which performs worse than DISTINCTCOUNTHLL, which supports serialized HyperLogLog in BYTES (byte array) format FASTHLLMV(playerNames)