The HLL data structure. Casts between bytea
and hll
are supported, should you choose to generate the contents of the hll
outside of the normal means. See STORAGE.markdown
.
SELECT hll_cardinality(E'\\xDEADBEEF');
OR
SELECT hll_cardinality(E'\\xDEADBEEF'::hll);
Represents a hashed data value. Backed by a 64-bit integer (int8in
). Typically only output by the hll_hash_*
functions. bigint
and integer
can both be cast to it if you want to skip hashing those values with the typical 123::hll_hashval
. Note that an integer
that is cast will also be cast, with sign extension, to a 64-bit integer.
All defaults for the hll_empty
and hll_add_agg
functions are in the C file, not in the SQL control file. The defaults can be changed (per connection) with:
SELECT hll_set_defaults(log2m, regwidth, expthresh, sparseon);
This returns a 4-tuple with the values of the prior defaults in the same order as the arguments.
hll_cardinality(hll)
- returns NULL
if the hll
's type is UNDEFINED
. Returns a double precision
floating point value otherwise. The prefix operator #
may be used as shorthand.
hll_union(hll, hll)
- returns the union (as an hll
) of two hll
s. The infix operator ||
may be used as shorthand.
hll_add(hll, hll_hashval)
- adds the hll_hashval
to the hll
and returns the new representation of the hll
. The infix operator ||
may be used as shorthand, like hll || hll_hashval
or hll_hashval || hll
.
hll_empty([log2m[, regwidth[, expthresh[, sparseon]]]])
- returns an empty hll
of the specified parameters. Any number of the parameters may be left blank and the default values will be used. See hll_set_defaults
.
hll_eq(hll, hll)
- returns a boolean
indicating whether the two hll
s match when their binary representations are compared. The infix operator =
may be used as shorthand.
hll_ne(hll, hll)
- returns a boolean
indicating whether the two hll
s do not match when their binary representations are compared. The infix operator <>
may be used as shorthand.
hll_union_agg(hll)
- aggregate function for hll
s that unions the hll
s in the input set and returns the hll
representing their union.
hll_add_agg(hll_hashval, [log2m[, regwidth[, expthresh[, sparseon]]]])
- aggregate function for hll_hashval
s that inserts each element in the input set into an hll
whose parameters are specified by the four optional arguments. If any of the four optional arguments are not specified, the defaults set with hll_set_defaults()
will be used. Returns the hll
representing the input set.
hll_print(hll)
- pretty-prints the hll
in a different way based on its type.
hll_schema_version(hll)
- returns the schema version value (integer) of the hll
.
hll_type(hll)
- returns the schema version-specific type value (integer) of the hll
. See the storage specification (v1.0.0) for more details.
hll_regwidth(hll)
- returns the register bit-width (integer) of the hll
.
hll_log2m(hll)
- returns the log-base-2 of the number of registers of the hll
. If the hll
is not of type FULL
or SPARSE
it returns the log2m
value which would be used if the hll
were promoted.
hll_expthresh(hll)
- returns a 2-tuple of the specified and effective EXPLICIT
promotion cutoffs for the hll
. The specified cutoff and the effective cutoff will be the same unless expthresh
has been set to 'auto' (-1
). In that case the specified value will be -1
and the effective value will be the implementation-dependent number of explicit values that will be stored before an EXPLICIT
hll
is promoted.
hll_sparseon(hll)
- returns 1
if the SPARSE
representation is enabled for the hll
, and 0
otherwise.
SELECT hll_set_output_version(int)
- sets the output schema version to the specified value and returns the previous value. The value set only applies within your connection.
SELECT hll_set_max_sparse(int)
- sets the maximum number of materialized registers in a SPARSE
hll
before it is promoted to a FULL
hll
for all hll
s that have sparseon
enabled. If -1
is provided, the cutoff will be determined based on storage efficiency and is implementation-dependent. If 0
is provided, the SPARSE
representation will be skipped and FULL
will be used instead. If any value greater than zero or less than 2^log2m
is provided, promotion will occur after that number of materialized registers. If any value greater than or equal to 2^log2m
is used, promotion to FULL
will never occur.
All values inserted into an hll
should be hashed, and as a result hll_add
and hll_add_agg
only accept hll_hashval
s. We do not recommend hashing floating point values raw as their bit-representation is not well-suited to hashing. Consider converting them to a reproducible, comparable binary representation (such as the IEEE 754-2008 interchange format) before hashing.
All the hll_hash_*
functions below accept a seed value, which defaults to 0
. We discourage negative seeds in order to maintain hashed-value compatibility with the Google Guava implementation of the 128-bit version of Murmur3. Negative hash seeds will produce a warning when used.
hll_hash_boolean(boolean)
- hashes the boolean
value into a hll_hashval
.
hll_hash_smallint(smallint)
- hashes the smallint
value into a hll_hashval
.
hll_hash_integer(integer)
- hashes the integer
value into a hll_hashval
.
hll_hash_bigint(bigint)
- hashes the bigint
value into a hll_hashval
.
hll_hash_bytea(bytea)
- hashes the bytea
value into a hll_hashval
.
hll_hash_text(text)
- hashes the text
value into a hll_hashval
.
hll_hash_any(scalar)
- hashes any PG data type by resolving the type dynamically and dispatching to the correct function for that type. This is significantly slower than the type-specific hash functions, and should only be used when the input type is not known beforehand.