Counting Functions
The package provides functions to count the occurrences of distinct values.
Counting over an Integer Range
StatsBase.counts
— Functioncounts(x, [wv::AbstractWeights])
counts(x, levels::UnitRange{<:Integer}, [wv::AbstractWeights])
-counts(x, k::Integer, [wv::AbstractWeights])
Count the number of times each value in x
occurs. If levels
is provided, only values falling in that range will be considered (the others will be ignored without raising an error or a warning). If an integer k
is provided, only values in the range 1:k
will be considered.
If a vector of weights wv
is provided, the proportion of weights is computed rather than the proportion of raw counts.
The output is a vector of length length(levels)
.
StatsBase.proportions
— Functionproportions(x, levels=span(x), [wv::AbstractWeights])
Return the proportion of values in the range levels
that occur in x
. Equivalent to counts(x, levels) / length(x)
.
If a vector of weights wv
is provided, the proportion of weights is computed rather than the proportion of raw counts.
proportions(x, k::Integer, [wv::AbstractWeights])
Return the proportion of integers in 1 to k
that occur in x
.
If a vector of weights wv
is provided, the proportion of weights is computed rather than the proportion of raw counts.
StatsBase.addcounts!
— Methodaddcounts!(r, x, levels::UnitRange{<:Integer}, [wv::AbstractWeights])
Add the number of occurrences in x
of each value in levels
to an existing array r
. For each xi ∈ x
, if xi == levels[j]
, then we increment r[j]
.
If a weighting vector wv
is specified, the sum of weights is used rather than the raw counts.
Counting over arbitrary distinct values
StatsBase.countmap
— Functioncountmap(x; alg = :auto)
-countmap(x::AbstractVector, wv::AbstractVector{<:Real})
Return a dictionary mapping each unique value in x
to its number of occurrences.
If a weighting vector wv
is specified, the sum of weights is used rather than the raw counts.
alg
is only allowed for unweighted counting and can be one of:
:auto
(default): ifStatsBase.radixsort_safe(eltype(x)) == true
then use:radixsort
, otherwise use:dict
.:radixsort
: ifradixsort_safe(eltype(x)) == true
then use the radix sort algorithm to sort the input vector which will generally lead to shorter running time for largex
with many duplicates. However the radix sort algorithm creates a copy of the input vector and hence uses more RAM. Choose:dict
if the amount of available RAM is a limitation.:dict
: useDict
-based method which is generally slower but uses less RAM, is safe for any data type, is faster for small arrays, and is faster when there are not many duplicates.
StatsBase.proportionmap
— Functionproportionmap(x)
-proportionmap(x::AbstractVector, w::AbstractVector{<:Real})
Return a dictionary mapping each unique value in x
to its proportion in x
.
If a vector of weights wv
is provided, the proportion of weights is computed rather than the proportion of raw counts.
StatsBase.addcounts!
— Methodaddcounts!(dict, x; alg = :auto)
-addcounts!(dict, x, wv)
Add counts based on x
to a count map. New entries will be added if new values come up.
If a weighting vector wv
is specified, the sum of the weights is used rather than the raw counts.
alg
is only allowed for unweighted counting and can be one of:
:auto
(default): ifStatsBase.radixsort_safe(eltype(x)) == true
then use:radixsort
, otherwise use:dict
.:radixsort
: ifradixsort_safe(eltype(x)) == true
then use the radix sort algorithm to sort the input vector which will generally lead to shorter running time for largex
with many duplicates. However the radix sort algorithm creates a copy of the input vector and hence uses more RAM. Choose:dict
if the amount of available RAM is a limitation.:dict
: useDict
-based method which is generally slower but uses less RAM, is safe for any data type, is faster for small arrays, and is faster when there are not many duplicates.