digestible

A modern C++ implementation of a merging t-digest data structure.

t-digests are data structures that provide highly accurate statistics for a large number of inputs in a small memory footprint. Things like minimum, maximum, average, quantile, and continuous distribution are readily calculated. Think improved histogram: compact way of accurately summarizing data where the bucket sizes need not be set in advance.

Notable features include:

No runtime memory allocation.
Strict, user-defined upper bound on structure size.
Written in modern C++ (c++17).
Templated mean and weight data types - save memory for smaller input ranges with a smaller data type.
Speed (average 61ns insertion time per element with a c. 2019 Intel i7)

Requirements:

A C++17 compatible compiler (developed with clang 7 using --std=c++17).
CMake (for building utility and running tests).

Sample Code

#include "digestible/digestible.h"

using namespace std;
using namespace digestible;

// Structure will have at most 20 elements.
// Also default to values of type float and weights of type unsigned.
tdigest digest(20);

// Add some data into the structure.
digest.insert(5);
digest.insert(10);
digest.insert(4);

// digestible buffers input data; explicitly merge it before querying statistics.
digest.merge();

printf("50th quantile is: %f\n", digest.quantile(50));

Additional examples can be found in the tests directory.

`digestible` Utility

To assist in integrating digestible in your implementation, please see utility directory.

Compression Factor

As the lone parameter to the digestible constructor, compression factor requires some discussion. Internally t-digests use the compression factor to distribute values throughout the input range. When combined with an internal distribution function a t-digest will distribute data such that the head and tail have fewer points and the middle has significantly more. Higher compression factors will yield better accuracy at the expense of increased memory usage. The digestible utility can be used to help strike this balance.

Data Structure Size

For a given compression factor f, value type size v, and weight type size w, the approximate t-digest size in bytes is:

f * max(v, w) * 8 + 64

Acknowledgments

This work would not be possible without the original algorithm and reference implementation by tdunning. A local copy of the original t-digest paper can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.circleci		.circleci
include/digestible		include/digestible
reference		reference
tests		tests
utility		utility
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

digestible

Notable features include:

Requirements:

Sample Code

`digestible` Utility

Compression Factor

Data Structure Size

Acknowledgments

About

Releases

Packages

Contributors 5

Languages

License

SpirentOrion/digestible

Folders and files

Latest commit

History

Repository files navigation

digestible

Notable features include:

Requirements:

Sample Code

digestible Utility

Compression Factor

Data Structure Size

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

`digestible` Utility

Packages