Skip to content

dbiton/MeanTail

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mean Tail

Frequency estimation and Top-K using a counter based method. Other methods include RAP, Frequent, SpaceSaving.

##Abstract In the field of Top-K flow identification and flow frequency estimation, counter-based methods like Frequent and Space Saving offer a more favorable ratio of space to approximation error compared to sketch-based approaches, particularly when the flow identifiers are not very large. Among counter-based methods, RAP is regarded as state-of-the-art. Traditional counter-based techniques are not sensitive to the distribution of the stream. However, in many real-world scenarios, the distribution tends to be long-tailed, with most elements concentrated in the tail and sharing similar frequencies. This work aims to enhance the space-to-approximation error ratio for such distributions. Specifically, we propose dedicating a memory section to track tail frequencies. Rather than maintaining an individual counter for each tail element, we use a single aggregate counter to represent the entire tail, allowing us to estimate an average count for all tail elements. This approach is most effective when the tail distribution is close to uniform. By reallocating memory previously used for individual tail counters, we can track twice the number of keys in the tail section, improving overall accuracy for these low-frequency elements. To this end, we introduce Mean Tail (MT), a novel counter-based data structure that supports updates and queries, with a specific focus on the tail of the tracked keys, offering better accuracy than RAP when the dataset is sufficiently large. MT also outperforms RAP in Top-K identification recall. All our code is open-sourced.

Running

python src/evaluation/paper.py

Paper

"Mean Tail: Top-K and Frequency Estimation with Fewer Counters and More Keys" was accepted to AINA-2025

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages