Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Space Saving, HyperLogLog and Hierarchical Heavy Hitters algorithms #1559

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

laraabastoss
Copy link

Added coded and respective documentation for the Space Saving, HyperLogLog and Hierarchical Heavy Hitters algorithms within the sketch section.

@smastelini
Copy link
Member

Hi @laraabastoss, thanks for your contribution! Recently some errors were fixed in the automated tests, so I am re-running them for this PR. Let's see how that goes and if you need to change something in your code. Perhaps you will need to pull the latest changes from the main branch.

Aside from that, I wanted to discuss a scope question. River already has a Heavy Hitters algorithm that is bound to provide the same functionality as Space Saving. I noticed that the current implementation in River supports a fading factor. I do not know the pros and cons of Space Saving vs Lossy Count with Forgetting Factor (the core of River's version), but I think we could do some renaming to keep both versions.

The idea is to follow the convention we followed so far for the stuff in river.sketch:

  • We use names that reflect functionality, rather than the actual algorithm name. For example, Counter, Set, and so on. The algorithm name and related info go in the documentation. So, in your case:
    • Space Saving -> Heavy Hitters (in this case, we will need to find a new name for the current implementation in sketch.HeavyHitters, like FadingHeavyHitters or something else -- suggestions are welcome)
    • Hierarchical Heavy Hitters is already conforming to the implicit convention
    • HyperLogLog -> Cardinality? Suggestions are welcome!
  • We try as much as possible to inspire in Python's collections module for API usage. This brings familiarity to the users and brings name choices tested by time :D. You can check the existing methods in the sketch module for inspiration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants