Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregations could reap unused key states #5840

Open
cpwright opened this issue Jul 24, 2024 · 1 comment
Open

Aggregations could reap unused key states #5840

cpwright opened this issue Jul 24, 2024 · 1 comment
Assignees
Labels
Milestone

Comments

@cpwright
Copy link
Contributor

As a user, I want to build aggregations of recent data without consuming memory for states that have since been removed.

When a state loses it's last row and is removed from the result, we would move the output position to a free list. We need to take care not to remove/add a new state on the same cycle.

Instead of leaving empty output positions, we should use a scheme like our incremental rehash credits to shift the output rows down to the unused slots. This would keep our incremental operations from needing to perform shifts linear in output size on a given cycle; but would necessitate additional data movement.

Making this change would be breaking, because we would no longer preserve initial encounter order for reincarnated states.

@cpwright cpwright added feature request New feature or request triage labels Jul 24, 2024
@rcaudy rcaudy added query engine core Core development tasks breaking and removed triage labels Jul 25, 2024
@rcaudy rcaudy added this to the 4. Unscheduled milestone Jul 25, 2024
@rcaudy
Copy link
Member

rcaudy commented Jul 25, 2024

This is likely a performance loss with no gain for some cases (we need to track previous state for key columns, etc, and the code will be more complex). For long-running aggregations where buckets can go away, this may be a significant win in memory usage.

We have some engine tools that rely on states never moving, including AggregationRowLookup (used for tree lookup and data index lookup).

This will need to be configurable, at a minimum, which likely means adding a builder interface to aggBy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants