Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Operator dash - Add duplicate row counts card #369

Open
bmtcril opened this issue Sep 11, 2023 · 2 comments
Open

Feat: Operator dash - Add duplicate row counts card #369

bmtcril opened this issue Sep 11, 2023 · 2 comments
Labels
enhancement Relates to new features or improvements to existing features good first issue A good task for a newcomer to start with

Comments

@bmtcril
Copy link
Contributor

bmtcril commented Sep 11, 2023

We should have a card that checks for duplicate rows, alerting operators that there are duplicates in the tables that may need to be dealt with.

@bmtcril bmtcril added the enhancement Relates to new features or improvements to existing features label Sep 11, 2023
@Ian2012
Copy link
Contributor

Ian2012 commented Jan 26, 2024

@bmtcril do you think this query would work on that one?

SELECT event_id, count(event_id) as counts
FROM xapi_events_all_parsed
GROUP BY event_id
HAVING counts > 1

Or, an MV would be better for that one?

@bmtcril
Copy link
Contributor Author

bmtcril commented Jan 29, 2024

This one is going to be tricky, I don't think we can do any kind of direct unbounded aggregation on xapi_events_all_parsed / xapi_events_all due to the size. That query takes ~250 secs on the 1.1B row test db, I tried a few others and they all were around there or worse. You might want to try a pre-aggregated projection on the table that just selects event id and count. If you come up with one that works I can try it on the big dataset and see how it performs.

@saraburns1 saraburns1 added the good first issue A good task for a newcomer to start with label Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Relates to new features or improvements to existing features good first issue A good task for a newcomer to start with
Projects
Status: Ready for Work
Development

No branches or pull requests

3 participants