Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rethink how we persist historical data of scheduled benchmarking runs #1524

Open
hendrikmakait opened this issue Aug 5, 2024 · 2 comments
Open

Comments

@hendrikmakait
Copy link
Member

Right now, we store all the data in two tables test_run and tpch_run. These tables are highly denormalized, i.e., flattened out so that they don't need to be merged with anything. This has caused our database to blow up significantly. To reduce the size of our historical database, we should think about normalizing some of the data, e.g., run or cluster data. Moreover, we don't use all the columns for TPC-H data which further increases the database size and stops us from truncating historical data as we don't store the start or end time of runs.

We should rethink this to avoid future problems caused by too much data like the recent 2 month gap in persisted history caused by CI workers running OOM.

@shughes-uk
Copy link
Contributor

shughes-uk commented Oct 30, 2024

Would be nice if y'all plotted it as monthly/quarterly rollups or something as a one off. I'm sad I can't go find the historical stuff!

@shughes-uk
Copy link
Contributor

Nevermind, it looks like y'all have erased them permanently 😢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants