Rethink how we persist historical data of scheduled benchmarking runs #1524

hendrikmakait · 2024-08-05T14:05:32Z

Right now, we store all the data in two tables test_run and tpch_run. These tables are highly denormalized, i.e., flattened out so that they don't need to be merged with anything. This has caused our database to blow up significantly. To reduce the size of our historical database, we should think about normalizing some of the data, e.g., run or cluster data. Moreover, we don't use all the columns for TPC-H data which further increases the database size and stops us from truncating historical data as we don't store the start or end time of runs.

We should rethink this to avoid future problems caused by too much data like the recent 2 month gap in persisted history caused by CI workers running OOM.

The text was updated successfully, but these errors were encountered:

shughes-uk · 2024-10-30T06:37:37Z

Would be nice if y'all plotted it as monthly/quarterly rollups or something as a one off. I'm sad I can't go find the historical stuff!

shughes-uk · 2024-10-30T06:40:58Z

Nevermind, it looks like y'all have erased them permanently 😢

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rethink how we persist historical data of scheduled benchmarking runs #1524

Rethink how we persist historical data of scheduled benchmarking runs #1524

hendrikmakait commented Aug 5, 2024

shughes-uk commented Oct 30, 2024 •

edited

Loading

shughes-uk commented Oct 30, 2024

Rethink how we persist historical data of scheduled benchmarking runs #1524

Rethink how we persist historical data of scheduled benchmarking runs #1524

Comments

hendrikmakait commented Aug 5, 2024

shughes-uk commented Oct 30, 2024 • edited Loading

shughes-uk commented Oct 30, 2024

shughes-uk commented Oct 30, 2024 •

edited

Loading