Temporary table churn from DBT incrementals causing extreme bloat in pg_catalog tables #164

rchui · 2024-10-17T15:45:00Z

rchui
Oct 17, 2024

We run a little bit of an unconventional DBT setup and are running into a bit of an interesting problem.

We have a multi-tenant database that is sharded by schema to isolate tenants; in total the cluster contains a low 10s of TBs and low 100s of millions of records. Every tenant runs the same set of DBT models and we have ~150 tenants with ~80 DBT incremental models. Each of those tenants are running incremental transformations which are running once every 3 minutes, mimicking near real time updates.

In order to deploy new models we use a slim CI full refresh of sub-graphs of the DAG and due to the nature of our data in order to account for late arriving facts we must fully refresh the entire DAG on a periodic basis.

The combination of these two things running at the same time has caused quite a few issues on our PG cluster. Every time a DBT incremental runs it creates a temporary table which adds entries to pg_catalog.pg_attribute. Full refreshes can take a couple of hours in some cases for individual models and hold long running transactions which prevent autovacuum from cleaning up dead tuples in the catalogs. Over the course of a couple of hours this results in millions of dead tuples in pg_attribute which starts causing the postgres_get_relations macro to run very slowly and consuming a very high amount of CPU time.

Fundamentally this seems to be a conflict between PG's ability to book keep and artifacts generated by DBT. Assuming the pace of incremental runs can't be slowed down, is there a way to avoid so many temporary tables from being created? Or is there a recommendation on how to avoid this from occurring or a recommendation on structuring the project differently?

Preemptive Answers

We can't stop running the incrementals when the resets are running. This could cause multi-hour delays in fresh data being surfaced to tenants.
We have considered sharding into multiple postgres clusters, but that is a large amount of work we're hoping to avoid and cause other operational overhead.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Temporary table churn from DBT incrementals causing extreme bloat in pg_catalog tables #164

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Temporary table churn from DBT incrementals causing extreme bloat in pg_catalog tables #164

rchui Oct 17, 2024

Replies: 0 comments

rchui
Oct 17, 2024