You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We run a little bit of an unconventional DBT setup and are running into a bit of an interesting problem.
We have a multi-tenant database that is sharded by schema to isolate tenants; in total the cluster contains a low 10s of TBs and low 100s of millions of records. Every tenant runs the same set of DBT models and we have ~150 tenants with ~80 DBT incremental models. Each of those tenants are running incremental transformations which are running once every 3 minutes, mimicking near real time updates.
In order to deploy new models we use a slim CI full refresh of sub-graphs of the DAG and due to the nature of our data in order to account for late arriving facts we must fully refresh the entire DAG on a periodic basis.
The combination of these two things running at the same time has caused quite a few issues on our PG cluster. Every time a DBT incremental runs it creates a temporary table which adds entries to pg_catalog.pg_attribute. Full refreshes can take a couple of hours in some cases for individual models and hold long running transactions which prevent autovacuum from cleaning up dead tuples in the catalogs. Over the course of a couple of hours this results in millions of dead tuples in pg_attribute which starts causing the postgres_get_relations macro to run very slowly and consuming a very high amount of CPU time.
Fundamentally this seems to be a conflict between PG's ability to book keep and artifacts generated by DBT. Assuming the pace of incremental runs can't be slowed down, is there a way to avoid so many temporary tables from being created? Or is there a recommendation on how to avoid this from occurring or a recommendation on structuring the project differently?
Preemptive Answers
We can't stop running the incrementals when the resets are running. This could cause multi-hour delays in fresh data being surfaced to tenants.
We have considered sharding into multiple postgres clusters, but that is a large amount of work we're hoping to avoid and cause other operational overhead.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
We run a little bit of an unconventional DBT setup and are running into a bit of an interesting problem.
We have a multi-tenant database that is sharded by schema to isolate tenants; in total the cluster contains a low 10s of TBs and low 100s of millions of records. Every tenant runs the same set of DBT models and we have ~150 tenants with ~80 DBT incremental models. Each of those tenants are running incremental transformations which are running once every 3 minutes, mimicking near real time updates.
In order to deploy new models we use a slim CI full refresh of sub-graphs of the DAG and due to the nature of our data in order to account for late arriving facts we must fully refresh the entire DAG on a periodic basis.
The combination of these two things running at the same time has caused quite a few issues on our PG cluster. Every time a DBT incremental runs it creates a temporary table which adds entries to
pg_catalog.pg_attribute
. Full refreshes can take a couple of hours in some cases for individual models and hold long running transactions which prevent autovacuum from cleaning up dead tuples in the catalogs. Over the course of a couple of hours this results in millions of dead tuples inpg_attribute
which starts causing thepostgres_get_relations
macro to run very slowly and consuming a very high amount of CPU time.Fundamentally this seems to be a conflict between PG's ability to book keep and artifacts generated by DBT. Assuming the pace of incremental runs can't be slowed down, is there a way to avoid so many temporary tables from being created? Or is there a recommendation on how to avoid this from occurring or a recommendation on structuring the project differently?
Preemptive Answers
Beta Was this translation helpful? Give feedback.
All reactions