-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Regression] Increased execution times with version 1.7 compared to version 1.3 #741
Comments
Hi @lucapazz, thanks so much for taking the time to document what you're seeing. That table is a work of art! Within dbt Core and adapters, we haven't ever invested in performance testing or benchmarking. If I could wave a magic wand and introduce performance gating as part of our CI, I would in a heartbeat. This is certainly something that we could consider as a future investment. Below I'll try to share as much context as I can, please tell me if this is helpful or where you disagree. end user impact on performanceJaffle Shop is exactly the project I'd reach for to do this benchmark. However, my gut instinct is that in "real-world" dbt projects in production, the majority of the time comes from the queries in the data warehouse itself. This thought it shared in recent comments a post in r/dataengineering about a project that is dbt but 30X faster. Before we get into some of the contributors to performance degradation, I'm curious to understand how you feel the "performance degradation" of upgrading from contributors to performance degradation
I can however, speak to the changes that have been made in migrating to
|
Hi @dataders, thank you so much for your reply. Before answering your questions, I let you know that I took some time to do further investigation. I attach a zip file containing the following files:
Some consideration observing those results:
It seems as if the execution time increase is due to less efficient query "scheduling”. Let's put the jaffle_shop project aside and move on to our use cases... We populate our DWH using many dbt projects running in distinct Prefect flows. Each project contains a limited number of models and tests, so the execution times of an entire project are not so long. If you are interested in having more details on our use cases, we are also available for a meeting to show you our projects :) |
I tested with dbt 1.8 too. I didn't expect to get back to the execution times of version 1.3, nor to get values worse than version 1.7:
What can be the cause? |
@lucapazz Thanks for bringing this to our attention. We have done a lot of work to understand and improve the overall performance of dbt-core in recent releases, but have not spent the same amount of time looking at the performance of specific adapters. I'll try to spend some time looking at Redshift specifically to understand where this extra time is coming from and to see if there is a way we can improve it. Jaffle shop is not always the best project for judging performance, since it is so small and simple, but in this case the increase in reported query times suggests an underlying problem. |
Is this a regression in a recent version of dbt-redshift?
Current Behavior
Using the Jaffle Shop demo project as an example, it can be observed that execution times increase significantly starting from version 1.5:
dbt run
(empty target schema)dbt test
dbt run
(tables and views already present in the target schema)1.3.0
is the version we currently use and1.7
is the version we would like to update our dbt projects to.1.5
is the version in which theredshift-connector
replacedpsycopg2
.1.7.4
is the version that uses a separate transaction for each SQL command produced by dbt.1.7.latest
is the branch that fixes the issue #693 restoring the sequence of SQL commands and the number of transactions to those of versions < 1.7Expected/Previous Behavior
Same execution times as version
1.3
.Steps To Reproduce
dbt-redshift==1.3.0
profiles.yml
file in the demo project to connect to your Redshift database (we've used 4 threads)Relevant log output
No response
Environment
Additional Context
No response
The text was updated successfully, but these errors were encountered: