-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize DbtVirtualenvBaseOperator: Single virtualenv per task execution #1200
base: main
Are you sure you want to change the base?
Optimize DbtVirtualenvBaseOperator: Single virtualenv per task execution #1200
Conversation
- Reuse virtualenv in single task execution to reduce creation overhead - Improve temporary directory management to use TemporaryDirectory when virtualenv_dir is set to None
✅ Deploy Preview for sunny-pastelito-5ecb04 canceled.
|
…mpochy/astronomer-cosmos into optimize-dbt-virtualenv-reuse
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1200 +/- ##
==========================================
+ Coverage 95.72% 95.75% +0.03%
==========================================
Files 64 64
Lines 3672 3675 +3
==========================================
+ Hits 3515 3519 +4
+ Misses 157 156 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
cmd: list[str], | ||
env: dict[str, str | bytes | os.PathLike[Any]], | ||
context: Context, | ||
) -> FullOutputSubprocessResult | dbtRunnerResult: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this fail when dbt is not installed in the Airflow worker node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your feedback!
This PR intends to install dbt within the virtual environment by running pip install
. I have also added tests to verify that the pip install
command is executed within the prepared virtual environment, so that dbt is available even if it's not pre-installed on the Airflow worker node.
Description
This PR optimizes the
DbtVirtualenvBaseOperator
by implementing virtualenv reuse within a single task execution. It reduces the overhead of creating new virtualenvs for each dbt command.The
DbtVirtualenvBaseOperator
in [email protected] creates a temporary directory and prepares a virtualenv twice whenvirtualenv_dir
isNone
andis_virtualenv_dir_temporary
isTrue
. This PR modifies it to create a directory and a virtualenv only once at the beginning of therun_command
method, avoiding this overhead.Additionally, I have added tests to ensure the directory for virtualenv will be deleted after the task execution. This is related to the issue reported in #958.
The changes include:
Related Issue(s)
#958
Breaking Change?
I believe this is not a breaking change.
Checklist