Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite execution of microbatch models to avoid blocking the main thread #11332

Draft
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

QMalcolm
Copy link
Contributor

Resolves #11243

Problem

Solution

Checklist

  • I have read the contributing guide and understand what's expected of me.
  • I have run this code in development, and it appears to resolve the stated issue.
  • This PR includes tests, or tests are not required or relevant for this PR.
  • This PR has no interface changes (e.g., macros, CLI, logs, JSON artifacts, config files, adapter interface, etc.) or this PR has already received feedback and approval from Product or DX.
  • This PR includes type annotations for new and modified functions.

…stration to a runner

We're working to ensure the orchestration of microbatch batches doesn't block the main thread.
This will require a lot of disentangling that currently exists in run.py. As such, it made sense
to "quickly" stub out a guide of what needs to be done.
The `MicrobatchBatchRunner` will be for running individual batches,
whereas the `MicrobatchModelRunner` will handle the orchestration
of the batches to be run for a given model.
…Runner` directly

Previously `handle_job_queue` considered `MicrobatchModelRunner` special cases, and delegated
to `handle_microbatch_model` to orchestrate the batches instead of delegating to the
`MicrobatchModelRunner` directly. Now that the `MicrobatchModelRunner` will be handling batch
orchestration, we can appropriately delegate to it, and  remove the special casing.
The function won't work as is, but I felt it better to straight copy, commit,
and then modify it to work in the runner context iteratively.
@QMalcolm QMalcolm added the Skip Changelog Skips GHA to check for changelog file label Feb 24, 2025
@cla-bot cla-bot bot added the cla:yes label Feb 24, 2025
@dbt-labs dbt-labs deleted a comment from github-actions bot Feb 24, 2025
Copy link

codecov bot commented Feb 24, 2025

Codecov Report

Attention: Patch coverage is 29.53020% with 105 lines in your changes missing coverage. Please review.

Project coverage is 82.92%. Comparing base (f7c4c3c) to head (3e857dd).

❗ There is a different number of reports uploaded between BASE (f7c4c3c) and HEAD (3e857dd). Click for more details.

HEAD has 56 uploads less than BASE
Flag BASE (f7c4c3c) HEAD (3e857dd)
unit 10 0
integration 50 4
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #11332      +/-   ##
==========================================
- Coverage   88.97%   82.92%   -6.05%     
==========================================
  Files         189      189              
  Lines       24182    24172      -10     
==========================================
- Hits        21516    20045    -1471     
- Misses       2666     4127    +1461     
Flag Coverage Δ
integration 82.92% <29.53%> (-3.35%) ⬇️
unit ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
Unit Tests 82.92% <29.53%> (-6.05%) ⬇️
Integration Tests 82.92% <29.53%> (-3.35%) ⬇️

We don't need these functions in `MicrobatchModelRunner` because the
inherited versions of these methods from `ModelRunner` will work for
our needs. Of note, we can probably also remove the need of having these
functions in `MicrobatchBatchRunner` by renaming the `print_batch_start_line`
and `print_batch_result_line` to the method names that the `ModelRunner`
methods call.
…unner`

The `MicrobatchModelRunner.compile` does nothing because `MicrobatchModelRunner`
only orchestrates the batches of the model to run, and doesn't actually run
the sql of the model. Thus compilation is unnecessary in `MicrobatchModelRunner`
Of note, implementing `on_skip` for `MicrobatchModelRunner` is unecessary
because the inherited `on_skip` suffices.
This is necessary because the materialization executor needs the
`MicrobatchBuilder` in order to build the jinja context.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla:yes Skip Changelog Skips GHA to check for changelog file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Microbatch models shouldn't block the main thread in multi-threaded dbt runs.
1 participant