Skip to content

Commit

Permalink
Merge branch 'current' into cberger_add_git_strategies_blog
Browse files Browse the repository at this point in the history
  • Loading branch information
christineberger committed Feb 21, 2025
2 parents 04bdd8b + 768822b commit 4ef880b
Show file tree
Hide file tree
Showing 109 changed files with 2,634 additions and 671 deletions.
10 changes: 6 additions & 4 deletions website/api/get-discourse-comments.js
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,12 @@ const PREVIEW_ENV = 'deploy-preview-'
// Set API endpoint and headers
let discourse_endpoint = `https://discourse.getdbt.com`
let headers = {
'Accept': 'application/json',
'Api-Key': DISCOURSE_DEVBLOG_API_KEY,
'Api-Username': DISCOURSE_USER_SYSTEM,
}
Accept: "application/json",
"Api-Key": DISCOURSE_DEVBLOG_API_KEY,
"Api-Username": DISCOURSE_USER_SYSTEM,
// Cache comments in the browser (max-age) & CDN (s-maxage) for 1 day
"Cache-Control": "max-age=86400, s-maxage=86400 stale-while-revalidate",
};

async function getDiscourseComments(request, response) {
let topicId, comments, DISCOURSE_TOPIC_ID;
Expand Down
10 changes: 6 additions & 4 deletions website/api/get-discourse-topics.js
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,12 @@ async function getDiscourseTopics(request, response) {
// Set API endpoint and headers
let discourse_endpoint = `https://discourse.getdbt.com`
let headers = {
'Accept': 'application/json',
'Api-Key': DISCOURSE_API_KEY,
'Api-Username': DISCOURSE_USER,
}
Accept: "application/json",
"Api-Key": DISCOURSE_API_KEY,
"Api-Username": DISCOURSE_USER,
// Cache topics in the browser (max-age) & CDN (s-maxage) for 1 day
"Cache-Control": "max-age=86400, s-maxage=86400 stale-while-revalidate",
};

const query = buildQueryString(body)
if(!query) throw new Error('Unable to build query string.')
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ Here’s the challenge: monitoring tools, by their nature, look backward. They

[dbt Cloud](https://www.getdbt.com/product/dbt-cloud) unifies these perspectives into a single [control plane](https://www.getdbt.com/blog/data-control-plane-introduction), bridging proactive and retrospective capabilities:

- **Proactive planning**: In dbt, you declare the desired [state](https://docs.getdbt.com/reference/node-selection/syntax#state-selection) of your data before jobs even run — your architectural plans are baked into the pipeline.
- **Proactive planning**: In dbt, you declare the desired [state](https://docs.getdbt.com/reference/node-selection/state-selection) of your data before jobs even run — your architectural plans are baked into the pipeline.
- **Retrospective insights**: dbt Cloud surfaces [job logs](https://docs.getdbt.com/docs/deploy/run-visibility), performance metrics, and test results, providing the same level of insight as traditional monitoring tools.

But the real power lies in how dbt integrates these two perspectives. Transformation logic (the plans) and monitoring (the inspections) are tightly connected, creating a continuous feedback loop where issues can be identified and resolved faster, and pipelines can be optimized more effectively.
Expand Down
5 changes: 3 additions & 2 deletions website/blog/2025-01-23-levels-of-sql-comprehension.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@ date: 2025-01-23
is_featured: true
---


Ever since [dbt Labs acquired SDF Labs last week](https://www.getdbt.com/blog/dbt-labs-acquires-sdf-labs), I've been head-down diving into their technology and making sense of it all. The main thing I knew going in was "SDF understands SQL". It's a nice pithy quote, but the specifics are *fascinating.*

For the next era of Analytics Engineering to be as transformative as the last, dbt needs to move beyond being a [string preprocessor](https://en.wikipedia.org/wiki/Preprocessor) and into fully comprehending SQL. **For the first time, SDF provides the technology necessary to make this possible.** Today we're going to dig into what SQL comprehension actually means, since it's so critical to what comes next.
Expand Down Expand Up @@ -145,6 +144,8 @@ In introducing these concepts, we’re still just scratching the surface. There'
- How this is all going to roll into a step change in the experience of working with data
- What it means for doing great data work

Over the coming days, you'll be hearing more about all of this from the dbt Labs team - both familiar faces and our new friends from SDF Labs.
To learn more, check out [The key technologies behind SQL Comprehension](/blog/sql-comprehension-technologies).

Over the coming days, you'll hear more about all of this from the dbt Labs team - both familiar faces and our new friends from SDF Labs.

This is a special moment for the industry and the community. It's alive with possibilities, with ideas, and with new potential. We're excited to navigate this new frontier with all of you.
77 changes: 77 additions & 0 deletions website/blog/2025-02-19-faster-project-parsing-with-rust.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
---
title: "Parser, Better, Faster, Stronger: A peek at the new dbt engine"
description: "Remember how dbt felt when you had a small project? You pressed enter and stuff just happened immediately? We're bringing that back."
slug: faster-project-parsing-with-rust

authors: [joel_labes]

tags: [data ecosystem]
hide_table_of_contents: false

date: 2025-02-19
is_featured: true
---
Remember how dbt felt when you had a small project? You pressed enter and stuff just happened immediately? We're bringing that back.

<Lightbox src="/img/blog/2025-02-19-faster-project-parsing-with-rust/parsing_10k.gif" width="100%" title="Benchmarking tip: always try to get data that's good enough that you don't need to do statistics on it" />

After a [series of deep dives](/blog/the-levels-of-sql-comprehension) into the [guts of SQL comprehension](/blog/sql-comprehension-technologies), let's talk about speed a little bit. Specifically, I want to talk about one of the most annoying slowdowns as your project grows: project parsing.

When you're waiting a few seconds or a few minutes for things to start happening after you invoke dbt, it's because parsing isn't finished yet. But Lukas' [SDF demo at last month's webinar](https://www.getdbt.com/resources/webinars/accelerating-dbt-with-sdf) didn't have a big wait, so why not?

<!-- truncate -->

## A primer on parsing

Parsing your project (remember: [not your SQL](/blog/the-levels-of-sql-comprehension)!) is how dbt builds the dependency graph of models and macros. If you've ever looked at a `manifest.json` and noticed all the `depends_on` blocks, that's what we're talking about.

Without the resolved dependencies, dbt can't filter down to a subset of your project – this is why parsing is always an all-or-nothing affair. You can't do `dbt parse --select my_model+` because parsing is what works out what's on the other side of that plus. (Of course, most projects use partial parsing so are not starting from scratch every time).

All those refs and macros are defined in Jinja. I don't know if you've ever thought about how Jinja gets from curly braces into text, but it's pretty weird! It's actually a two-step process: first it gets converted into Python code, and then that Python code is *itself run to generate a string*!

This is kinda slow. Not so much as a one-off, but a project with 10,000 nodes might have 15-20,000 dependencies so every millisecond adds up.

## What if we wanted it to be faster?

Since running the code is slow, one way to get results faster is to not run the code. Since v1.0, dbt's parser has [used a static analyzer](https://github.com/dbt-labs/dbt-core/blob/main/docs/guides/parsing-vs-compilation-vs-runtime.md#:~:text=Simple%20Jinja%2DSQL%20models%20(using%20just%20ref()%2C%20source()%2C%20%26/or%20config()%20with%20literal%20inputs)%20are%20also%20statically%20analyzed%2C%20using%20a%20thing%20we%20built.%20This%20is%20very%20fast%20(~0.3%20ms)) to resolve refs when possible, which is [about 3x faster](https://docs.getdbt.com/reference/parsing#:~:text=For%20now%2C%20the%20static%20parser,speedup%20in%20the%20model%20parser) than going through the whole rigmarole above.

<Lightbox src="/img/blog/2025-02-19-faster-project-parsing-with-rust/evaluation_strategies_1.png" width="100%" />

The other way you could get the result faster is to run the code faster.

The original author of Jinja also wrote [minijinja](https://github.com/mitsuhiko/minijinja) – a Rust implementation of a subset of the original Jinja library.

This is not the post for a deep dive on *why* Rust and Python have such different performance characteristics, but the key takeaway is that [minijinja can *fully evaluate* a ref 30 times faster](https://github.com/mitsuhiko/minijinja/tree/main/benchmarks) than today's dbt can even *statically analyze* it.

<Lightbox src="/img/blog/2025-02-19-faster-project-parsing-with-rust/evaluation_strategies_2.png" width="100%" />

Our analysis in the leadup to dbt v1.0 showed that the static analyzer could handle 60% of models. Evaluating refs 30x faster in 60% of models would itself be great.

But recall that static analysis was the workaround for evaluating Jinja being slow. Since **we can now evaluate Jinja faster than we can statically analyze it**, let's just<sup>†</sup> evaluate everything!

<sup>†</sup>The word "just" is doing a *lot* of heavy lifting here. In practice, there's a lot happening behind the scenes to get both the performance of minijinja and the ability to process the full range of capabilities of a dbt project. Another story for another day.

## What does this mean in practice?

As you saw at the top of the post, I've been running some synthetic projects against an early build of the new dbt engine, and it's pretty snappy - **parsing a 10,000 model project in under 600ms**. Let's see how it goes with some other common project sizes:

<Lightbox src="/img/blog/2025-02-19-faster-project-parsing-with-rust/parse_time_comparison_linear.png" width="100%" title="You might have to squint, but I promise there's a yellow line on each of those groups" />

Even a 20,000-model project finished parsing in about a second. The equivalent cold parse takes well over a minute, and a partial parse (with no changed files) took about 12 seconds.

Let's look at one more comparison: **100k models. I need to break out the log scale for this one:**

<Lightbox src="/img/blog/2025-02-19-faster-project-parsing-with-rust/parse_time_comparison_log.png" width="100%" />

The new dbt engine parsed our 100,000 model example project in under 10 seconds, compared with almost 20 minutes.

Let me be clear: I do not think you should put 100,000 models into your project! I mostly ran that one for the lols. But back in the realm of project sizes that actually exist:

- If your project isn't currently eligible for partial parsing, cold parses in Rust are fast enough to make it a moot point.
- Regardless of how your project parses today, your project will feel like it's a couple of orders of magnitude smaller than it is.

## We're just getting started

Speed is just one benefit to come from this integration, and pales in comparison to, say, [the importance of logical plans](https://roundup.getdbt.com/p/the-power-of-a-plan-how-logical-plans). But it sure is fun!

The teams are still hard at work integrating the two tools, and we'll have more to share on how the developer experience will change thanks to SDF's tech at our [Developer Day event in March](https://www.getdbt.com/resources/webinars/dbt-developer-day).
8 changes: 4 additions & 4 deletions website/blog/ctas.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@
subheader: Catch up on Coalesce 2024 and register to access a select number of on-demand sessions.
button_text: Register and watch
url: https://coalesce.getdbt.com/register/online
- name: spring_launch_2025
header: 2025 dbt Cloud Launch Showcase
subheader: Join us on March 19th or 20th to hear from our executives and product leaders about the latest features landing in dbt.
- name: developer_day_2025
header: dbt Developer Day
subheader: Join us on March 19th or 20th to hear from dbt Labs product leads about exciting new and coming-soon features designed to supercharge data developer workflows.
button_text: Save your seat
url: https://www.getdbt.com/resources/webinars/2025-dbt-cloud-launch-showcase
url: https://www.getdbt.com/resources/webinars/dbt-developer-day
6 changes: 3 additions & 3 deletions website/blog/metadata.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@
featured_image: ""

# This CTA lives in right sidebar on blog index
featured_cta: "spring_launch_2025"
featured_cta: "developer_day_2025"

# Show or hide hero title, description, cta from blog index
show_title: true
show_description: true
show_title: false
show_description: false
hero_button_url: "/blog/welcome"
hero_button_text: "Start here"
hero_button_new_tab: false
Loading

0 comments on commit 4ef880b

Please sign in to comment.