Replies: 9 comments 20 replies
-
will this be limited to multiple projects using a single adapter? single adapter version? |
Beta Was this translation helpful? Give feedback.
-
Hi @jtcohen6! You know I'm excited about this one! I have a couple of questions:
Thanks Jeremy! Super excited about this :) |
Beta Was this translation helpful? Give feedback.
-
I've renamed this discussion from "Multi-project deployments" to "Multi-project collaboration." Why? The primary goal of this initiative—the challenge it seeks to solve—is not in finding the most elegant way to deploy projects that are defined in separate repositories. That is an important outcome, and a lot of it is already possible with Rather, our primary goal has been, and continues to be, enabling multiple teams to collaborate. To actually own their data models. To publish, share, and maintain those models with other teams in predictable ways. To do so with the recognition that those producing & consuming teams may have conflicting interests & incentives. One team should not operate in a vacuum, siloed off from every other. They should be able to leverage their colleagues' work, while still having control over the scope of their project. For folks who are using dbt at a smaller scale, and don't need to tackle organizational complexity with capabilities around model governance: that's okay! None of them is required for upgrading to v1.5 / v1.6, and there's a lot of other good stuff besides. I'd just say: Know that we're invested in making dbt scale as a framework, if/when you need it. For everyone who does need these capabilities: Let's make the most of them, together. |
Beta Was this translation helpful? Give feedback.
-
Consider the following scenario:
With packages this scenario was not possible: Project B could use Project A as a package, but adding Project B as a package into Project A would mean a "circular dependency". Think domain A creating input for domain B and domain B creating input for domain A on something completely unrelated. Will this scenario be supported in 1.5 or 1.6? |
Beta Was this translation helpful? Give feedback.
-
@jtcohen6 reading your previous comments, especially #6725 (reply in thread), I wonder if we can say that's gonna be more a
and the mention of |
Beta Was this translation helpful? Give feedback.
-
Hi @jtcohen6 , many thanks for sharing all this information, it's very helpful. After being excited by the release of dbt-core 1.6, I was a bit disappointed to find out that cross-project refs would be available for dbt Cloud only, which led me to find and read this thread. Thanks again for all the explanations. There's a basic technical point that's still not completely clear to me: why does cross-project ref needs the state of the ref'ed project? After all, it doesn't need it when including the project as a package, so what's the difference? I think I have a guess but I'd love for someone who actually knows to clarify that. Apologies if this has already been explained somewhere! |
Beta Was this translation helpful? Give feedback.
-
Hello @jtcohen6 can you clarify the reasoning between splitting up into multiple projects in more detail and the number of ~500 models that you quoted? We're wondering if it's better for us to stick with a single dbt project instead of having to deal with cross-project references. Our dbt project currently contains 500 models and could grow to 1000 models over the next year but shouldn't grow much further than that. We don't use dbt cloud. Unfortunately the github repo is internal so I can't share it but the user docs are public: https://user-guidance.analytical-platform.service.justice.gov.uk/tools/create-a-derived-table/ Parsing I (full-)parsed a project containing about 500 models. This took ~20 secs:
I partial-parsed a project containing about 500 models. This took <2 secs:
Hence as long as I don't make any changes that trigger a full parse, parsing becomes a non-issue? Finding the right model We split our models into business-facing domains, assigning a directory to each domain. There is also a data engineer responsible for each domain to ensure consistency inter and intra-domain. Analysts can search across the repo for the right model, or create the model in the right location. How would splitting the dbt project into multiple dbt project by domain, whether in the same repo or in multiple repos, would make it easier to find/add a model? Thanks !! |
Beta Was this translation helpful? Give feedback.
-
Not all heroes wear capes but they do write open source Python!
…On Thu, Sep 14, 2023 at 08:58 Nicholas A. Yager ***@***.***> wrote:
I know this discussion/thread have been disappointing for several
community members who were excited to be getting multi-project capabilities
this year.
Ooh, I'm in this picture 😆
As a dissapointed community member, I've have created dbt-loom
<https://github.com/nicholasyager/dbt-loom>, an Apache 2-licensed python
package that enables cross-project references in dbt-core and hybrid
Core/Cloud environments. While it may not be as refined as dbt Labs'
official approach in dbt Cloud, it effectively meets the needs discussed in
this thread.
@jtcohen6 <https://github.com/jtcohen6> As for the disappointment, no
hard feelings! You all are navigating an tricky balance between value
creation and commercial success. Looking ahead, I believe that the
community could benefit from clearer labeling of dbt Cloud-specific
functionality and perhaps separate a forum for non-open-source dbt Labs'
products. In any case, I remain confident in the team's good intentions and
eagerly anticipate the innovative tools and products to come.
—
Reply to this email directly, view it on GitHub
<#6725 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACELLMHQSIVE3DAV2CYDKB3X2MLQXANCNFSM6AAAAAAUGLF3ZA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I'm going to close this discussion as resolved, though we're only just beginning our mesh-y journey together. Thank you to everyone for your full & honest participation since January. It's meant a lot to me personally. Check out the docs & guides, if you haven't already! |
Beta Was this translation helpful? Give feedback.
-
Or: how I learned to stop worrying and love the ✨ dbt mesh ✨. This discussion supersedes #5244.
The bigger a dbt project, the harder it gets: to develop with speed, to contribute with confidence, and to share with clarity. It's frustrating to see parse times get slower, wait for longer IDE loading times, or disentangle a mess of conflicting CI builds—but above all, it becomes harder to find the right model (or even know which model is the right one). We want the other kind of network effect: the more of an organization's knowledge graph is encoded in dbt models, the more value dbt can deliver in disseminating that knowledge through the organization.
Background
There are more large projects, and they are getting larger. We see it in our anonymous usage data, and we hear about it firsthand from mature organizations rolling out dbt deployments to more collaborators than ever before.
I define the ideal project size as <500 models. This is an arbitrary line in the sand, but for me it reflects the point at which dbt Labs' own internal analytics project went from feeling "manageable" to "there's too much going on." The goal of this initiative is to enable large dbt projects—even ones maintained by relatively small data teams!—to separate their concerns, and collaborate more effectively.
Over the past year, the number of "large" projects (>500 models) has tripled. A year ago, of all known dbt models, one out of every four was running in a "large" project. Today, it's one out of every three, and I expect the trend to continue. Unless!
The opportunity
Today, a large organization adopting dbt has two choices:
The essential goal of this initiative is to break that dichotomy. We should enable teams to develop projects independently, with alacrity and assurance—while still providing them with the ability to share common datasets, and unified lineage as a given.
It should feel like this
When there are multiple teams developing dbt models, each team should have the ability to:
If we do this right, a single developer on a single team can be working in a project of reasonable size (<500 models), building out a well-organized DAG producing their own final set of public & contracted models, without needing to know about the thousands of private models that exist elsewhere in the org. At the same moment, a colleague with requisite permissions could be viewing the full dbt DAG, in all its glory, seeing the dependencies across projects and between every model—because the full lineage is always there.
There are data teams who have made attempts in this direction: by tracking after-the-fact lineage outside of dbt; by pushing isolated metadata to central data catalogs; or by stitching together cross-project links (models or exposures in one project, recreated as sources in another), to run in separate orchestration tools. It's better than nothing, but I don't believe it's good enough. I believe dbt must solve this problem natively. When all the pieces are in place, it should feel like this:
And it should just work.
The plan
Over the course of the day, I'll be opening another discussion for each of the three themes in "Phase 1" below. Each discussion will include narrative, motivations, and requirements for the user experience, supported by proposed specs and code snippets. The intent of those snippets will be to illustrate, rather than guarantee, the exact final syntax which you can expect to read about and beta-test over the coming months.
Over the next several days & weeks, @MichelleArk will also be opening narrower issues to track our intended implementation. We welcome comments here, there, everywhere. Bring us your thoughts, questions, doubts, challenges, enthusiasm.
Phase 1: Models as APIs
Goal: v1.5 (April)
Develop new constructs that enable dbt developers to create, contract, and communicate data models like software APIs. This work should enable more scalable monorepos, while also laying the foundation for Phase 2.
Phase 2: Extend to many
Goal: v1.6 (July)
This is an ambitious timeline! If the dates need to change, we'll say when & why.
We will extend the constructs above to multiple projects. Cross-project
ref
is the tip of the iceberg. We must enable seamless experiences around development & deployment, enabled by dbt metadata. Developers in downstream projects do not need access to the full source code of upstream projects. Instead, they should get only & exactly the information they need, when they need it.Beta Was this translation helpful? Give feedback.
All reactions