Inline model documentation + tests #6853
Replies: 3 comments 18 replies
-
@jakebiesinger This was very cool to read. Thank you for kicking it off as a discussion, for having done your homework, and for sticking around through the years to see this idea through :) There are two challenges here:
It sounds like you'd be open to diving in and working on this, if we can set you up with a clear set of foundations to work in. Our parsing logic is still quite thorny (though better than it used to be); I couldn't promise a walk in the park. I'm getting ahead of myself, though. First thing we'd need is an agreeable answer to (1). Finding a syntaxLet's check each of the options:
I'm skeptical of this for two reasons:
This specific idea (#979) got quite a bit of traction. As you say, I'm not sure how we'd do the "auto-attaching," except via some implicit magic with the naming convention (docs block named
This would require us to do SQL parsing, which isn't something dbt has done to date. While the answer is definitely not "never," I'm quite wary of taking an accidental half-step into this particular quagmire. It would be much more tenable for Python models, where we've already toyed with the idea of supporting model-level descriptions via docstring: def model(dbt, session):
"""
This is my awesome model!
"""
df = ... What about YFM?Here's an idea, which you have every right to hate: What if we did this as yaml front matter (YFM)? After scanning through the older linked issues, I didn't see this come up; I may have missed it, or I may be proposing something that everyone else has tacitly agreed would be terrible. For whatever reason, I don't hate the idea, at least not at first glance. Idea being, these remain two separate & independent sets of information, which just happen to live in the same file. The yaml in the front matter can include any of the key-value pairs that could also be passed to the The front matter would either take priority over, or be mutually exclusive with, defining yaml properties for this model in a separate yaml file. If the model also defines ---
description: This is my model's description. It can reference a {{ doc('block') }} if it wants to.
config:
materialized: incremental
unique_key: id
columns:
- name: first_col
tests:
- unique
- not_null
---
select ... ---
columns:
- name: first_col
tests:
- unique
- not_null
---
def model(dbt, session):
...
This approach definitely doesn't help with the sheer model size, but yaml does offer a few line-saving options if you're willing to get a bit JSON-y: ---
columns:
- {name: first_col, tests: [not_null, unique]}
- {name: second_col, tests: [not_null], description: ...}
- ...
---
select ... To be clear, what I'm not proposing here is the ability to use Jinja or yaml to dynamically create/populate the front matter contents. I do think that remains, rightly, the domain of code-generation plugins/packages. (Our work on refactoring |
Beta Was this translation helpful? Give feedback.
-
@jakebiesinger I took a look at your pull request, and I admire your enthusiasm for diving into some hairy code. However, the approach of creating two different file objects was not the way to go. Once I started thinking about it, it was hard to let it sit, so I borrowed some of your code from yaml_helpers.py and created a draft pull request which calls the model schema parser from the model parser: #7100. There's still a fair amount of work to do to make sure that edge cases are handled and dealing with config in both the SQL files and a schema file, etc, but I think as a proof of concept it looks pretty reasonable. Thanks for providing the inspiration :) -- we'll be sure to put your name on as co-contributor. |
Beta Was this translation helpful? Give feedback.
-
For proper code highlighting your IDE propably requires a Plugin. |
Beta Was this translation helpful? Give feedback.
-
The high-level requirement in my mind is "let me specify all the informational bits about a model inside the model itself". For me, this would be:
docs
blocks)This topic seems rather perennial-- it keeps popping up again and again! Here's a quick citation list:
{% docs %}
blocks to be specified in.sql
files #1042 and then to auto-attach them to models + columns based on the doc block name #1043It seems clear that the community wants this feature in some form. Why not bring it into
dbt-core
and give the people what they want? 😸Over the years, the DBT team has seemed to like this idea, but has raised concerns:
So far, we've seen proposals for syntax including:
config
jinja call to allow docs + descriptions + tests to be specified inline (so a single massive call toconfig
would put all the things in place)docs
blocks in sql files, with possible support to auto-attach them to models (since they appear in the model sql file anyway)/** */
blocks (which can help with IDE confusion around jinja blocks)I am happy to (again) put in some elbow grease to get this done. Is there appetite for receiving it on the DBT team side?
Beta Was this translation helpful? Give feedback.
All reactions