-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] The insert_by_period materialization should graduate to part of the main project #4174
Comments
There's also an interesting It's been pretty handy. |
Thanks for opening this @joellabes, and getting the conversation started. All in all, I find myself agreeing with your assessment here:
I'm wary of our ability to modify the existing incremental materialization, without adding a ton of new complexity. The ability to I sense more significant overlap with the proposal for sharded tables in #1637. When a table gets so big that it's impossible to create it with a single query (however long-running or expensive), perhaps the right answer is to make it multiple tables. Then, rather than inserting period chunks into one table, we create many tables, and a view to union them together. Sharding tables is a hard problem—there's a strict 1:1 mapping between database objects and project resources today, a rule we'd need to break—but it feels like the right hard problem to tackle. Support for sharded tables will also, I believe, offer us a way toward native support for "lambda views". After all, the only difference is that the "latest" shard needs to be materialized as a view instead of a table. BigQuery has long had a native approach to this, with some wonky/legacy syntax, as ingestion-time-partitioned tables. We've removed support for those from dbt-bigquery ahead of v1 (it was effectively deprecated code). Whether we re-add support for the native approach there—or opt for one in which dbt more actively coordinates between shards and the union view, as it will have to on other adapters—that should be an implementation detail to figure out when we have a chance to tackle this project in earnest. So: Support for table sharding is on the list of big improvements we should consider making after v1. I think it may obviate the need to add |
OK! Until the glory days of sharded tables are upon us, I'll start thinking a bit harder about how a modernised version of the current insert_by_period materialization should behave in utils. |
Ooh, this is a very exciting initiative! Sharded tables sound like the way to go, even if it breaks the 1:1 mapping between a model and the corresponding physical relations. I just wanted to add some comments since we've internally modified Models should run successfully (they don't hang) and within the execution window we have. On the other hand it's ok for us that the physical relation eventually catches up with the latest data, which sounds very similar to what you did in your previous company @joellabes. Some of the changes we did to
between dateadd(day, -27, '__PROCESSING_DATE__') and '__PROCESSING_DATE__'
|
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest; add a comment to notify the maintainers. |
Closing this in favor of our epic to add new "microbatch" incremental strategy: |
Is there an existing feature request for this?
Describe the Feature
Since 0.10.1, dbt-utils has contained a Redshift-specific macro called insert_by_period to do incremental loads.
Recently, there's been some interest from the community in modernising this:
__PERIOD_FILTER__
block.Although we could leave it in utils, I think it's time for it to make the jump to being a first-class materialization.
Describe alternatives you've considered
I'd be happiest if this could just be extra parameters on the existing incremental materialisation. IDK if that's actually possible; if not, then it would need to be the fifth inbuilt option and I'd probably swing closer to leaving it in utils because it's not as cut-and-dried and so adding the extra friction maybe isn't so bad.
Who will this benefit?
Are you interested in contributing this feature?
Sure am, despite how scary materialization code is
Anything else?
No response
The text was updated successfully, but these errors were encountered: