Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Column-level encoding] Add support for column-level encoding to Redshift adapter #896

Open
3 tasks done
gordonr opened this issue Aug 12, 2024 · 1 comment
Open
3 tasks done
Labels

Comments

@gordonr
Copy link

gordonr commented Aug 12, 2024

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt-redshift functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Currently the Redshift adapter does not support column-level encoding, which is used to configure compression. This appears to be supported in the Redshift CREATE TABLE statement, as documented here. An approach might be to translate a Redshift-specific dbt config setting to the appropriate CREATE TABLE statement modifications to achieve the desired outomce.

Describe alternatives you've considered

If a dbt model is an incremental view, it would be possible to manually modify column-level encoding after the initial build of the Redshift table, but those changes would be lost if a full refresh were done.

Who will this benefit?

This would theoretically benefit all Redshift customers.

Are you interested in contributing this feature?

No response

Anything else?

No response

@TuringND
Copy link

For what is worth, one of the values of having column-level encoding relates to incremental models. Currently compression post-hook makes a deep-copy of the model.

This strategy, though sound for table materializations, antagonizes with incremental materializations as it derives in a full copy of the data. When the model data is big enough, using incremental provides a substantial increase in performance, performance that is lost due to the time spent in the deep-copy.

I was going to post about this in a separate git issue / create a topic in discourse but this feels more appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants