Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support Data Profiling in dbt #1330

Open
3 tasks done
syou6162 opened this issue Sep 3, 2024 · 4 comments · May be fixed by #1392
Open
3 tasks done

[Feature] Support Data Profiling in dbt #1330

syou6162 opened this issue Sep 3, 2024 · 4 comments · May be fixed by #1392
Labels
enhancement New feature or request

Comments

@syou6162
Copy link
Contributor

syou6162 commented Sep 3, 2024

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt-bigquery functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Dataplex data profiling lets you identify common statistical characteristics of the columns in your BigQuery tables. This information helps you to understand and analyze your data more effectively.

You can set data profiling from the GUI or API, but if you specify materialized='table', the data profiling settings will be deleted because the table will be recreated. If data profiling could be set within dbt after the table is created, it would make it easier for dbt users to use the data profiling function.

Describe alternatives you've considered

No response

Who will this benefit?

People who use BigQuery tables built with dbt. I think this will be a useful feature for data users, especially analysts and business developers, as they can see the statistics for each column without having to write a query.

Are you interested in contributing this feature?

Yes, very much! I'm interested in contributing to dbt, so I plan to send a pull request soon. I think I can do it if I refer to the implementation that supports BigQuery's policy tag.

Anything else?

No response

@syou6162 syou6162 added enhancement New feature or request triage labels Sep 3, 2024
@amychen1776 amychen1776 removed the triage label Sep 5, 2024
@amychen1776
Copy link

Thank you for opening up this request! At this time, we will be unable to support this functionality but I'm happy to leave this issue open to collect more feedback (and see the community desire for this).

@syou6162 syou6162 linked a pull request Nov 3, 2024 that will close this issue
4 tasks
@syou6162
Copy link
Contributor Author

syou6162 commented Nov 4, 2024

@amychen1776 I implemented this feature myself at #1392. Could you ask the development team to review my pull request?

@moinuddinmbd
Copy link

@amychen1776 I implemented this feature myself at #1392. Could you ask the development team to review my pull request?

Integrating a data profiling scan is an excellent idea; however, initiating it through a Dataplex scan may not be the most effective approach.

@amychen1776
Copy link

amychen1776 commented Nov 25, 2024

Thank you @syou6162 for your PR - we will take a look at it in rotation if this is the right decision to merge this feature in.

Please do not tag adapter maintainers directly :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants