Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate Source Yaml doesn't preserve capitalization on schema/table names, creates unusable yaml for BQ (probably others) #140

Closed
1 task done
ryantimjohn opened this issue Sep 19, 2023 · 7 comments
Labels
bug Something isn't working duplicate This issue or pull request already exists

Comments

@ryantimjohn
Copy link

ryantimjohn commented Sep 19, 2023

Describe the bug

Generate Source Yaml doesn't preserve capitalization on schema/table names, creates unusable yaml for BQ (probably others)

Steps to reproduce

Generate source yaml against a table with capitalized schema/table names, with generate a source yaml with lower-case schema and table names.

Expected results

Preserve capitalization, preferable as an true/false in the function call.

Actual results

Screenshot 2023-09-19 at 11 48 41 AM

Screenshot 2023-09-19 at 11 48 54 AM

System information

The contents of your packages.yml file:

packages:
  - package: dbt-labs/codegen
    version: 0.10.0

Which database are you using dbt with?

  • bigquery

The output of dbt --version:
1.3

The operating system you're using:
Linux (dbt Cloud)

The output of python --version:
dbt Cloud

Are you interested in contributing the fix?

Yes! Point me to the lines of code that do this and I'll happily contribute to a fix!

@ryantimjohn ryantimjohn added bug Something isn't working triage labels Sep 19, 2023
@az-hienas
Copy link

+1 to this for Snowflake with dbt 1.5.0.

If I have a column name in my source called firstName (note the capitalisation) generate_source produces YAML with the column firstname, which in Snowflake will be interpreted as FIRSTNAME since I did not give (nor is there an option to quote the column names for the source YAML).

As a result if I add a not_null test on the firstname column, e.g config

- name: firstname
  data_type: varchar
  description: ""
  tests: 
    - not_null

the test will fail with:
000904 (42000): SQL compilation error: ... invalid identifier 'FIRSTNAME'.

That's obviously because the Snowflake column is firstName and neither selecting FIRSTNAME nor firstname will work.

Perhaps by default the generated YAML should be with lower case column names, but have an option to quote things / preserve capitalisation , e.g. for my case would produce

- name: firstName
  data_type: varchar
  description: ""
  quote: true

Then I can manually add my not null test to the above config and everything will work.

I'm working with a Snowflake source table with around 150 columns in which the capitalisation is all over the place (all lower, all upper, camel case, title case and combinations of the aforementioned) and I can't control the case of the source column names.

In this case codegen doesn't really help a lot since I would have to go through 150 columns and make sure the case is ok and manually add quote: true

@gwenwindflower
Copy link
Contributor

Makes sense that this should be a top-level option you can apply from codegen -- I'm going to be attempting some work to catch this package up based on the open issues -- thanks for sharing this! I'll keep you posted.

@gwenwindflower gwenwindflower self-assigned this Feb 22, 2024
@jenna-jordan
Copy link

Adding my vote that this would be a very useful parameter for have - both for generate_source_yaml and generate_model_yaml. Thank you @gwenwindflower for digging into this!

@gwenwindflower
Copy link
Contributor

there ended up being an open PR to do this, so i'm working with author of that to get these changes in, ideally covering the entire database -> schema -> table -> column chain in sources.

@ronlut
Copy link

ronlut commented Jun 2, 2024

@gwenwindflower Joining the party here - any plan/progress with supporting quote param per column? All our columns in SF are quotes hence we need to add quote: true to all generated source columns
Thanks :)

Currently I am using sed to add that param when generating:
sed -E "s/([[:space:]]+)data_type:/\1quote: true\n\1data_type:/g"

@urkle
Copy link

urkle commented Jul 12, 2024

This also affects the generate_model_yaml when the model file names have mixed-case. It generates model yaml output with all lower-cased table and field names.

@dbeatty10
Copy link
Contributor

There are two different macros discussed here, and they each are covered by other issues:

  • generate_source -- covered by #112
  • generate_model_yaml -- covered by #175

So I'm going to close this as a duplicate of those two.

The former was resolved by #168 and it will be included in the next release of dbt-codegen (already available in a beta: 0.13.0-b1).

The latter is still under consideration.

@dbeatty10 dbeatty10 closed this as not planned Won't fix, can't repro, duplicate, stale Nov 20, 2024
@dbeatty10 dbeatty10 added duplicate This issue or pull request already exists and removed triage labels Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

7 participants