Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] During persist_docs, col names in the schema yml file should always be uppercased if we don't have a quote config on the column #1156

Open
2 tasks done
jeremyyeo opened this issue Aug 2, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@jeremyyeo
Copy link
Contributor

jeremyyeo commented Aug 2, 2024

Is this a new bug in dbt-snowflake?

  • I believe this is a new bug in dbt-snowflake
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

If users have columns.name key in a mixed case - dbt does not correctly match the uppercased column name that comes from a describe table ... with what is in the schema.yml file - therefor, column descriptions aren't added to the column.

Expected Behavior

During the "matching process" - we should always uppercase the columns.name key no matter the case if the user doesn't specify the quote config on the column (https://docs.getdbt.com/reference/resource-properties/quote).

Steps To Reproduce

Project setup:

# dbt_project.yml
name: my_dbt_project
profile: all
version: "1.0.0"

models:
   my_dbt_project:
      +materialized: table

# models/schema.yml
models:
  - name: foo
    columns:
      - name: mixed_Case
        description: Col description
-- models/foo.sql
{{ config(persist_docs={"columns": true}) }}
select 1 as mixed_Case

Run

$ dbt --debug run
22:05:24  1 of 1 START sql table model dbt_jyeo.foo ...................................... [RUN]
22:05:24  Re-using an available connection from the pool (formerly list_development_jyeo_dbt_jyeo, now model.my_dbt_project.foo)
22:05:24  Began compiling node model.my_dbt_project.foo
22:05:24  Writing injected SQL for node "model.my_dbt_project.foo"
22:05:24  Began executing node model.my_dbt_project.foo
22:05:24  Writing runtime sql for node "model.my_dbt_project.foo"
22:05:24  Using snowflake connection "model.my_dbt_project.foo"
22:05:24  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
create or replace transient table development_jyeo.dbt_jyeo.foo
         as
        (
select 1 as mixed_Case
        );
22:05:24  Opening a new connection, currently in state closed
22:05:26  SQL status: SUCCESS 1 in 2.0 seconds
22:05:26  Using snowflake connection "model.my_dbt_project.foo"
22:05:26  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
describe table development_jyeo.dbt_jyeo.foo
22:05:26  SQL status: SUCCESS 1 in 0.0 seconds
22:05:26  Using snowflake connection "model.my_dbt_project.foo"
22:05:26  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
alter table development_jyeo.dbt_jyeo.foo alter
    
        "MIXED_CASE" COMMENT $$$$;
22:05:26  SQL status: SUCCESS 1 in 0.0 seconds
22:05:26  On model.my_dbt_project.foo: Close
22:05:27  Sending event: {'category': 'dbt', 'action': 'run_model', 'label': '102c02e3-7b4f-48a1-91be-ac0bb00f1129', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x13f491f50>]}
22:05:27  1 of 1 OK created sql table model dbt_jyeo.foo ................................. [SUCCESS 1 in 2.83s]

^ What happens here....

  1. We created the table column as select 1 as mixed_Case - and what happens is Snowflake has this column as a uppercased MIXED_CASE (this is the default behaviour of Snowflake).
  2. We run a describe table and the describe has returned that a column MIXED_CASE exist.
  3. This uppercase MIXED_CASE that comes from the describe, does not match what is in the schema.yml which has that as mixed_Case.
  4. "MIXED_CASE" != "mixed_Case" so we don't find a matching column description.
  5. Because there is no matching column description then the comment added is empty.

What can users do about this

(A) They can quote the column name in the model

-- models/foo.sql
{{ config(persist_docs={"columns": true}) }}
select 1 as "mixed_Case"
# models/schema.yml
models:
  - name: foo
    columns:
      - name: mixed_Case
        description: Col description
$ dbt --debug run

22:10:57  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
create or replace transient table development_jyeo.dbt_jyeo.foo
         as
        (
select 1 as "mixed_Case"
        );
22:10:57  Opening a new connection, currently in state closed
22:10:59  SQL status: SUCCESS 1 in 2.0 seconds
22:10:59  Using snowflake connection "model.my_dbt_project.foo"
22:10:59  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
describe table development_jyeo.dbt_jyeo.foo
22:10:59  SQL status: SUCCESS 1 in 0.0 seconds
22:10:59  Using snowflake connection "model.my_dbt_project.foo"
22:10:59  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
alter table development_jyeo.dbt_jyeo.foo alter
    
        "mixed_Case" COMMENT $$Col description$$;
22:10:59  SQL status: SUCCESS 1 in 0.0 seconds

Here Snowflake has created a literal mixed case column name mixed_Case - the describe returns this exactly and we get a match in the schema.yml file - "mixed_Case" = "mixed_Case" - therefor, we can apply the column description appropriately.

Cons

Their table will have a literal mixed case column name which they may not want.

(B) They can uppercase / lowercase the column name in the schema yaml

-- models/foo.sql
{{ config(persist_docs={"columns": true}) }}
select 1 as mixed_Case
# models/schema.yml
models:
  - name: foo
    columns:
      - name: mixed_case # This can also be 'MIXED_CASE' (all uppercase) - the net effect is the same.
        description: Col description
$ dbt --debug run
22:15:28  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
create or replace transient table development_jyeo.dbt_jyeo.foo
         as
        (
select 1 as mixed_Case
        );
22:15:28  Opening a new connection, currently in state closed
22:15:30  SQL status: SUCCESS 1 in 2.0 seconds
22:15:30  Using snowflake connection "model.my_dbt_project.foo"
22:15:30  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
describe table development_jyeo.dbt_jyeo.foo
22:15:30  SQL status: SUCCESS 1 in 0.0 seconds
22:15:30  Using snowflake connection "model.my_dbt_project.foo"
22:15:30  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
alter table development_jyeo.dbt_jyeo.foo alter
    
        "MIXED_CASE" COMMENT $$Col description$$;
22:15:31  SQL status: SUCCESS 1 in 0.0 seconds

For some reason, if the columns.name key is either:

  • All UPPERCASED.
  • All lowercased.

dbt has some internal methods to match this correctly.

Cons

User have to have a different column casing style between their model code and their schema yml files.

Relevant log output

No response

Environment

- OS: macOS
- Python: 3.11
- dbt-core: 1.8.4
- dbt-snowflake: 1.8.3

Additional Context

If users don't specify the quote column config (https://docs.getdbt.com/reference/resource-properties/quote) - I propose we always uppercase the columns.name key like (B) above - even if the user is going to specify a mixed case column name in the schema yml file.

@jeremyyeo jeremyyeo added bug Something isn't working triage labels Aug 2, 2024
@jeremyyeo
Copy link
Contributor Author

jeremyyeo commented Aug 2, 2024

{% if (column_name|upper in column_dict) -%}
{% set matched_column = column_name|upper -%}
{% elif (column_name|lower in column_dict) -%}
{% set matched_column = column_name|lower -%}
{% elif (column_name in column_dict) -%}
{% set matched_column = column_name -%}

^ Ah ok so here is why Case (B) works... we both upper and lower the column names and match it in the column dict... therefore always lowercased or UPPERCASED would match something for sure....


In the exact scenario above (mixed_Case in both model and yaml and not-quoted)... we can override the built in macro:

-- macros/get_column_comment_sql.sql
{% macro get_column_comment_sql(column_name, column_dict) -%}
  {#/* Create new uppercased key column dict. */#}
  {% set fixed_case_column_dict = {} %}
  {% for k, v in column_dict.items() %}
    {% do fixed_case_column_dict.update({k.upper(): v}) %}
  {% endfor %}

  {% if (column_name in fixed_case_column_dict) %}
    {% set matched_column = column_name -%}
  {% else %}
    {% set matched_column = None -%}
  {% endif %}

  {% if matched_column -%}
    {{ adapter.quote(column_name) }} COMMENT $${{ fixed_case_column_dict[matched_column]['description'] | replace('$', '[$]') }}$$
  {%- else -%}
    {{ adapter.quote(column_name) }} COMMENT $$$$
  {%- endif -%}
{% endmacro %}

Then:

$ dbt --debug run
01:31:36  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
create or replace transient table development_jyeo.dbt_jyeo.foo
         as
        (
select 1 as mixed_Case
        );
01:31:36  Opening a new connection, currently in state closed
01:31:37  SQL status: SUCCESS 1 in 2.0 seconds
01:31:38  Using snowflake connection "model.my_dbt_project.foo"
01:31:38  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
describe table development_jyeo.dbt_jyeo.foo
01:31:38  SQL status: SUCCESS 1 in 0.0 seconds
01:31:38  Using snowflake connection "model.my_dbt_project.foo"
01:31:38  On model.my_dbt_project.foo: /* {"app": "dbt", "dbt_version": "1.8.4", "profile_name": "all", "target_name": "sf", "node_id": "model.my_dbt_project.foo"} */
alter table development_jyeo.dbt_jyeo.foo alter

  "MIXED_CASE" COMMENT $$Col description$$;
01:31:38  SQL status: SUCCESS 1 in 0.0 seconds
01:31:38  On model.my_dbt_project.foo: Close
01:31:38  Sending event: {'category': 'dbt', 'action': 'run_model', 'label': 'e30c8d1b-5b26-4c3e-bc87-82e19958136a', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x13938f0d0>]}
01:31:38  1 of 1 OK created sql table model dbt_jyeo.foo ................................. [SUCCESS 1 in 2.78s]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants