Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Ensure that apostrophes in all json objects are correctly handled #316

Open
3 tasks
glsdown opened this issue May 12, 2023 · 3 comments
Open
3 tasks
Labels
enhancement New feature or request help wanted Extra attention is needed quality of life A general improvement, that isn't a new feature or a bug

Comments

@glsdown
Copy link
Contributor

glsdown commented May 12, 2023

Overview

There are several issues that arise about apostrophes not being escaped correctly in various fields. These are being fixed as they come up, but it would be good to have a macro to fix them for all JSON objects across all databases, as the way they are handled is currently quite ad-hoc.

What problem would this solve?

  • Fewer issues raised
  • Greater consistency and extensibility

Would you be willing to contribute?

  • Yes - I'm on it!
  • Yes, but I'd need help getting started.
  • No
@glsdown glsdown added enhancement New feature or request help wanted Extra attention is needed quality of life A general improvement, that isn't a new feature or a bug labels May 12, 2023
@jeremyyeo
Copy link

jeremyyeo commented May 23, 2023

💯 this.

Just added another scenario:

-- models/foo.sql
select * from this.doesnt.exist

Note: purposely using a database/object that doesn't exist.

The following

$ dbt run -s foo
03:44:06  Running with dbt=1.3.4
03:44:08  Found 35 models, 0 tests, 0 snapshots, 0 analyses, 416 macros, 1 operation, 0 seed files, 0 sources, 0 exposures, 0 metrics
03:44:08  
03:44:13  Concurrency: 1 threads (target='default')
03:44:13  
03:44:13  1 of 1 START sql table model dbt_jyeo.foo ...................................... [RUN]
03:44:15  1 of 1 ERROR creating sql table model dbt_jyeo.foo ............................. [ERROR in 2.36s]
03:44:15  
03:44:15  Running 1 on-run-end hook
03:44:15  Uploading model executions
03:44:17  Database error while running on-run-end
03:44:18  Encountered an error:
Database Error
  001003 (42000): SQL compilation error:
  syntax error line 49 at position 12 unexpected 'THIS'.

Ends up generating invalid SQL (inspect debug logs):

�[0m03:44:15.968689 [debug] [MainThread]: On master: /* {"app": "dbt", "dbt_version": "1.3.4", "profile_name": "snowflake", "target_name": "default", "connection_name": "master"} */
insert into development.dbt_jyeo.model_executions
    
        
        
        select
            $1,
            $2,
            $3,
            $4,
            $5,
            $6,
            $7,
            $8,
            $9,
            $10,
            $11,
            $12,
            $13,
            $14,
            $15
        from values
        (
                'e114f958-de54-442c-868e-db67195d5a22', 
                'model.my_dbt_project.foo', 
                '2023-05-23 03:44:05.021379+00:00', 

                
                
                    
                
                'False', 

                'Thread-1 (worker)', 
                'error', 

                
                    null, 
                    null, 
                

                2.3553578853607178, 
                try_cast('' as int), 
                'table', 
                'dbt_jyeo', 
                'foo', 
                'foo', 
                'Database Error in model foo (models/foo.sql)
  002003 (02000): SQL compilation error:
  Database 'THIS' does not exist or not authorized.
  compiled Code at target/run/my_dbt_project/models/foo.sql' 
            )
�[0m03:44:15.969233 [debug] [MainThread]: Opening a new connection, currently in state init

^ Because of the quotes (just pulling out offending line):

'Database Error in model foo (models/foo.sql) 002003 (02000): SQL compilation error: Database 'THIS' does not exist or not authorized. compiled Code at target/run/my_dbt_project/models/foo.sql'
                                                                                              ^ This single quote terminates the string.

^ The above is actually in the model.message key so overriding snowflake__get_model_executions_dml_sql like so fixes it on v2.3.0 of this package:

-- macros/dbt_artifacts_overrides.sql
{% macro snowflake__get_model_executions_dml_sql(models) -%}
    {% if models != [] %}
        {% set model_execution_values %}
        select
            {{ adapter.dispatch('column_identifier', 'dbt_artifacts')(1) }},
            {{ adapter.dispatch('column_identifier', 'dbt_artifacts')(2) }},
            {{ adapter.dispatch('column_identifier', 'dbt_artifacts')(3) }},
            {{ adapter.dispatch('column_identifier', 'dbt_artifacts')(4) }},
            {{ adapter.dispatch('column_identifier', 'dbt_artifacts')(5) }},
            {{ adapter.dispatch('column_identifier', 'dbt_artifacts')(6) }},
            {{ adapter.dispatch('column_identifier', 'dbt_artifacts')(7) }},
            {{ adapter.dispatch('column_identifier', 'dbt_artifacts')(8) }},
            {{ adapter.dispatch('column_identifier', 'dbt_artifacts')(9) }},
            {{ adapter.dispatch('column_identifier', 'dbt_artifacts')(10) }},
            {{ adapter.dispatch('column_identifier', 'dbt_artifacts')(11) }},
            {{ adapter.dispatch('column_identifier', 'dbt_artifacts')(12) }},
            {{ adapter.dispatch('column_identifier', 'dbt_artifacts')(13) }},
            {{ adapter.dispatch('column_identifier', 'dbt_artifacts')(14) }},
            {{ adapter.dispatch('column_identifier', 'dbt_artifacts')(15) }}
        from values
        {% for model in models -%}
            (
                '{{ invocation_id }}', {# command_invocation_id #}
                '{{ model.node.unique_id }}', {# node_id #}
                '{{ run_started_at }}', {# run_started_at #}

                {% set config_full_refresh = model.node.config.full_refresh %}
                {% if config_full_refresh is none %}
                    {% set config_full_refresh = flags.FULL_REFRESH %}
                {% endif %}
                '{{ config_full_refresh }}', {# was_full_refresh #}

                '{{ model.thread_id }}', {# thread_id #}
                '{{ model.status }}', {# status #}

                {% if model.timing != [] %}
                    {% for stage in model.timing if stage.name == "compile" %}
                        {% if loop.length == 0 %}
                            null, {# compile_started_at #}
                        {% else %}
                            '{{ stage.started_at }}', {# compile_started_at #}
                        {% endif %}
                    {% endfor %}

                    {% for stage in model.timing if stage.name == "execute" %}
                        {% if loop.length == 0 %}
                            null, {# query_completed_at #}
                        {% else %}
                            '{{ stage.completed_at }}', {# query_completed_at #}
                        {% endif %}
                    {% endfor %}
                {% else %}
                    null, {# compile_started_at #}
                    null, {# query_completed_at #}
                {% endif %}

                {{ model.execution_time }}, {# total_node_runtime #}
                try_cast('{{ model.adapter_response.rows_affected }}' as int), {# rows_affected #}
                '{{ model.node.config.materialized }}', {# materialization #}
                '{{ model.node.schema }}', {# schema #}
                '{{ model.node.name }}', {# name #}
                '{{ model.node.alias }}', {# alias #}
                '{{ model.message | replace("\\", "\\\\") | replace("'", "\\'") | replace('"', '\\"') }}' {# message #}
            )
            {%- if not loop.last %},{%- endif %}
        {%- endfor %}
        {% endset %}
        {{ model_execution_values }}
    {% else %}
        {{ return("") }}
    {% endif %}
{% endmacro -%}

^ I just copied whatever the replacement logic from the current main being applied to the adapter response and applied it to model.message:

'{{ tojson(model.adapter_response) | replace("\\", "\\\\") | replace("'", "\\'") | replace('"', '\\"') }}' {# adapter_response #}

And the resulting now valid SQL:

insert into development.dbt_jyeo.model_executions
    
        
        
        select
            $1,
            $2,
            $3,
            $4,
            $5,
            $6,
            $7,
            $8,
            $9,
            $10,
            $11,
            $12,
            $13,
            $14,
            $15
        from values
        (
                '5420e178-c4bd-4e38-84cf-70d12f3a3113', 
                'model.my_dbt_project.foo', 
                '2023-05-23 04:04:29.775636+00:00', 

                
                
                    
                
                'False', 

                'Thread-1 (worker)', 
                'error', 

                
                    null, 
                    null, 
                

                2.2468910217285156, 
                try_cast('' as int), 
                'table', 
                'dbt_jyeo', 
                'foo', 
                'foo', 
                'Database Error in model foo (models/foo.sql)
  002003 (02000): SQL compilation error:
  Database \'THIS\' does not exist or not authorized.
  compiled Code at target/run/my_dbt_project/models/foo.sql' 
            )

@jared-rimmer
Copy link
Member

This should be resolved by the merge shown above. We will open further PRs if there are any other fields that come up.

@adamantike
Copy link
Contributor

I have found the same issue, when DBT variables include single quotes. This is the debug log for the query:

insert into DBT.ARTIFACTS.invocations
    select
        $1,
        $2,
        $3,
        $4,
        $5,
        $6,
        $7,
        $8,
        $9,
        $10,
        nullif($11, ''),
        nullif($12, ''),
        nullif($13, ''),
        nullif($14, ''),
        nullif($15, ''),
        parse_json($16),
        parse_json($17),
        parse_json($18),
        parse_json($19)
    from values
    (
        '4768f502-9811-4b39-a5fb-ca57b95007e5',
        '1.4.5',
        'proj',
        '2023-06-09 20:32:34.092008+00:00',
        'test',
        'False',
        'test',
        'prod',
        'myschema',
        4,
        '',
        '',
        '',
        '',
        '',
        null,
        null,
        '{"write_json": true, "use_colors": true, "printer_width": 80, "version_check": true, "partial_parse": true, "static_parser": true, "profiles_dir": "/home/mike/dbt/config", "send_anonymous_usage_stats": true, "quiet": false, "no_print": false, "cache_selected_only": false, "target": "prod", "vars": "{airflow_ts: '2023-06-08T05:00:00+00:00'}", "indirect_selection": "eager", "select": ["mytest"], "which": "test", "rpc_method": "test"}',
        '{}'
    )

This is using DBT 1.4.5, and version 2.4.2 of this package!

adamantike added a commit to adamantike/dbt_artifacts that referenced this issue Jun 9, 2023
Related to brooklyn-data#316, this change fixes another occurrence, this time for
`invocations.invocation_args`. As DBT variables can contain single
quotes, they must be escaped.
glsdown pushed a commit that referenced this issue Jun 15, 2023
Related to #316, this change fixes another occurrence, this time for
`invocations.invocation_args`. As DBT variables can contain single
quotes, they must be escaped.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed quality of life A general improvement, that isn't a new feature or a bug
Projects
None yet
Development

No branches or pull requests

4 participants