[CT-2343] [Bug] snapshot with custom datatypes breaks when source table is regenerated #13

Bonnevie · 2023-03-31T12:44:55Z

Is this a new bug in dbt-core?

I believe this is a new bug in dbt-core
I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

I've set-up a snapshot table in a postgres database, and it has a column named annotation_type that uses a custom enum datatype. This has worked without problems for weeks, but after the table had to be truncated and restored due to a faulty replica setup we encountered the following error when trying t:

12:22:53  Database Error in snapshot annotations_image_snapshot (snapshots/annotations_image_snapshot.sql)
12:22:53    syntax error at or near "USER"
12:22:53    LINE 3: ...ions_image_snapshot" add column "annotation_type" USER-DEFIN...

I cannot find the sql file that the error is in reference to.
the snapshot table was also suddenly lacking the annotation_type column, despite snapshots supposedly never deleting columns?
We are at a development stage where we can still afford to do a full restart of the snapshot, but we'd rather not encounter this in the future.
Our hypothesis is that the custom type is somehow not matched appropriately following regeneration?

Expected Behavior

I would expect snapshots to be robust to no-ops like this, and at least to never incur data loss.
If custom types are unsupported/problematic, it'd be nice to be warned.

Steps To Reproduce

Not easily reproduced, but most likely course:

build source table with custom enum type in postgres
take snapshot
truncate table, then rebuild
take snapshot

Relevant log output

No response

Environment

No response

Which database adapter are you using with dbt?

postgres

Additional Context

No response

The text was updated successfully, but these errors were encountered:

dbeatty10 · 2023-04-05T04:03:46Z

Thanks for opening this @Bonnevie !

I made an attempt at reproducing this, but wasn't able to trigger the same error as you. Could you tweak the simple example below to reproduce what you were seeing?

I am using a local postgres database as described here so you'll need to tweak your database, schema, etc as-needed.

dbt project

models/_sources.yml

version: 2

sources:
  - name: dbt_dbeatty
    database: postgres
    schema: dbt_dbeatty 
    tables:
      - name: person

snapshots/my_snapshot.sql

{% snapshot my_snapshot %}

{{
    config(
      target_database='postgres',
      target_schema='snapshots',
      strategy='check',
      check_cols='all',
      unique_key='name'
    )
}}

select * from {{ source("dbt_dbeatty", "person") }}

{% endsnapshot %}

Steps

Build source table with custom enum type in postgres (using example here)

Using postgres IDE of choice:

CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');

CREATE TABLE dbt_dbeatty.person (
    name text,
    current_mood mood
);
INSERT INTO dbt_dbeatty.person VALUES ('Moe', 'happy');

Take a snapshot:

dbt build

Truncate table, then rebuild

Using postgres IDE of choice again:

truncate TABLE dbt_dbeatty.person;
INSERT INTO dbt_dbeatty.person VALUES ('Curly', 'ok');

Take another snapshot:

dbt build

benorourke · 2023-04-25T14:32:46Z

I'm also experiencing this issue. The macro postgres__get_columns_in_relation, which is called when the snapshot checks for any column changes, selects data_type from the information_schema. For any UDT this will return USER-DEFINED. We can instead conditionally select udt_name to get the name of the enum.

As a temporary fix, I have overridden postgres__get_columns_in_relation in override_default_macros.sql to resolve udt names along with their schema:

{% macro postgres__get_columns_in_relation(relation) -%}
  {% call statement('get_columns_in_relation', fetch_result=True) %}
      select
          column_name,
          case
            when (data_type = 'USER-DEFINED') then concat(udt_schema || '.' || udt_name)
            else data_type
          end as data_type,
          character_maximum_length,
          numeric_precision,
          numeric_scale

      from {{ relation.information_schema('columns') }}
      where table_name = '{{ relation.identifier }}'
        {% if relation.schema %}
        and table_schema = '{{ relation.schema }}'
        {% endif %}
      order by ordinal_position

  {% endcall %}
  {% set table = load_result('get_columns_in_relation').table %}
  {{ return(sql_convert_columns_in_relation(table)) }}
{% endmacro %}

dbeatty10 · 2023-04-25T17:54:31Z

After attempting again with the following steps, I was able to reproduce the reported issue.

Also confirmed that the solution from @benorourke worked 🏅

Steps

Create a simple table that uses only built-in data types

Using postgres IDE of choice:

CREATE TABLE dbt_dbeatty.person (
    name text
);

INSERT INTO dbt_dbeatty.person VALUES ('Moe');

Take a snapshot:

dbt snapshot

Add a column that uses a custom data type

Build source table with custom enum type in postgres (using example here).

Using postgres IDE of choice again:

CREATE TYPE mood AS ENUM ('sad', 'ok', 'happy');

-- Yeah, a simple ALTER TABLE ... would be better here! But this is sufficient to reproduce
DROP TABLE dbt_dbeatty.person;
CREATE TABLE dbt_dbeatty.person (
    name text,
    current_mood mood
);

INSERT INTO dbt_dbeatty.person VALUES ('Curly', 'ok');

Take another snapshot:

dbt snapshot

💥

Override the postgres__get_columns_in_relation macro

Copy-paste the solution described here (untested) or here into macros/overrides.sql

Retry taking another snapshot:

dbt snapshot

✅

aBBDnGus · 2023-09-06T09:51:17Z

I have a similar bug, which relates to this:
If I have an incremental table with option on_schema_change='sync_all_columns'. If I know add a column with user-defined type to the incremental model, the same error occurs.
It can be fixed with the provided solution.

Environment:
PostgreSQL, dbt 1.6.1

dbeatty10 · 2023-09-28T03:13:59Z

Here is an alternative override for postgres__get_columns_in_relation that might be better (but I didn't test it out with this issue):

{% macro postgres__get_columns_in_relation(relation) -%}
  {% call statement('get_columns_in_relation', fetch_result=True) %}

    select
      col.attname as column_name,

      -- Two options for formatting the data type:
      -- 1) do not include the type modifier (e.g., `geometry`)
      -- 2) include the type modifier when appliable (e.g., `geometry(Point,4326)`)

      -- Option 1:
      pg_catalog.format_type(col.atttypid, null) as data_type,

      -- Option 2:
      -- pg_catalog.format_type(col.atttypid, col.atttypmod) as data_type,

      null as character_maximum_length,
      null as numeric_precision,
      null as numeric_scale

    from pg_catalog.pg_namespace sch
    join pg_catalog.pg_class tbl on tbl.relnamespace = sch.oid
    join pg_catalog.pg_attribute col on col.attrelid = tbl.oid
    left outer join pg_catalog.pg_description tbl_desc on (tbl_desc.objoid = tbl.oid and tbl_desc.objsubid = 0)
    left outer join pg_catalog.pg_description col_desc on (col_desc.objoid = tbl.oid and col_desc.objsubid = col.attnum)

    where 1=1
      and not pg_is_other_temp_schema(sch.oid) -- not a temporary schema belonging to another session
      and tbl.relpersistence in ('p', 'u') -- [p]ermanent table or [u]nlogged table. Exclude [t]emporary tables
      and tbl.relkind in ('r', 'v', 'f', 'p') -- o[r]dinary table, [v]iew, [f]oreign table, [p]artitioned table. Other values are [i]ndex, [S]equence, [c]omposite type, [t]OAST table, [m]aterialized view
      and col.attnum > 0 -- negative numbers are used for system columns such as oid
      and not col.attisdropped -- column has not been dropped

      and tbl.relname = '{{ relation.identifier }}'
        {% if relation.schema %}
        and sch.nspname = '{{ relation.schema }}'
        {% endif %}

    order by
        sch.nspname,
        tbl.relname,
        col.attnum

  {% endcall %}
  {% set table = load_result('get_columns_in_relation').table %}
  {{ return(sql_convert_columns_in_relation(table)) }}
{% endmacro %}

github-actions · 2024-08-12T01:54:37Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

magno32 · 2024-08-19T14:38:37Z

This issue also exists in unit tests. The workaround provided by @dbeatty10 gets us past the USER-DEFINED cast in the generated SQL for the unit-test, but any default geometry values given in the input/expect rows are wrapped as strings. I'm guessing there is another macro somewhere that would fix that.

dbeatty10 · 2024-08-19T17:06:31Z

@magno32 could you please open an issue here for using the geometry data type within unit tests?

Bonnevie added bug Something isn't working triage labels Mar 31, 2023

github-actions bot changed the title ~~[Bug] snapshot with custom datatypes breaks when source table is regenerated~~ [CT-2343] [Bug] snapshot with custom datatypes breaks when source table is regenerated Mar 31, 2023

dbeatty10 added awaiting_response and removed triage labels Apr 5, 2023

github-actions bot added triage and removed awaiting_response labels Apr 25, 2023

benorourke mentioned this issue Apr 25, 2023

[CT-2343] [Bug] snapshot with custom datatypes breaks when source table is regenerated dbt-labs/dbt-core#7459

Closed

6 tasks

dbeatty10 removed the triage label Apr 25, 2023

mikealfare transferred this issue from dbt-labs/dbt-core Feb 13, 2024

dbeatty10 added the snapshots Issues related to dbt's snapshot functionality label Feb 13, 2024

github-actions bot added the Stale label Aug 12, 2024

dbeatty10 removed the Stale label Aug 19, 2024

magno32 mentioned this issue Aug 20, 2024

[FEATURE] Support geometry data type for Unit Tests #142

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-2343] [Bug] snapshot with custom datatypes breaks when source table is regenerated #13

[CT-2343] [Bug] snapshot with custom datatypes breaks when source table is regenerated #13

Bonnevie commented Mar 31, 2023

dbeatty10 commented Apr 5, 2023

benorourke commented Apr 25, 2023 •

edited

Loading

dbeatty10 commented Apr 25, 2023 •

edited

Loading

aBBDnGus commented Sep 6, 2023

dbeatty10 commented Sep 28, 2023

github-actions bot commented Aug 12, 2024

magno32 commented Aug 19, 2024

dbeatty10 commented Aug 19, 2024

[CT-2343] [Bug] snapshot with custom datatypes breaks when source table is regenerated #13

[CT-2343] [Bug] snapshot with custom datatypes breaks when source table is regenerated #13

Comments

Bonnevie commented Mar 31, 2023

Is this a new bug in dbt-core?

Current Behavior

Expected Behavior

Steps To Reproduce

Relevant log output

Environment

Which database adapter are you using with dbt?

Additional Context

dbeatty10 commented Apr 5, 2023

dbt project

Steps

benorourke commented Apr 25, 2023 • edited Loading

dbeatty10 commented Apr 25, 2023 • edited Loading

Steps

aBBDnGus commented Sep 6, 2023

dbeatty10 commented Sep 28, 2023

github-actions bot commented Aug 12, 2024

magno32 commented Aug 19, 2024

dbeatty10 commented Aug 19, 2024

benorourke commented Apr 25, 2023 •

edited

Loading

dbeatty10 commented Apr 25, 2023 •

edited

Loading