Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BigQuery] at_least_one test of nested struct field gives error since 1.1.0 #823

Closed
1 of 5 tasks
nicklubbers opened this issue Aug 11, 2023 · 4 comments
Closed
1 of 5 tasks
Labels
bug Something isn't working Stale triage

Comments

@nicklubbers
Copy link

nicklubbers commented Aug 11, 2023

Describe the bug

When I want to test at_least_one on a field_a within a struct field_struct, the test fails with BigQuery error:

unrecognized name `field_struct`

This occurs since version 1.1.0 and likely because of this commit.

When pasting the compiled test query into the BigQuery editor it shows:

select
      count(*) as failures,
      count(*) != 0 as should_warn,
      count(*) != 0 as should_error
    from (

select *
from (
    with pruned_rows as (
      select
        field_struct.field_a
      from `my_project`.`my_dataset`.`my_table`
      where field_struct.field_a is not null
      limit 1
    )
    select

      count(field_struct.field_a) as filler_column <-- BigQuery error: unrecognized name `field_struct`

    from pruned_rows

    having count(field_struct.field_a) = 0

) validation_errors

    ) dbt_internal_test

A possible solution would be to use an alias for the nested field:

select
        field_struct.field_a AS field_struct__field_a
 from `my_project`.`my_dataset`.`my_table`

and use that alias field_struct__field_a in the subsequent steps.

EDIT: you could also just use a default alias at_least_one_field:

select
        field_struct.field_a AS at_least_one_field
 from `my_project`.`my_dataset`.`my_table`

Steps to reproduce

Create a DBT model (table) containing a struct field:

STRUCT(
  "foo" AS field_a
) AS field_struct

Add dbt_utils.at_least_one test to the nested field_a.

schema yml:

- name: field_struct.field_a
   tests:
     - dbt_utils.at_least_one

Expected results

I would expect the test to succeed as it did with dbt_utils < 1.1.0

Actual results

Screenshots and log output

System information

The contents of your packages.yml file:

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

Core:
  - installed: 1.5.4
  - latest:    1.6.0 - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - bigquery: 1.5.5 - Update available!

  At least one plugin is out of date or incompatible with dbt-core.
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Additional context

no

Are you interested in contributing the fix?

currently not

@nicklubbers nicklubbers added bug Something isn't working triage labels Aug 11, 2023
@vinit2107
Copy link

I want to contribute to this, but before I proceed I have a question.
If I understand the bug correctly, the reason the test fails is that once you parse the RECORD datatype in BigQuery, the resultant column is the actual field in the RECORD column(field_struct.field1 -> field1).
The question I have regarding adding aliases is this would need to be done for other macros as well which take column name as a parameter and use CTEs. Along with column names, I would also need to rename the group by columns as someone can pass "field_struct.field1" as a group by column.
Please let me know if I'm on the right track.

@nicklubbers
Copy link
Author

nicklubbers commented Sep 20, 2023

Thanks @vinit2107 !

column(field_struct.field1 -> field1) -- that is correct

Along with column names, I would also need to rename the group by columns as someone can pass "field_struct.field1" as a group by column. -- correct. For any test the parameter column name could contain one or more dots like field_struct.field1.

WITH DATA AS (
  
  SELECT
    STRUCT(
      STRUCT(
        "John Doe" AS name
      ) AS more_nested
    ) AS my_struct 

)

SELECT my_struct.more_nested.name -- `my_struct.more_nested.name` as a whole would be the input column name
FROM data

results in (in json format):

[{
  "name": "John Doe"
}]

Thus the my_struct.more_nested. part is not included in the result.

For this particular test a solution would be:

    with pruned_rows as (
      select
        field_struct.field_a AS at_least_one_field
      from `my_project`.`my_dataset`.`my_table`
      where field_struct.field_a is not null
      limit 1
    )
    select

      count(at_least_one_field) as filler_column <-- BigQuery error: unrecognized name `field_struct`

    from pruned_rows

    having count(at_least_one_field) = 0
    

After the pruned_rows CTE you will have a field called at_least_one_field you can use in the next query.

Copy link

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Mar 19, 2024
Copy link

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Stale triage
Projects
None yet
Development

No branches or pull requests

2 participants