Skip to content

Some ETLs and views may be silently unioning data incorrectly #7461

@data-sync-user

Description

@data-sync-user

While investigating an error relating to unioning ping data [~accountid:6047cd5cd7f56e0071965b2d] noticed that the type and enrollment columns in the ping_info.experiments[].value.extra struct are in different orders in various pings, and when unioning such pings together as-is BigQuery won’t complain because their column types are compatible, which could result in data silently ending up in the wrong column in the union output for some pings.

This has been manually worked around in a couple of cases recently (bigquery-etl#6878, bigquery-etl#6887), but there may be other such cases we don’t yet know about.

It’s possible the Schema.generate_compatible_select_expression() method (code) could be used to help with this situation (it’s currently used for unioning pings in the Glean app ping views).

┆Issue is synchronized with this Jira Bug

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions