You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Today, the Airbyte Protocol allows multiple primary keys. However, it is unclear if this is intended mean that there are 2 record properties which each have the behavior of a PK (unique and non-null) or together they have the property of a primary key (e.g. together they form a unique identifier, and either of them individually can be null or repeated).
This is important to clarify because we had a recent outage where a sources like Facebook Ads and the custom_real-_time _spend stream - it has multiple primary keys.
V2 Destinations assumed that meant that each of these record properties would be required to be non-null, and added an index in the destination to that effect (#30779). But... records coming from the source had at least one of these columns as null, causing sync failures (https://github.com/airbytehq/oncall/issues/3129) looking like Query error: Required field ad_id cannot be null.
What's going on here? Is it that:
There's a bug in the source, and this stream doesn't have this many (or any) primary keys?
Multiple PKs should be treaded as a composite index, e.g. Multi-Column Index in Postgres
Each stream should only be allowed to have one PK, and if the source needs more than one real property to construct this PK for deduplication, it should create a virtual recorord property, e.g. {pk: "${ad_id}+${account_id}+${date_start}"
Once we decide which of the above is the path forward, we should add validations to CAT and the platform. e.g. if we go with route 3, than means it would be correct to assume that any record property marked as a PK can never be null. The platform could fail a sync if a record comes in with a null PK.
The text was updated successfully, but these errors were encountered:
evantahler
changed the title
Formally decide how the Airbyte Protocol should handle composite primary keys
Formally decide how the Airbyte Protocol should handle multiple primary keys
Oct 27, 2023
After an in-person discussion we decided that the intent of having multiple primary keys is to be a composite primary key, implying that any of them can be null, as long as all PKs are not null (at least one must be non-null).
That means we can validate this in the platform via #31758
This is is an extension of an action item from #p0-primay-keys-cannot-be-null
Today, the Airbyte Protocol allows multiple primary keys. However, it is unclear if this is intended mean that there are 2 record properties which each have the behavior of a PK (unique and non-null) or together they have the property of a primary key (e.g. together they form a unique identifier, and either of them individually can be null or repeated).
This is important to clarify because we had a recent outage where a sources like Facebook Ads and the
custom_real-_time _spend
stream - it has multiple primary keys.V2 Destinations assumed that meant that each of these record properties would be required to be non-null, and added an index in the destination to that effect (#30779). But... records coming from the source had at least one of these columns as null, causing sync failures (https://github.com/airbytehq/oncall/issues/3129) looking like
Query error: Required field ad_id cannot be null
.What's going on here? Is it that:
{pk: "${ad_id}+${account_id}+${date_start}"
Once we decide which of the above is the path forward, we should add validations to CAT and the platform. e.g. if we go with route 3, than means it would be correct to assume that any record property marked as a PK can never be null. The platform could fail a sync if a record comes in with a null PK.
The text was updated successfully, but these errors were encountered: