Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formally decide how the Airbyte Protocol should handle multiple primary keys #31926

Closed
evantahler opened this issue Oct 27, 2023 · 1 comment
Closed
Assignees

Comments

@evantahler
Copy link
Contributor

evantahler commented Oct 27, 2023

This is is an extension of an action item from #p0-primay-keys-cannot-be-null

Today, the Airbyte Protocol allows multiple primary keys. However, it is unclear if this is intended mean that there are 2 record properties which each have the behavior of a PK (unique and non-null) or together they have the property of a primary key (e.g. together they form a unique identifier, and either of them individually can be null or repeated).

This is important to clarify because we had a recent outage where a sources like Facebook Ads and the custom_real-_time _spend stream - it has multiple primary keys.
Screenshot 2023-10-04 at 8 48 21 AM

V2 Destinations assumed that meant that each of these record properties would be required to be non-null, and added an index in the destination to that effect (#30779). But... records coming from the source had at least one of these columns as null, causing sync failures (https://github.com/airbytehq/oncall/issues/3129) looking like Query error: Required field ad_id cannot be null.

What's going on here? Is it that:

  1. There's a bug in the source, and this stream doesn't have this many (or any) primary keys?
  2. Multiple PKs should be treaded as a composite index, e.g. Multi-Column Index in Postgres
  3. Each stream should only be allowed to have one PK, and if the source needs more than one real property to construct this PK for deduplication, it should create a virtual recorord property, e.g. {pk: "${ad_id}+${account_id}+${date_start}"

Once we decide which of the above is the path forward, we should add validations to CAT and the platform. e.g. if we go with route 3, than means it would be correct to assume that any record property marked as a PK can never be null. The platform could fail a sync if a record comes in with a null PK.

@evantahler evantahler added team/platform-move team/destinations Destinations team's backlog labels Oct 27, 2023
@evantahler evantahler changed the title Formally decide how the Airbyte Protocol should handle composite primary keys Formally decide how the Airbyte Protocol should handle multiple primary keys Oct 27, 2023
@evantahler
Copy link
Contributor Author

evantahler commented Oct 31, 2023

After an in-person discussion we decided that the intent of having multiple primary keys is to be a composite primary key, implying that any of them can be null, as long as all PKs are not null (at least one must be non-null).

That means we can validate this in the platform via #31758

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants