Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[source-MySQL] Can't select incremental sync in CDC-based connection #38659

Closed
1 task done
mrmooon opened this issue May 25, 2024 · 6 comments
Closed
1 task done

[source-MySQL] Can't select incremental sync in CDC-based connection #38659

mrmooon opened this issue May 25, 2024 · 6 comments
Assignees
Labels
area/connectors Connector related issues cdc community connectors/source/mysql team/db-dw-sources Backlog for Database and Data Warehouse Sources team type/bug Something isn't working

Comments

@mrmooon
Copy link

mrmooon commented May 25, 2024

Connector Name

source-mysql

Connector Version

3.4.1

What step the error happened?

Configuring a new connector

Relevant information

We have a connection from MySQL to S3 using CDC as the update method. However, we can't select the incremental strategy for two particular streams. As I understand it, Airbyte should be able to retrieve the data incrementally based on the metadata of the binary logs, regardless of the structure of the table schema.

Screenshot 2024-05-25 at 11 54 02

Relevant log output

No response

Contribute

  • Yes, I want to contribute
@marcosmarxm
Copy link
Member

@mrmooon it looks is affecting only 2 streams, with others are you able to select incremental?
Can you confirm both tables are configured to use CDC in MySQL?

@mrmooon
Copy link
Author

mrmooon commented May 27, 2024

@mrmooon it looks is affecting only 2 streams, with others are you able to select incremental? Can you confirm both tables are configured to use CDC in MySQL?

Hey, @marcosmarxm. Yes, we update data incrementally with the rest of the streams (more than 150 tables split across multiple connections).

AFAIK all the table updates are written to the binary logs. The only difference from other streams is that these two streams don't have an id column in the source DB.

@akashkulk
Copy link
Contributor

akashkulk commented May 28, 2024

Do the two particular streams have a PK defined? A PK is required for a stream to enable CDC

@mrmooon
Copy link
Author

mrmooon commented May 29, 2024

Hello @akashkulk! These tables don't have a PK. I understand this is a requirement forIncremental Sync - Append + Deduped strategy when the data in the destination is updated to maintain the state of the source system (type 1 SCD). But, does the incremental append require a PK in the source system to identify changed records? To my understanding, the binlog offset allows you identify which was the last record ingested. Sorry if this is a trivial question.

@akashkulk
Copy link
Contributor

Your assessment is correct here but unfortunately this is not supported by the source at the moment :(

The long of it is that the Append vs De-dup is decided by the destination connector and selecting a stream with no PK with de-dup mode causes problems. As a result, the source defensively disables streams w/out PK for incremental syncing because it cannot determine the final destination.

@akashkulk
Copy link
Contributor

https://github.com/airbytehq/airbyte-internal-issues/issues/8009 to track.

But I don't think we will prioritize a fix this in the near future :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues cdc community connectors/source/mysql team/db-dw-sources Backlog for Database and Data Warehouse Sources team type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants