-
Notifications
You must be signed in to change notification settings - Fork 842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MySQL CDC Plugin #3014
base: main
Are you sure you want to change the base?
MySQL CDC Plugin #3014
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initial pass - thanks!
Description("The key to store the last processed binlog position."), | ||
service.NewStringField(fieldFlavor). | ||
Description("The flavor of MySQL to connect to."). | ||
Example("mysql"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the alternative? Should this be the default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative can be mariadb, since it's compatible with MySQL.
But if we release this connector for MySQL deployments only - I can remove this field.
Description("The flavor of MySQL to connect to."). | ||
Example("mysql"), | ||
service.NewBoolField(fieldMaxSnapshotParallelTables). | ||
Description("Int specifies a number of tables to be streamed in parallel when taking a snapshot. If set to true, the connector will stream all tables in parallel. Otherwise, it will stream tables one by one."). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This declared as a bool field
service.NewStringField(fieldCheckpointKey). | ||
Description("The key to store the last processed binlog position."), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we also need a cache field? Then this is the key for the cache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was following CockroachDB code example. We had only one field for checkpointer/cache
Description("If set to true, the connector will query all the existing data as a part of snapshot procerss. Otherwise, it will start from the current binlog position."), | ||
service.NewAutoRetryNacksToggleField(), | ||
service.NewIntField(fieldCheckpointLimit). | ||
Description("The maximum number of messages that can be processed at a given time. Increasing this limit enables parallel processing and batching at the output level. Any given LSN will not be acknowledged unless all messages under that offset are delivered in order to preserve at least once delivery guarantees."). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LSN is a postgres concept not MySQL
return nil, err | ||
} | ||
|
||
if streamInput.binLogCache, err = conf.FieldString(fieldCheckpointKey); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the docs this isn't quite right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What exactly, and again, I'm following CockroachDB example
} | ||
|
||
// 2. Acquire global read lock (minimizing lock time) | ||
if _, err := s.lockConn.ExecContext(ctx, "FLUSH TABLES WITH READ LOCK"); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This WITH READ LOCK
is only held with the context of this statement or the connection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only for this statement. We execute UNLOCK TABLES
a few lines down.
internal/impl/mysql/snapshot.go
Outdated
return nil, fmt.Errorf("failed to start consistent snapshot: %v", err) | ||
} | ||
|
||
// 2. Acquire global read lock (minimizing lock time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we comment as to why we need to FLUSH TABLES?
internal/impl/mysql/snapshot.go
Outdated
return nil, fmt.Errorf("failed to start transaction: %v", err) | ||
} | ||
|
||
// Execute START TRANSACTION WITH CONSISTENT SNAPSHOT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a helpful comment. Maybe better would be to explain why we need to use this transaction option?
|
||
func (s *Snapshot) getRowsCount(table string) (int, error) { | ||
var count int | ||
if err := s.tx.QueryRowContext(s.ctx, "SELECT COUNT(*) FROM "+table).Scan(&count); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this actually fast? Should we be querying the table stats instead? https://stackoverflow.com/a/61548683
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I found table stats query is not accurate, meaning we may get a lower number of records, wherefore missing some of the snapshot data.
Even in the link you sent it says: Estimate but very performant.
So, I'd stick with count(*). But let me know your thoughts
internal/impl/mysql/snapshot.go
Outdated
SELECT COLUMN_NAME | ||
FROM information_schema.KEY_COLUMN_USAGE | ||
WHERE TABLE_NAME = '%s' AND CONSTRAINT_NAME = 'PRIMARY'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this yield them in the right order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my tests - yes.
but I'll add ORDER BY ORDINAL_POSITION;
just to be sure
Adds support for MySQL CDC using golang-canal lib
Features supported:
Versions tested:
"8.0", "9.0", "9.1"