Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cdc): support more metadata columns for MySQL, PG and MongoDB #17051

Merged
merged 15 commits into from
Jun 5, 2024

Conversation

StrikeW
Copy link
Contributor

@StrikeW StrikeW commented May 31, 2024

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Support more metadata columns for cdc table

  • database_name, schema_name, table_name for PG and MySQL
  • collection_name for MongoDB

Example:

CREATE TABLE person (
    id int,
    name varchar,
    email_address varchar,
    credit_card varchar,
    city varchar,
    PRIMARY KEY (id)
) INCLUDE TIMESTAMP AS commit_ts
INCLUDE DATABASE_NAME as database_name
INCLUDE SCHEMA_NAME as schema_name
INCLUDE TABLE_NAME as table_name
FROM pg_source TABLE 'public.person';

close #16654

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

Support more metadata columns for cdc table

  • database_name, schema_name, table_name for PG and MySQL
  • collection_name for MongoDB

@StrikeW StrikeW changed the title feat(cdc): (WIP) support database_name and table_name meta column for MySQL, PG and MongoDB feat(cdc): support database_name and table_name meta column for MySQL, PG and MongoDB Jun 2, 2024
@StrikeW StrikeW marked this pull request as ready for review June 2, 2024 15:08
@StrikeW StrikeW changed the title feat(cdc): support database_name and table_name meta column for MySQL, PG and MongoDB feat(cdc): support more metadata columns for MySQL, PG and MongoDB Jun 2, 2024
@lmatz lmatz added the user-facing-changes Contains changes that are visible to users label Jun 3, 2024
@hzxa21 hzxa21 self-requested a review June 3, 2024 09:50
@@ -215,6 +224,10 @@ message AdditionalColumn {
AdditionalColumnHeader header_inner = 5;
AdditionalColumnFilename filename = 6;
AdditionalColumnHeaders headers = 7;
AdditionalDatabaseName database_name = 8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since they're all empty messages, can we use the same message type instead? Or directly use google.protobuf.Empty?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, prost has some trouble with google well-known types.
& I use message structs for future extensibility on the column def, such as

message AdditionalColumnHeader {
  string inner_field = 1;
  data.DataType data_type = 2;
}

we won't want to handle the changes inside the catalog in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

message AdditionalColumn {
  oneof column_type {
    AdditionalColumnKey key = 1;
    AdditionalColumnTimestamp timestamp = 2;
    AdditionalColumnPartition partition = 3;
    AdditionalColumnOffset offset = 4;
    AdditionalColumnHeader header_inner = 5;
    AdditionalColumnFilename filename = 6;
    AdditionalColumnHeaders headers = 7;
  }
}

I think we need different message type for the oneof.

Copy link
Contributor

@tabVersion tabVersion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the impl LGTM, just to confirm cdc source does not need the additional column to maintain its states so no need for compatible code?

@StrikeW
Copy link
Contributor Author

StrikeW commented Jun 4, 2024

the impl LGTM, just to confirm cdc source does not need the additional column to maintain its states so no need for compatible code?

No need compatible code for cdc sources and tables defined in previous version. User need to recreate their source or table to get the metadata column.

Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

src/connector/src/parser/mod.rs Outdated Show resolved Hide resolved
@StrikeW StrikeW enabled auto-merge June 5, 2024 06:07
@StrikeW StrikeW added this pull request to the merge queue Jun 5, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jun 5, 2024
@StrikeW StrikeW enabled auto-merge June 5, 2024 07:10
@StrikeW StrikeW added this pull request to the merge queue Jun 5, 2024
Merged via the queue into main with commit e4f1c48 Jun 5, 2024
30 of 31 checks passed
@StrikeW StrikeW deleted the siyuan/cdc-metadata-dbname branch June 5, 2024 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature user-facing-changes Contains changes that are visible to users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

additional metadata when CDC via Debezium
5 participants