Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sink(ticdc): Add Support for Multiple MySQL-Compatible Downstream Addresses in TiCDC for High Availability #11527

Open
wants to merge 52 commits into
base: master
Choose a base branch
from

Conversation

wlwilliamx
Copy link
Contributor

@wlwilliamx wlwilliamx commented Aug 27, 2024

What problem does this PR solve?

Issue Number: close #11475

What is changed and how it works?

Description:

This PR introduces a new feature in TiCDC to enhance the fault tolerance of Changefeeds that target MySQL-compatible downstream databases. Previously, TiCDC could only connect to one MySQL-compatible server in a downstream cluster, and if that server became unavailable, it required manual intervention (i.e., recreating the Changefeed) to restore functionality.

New Feature:

Automatic Failover for Multiple Downstream Addresses:

  • Background: In many deployment scenarios, a load balancer is often used to manage high availability for database clusters. However, this can introduce a single point of failure and additional complexity. To simplify deployment and enhance robustness, TiCDC now natively supports specifying multiple MySQL-compatible downstream addresses in the --sink-uri option during Changefeed creation or update.
  • Functionality: When a downstream database server becomes unavailable, TiCDC will automatically attempt to switch to another available server from the list of provided addresses, ensuring the continuity of the Changefeed without needing user intervention. This adds a layer of redundancy, making TiCDC more resilient in environments where load balancers may not be feasible.

Key Additions and Changes:

  1. DBConnector Implementation:

    • Purpose: Manages the connection to MySQL-compatible databases and handles automatic reconnection and failover in case of a server failure. It works by rotating through a list of DSNs (Data Source Names) in a round-robin fashion to find an available database server.
    • Code Location: The DBConnector struct and methods have been added in the pkg/sink/mysql package. The core methods include:
      • SwitchToAnAvailableDB: Automatically tries to switch to another available database in the event of a failure.
      • ConfigureDBWhenSwitch: Allows custom configuration logic to be applied when switching to a new connection.
  2. Integration into Existing Components:

    • Replaced all instances of direct MySQL connection logic with the DBConnector. This includes:
      • DDL Sink
      • DML Sink
      • Observer
      • Syncpoint Store
    • By using DBConnector, these components now benefit from automatic failover, making TiCDC more resilient in the face of database outages.
  3. Unit Testing:

    • Thorough unit tests have been added for the DBConnector to ensure it handles reconnection and failover logic correctly. These tests can be found in pkg/sink/mysql/mysql_connector_test.go.
  4. Integration Testing:

    • Updated the TiCDC integration tests to verify that Changefeeds can handle multiple downstream addresses. The integration tests cover scenarios where downstream database servers become unavailable, and TiCDC successfully switches to another available server.
    • New Script: Added a script start_downstream_tidb_instances to start multiple TiDB instances for testing the failover functionality. The test_prepare file has been updated to register three downstream TiDB instance ports for these tests. While the integration tests now support up to three TiDB instances by default, more instances can be added if needed by modifying the registered ports.
  5. Minor Changes to Existing Scripts:

    • The ports for downstream TiDB instances in the integration test scripts (run.sh) have been modified to accommodate the new multi-instance setup. These changes are purely related to port assignments and do not alter the test logic.

Testing and Validation:

  • Unit Tests: Added detailed unit tests for DBConnector.
  • Integration Tests: Tested cdc cli changefeed create and cdc cli changefeed update with multiple downstream addresses. Verified that TiCDC correctly switches between downstream instances during failures with multiple TiDB instances to confirm the automatic failover functionality works as expected.

This enhancement greatly improves TiCDC’s reliability and ease of use, especially in complex deployment environments, by reducing dependency on external load balancers and ensuring smooth failover between multiple downstream MySQL-compatible databases.

Check List

Tests

  • Unit test
  • Integration test

Questions

Will it cause performance regression or break compatibility?

No

Do you need to update user documentation, design documentation or monitoring documentation?

Yes, the user documentation should be updated to reflect the new support for specifying multiple downstream addresses in the --sink-uri option, along with instructions on how to configure and use this feature. Additionally, any design documentation that explains the architecture of TiCDC's database connection management should be updated to include details on the new DBConnector and its failover capabilities. Monitoring documentation should also be updated to account for the behavior and health of multiple downstream connections, including potential alerts when failover occurs.

Release note

Support automatic failover across multiple MySQL-compatible downstream addresses in the `--sink-uri`, ensuring high availability and improved fault tolerance.

…s before each retry attempt

Introduced the `WithPreExecutionWhenRetry` feature to allow a PreExecution action to be specified,
which will be executed before each retry attempt, but only if the initial execution fails. This
ensures that the PreExecution function is triggered before every retry following the first failure.
If the initial execution is successful, no retry occurs, and the PreExecution function will not be
executed.
…ying all DOWN_TIDB_STATUS to DOWN_TIDB_STATUS_1
Copy link
Contributor

ti-chi-bot bot commented Aug 27, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Aug 27, 2024
@wlwilliamx
Copy link
Contributor Author

/test cdc-integration-kafka-test

Copy link

codecov bot commented Aug 28, 2024

Codecov Report

Attention: Patch coverage is 66.02317% with 88 lines in your changes missing coverage. Please review.

Project coverage is 57.5027%. Comparing base (6f697c4) to head (d804633).
Report is 5 commits behind head on master.

Additional details and impacted files
Components Coverage Δ
cdc 61.2813% <66.0231%> (+0.0998%) ⬆️
dm 51.0354% <ø> (+0.0141%) ⬆️
engine 63.3879% <ø> (ø)
Flag Coverage Δ
unit 57.5027% <66.0231%> (+0.0597%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

@@               Coverage Diff                @@
##             master     #11527        +/-   ##
================================================
+ Coverage   57.4429%   57.5027%   +0.0597%     
================================================
  Files           851        852         +1     
  Lines        126421     126580       +159     
================================================
+ Hits          72620      72787       +167     
+ Misses        48394      48363        -31     
- Partials       5407       5430        +23     

@wlwilliamx wlwilliamx force-pushed the feature/multi-mysql-addresses-in-sink-uri branch from 9708137 to 97db8dd Compare August 28, 2024 07:29
@wlwilliamx wlwilliamx changed the title sink(ticdc): Support Multiple Downstream Addresses for MySQL sink(ticdc): Add Support for Multiple MySQL-Compatible Downstream Addresses in TiCDC for High Availability Aug 29, 2024
@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. and removed release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Aug 29, 2024
@flowbehappy flowbehappy added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 29, 2024
@ti-chi-bot ti-chi-bot bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Aug 30, 2024
@wlwilliamx
Copy link
Contributor Author

/test verify

@wlwilliamx
Copy link
Contributor Author

/test cdc-integration-mysql-test
/test dm-integration-test

1 similar comment
@wlwilliamx
Copy link
Contributor Author

/test cdc-integration-mysql-test
/test dm-integration-test

@wlwilliamx
Copy link
Contributor Author

/test cdc-integration-mysql-test

@wlwilliamx
Copy link
Contributor Author

/test cdc-integration-kafka-test

@wlwilliamx
Copy link
Contributor Author

/test cdc-integration-pulsar-test

@ti-chi-bot ti-chi-bot bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 6, 2024
Copy link
Contributor

ti-chi-bot bot commented Sep 6, 2024

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ticdc: Support Multiple Downstream Addresses for MySQL
2 participants