Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cli(ticdc): Add safepoint command support #11470

Open
wk989898 opened this issue Aug 6, 2024 · 1 comment · May be fixed by #11471
Open

Cli(ticdc): Add safepoint command support #11470

wk989898 opened this issue Aug 6, 2024 · 1 comment · May be fixed by #11471
Labels
type/feature Issues about a new feature

Comments

@wk989898
Copy link
Collaborator

wk989898 commented Aug 6, 2024

Is your feature request related to a problem?

At a certain moment, the GC safe point is at 10, and a changefeed has synced to 20. At this point, the changefeed fails, interrupting the changefeed and leaving a service safe point with a value of 20 and a TTL of 24 hours. After a short while,
the service safe point of the GC worker is advancing normally and the GC safe point is at 30. The changefeed service safe point blocks GC, which has expired beyond the 24-hour TTL. The GC lifetime is manually extended in an attempt to recover the changefeed, but it is determined that blocking GC has failed, making it unsafe to start the changefeed after 24-hour TTL.

We can't stop GC when updating tidb_gc_life_time in some cases, it is related to gc-ttl.
So we have to create a new command to prolong changefeed's TTL to avoid data loss.

Describe the feature you'd like

We can add a servicesafepoint with changefeed's safepoint. This new servicesafepoint will prolong TTL to prevent GC. We also can delete this servicesafepoint as soon as possible when it is not necessary. And it's great that we can query all servicesafepoints.

We create a new cli command cdc cli safepoint to fix.

  • cdc cli safepoint set: create a user defined service safe point to block GC.
  • cdc cli safepoint delete: delete a user defined service safe point.
  • cdc cli safepoint query: query all service safe points.

When a changefeed fails, users can use cdc cli safepoint set command to create a new service safe point with a long TTL to block GC. They can use cdc cli safepoint delete command to delete the service safe point when the bug is fixed, and use cdc cli safepoint query command to check whether the related operations have been correctly implemented.

Notice: Users can only create and delete their definition of safepoint

  1. Set
cdc cli safepoint set service-id-suffix=xxx start-ts=xxx ttl=xxx
# {
#   "service_gc_safe_points": [
#     {
#       "service_id": "gc_worker",
#       "expired_at": 9223372036854775807,
#       "safe_point": 451519635657850880
#     },
#     {
#       "service_id": "ticdc-default-15674009460217235928",
#       "expired_at": 1722651185,
#       "safe_point": 451560023174938623
#     }
#   ],
#   "min_service_gc_safe_point": 451519635657850880,
#   "gc_safe_point": 451519635657850880
# }
  • service-id-suffix: This is used to specify the service ID for the user-generated service safe point. TiCDC will generate a service ID in the format of ticdc-clusterID-etcdClusterID. We will append the suffix to this service ID to create a new service ID ticdc-clusterID-etcdClusterID-service-id-suffix, with a default value of "user-defined."
  • start-ts: This serves as the timestamp for the safe point that needs to be held. This value must be greater than or equal to minServiceSafePoint; otherwise, an error will be reported.
  • ttl: This updates the protection period and cannot be less than or equal to 0. The default value is 86400 seconds (24 hours).
  1. Delete
cdc cli safepoint delete serviceIDsuffix start-ts=xxx
# equal `cdc cli safepoint set serviceID start-ts=xxx ttl=0`
# {
#   "service_gc_safe_points": [
#     {
#       "service_id": "gc_worker",
#       "expired_at": 9223372036854775807,
#       "safe_point": 451519635657850880
#     },
#     {
#       "service_id": "ticdc-default-15674009460217235928",
#       "expired_at": 1722651185,
#       "safe_point": 451560023174938623
#     }
#   ],
#   "min_service_gc_safe_point": 451519635657850880,
#   "gc_safe_point": 451519635657850880
# }
  1. Query
cdc cli safepoint query --pd http://localhost:2379 [--cdc]
# {
#   "service_gc_safe_points": [
#     {
#       "service_id": "gc_worker",
#       "expired_at": 9223372036854775807,
#       "safe_point": 451519635657850880
#     },
#     {
#       "service_id": "ticdc-default-15674009460217235928",
#       "expired_at": 1722651185,
#       "safe_point": 451560023174938623
#     }
#   ],
#   "min_service_gc_safe_point": 451519635657850880,
#   "gc_safe_point": 451519635657850880
# }

API

  • GET /api/v2/safepoint: This is equivalent to the query operation, retrieving the safe point.
  • POST /api/v2/safepoint: This is equivalent to the set operation, setting the safe point and TTL for the given service ID.
  • DELETE /api/v2/safepoint: This is equivalent to the delete operation, removing the safe point for the specified service ID.

Describe alternatives you've considered

The changefeed needs to retain data beyond 20, but the 24-hour protection period has already expired. The GC worker only needs to retain data beyond 30, so it sets its own service safe point to 30. This means that the data between 20 and 30 has been nominally abandoned by the GC worker, although the actual deletion operation has not been executed.

Teachability, Documentation, Adoption, Migration Strategy

A document about safepoint will add after this feature.

@wk989898 wk989898 added the type/feature Issues about a new feature label Aug 6, 2024
@wk989898 wk989898 changed the title TiCDC(Cli): Add safepoint command support Cli(ticdc): Add safepoint command support Aug 6, 2024
@wk989898 wk989898 linked a pull request Aug 6, 2024 that will close this issue
@flowbehappy
Copy link
Collaborator

@benmeadowcroft PTAL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature Issues about a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants