Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migrating tables with cdc enabled ends with Failed to execute #84

Open
carlo4002 opened this issue Jul 20, 2022 · 5 comments
Open

migrating tables with cdc enabled ends with Failed to execute #84

carlo4002 opened this issue Jul 20, 2022 · 5 comments

Comments

@carlo4002
Copy link

Hello guys

I am testing a migration of a table with cdc enable in source (cassandra) and target ( scylladb). The job finished in error with the next message.

22/07/13 14:09:49 ERROR QueryExecutor: Failed to execute: com.datastax.spark.connector.writer.RichBoundStatementWrapper@6512ab5fcom.datastax.oss.driver.api.core.servererrors.InvalidQueryException:cdc: attempted to get a stream from an earlier generation than the currently used one.With CDC you cannot send writes with timestamps too far into the past,because that would break consistency properties (write timestamp: 2018/11/21 23:56:33, current generation started at: 2022/07/11 10:15:34)

I cannot change current generation because it is the date the cluster was created. disabling the CDC in the target fix this problem but we need CDC enable during our dual writes ( migration is without downtime )

is it there a way force this writes ?

@tarzanek
Copy link
Contributor

tarzanek commented Jul 27, 2022

This looked like a CDC bug
there is a way to fix the streams

see
scylladb/scylladb#7127

@tarzanek tarzanek reopened this Jul 27, 2022
@tarzanek
Copy link
Contributor

tarzanek commented Jul 27, 2022

API for that was in scylladb/scylladb#6498

but all this assumes the error comes from Scylla, not sure about cassandra @carlo4002

and looking closer it's more about migrator preserving timestamps and writing with old timestamps

@tarzanek
Copy link
Contributor

tarzanek commented Jul 27, 2022

also I am confused by your implementation of dual writes, you just need CDC on source of dual writes and consume and write it to target.
There is a kafka CDC consumer to help with this.
Target won't need CDC at all, resp. I don't see why you would need it there.

Also note that other option is to just do dual writes from application and in such case you won't need CDC anywhere (but a small code change would be needed in client of course).

@tarzanek
Copy link
Contributor

tarzanek commented Jul 27, 2022

so I would just migrate with CDC disabled in target

alternative is of course disabling preserveTimestamp in migrator, but this way you will risk overwriting dual written data!

@carlo4002
Copy link
Author

Hello @tarzanek , Sorry it took so long to give you my feedback about this, I am still working on this and yes my work around was to disable cdc in scylla for the moment.

The cdc is scylla isn't for the migration (dual writes) but for some applications that use this db. So not all the tables have the cdc.

So when I say I need migration without downtime, I wanted to say that cdc must be enable in target for those tables that need it. However we are going to switch those services after first load, so no need to have the cdc on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants