Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot seem to run ALTER command on replicas of the same shard #316

Open
sudhanvaghebbale opened this issue Jun 28, 2024 · 2 comments
Open

Comments

@sudhanvaghebbale
Copy link

Hi all, I wanted to reach out regarding an issue I have been facing while running Clickhouse schema migrations using alembic. This issue may not be related to alembic altogether but wanted to know if someone else has experienced this. Here's what I have been trying to do -

Clickhouse Setup

I have a cluster setup containing two Clickhouse nodes which are part of the same shard i.e. two replicas on one shard. I am also using zookeeper along with this. I have brought both of them up using Docker.

These are the schema migrations I ran in the order mentioned:

Migration 1
def upgrade() -> None:
   op.execute(
      """
		CREATE TABLE IF NOT EXISTS table_local `ON CLUSTER '{cluster}'` (
	       {fields here...}
	    )
		ENGINE = ReplicatedMergeTree('/clickhouse/tables/{cluster}/{shard}/test/table', '{replica}')
		TTL toDateTime(timestamp) + INTERVAL 3 MONTH DELETE   
		PARTITION BY toYYYYMMDD (timestamp);
      """
   )
Migration 2
def upgrade() -> None:
   op.execute(
      """
		CREATE TABLE IF NOT EXISTS table ON CLUSTER '{cluster}' AS table_local
        ENGINE = Distributed('{cluster}', test, table_local, rand());
      """
   )
Migration 3
def upgrade() -> None:
   op.execute(
      """
         DROP TABLE IF EXISTS table ON CLUSTER '{cluster}' SYNC;
      """
   )

   time.sleep(30)

   op.execute(
      """
         ALTER TABLE table_local ON CLUSTER '{cluster}' ADD COLUMN IF NOT EXISTS new_column String;
      """
   )

   time.sleep(30)

   op.execute(
      """
         CREATE TABLE IF NOT EXISTS table ON CLUSTER '{cluster}' AS table_local
         ENGINE = Distributed('{cluster}', test, table_local, rand());
      """
   )

After running these migrations, I see the alembic logs indicating that the migrations have run. I have also verified that the alembic_version table contains the last/correct migration run.

What I've come to observe is that only one replica in the shard has the updated column while the other replica does not have it even though the migration seems to have ran successfully. I don't see any errors in the clickhouse-server.err.log file. When I look at the clickhouse-server logs, I see this line corresponding to the ALTER query.

2024.06.25 15:23:46.767800 [ 700 ] {} <Debug> DDLWorker: Task query-0000000001 has already been executed by replica (clickhouse%2D02:9000) of the same shard.

I tried a few remedies to overcome this like:

  1. Add a sleep between each DDL command to make sure it's not a timing issue.
  2. I used the alter_sync setting in the query to allow sync of alter command across replicas.
  3. I tried to run all these commands using just curl without the use of alembic but same issue

These are versions I am running:

ClickHouse server version: 24.5.1
ClickHouse client version: 24.5.1.1763 (official build)
Alembic: 1.13.1
clickhouse-sqlalchemy: 0.3.2
sqlalchemy: 2.0.30

I want to know if someone else has also faced this issue in the past and if yes, how have they solve this problem. Any help will be greatly appreciated. Thanks!

@xzkostyan
Copy link
Owner

Hi.

Have alter_sync helped with this case?

@sudhanvaghebbale
Copy link
Author

No, I have had no luck with alter_sync as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants