Skip to content

[clickhouse] Long running QA tests for replicated cluster #6953

Open
@karencfv

Description

@karencfv

Overview

As part of the work to roll out replicated ClickHouse we'll be needing some long running testing to ensure stability of the replicated cluster. Specifically, we'll be wanting to know how stable the replicated cluster is when left alone for a while under load (i.e., days, weeks).

We'll need to monitor the system and answer the following questions periodically during a long period of time (a month or so?):

  • Is data consistent across all replicas?
  • Is query performance acceptable under load?
  • Are the queue lengths acceptable under load?
  • Do queue lengths grow over time or are they consistent depending on the load?
  • <more?>

Implementation

We'll probably want to use clickhouse-admin to extract information from the system. There is a clickhouse-admin binary already installed in each of the clickhouse-{server|keeper} nodes.

To retrieve the information we need, we can leverage the following native ClickHouse tooling:

Altinity has a pretty cool ClickHouse stress test suite. We can probably use it for running stress tests, or take inspiration from it to create our own stress tests.

Relevant links

Tasks

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions