Skip to content

Data Restoration

Larry Snizek edited this page Aug 23, 2022 · 6 revisions

The best kind of replication is one that requires no manual intervention, like ClickHouse's ReplicatedMergeTree. High-availability systems can be constructed that tolerate single node failures. Sooner or later, maintainers of larger scale setups also have to consider whole-cluster failures and those call for some degree of manual data restoration.

At the scale of my time-series deployment, several million messages/sec with several aggregations, I have been finding it useful to have a small restoration procedure separate from a big one, as described below.

Small One: The Patch

A quick replacement of a small gap in data in the order of tens to hundreds million rows (minutes of my time-series stream). Such gap may be caused by an awkward node restart or a network glitch. Use ClickHouse Copier or simply a manual DELETE-SELECT-INSERT. My typical procedure is:

  1. ALTER TABLE x DELETE WHERE t BETWEEN t0 AND t1
  2. SELECT FROM x WHERE t BETWEEN t0 AND t1 and INSERT INTO x, network transport between the two using netcat or rsync
  3. repeat for each table, rounding interval (t0,t1) appropriately

The rounding is necessary for those of my tables that are time-resolution aggregates.

This is an easy (bunch of SQL), flexible (one can transform schema in SELECT statement as desired + arbitrary resharding) and safe (native protocol command line client) procedure. It is slowed down by having to re-encode data types and re-merge inserted data. I might get a million rows/sec on my cluster with magnetic disk drives.

Big One: The Backfill

Repair of incidents lasting up to several days. Copy and attach entire merged parts. The procedure is (see script example):

  1. SELECT path FROM system.parts WHERE table = 'x' AND partition BETWEEN t0 and t1 to list parts in partitions overlapping given interval
  2. copy those parts from their filesystem location to destination cluster's table detached directory (I use netcat over TCP)
  3. ALTER TABLE x DELETE WHERE t BETWEEN t0 AND t1
  4. ALTER TABLE x ATTACH PART on each copied part
  5. repeat for each table

Interval is rounded to largest partition size of interest.

This procedure is faster than SELECT-INSERT patching. I might get a couple million rows/sec on my cluster. A decently formulated benchmark would be useful here, and I might add one in due course.

Main disadvantage is the potential for messing something up by accessing filesystem. Schema of source and destination tables must be identical, with no SQL tweaks possible. Finally, you cannot reshard many-to-one where part names collide.

Clone this wiki locally