Skip to content

backup: adjust concurrency description after v8.2 #20963

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 21, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 13 additions & 5 deletions br/br-snapshot-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,15 +27,13 @@
```shell
tiup br backup full --pd "${PD_IP}:2379" \
--backupts '2022-09-08 13:30:00 +08:00' \
--storage "s3://backup-101/snapshot-202209081330?access-key=${access-key}&secret-access-key=${secret-access-key}" \
--ratelimit 128 \
--storage "s3://backup-101/snapshot-202209081330?access-key=${access-key}&secret-access-key=${secret-access-key}"
```

In the preceding command:

- `--backupts`: The time point of the snapshot. The format can be [TSO](/tso.md) or timestamp, such as `400036290571534337` or `2018-05-11 01:42:23 +08:00`. If the data of this snapshot is garbage collected, the `tiup br backup` command returns an error and `br` exits. When backing up using a timestamp, it is recommended to specify the time zone as well. Otherwise, `br` uses the local time zone to construct the timestamp by default, which might lead to an incorrect backup time point. If you leave this parameter unspecified, `br` picks the snapshot corresponding to the backup start time.
- `--storage`: The storage address of the backup data. Snapshot backup supports Amazon S3, Google Cloud Storage, and Azure Blob Storage as backup storage. The preceding command uses Amazon S3 as an example. For more details, see [URI Formats of External Storage Services](/external-storage-uri.md).
- `--ratelimit`: The maximum speed **per TiKV** performing backup tasks. The unit is in MiB/s.

During backup, a progress bar is displayed in the terminal as shown below. When the progress bar advances to 100%, the backup task is completed and statistics such as total backup time, average backup speed, and backup data size are displayed.

Expand Down Expand Up @@ -247,8 +245,18 @@

You can use the following methods to manually control the impact of backup tasks on cluster performance. However, these two methods also reduce the speed of backup tasks while reducing the impact of backup tasks on the cluster.

- Use the `--ratelimit` parameter to limit the speed of backup tasks. Note that this parameter limits the speed of **saving backup files to external storage**. When calculating the total size of backup files, use the `backup data size(after compressed)` as a benchmark. When `--ratelimit` is set, to avoid too many tasks causing the speed limit to fail, the `concurrency` parameter of br is automatically adjusted to `1`.
- Adjust the TiKV configuration item [`backup.num-threads`](/tikv-configuration-file.md#num-threads-1) to limit the number of threads used by backup tasks. According to internal tests, when BR uses no more than `8` threads for backup tasks, and the total CPU utilization of the cluster does not exceed 60%, the backup tasks have little impact on the cluster, regardless of the read and write workload.
- Recommended method: Adjust the TiKV configuration parameter [`backup.num-threads`](/tikv-configuration-file.md#num-threads-1), which controls the number of worker threads used by backup tasks. Because backup is a CPU-intensive operation, tuning this parameter allows for more precise control over TiKV’s CPU usage, enabling better resource isolation and predictability. In most scenarios, simply adjusting `num-threads` is sufficient to limit the impact of backup on the cluster. Internal testing shows that when the number of threads is set to `8` or fewer, and overall cluster CPU usage remains below 60%, the impact of backup on foreground workloads is negligible.

Check failure on line 248 in br/br-snapshot-guide.md

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [PingCAP.ZhPunctuation] Don't use Chinese punctuation (’) in English text. Raw Output: {"message": "[PingCAP.ZhPunctuation] Don't use Chinese punctuation (’) in English text.", "location": {"path": "br/br-snapshot-guide.md", "range": {"start": {"line": 248, "column": 304}}}, "severity": "ERROR"}

Check warning on line 248 in br/br-snapshot-guide.md

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [PingCAP.Ambiguous] Consider using a clearer word than 'sufficient' because it may cause confusion. Raw Output: {"message": "[PingCAP.Ambiguous] Consider using a clearer word than 'sufficient' because it may cause confusion.", "location": {"path": "br/br-snapshot-guide.md", "range": {"start": {"line": 248, "column": 426}}}, "severity": "INFO"}

- Alternative method: If you have already set `backup.num-threads` to a small value (for example, `1`), but still want to further reduce the impact of backup on the cluster, consider using the `--ratelimit` parameter. This option limits the bandwidth used to write backup files to external storage, specified in MiB/s. Note that the actual rate limiting effect depends on the size of the compressed data. You can refer to the `backup data size (after compressed)` field in the logs for more insight. When `--ratelimit` is enabled, BR automatically sets `--concurrency` to `1` to reduce the number of concurrent requests.

> **Note:**
>
> Enabling `--ratelimit` will further reduce backup throughput. In most cases, if you are already performing backups during off-peak hours and have reduced `backup.num-threads` to 1 but still observe backup impact on foreground workloads, it typically indicates that the cluster is approaching its resource limits.
>
> In such situations, consider the following alternatives:
>
> - [Scale out a cluster](/tiup/tiup-cluster.md#scale-out-a-cluster) to increase available resources.
> - Enable [`Log Backup`](/br/br-log-architecture.md) to offload backup pressure and minimize disruption to online workloads.

The impact of backup on cluster performance can be reduced by limiting the backup threads number, but this affects the backup performance. The preceding tests show that the backup speed is proportional to the number of backup threads. When the number of threads is small, the backup speed is about 20 MiB/thread. For example, 5 backup threads on a single TiKV node can reach a backup speed of 100 MiB/s.

Expand Down
5 changes: 0 additions & 5 deletions br/br-snapshot-manual.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,14 +36,12 @@ tiup br backup full \
--pd "${PD_IP}:2379" \
--backupts '2024-06-28 13:30:00 +08:00' \
--storage "s3://${backup_collection_addr}/snapshot-${date}?access-key=${access-key}&secret-access-key=${secret-access-key}" \
--ratelimit 128 \
--log-file backupfull.log
```

In the preceding command:

- `--backupts`: The time point of the snapshot. The format can be [TSO](/tso.md) or timestamp, such as `400036290571534337` or `2024-06-28 13:30:00 +08:00`. If the data of this snapshot is garbage collected, the `tiup br backup` command returns an error and 'br' exits. If you leave this parameter unspecified, `br` picks the snapshot corresponding to the backup start time.
- `--ratelimit`: The maximum speed **per TiKV** performing backup tasks. The unit is in MiB/s.
- `--log-file`: The target file where `br` log is written.

> **Note:**
Expand Down Expand Up @@ -72,7 +70,6 @@ tiup br backup db \
--pd "${PD_IP}:2379" \
--db test \
--storage "s3://${backup_collection_addr}/snapshot-${date}?access-key=${access-key}&secret-access-key=${secret-access-key}" \
--ratelimit 128 \
--log-file backuptable.log
```

Expand All @@ -90,7 +87,6 @@ tiup br backup table \
--db test \
--table usertable \
--storage "s3://${backup_collection_addr}/snapshot-${date}?access-key=${access-key}&secret-access-key=${secret-access-key}" \
--ratelimit 128 \
--log-file backuptable.log
```

Expand All @@ -107,7 +103,6 @@ tiup br backup full \
--pd "${PD_IP}:2379" \
--filter 'db*.tbl*' \
--storage "s3://${backup_collection_addr}/snapshot-${date}?access-key=${access-key}&secret-access-key=${secret-access-key}" \
--ratelimit 128 \
--log-file backupfull.log
```

Expand Down
2 changes: 1 addition & 1 deletion br/use-br-command-line-tool.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ A `tiup br` command consists of multiple layers of sub-commands. Currently, br c
* `--cert`: specifies the path to the SSL certificate in the PEM format.
* `--key`: specifies the path to the SSL certificate key in the PEM format.
* `--status-addr`: specifies the listening address through which `br` provides statistics to Prometheus.
* `--concurrency`: the number of concurrent tasks during the backup.
* `--concurrency`: controls how backup tasks are split into multiple requests and sent concurrently to the same TiKV node. This parameter primarily affects the granularity of request splitting from BR to TiKV, and no longer directly determines overall backup throughput. In most cases, it is not necessary to change this value. To improve backup performance, it is recommended to tune [`tikv.backup.num-threads`](/tikv-configuration-file.md#num-threads-1) instead.
* `--pitr-concurrency`: the number of concurrent tasks during log restore.
* `--tikv-max-restore-concurrency`: the maximum number of concurrent tasks per TiKV node during snapshot restore.
* `--compression`: determines the compression algorithm used for generating backup files. It supports `lz4`, `snappy`, and `zstd`, with the default being `zstd` (usually no need to modify). For guidance on choosing different compression algorithms, refer to [this document](https://github.com/EighteenZi/rocksdb_wiki/blob/master/Compression.md).
Expand Down