From ed764d7d014c420ee0cbcde99597020c4f75346d Mon Sep 17 00:00:00 2001 From: Peng Guanwen Date: Fri, 15 Oct 2021 13:12:33 +0800 Subject: [PATCH] rawkv bulk load: add description for pause merge (#74) * rawkv bulk load: add description for pause merge Signed-off-by: Peng Guanwen * Update text/0072-online-bulk-load-for-rawkv.md Co-authored-by: Liangliang Gu Signed-off-by: Peng Guanwen * Add future improvements Signed-off-by: Peng Guanwen Co-authored-by: Liangliang Gu --- text/0072-online-bulk-load-for-rawkv.md | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/text/0072-online-bulk-load-for-rawkv.md b/text/0072-online-bulk-load-for-rawkv.md index 73f74f11..60989627 100644 --- a/text/0072-online-bulk-load-for-rawkv.md +++ b/text/0072-online-bulk-load-for-rawkv.md @@ -147,6 +147,15 @@ There are some usecases: 1. Load parquet from S3 to RawKV: Location=S3, FileFormat=Parquet, Encoder=TableEncoder, IngestAPI=RawKV 2. Load CSV from HDFS to TxnKV: Location=HDFS, FileFormat=CSV, Encoder=TiDBEncoder, IngestAPI=TxnKV +### Import mode & pause merge checker + +When ingesting, TiKV should be switched to import mode. Also, merge the empty regions split by Spark should be prevented. To achieve this, a PD REST API to pause checkers (including merge checker) should be added. Pause-checker API should be called periodically to keep checker in paused status. + +| URL | body | | +|---|:---:|---| +| POST /pd/api/v1/checker/{name} | `{ "delay": }` | Pause or resume a checker | +| GET /pd/api/v1/checker/{name} | -- | Check if a checker is paused | + ## Drawbacks Learning spark has a certain cost. @@ -154,3 +163,11 @@ Learning spark has a certain cost. ## Others A new repository is required to hold the spark-related codes. I propose it be named `migration`. + +## Future improvements + +Sometimes only part of TiKV are doing importing job. For example, we have two keyspace and want one keeps serving while other is importing. +To minimize the affect from import mode and the paused merge checker, two strategy can be applied: + +1. The import mode can be changed to region-wise, so that other regions remains unaffected. +2. A more fine-grained API to let PD pause checker skip specific regions.