TiSpark Benchmark

Benchmark with TPC-H

Environment

Machine * 10
* CPU: 8 Intel Xeon Processor (Icelake)
* Memory: 32G
* Disk: 500G

TiDB 5.4.0: 3 TiDB + 3 TiKV + 1PD (TiDB and PD are in the same machine)

Spark 3.0.3 StandAlone: 1 master + 3 worker

Parallel Number

Parallel number depends on the total number of executor cores = 3*8 = 24

Write Benchmark

Write data from HDFS to TiDB with Data generated by TPC-H (ORDERS table)

TiSpark Write bechmark

Count(*)	Data size	Tasknumber	Time(s)
1,500,000	164M	9	62
15,000,000	1.7G	23	396
150,000,000	17G	226	4722

Spark JDBC Write benchmark

Count(*)	Data size	Tasknumber	Time(s)
1,500,000	164M	24	23
15,000,000	1.7G	24	244
150,000,000	17G	133	2483

Delete Benchmark

Delete data from TiDB with TiSpark (ORDERS table)

Count(*)	Data size	Tasknumber	Time(s)
1,500,000	164M	3	31
15,000,000	1.7G	5	269
150,000,000	17G	33	3225

Select Benchmark

Select with TPCH 22 queries and table scan

Spark JDBC uses default config without partitionColumn, lowerBound, upperBound to partition the table
TiSpark will partition the table for us automatically

Query	DataSize	TiSpark(s)	Spark JDBC(s)
TPC-H 22 queries	1G	131	157
TPC-H 22 queries	10G	424	1793 ( q21 OOM )
select * from orders	164M	5	10
select * from orders	1.7G	14	89

If you want to do a benchmark for TiSpark，here is a reference (Chinese only for now)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly