TiSpark Benchmark

Benchmark with TPC-H

Environment

Machine * 10
* CPU: 8 Intel Xeon Processor (Icelake)
* Memory: 32G
* Disk: 500G

TiDB 5.4.0: 3 TiDB + 3 TiKV + 1PD (TiDB and PD are in the same machine)

Spark 3.0.3 StandAlone: 1 master + 3 worker

Parallel Number

Parallel number depends on the total number of executor cores = 3*8 = 24

Write Benchmark

Write data from HDFS to TiDB with Data generated by TPC-H (ORDERS table)

TiSpark Write bechmark

Count(*)	Data size	Tasknumber	Time(s)
1,500,000	164M	9	62
15,000,000	1.7G	23	396
150,000,000	17G	226	4722

Spark JDBC Write benchmark

Count(*)	Data size	Tasknumber	Time(s)
1,500,000	164M	24	23
15,000,000	1.7G	24	244
150,000,000	17G	133	2483

Delete Benchmark

Delete data from TiDB with TiSpark (ORDERS table)

Count(*)	Data size	Tasknumber	Time(s)
1,500,000	164M	3	31
15,000,000	1.7G	5	269
150,000,000	17G	33	3225

Select Benchmark

Select with TPCH 22 queries and table scan

Spark JDBC uses default config without partitionColumn, lowerBound, upperBound to partition the table
TiSpark will partition the table for us automatically

Query	DataSize	TiSpark(s)	Spark JDBC(s)
TPC-H 22 queries	1G	131	157
TPC-H 22 queries	10G	424	1793 ( q21 OOM )
select * from orders	164M	5	10
select * from orders	1.7G	14	89

If you want to do a benchmark for TiSpark，here is a reference (Chinese only for now)

Benchmark with TPC-DS

Environment

Machine * 2
* CPU: 48c
* Memory: 187G

TiDB v6.0.0: 3 TiKV

Spark v3.1.3: Local Mode

the first machine run 2 TiKV, the second machine run 1 TiKV and 1 spark

Data

Load 50G TPC-DS Data to TiDB. See here for the detail of data load

Query

Some queries are not compatible with Spark SQL

change all the date_add(start_date, interval 30 day) to date_add(start_date, 30)
change alias from 'name' to `name`

Select BenchMark

Execute 99 TPC-DS query on 50G Data

storage	total time(s)
TiSpark on TiKV	7504
TiSpark on TiFlash	2928 (Q5 Fail)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TiSpark Benchmark

Benchmark with TPC-H

Environment

Parallel Number

Write Benchmark

Delete Benchmark

Select Benchmark

Benchmark with TPC-DS

Environment

Data

Query

Select BenchMark

Clone this wiki locally