Skip to content

TiSpark Benchmark

shiyuhang0 edited this page May 11, 2022 · 4 revisions

Benchmark with TPC-H

Environment

Machine * 10
* CPU: 8 Intel Xeon Processor (Icelake)
* Memory: 32G
* Disk: 500G

TiDB 5.4.0: 3 TiDB + 3 TiKV + 1PD (TiDB and PD are in the same machine)

Spark 3.0.3 StandAlone: 1 master + 3 worker

Parallel Number

Parallel number depends on the total number of executor cores = 3*8 = 24

Write Benchmark

Write data from HDFS to TiDB with Data generated by TPC-H (ORDERS table)

TiSpark Write bechmark

Count(*) Data size Tasknumber Time(s)
1,500,000 164M 9 62
15,000,000 1.7G 23 396
150,000,000 17G 226 4722

Spark JDBC Write benchmark

Count(*) Data size Tasknumber Time(s)
1,500,000 164M 24 23
15,000,000 1.7G 24 244
150,000,000 17G 133 2483

Delete Benchmark

Delete data from TiDB with TiSpark (ORDERS table)

Count(*) Data size Tasknumber Time(s)
1,500,000 164M 3 31
15,000,000 1.7G 5 269
150,000,000 17G 33 3225

Select Benchmark

Select with TPCH 22 queries and table scan

  • Spark JDBC uses default config without partitionColumn, lowerBound, upperBound to partition the table
  • TiSpark will partition the table for us automatically
Query DataSize TiSpark(s) Spark JDBC(s)
TPC-H 22 queries 1G 131 157
TPC-H 22 queries 10G 424 1793 ( q21 OOM )
select * from orders 164M 5 10
select * from orders 1.7G 14 89

If you want to do a benchmark for TiSpark,here is a reference (Chinese only for now)

Clone this wiki locally