-
Notifications
You must be signed in to change notification settings - Fork 247
Getting Started
shiyuhang0 edited this page May 11, 2022
·
16 revisions
Take the use of spark-shell for example, make sure you have deployed Spark and get the TiSpark
TO use Tispark in Spark shell
- Add the following configuration in
spark-defaults.conf
spark.sql.extensions org.apache.spark.sql.TiExtensions
spark.tispark.pd.addresses ${your_pd_adress}
spark.sql.catalog.tidb_catalog org.apache.spark.sql.catalyst.catalog.TiCatalog
spark.sql.catalog.tidb_catalog.pd.addresses ${your_pd_adress}
- Start spark-shell with the --jars option
spark-shell --jars tispark-assembly-{version}.jar
You can use Spark SQL to read from TiKV
spark.sql("use tidb_catalog")
spark.sql("select count(*) from ${database}.${table}").show
You can use Spark DataSource API to write to TiKV and guarantees ACID(INSERT statement is not supported yet)
val tidbOptions: Map[String, String] = Map(
"tidb.addr" -> "127.0.0.1",
"tidb.password" -> "",
"tidb.port" -> "4000",
"tidb.user" -> "root",
"spark.tispark.pd.addresses" -> "127.0.0.1:2379"
)
val customerDF = spark.sql("select * from customer limit 100000")
customerDF.write
.format("tidb")
.option("database", "tpch_test")
.option("table", "cust_test_select")
.options(tidbOptions)
.mode("append")
.save()
See here for more details.
You can use Spark SQL to delete from TiKV (Tispark master support)
spark.sql("use tidb_catalog")
spark.sql("delete from ${database}.${table} where xxx").show
See here for more details.
Take the use of spark-shell for example
TO use Tispark in Spark shell
- Add the following configuration in
spark-defaults.conf
spark.sql.extensions org.apache.spark.sql.TiExtensions
spark.tispark.pd.addresses ${your_pd_adress}
- Start spark-shell with the --jars option
spark-shell --jars tispark-assembly-{version}.jar
You can use Spark SQL to read from TiKV
spark.sql("select count(*) from ${database}.${table}").show
You can use Spark DataSource API to write to TiKV and guarantees ACID(INSERT statement is not supported yet)
val tidbOptions: Map[String, String] = Map(
"tidb.addr" -> "127.0.0.1",
"tidb.password" -> "",
"tidb.port" -> "4000",
"tidb.user" -> "root",
"spark.tispark.pd.addresses" -> "127.0.0.1:2379"
)
val customerDF = spark.sql("select * from customer limit 100000")
customerDF.write
.format("tidb")
.option("database", "tpch_test")
.option("table", "cust_test_select")
.options(tidbOptions)
.mode("append")
.save()
See here for more details.