-
Notifications
You must be signed in to change notification settings - Fork 244
Getting Started
Take the use of spark-shell for example, make sure you have deployed Spark and getted the TiSpark
TO use Tispark in spark-shell
- Add the following configuration in
spark-defaults.conf
spark.sql.extensions org.apache.spark.sql.TiExtensions
spark.tispark.pd.addresses ${your_pd_adress}
spark.sql.catalog.tidb_catalog org.apache.spark.sql.catalyst.catalog.TiCatalog
spark.sql.catalog.tidb_catalog.pd.addresses ${your_pd_adress}
- Start spark-shell with the --jars option
spark-shell --jars tispark-assembly-{version}.jar
spark.sql("select ti_version()").collect
You can use Spark SQL to read from TiKV
spark.sql("use tidb_catalog")
spark.sql("select count(*) from ${database}.${table}").show
You can use Spark DataSource API to write to TiKV and guarantees ACID(INSERT statement is not supported yet)
val tidbOptions: Map[String, String] = Map(
"tidb.addr" -> "127.0.0.1",
"tidb.password" -> "",
"tidb.port" -> "4000",
"tidb.user" -> "root"
)
val customerDF = spark.sql("select * from customer limit 100000")
customerDF.write
.format("tidb")
.option("database", "tpch_test")
.option("table", "cust_test_select")
.options(tidbOptions)
.mode("append")
.save()
See here for more details.
You can use Spark SQL to delete from TiKV (Tispark master support)
spark.sql("use tidb_catalog")
spark.sql("delete from ${database}.${table} where xxx")
See here for more details.
you can use multiple catalogs to read from different sources.
// read from hive
spark.sql("select * from spark_catalog.default.t").show
// join hive and tidb
spark.sql("select t1.id,t2.id from spark_catalog.default.t t1 left join tidb_catalog.test.t t2").show
Take the use of spark-shell for example
TO use Tispark in Spark shell
- Add the following configuration in
spark-defaults.conf
spark.sql.extensions org.apache.spark.sql.TiExtensions
spark.tispark.pd.addresses ${your_pd_adress}
- Start spark-shell with the --jars option
spark-shell --jars tispark-assembly-{version}.jar
spark.sql("select ti_version()").collect
You can use Spark SQL to read from TiKV
spark.sql("select count(*) from ${database}.${table}").show
You can use Spark DataSource API to write to TiKV and guarantees ACID(INSERT statement is not supported yet)
val tidbOptions: Map[String, String] = Map(
"tidb.addr" -> "127.0.0.1",
"tidb.password" -> "",
"tidb.port" -> "4000",
"tidb.user" -> "root"
)
val customerDF = spark.sql("select * from customer limit 100000")
customerDF.write
.format("tidb")
.option("database", "tpch_test")
.option("table", "cust_test_select")
.options(tidbOptions)
.mode("append")
.save()
See here for more details.