tsfile-kmx-spark-connector

注意：本版本的connector适用的tsfile中delta_object字段的格式为： key:value(+key:value)* 其中key不相同将一个或多个TsFile展示成SparkSQL中的一张表。允许指定单个目录，或使用通配符匹配多个目录。如果是多个TsFile，schema将保留各个TsFile中sensor的并集。

示例

src/test/scala/cn.edu.thu.kvtsfile.spark.TSFileSuit

路径指定方式

basefolder/key=1/file1.tsfile

basefolder/key=2/file2.tsfile 指定basefolder为path，会在表中多加一列key，值为1或2。

如： path=basefolder

如果使用通配符指定，将不会当做partiton

如： path=basefolder/*/*.tsfile

basefolder/file1.tsfile basefolder/file2.tsfile

指定basefolder会将多个tsfile的schema合并，保留sensor的并集

如： path=basefolder

版本需求

The versions required for Spark and Java are as follow:

Spark Version	Scala Version	Java Version
`2.0+`	`2.11`	`1.8`

数据类型转化

This library uses the following mapping the data type from TsFile to SparkSQL:

TsFile	SparkSQL
INT32	IntegerType
INT64	LongType
FLOAT	FloatType
DOUBLE	DoubleType

TsFile Schema -> SparkSQL Table Structure

The set of time-series data in section "Time-series Data" is used here to illustrate the mapping from TsFile Schema to SparkSQL Table Stucture.

turbineId:tunbine1
sensor_1		sensor_2		sensor_3
time	value	time	value	time	value
1	1.2	1	20	2	50
3	1.4	2	20	4	51
5	1.1	3	21	6	52
7	1.8	4	20	8	53

A set of time-series data

There is only one reserved columns in Spark SQL Table:

time : Timestamp, LongType

The SparkSQL Table Structure is as follow:

time(LongType)	turbineId(StringType)	sensor_1(FloatType)	sensor_2(IntType)	sensor_3(IntType)
1	turbine1	1.2	20	null
2	turbine1	null	20	50
3	turbine1	1.4	21	null
4	turbine1	null	20	51
5	turbine1	1.1	null	null
6	turbine1	null	null	52
7	turbine1	1.8	null	null
8	turbine1	null	null	53

Examples

Scala API

Example 1

 // import this library and Spark
 import cn.edu.thu.kvtsfile._
 import org.apache.spark.sql.SparkSession

 val spark = SparkSession.builder().master("local").getOrCreate()

 //read data in TsFile and create a table
 val df = spark.read.kvtsfile("test.ts")
 df.createOrReplaceTempView("TsFile_table")

 //query with filter
 val newDf = spark.sql("select * from TsFile_table where sensor_1 > 1.2").cache()

 newDf.show()

Example 2

 import cn.edu.thu.kvtsfile._
 import org.apache.spark.sql.SparkSession
 val spark = SparkSession.builder().master("local").getOrCreate()
 val df = spark.read
       .format("cn.edu.thu.kvtsfile")
       .load("test.ts")


 df.filter("sensor_1 > 1.2").show()

Example 3

 import cn.edu.thu.kvtsfile._
 import org.apache.spark.sql.SparkSession
 val spark = SparkSession.builder().master("local").getOrCreate()

 //create a table in SparkSQL and build relation with a TsFile
 spark.sql("create temporary view TsFile using cn.edu.thu.kvtsfile options(path = \"test.ts\")")

 spark.sql("select * from TsFile where sensor_1 > 1.2").show()

spark-shell

可以将项目打包在 spark-shell中使用。

mvn clean scala:compile compile package

包所在位置：target/kvtsfile-spark-connector-0.1.0.jar

$ bin/spark-shell --jars kvtsfile-spark-connector-0.1.0.jar,tsfile-0.1.0.jar

scala> sql("CREATE TEMPORARY TABLE TsFile_table USING cn.edu.thu.kvtsfile.spark OPTIONS (path \"hdfs://localhost:9000/test.ts\")")

scala> sql("select * from TsFile_table where sensor_1 > 1.2").show()

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
src		src
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tsfile-kmx-spark-connector

示例

路径指定方式

版本需求

数据类型转化

TsFile Schema -> SparkSQL Table Structure

Examples

Scala API

spark-shell

About

Releases

Packages

Languages

qiaojialin/tsfile-kmx-spark-connector

Folders and files

Latest commit

History

Repository files navigation

tsfile-kmx-spark-connector

示例

路径指定方式

版本需求

数据类型转化

TsFile Schema -> SparkSQL Table Structure

Examples

Scala API

spark-shell

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages