seatunnel

SeaTunnel was formerly named Waterdrop , and renamed SeaTunnel since October 12, 2021.

SeaTunnel is a very easy-to-use ultra-high-performance distributed data integration platform that supports real-time synchronization of massive data. It can synchronize tens of billions of data stably and efficiently every day, and has been used in the production of nearly 100 companies.

Why do we need SeaTunnel

SeaTunnel will do its best to solve the problems that may be encountered in the synchronization of massive data:

Data loss and duplication
Task accumulation and delay
Low throughput
Long cycle to be applied in the production environment
Lack of application running status monitoring

SeaTunnel use scenarios

Mass data synchronization
Mass data integration
ETL with massive data
Mass data aggregation
Multi-source data processing

Features of SeaTunnel

Easy to use, flexible configuration, low code development
Real-time streaming
Offline multi-source data analysis
High-performance, massive data processing capabilities
Modular and plug-in mechanism, easy to extend
Support data processing and aggregation by SQL
Support Spark structured streaming
Support Spark 2.x

Workflow of SeaTunnel

Input[Data Source Input] -> Filter[Data Processing] -> Output[Result Output]

The data processing pipeline is constituted by multiple filters to meet a variety of data processing needs. If you are accustomed to SQL, you can also directly construct a data processing pipeline by SQL, which is simple and efficient. Currently, the filter list supported by SeaTunnel is still being expanded. Furthermore, you can develop your own data processing plug-in, because the whole system is easy to expand.

Plugins supported by SeaTunnel

Input plugin Fake, File, Hdfs, Kafka, S3, Socket, self-developed Input plugin
Filter plugin Add, Checksum, Convert, Date, Drop, Grok, Json, Kv, Lowercase, Remove, Rename, Repartition, Replace, Sample, Split, Sql, Table, Truncate, Uppercase, Uuid, Self-developed Filter plugin
Output plugin Elasticsearch, File, Hdfs, Jdbc, Kafka, Mysql, S3, Stdout, self-developed Output plugin

Environmental dependency

java runtime environment, java >= 8
If you want to run SeaTunnel in a cluster environment, any of the following Spark cluster environments is usable:

Spark on Yarn
Spark Standalone

If the data volume is small, or the goal is merely for functional verification, you can also start in local mode without a cluster environment, because SeaTunnel supports standalone operation. Note: SeaTunnel 2.0 supports running on Spark and Flink.

Downloads

Download address for run-directly software package :https://github.com/InterestingLab/SeaTunnel/releases

Quick start

Quick start: https://interestinglab.github.io/seatunnel-docs/#/zh-cn/v1/quick-start

Detailed documentation on SeaTunnel:https://interestinglab.github.io/seatunnel-docs/#/

Application practice cases

Weibo, Value-added Business Department Data Platform

Weibo business uses an internal customized version of SeaTunnel and its sub-project Guardian for SeaTunnel On Yarn task monitoring for hundreds of real-time streaming computing tasks.

Sina, Big Data Operation Analysis Platform

Sina Data Operation Analysis Platform uses SeaTunnel to perform real-time and offline analysis of data operation and maintenance for Sina News, CDN and other services, and write it into Clickhouse.

Sogou, Sogou Qiqian System

Sogou Qiqian System takes SeaTunnel as an ETL tool to help establish a real-time data warehouse system.

Qutoutiao, Qutoutiao Data Center

Qutoutiao Data Center uses SeaTunnel to support mysql to hive offline ETL tasks, real-time hive to clickhouse backfill technical support, and well covers most offline and real-time tasks needs.

Yixia Technology, Yizhibo Data Platform
Yonghui Superstores Founders' Alliance-Yonghui Yunchuang Technology, Member E-commerce Data Analysis Platform

SeaTunnel provides real-time streaming and offline SQL computing of e-commerce user behavior data for Yonghui Life, a new retail brand of Yonghui Yunchuang Technology.

Shuidichou, Data Platform

Shuidichou adopts SeaTunnel to do real-time streaming and regular offline batch processing on Yarn, processing 3~4T data volume average daily, and later writing the data to Clickhouse.

For more use cases, please refer to: https://interestinglab.github.io/seatunnel-docs/#/zh-cn/case_study/

Contribute ideas and code

Submit issues and advice: https://github.com/InterestingLab/SeaTunnel/issues

Contribute code: https://github.com/InterestingLab/SeaTunnel/pulls

Developer

Thanks to all developers https://github.com/InterestingLab/SeaTunnel/graphs/contributors

Name		Name	Last commit message	Last commit date
Latest commit History 1,219 Commits
.github		.github
.mvn/wrapper		.mvn/wrapper
bin		bin
config		config
docs/en/configuration		docs/en/configuration
plugin-flink-sink-console		plugin-flink-sink-console
plugin-flink-sink-elasticsearch		plugin-flink-sink-elasticsearch
plugin-flink-sink-file		plugin-flink-sink-file
plugin-flink-sink-jdbc		plugin-flink-sink-jdbc
plugin-flink-sink-kafka		plugin-flink-sink-kafka
plugin-flink-source-fake		plugin-flink-source-fake
plugin-flink-source-file		plugin-flink-source-file
plugin-flink-source-jdbc		plugin-flink-source-jdbc
plugin-flink-source-kafka		plugin-flink-source-kafka
plugin-flink-source-socket		plugin-flink-source-socket
plugin-flink-transform-datastream2table		plugin-flink-transform-datastream2table
plugin-flink-transform-split		plugin-flink-transform-split
plugin-flink-transform-sql		plugin-flink-transform-sql
plugin-flink-transform-table2datasteam		plugin-flink-transform-table2datasteam
plugin-spark-phoenix-core		plugin-spark-phoenix-core
plugin-spark-sink-clickhouse		plugin-spark-sink-clickhouse
plugin-spark-sink-console		plugin-spark-sink-console
plugin-spark-sink-elasticsearch		plugin-spark-sink-elasticsearch
plugin-spark-sink-email		plugin-spark-sink-email
plugin-spark-sink-file		plugin-spark-sink-file
plugin-spark-sink-hbase		plugin-spark-sink-hbase
plugin-spark-sink-kafka		plugin-spark-sink-kafka
plugin-spark-sink-mysql		plugin-spark-sink-mysql
plugin-spark-sink-phoenix		plugin-spark-sink-phoenix
plugin-spark-source-elasticsearch		plugin-spark-source-elasticsearch
plugin-spark-source-fake		plugin-spark-source-fake
plugin-spark-source-file		plugin-spark-source-file
plugin-spark-source-hive		plugin-spark-source-hive
plugin-spark-source-jdbc		plugin-spark-source-jdbc
plugin-spark-source-kafka		plugin-spark-source-kafka
plugin-spark-source-kudu		plugin-spark-source-kudu
plugin-spark-source-mongodb		plugin-spark-source-mongodb
plugin-spark-source-phoenix		plugin-spark-source-phoenix
plugin-spark-source-redis		plugin-spark-source-redis
plugin-spark-source-socket		plugin-spark-source-socket
plugin-spark-transform-json		plugin-spark-transform-json
plugin-spark-transform-split		plugin-spark-transform-split
plugin-spark-transform-sql		plugin-spark-transform-sql
plugins		plugins
seatunnel-apis		seatunnel-apis
seatunnel-common		seatunnel-common
seatunnel-config		seatunnel-config
seatunnel-core		seatunnel-core
seatunnel-dist		seatunnel-dist
seatunnel-flink-api		seatunnel-flink-api
seatunnel-spark-api		seatunnel-spark-api
tools		tools
.dlc.json		.dlc.json
.gitignore		.gitignore
.licenserc.yaml		.licenserc.yaml
LICENSE		LICENSE
README.md		README.md
README_zh_CN.md		README_zh_CN.md
build.md		build.md
codeStyle.xml		codeStyle.xml
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

seatunnel

Why do we need SeaTunnel

SeaTunnel use scenarios

Features of SeaTunnel

Workflow of SeaTunnel

Plugins supported by SeaTunnel

Environmental dependency

Downloads

Quick start

Application practice cases

Contribute ideas and code

Developer

About

Releases

Packages

Languages

License

zixi0825/seatunnel

Folders and files

Latest commit

History

Repository files navigation

seatunnel

Why do we need SeaTunnel

SeaTunnel use scenarios

Features of SeaTunnel

Workflow of SeaTunnel

Plugins supported by SeaTunnel

Environmental dependency

Downloads

Quick start

Application practice cases

Contribute ideas and code

Developer

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages