SeaTunnel初体验

数据集成组件,对比DataX,Flume,Logstash,FileBeat

介绍

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
高易用,高性能,支持实时离线的海量数据处理组件
架构在Flink/Spark之上

场景
海量数据ETL
海量数据聚合
多源数据处理

工作流程
Input/Source->Filter/Transform->Output/Sink

Input/Source
Fake,File,HDFS,Kafka,S3,Socket,自定义
Filter/Transform
Add,CheckSum,Convert,Date,Drop,Grok,Json,KV
LowerCase,Remove,Rename,Repartition,Replace
Sample,Split,Sql,Table,Truncate,UpperCase
UUID,自定义
Output/Sink
ES,File,HDFS,JDBC,Kafka,MySQL,ClickHouse
Stdout,自定义

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
Step1:Flink环境
不做过多介绍

Step2:Seatunnel准备
不做过多介绍

Step3:Seatunnel配置
# 指定FLINK_HOME
vi config/waterdrop-env.sh
# 新建应用配置
vi config/application.conf
env {
# You can set flink configuration here
execution.parallelism = 1
#execution.checkpoint.interval = 10000
#execution.checkpoint.data-uri = "hdfs://localhost:9000/checkpoint"
}

source {
SocketStream{
result_table_name = "fake"
field_name = "info"
}
}

transform {
Split{
separator = "#"
fields = ["name","age"]
}
sql {
sql = "select * from (select info,split(info) as info_row from fake) t1"
}
}

sink {
ConsoleSink {}
}

Step4:启动监听
nc -l -p 9999

Step5:Seatunnel启动
cd seatunnel
./bin/start-waterdrop-flink.sh --config ./config/application.conf

Step6:发送数据
xg#1995

Step7:查看效果
http://localhost:8081/#/task-manager
TM的Stdout日志查看打印结果
xg#1995,xg,1995

基于Spark

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
Step1:Spark环境
不做过多介绍

Step2:Seatunnel准备
不做过多介绍

Step3:Seatunnel配置
# 指定SPARK_HOME
vi config/waterdrop-env.sh
# 新建应用配置
vi config/application.conf
env {
# seatunnel defined streaming batch duration in seconds
spark.streaming.batchDuration = 5

spark.app.name = "seatunnel"
spark.ui.port = 13000
}

source {
socketStream {}
}

transform {
split {
fields = ["msg", "name"]
delimiter = ","
}
}

sink {
console {}
}

Step4:启动监听
nc -l -p 9999

Step5:Seatunnel启动
cd seatunnel
./bin/start-seatunnel-spark.sh --master local[4] --deploy-mode client --config ./config/application.conf

Step6:发送数据
Hello World, seatunnel

Step7:查看效果
Seatunnel日志查看打印结果
+----------------------+-----------+---------+
|raw_message |msg |name |
+----------------------+-----------+---------+
|Hello World, seatunnel|Hello World|seatunnel|
+----------------------+-----------+---------+

体验

1
低代码真的会成为大数据的一个趋势