Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flume知识点记录 #26

Open
chenpengcong opened this issue Feb 7, 2019 · 0 comments
Open

Flume知识点记录 #26

chenpengcong opened this issue Feb 7, 2019 · 0 comments

Comments

@chenpengcong
Copy link
Owner

chenpengcong commented Feb 7, 2019

控制flume写入HDFS的文件大小

HDFS sink将event写入HDFS时,对写入的文件进行轮转(rolling/rotation)由下面几个参数控制

Name Default Description
hdfs.rollInterval 30 Number of seconds to wait before rolling current file (0 = never roll based on time interval)
hdfs.rollSize 1024 File size to trigger roll, in bytes (0: never roll based on file size)
hdfs.rollCount 10 Number of events written to file before it rolled (0 = never roll based on number of events)

上面三个条件只要任意一条得到满足,则关闭当前打开的文件,此后新的事件将被写入新的文件,开始新一轮处理,因此如果想控制写入HDFS的文件的大小,可以修改rollInterval和rollCount 为0,然后指定rollSize为想要的文件大小

参考

多个agent之间传递数据,上一个agent的sink和当前agent的source类型要选择哪个

选择avro类型(实际验证过没问题),在Setting multi-agent flow提到

In order to flow the data across multiple agents or hops, the sink of the previous agent and source of the current hop need to be avro type with the sink pointing to the hostname (or IP address) and port of the source.

使用batchsize来优化性能

batchsize

The batch size is the maximum number of events that a sink or client will attempt to take from a channel in a single transaction.

增加batchsize的大小可以提高吞吐量,但事务失败时数据重复会增加

详见Flume Performance Tuning - part 1

filechannel原理

Apache Flume - FileChannel

自定义拦截器

[Flume Cookbook] Implementing Custom Interceptors写得很详细

且在实际应用过程中通过打印log得知Builder的configure方法最先被调用,接下来是build方法,然后才是拦截器的构造器

@chenpengcong chenpengcong changed the title Flume知识点记录.md Flume知识点记录 Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant