Flume之Multiplexing Channel Selector使用示例

本文主要是介绍Flume之Multiplexing Channel Selector使用示例，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

前言

Multiplexing Channe Selector 的作用就是根据 Event 的 Header 中的某个或几个字段的值将其映射到指定的 Channel ，便于之后 Channel Processor 将Event发送至对应的Channel中去。在Flume中，Multiplexing Channel Selector一般都与 Interceptor 拦截器搭配使用，因为新鲜的Event数据中Header为空，需要Interceptor去填充所需字段

具体配置

1）flume1.properties

# flume1:此配置用于监控单个或多个指定文件将其追加内容生成的Event先通过自定义的TypeInterceptor
# 根据Body中的内容向其Header中添加type字段,然后使用Multiplexing Channel Selector将不同
# type的Event传输到不同的Channel中,最后分别输出到flume2和flume3的控制台
# a1:TailDir Source-> TypeInterceptor -> Multiplexing Channel Selector ->
#   Memory Channel -> Avro Sink# Agent
a1.sources = r1
a1.channels = c1 c2
a1.sinks = k1 k2# Sources
# a1.sources.r1
a1.sources.r1.type = TAILDIR
# 设置Json文件存储路径(最好使用绝对路径)
# 用于记录文件inode/文件的绝对路径/每个文件的最后读取位置等信息
a1.sources.r1.positionFile = /opt/module/flume-1.8.0/.position/taildir_position.json
# 指定监控的文件组
a1.sources.r1.filegroups = f1
# 配置文件组中的被监控文件
# 设置f2组的监控文件,注意:使用的是正则表达式,而不是Linux通配符
a1.sources.r1.filegroups.f1 = /tmp/logs/^.*log$# Interceptor
# a1.sources.r1.interceptors
# 配置Interceptor链,Interceptor调用顺序与配置循序相同
a1.sources.r1.interceptors = typeInterceptor
# 指定使用的自定义Interceptor全类名,并使用其中的静态内部类Builder
# 要想使用自定义Interceptor,必须将实现的类打包成jar包放入$FLUME_HOME/lib文件夹中
# flume运行Java程序时会将此路径加入到ClassPath中
a1.sources.r1.interceptors.typeInterceptor.type = com.tomandersen.interceptors.TypeInterceptor$Builder# Channels
# a1.channels.c1
# 使用内存作为缓存/最多缓存的Event个数/单次传输的Event个数
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# a1.channels.c2
a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100# Channel Selector
# a1.sources.r1.selector
# 使用Multiple Channel Selector
a1.sources.r1.selector.type = multiplexing
# 设置匹配Header的字段
a1.sources.r1.selector.header = type
# 设置不同字段的值映射至各个Channel,其余的Event默认丢弃
a1.sources.r1.selector.mapping.Startup = c1
a1.sources.r1.selector.mapping.Event = c2# Sinks
# a1.sinks.k1
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop102
a1.sinks.k1.port = 4141
# a1.sinks.k2
a1.sinks.k2.type = avro
a1.sinks.k2.hostname = hadoop103
a1.sinks.k2.port = 4141# Bind
# r1->TypeInterceptor->Multiplexing Channel Selector->c1->k1
# r1->TypeInterceptor->Multiplexing Channel Selector->c2->k2
a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2

2）flume2.properties

# flume2:此配置用于将来自指定Avro端口的数据输出到控制台中
# a2:Avro Source->Memory Channel->Logger Sink# Agent
a2.sources = r1
a2.channels = c1
a2.sinks = k1# Sources
a2.sources.r1.type = avro
a2.sources.r1.bind = 0.0.0.0
a2.sources.r1.port = 4141# Channels
a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100# Sinks
# 运行时设置参数 -Dflume.root.logger=INFO,console 即输出到控制台实时显示
a2.sinks.k1.type = logger
# 设置Event的Body中写入log的最大字节数(默认值为16)
a2.sinks.k1.maxBytesToLog = 256# Bind
r1->c1->k1
a2.sources.r1.channels = c1
a2.sinks.k1.channel = c1

3）flume3.properties

# flume3:此配置用于将来自指定Avro端口的数据输出到控制台中
# a3:Avro Source->Memory Channel->Logger Sink# Agent
a3.sources = r1
a3.channels = c1
a3.sinks = k1# Sources
a3.sources.r1.type = avro
a3.sources.r1.bind = 0.0.0.0
a3.sources.r1.port = 4141# Channels
a3.channels.c1.type = memory
a3.channels.c1.capacity = 1000
a3.channels.c1.transactionCapacity = 100# Sinks
# 运行时设置参数 -Dflume.root.logger=INFO,console 即输出到控制台实时显示
a3.sinks.k1.type = logger
# 设置Event的Body中写入log的最大字节数(默认值为16)
a3.sinks.k1.maxBytesToLog = 256# Bind
r1->c1->k1
a3.sources.r1.channels = c1
a3.sinks.k1.channel = c1

4）启动命令

Flume Agent a1至a3分别运行在主机hadoop101、hadoop102、hadoop103上

./bin/flume-ng agent -n a1 -c conf -f flume1.properties
./bin/flume-ng agent -n a2 -c conf -f flume2.properties -Dflume.root.logger=INFO,console
./bin/flume-ng agent -n a3 -c conf -f flume3.properties -Dflume.root.logger=INFO,console

5）实现功能

Agent a1监听本地指定文件,将监听到的数据组装成Event通过自定义的 TypeInterceptor 来根据其Body中的内容向Header中添加不同的type字段键值，然后通过 Multiplexing Channel Selector将不同type的Event发送给不同的Channel，并最终分别在a2和a3的控制台上输出