通过Flume拉取Kafka数据保存到HDFS

来源：互联网发布：c语言英语怎么说编辑：程序博客网时间：2024/05/22 05:33

新建一个flume的.properties文件
vim flume-conf-kafka2hdfs.properties

# ------------------- 定义数据流----------------------# source的名字flume2HDFS_agent.sources = source_from_kafka  # channels的名字，建议按照type来命名flume2HDFS_agent.channels = mem_channel  # sink的名字，建议按照目标来命名flume2HDFS_agent.sinks = hdfs_sink  #auto.commit.enable = true  ## kerberos config ##  #flume2HDFS_agent.sinks.hdfs_sink.hdfs.kerberosPrincipal = flume/datanode2.hdfs.alpha.com@OMGHADOOP.COM  #flume2HDFS_agent.sinks.hdfs_sink.hdfs.kerberosKeytab = /root/apache-flume-1.6.0-bin/conf/flume.keytab   #-------- kafkaSource相关配置-----------------# 定义消息源类型# For each one of the sources, the type is defined  flume2HDFS_agent.sources.source_from_kafka.type = org.apache.flume.source.kafka.KafkaSource  flume2HDFS_agent.sources.source_from_kafka.channels = mem_channel  flume2HDFS_agent.sources.source_from_kafka.batchSize = 5000  # 定义kafka所在的地址  #flume2HDFS_agent.sources.source_from_kafka.zookeeperConnect = 10.129.142.46:2181,10.166.141.46:2181,10.166.141.47:2181/testkafka  flume2HDFS_agent.sources.source_from_kafka.kafka.bootstrap.servers = 192.168.2.86:9092,192.168.2.87:9092# 配置消费的kafka topic#flume2HDFS_agent.sources.source_from_kafka.topic = itil_topic_4097  flume2HDFS_agent.sources.source_from_kafka.kafka.topics = mtopic# 配置消费的kafka groupid#flume2HDFS_agent.sources.source_from_kafka.groupId = flume4097  flume2HDFS_agent.sources.source_from_kafka.kafka.consumer.group.id = flumetest#---------hdfsSink 相关配置------------------# The channel can be defined as follows.  flume2HDFS_agent.sinks.hdfs_sink.type = hdfs # 指定sink需要使用的channel的名字,注意这里是channel#Specify the channel the sink should use  flume2HDFS_agent.sinks.hdfs_sink.channel = mem_channel  #flume2HDFS_agent.sinks.hdfs_sink.filePrefix = %{host}  flume2HDFS_agent.sinks.hdfs_sink.hdfs.path = hdfs://192.168.2.xx:8020/tmp/ds=%Y%m%d  #File size to trigger roll, in bytes (0: never roll based on file size)flume2HDFS_agent.sinks.hdfs_sink.hdfs.rollSize = 0  #Number of events written to file before it rolled (0 = never roll based on number of events)   flume2HDFS_agent.sinks.hdfs_sink.hdfs.rollCount = 0  flume2HDFS_agent.sinks.hdfs_sink.hdfs.rollInterval = 3600  flume2HDFS_agent.sinks.hdfs_sink.hdfs.threadsPoolSize = 30#flume2HDFS_agent.sinks.hdfs_sink.hdfs.codeC = gzip  #flume2HDFS_agent.sinks.hdfs_sink.hdfs.fileType = CompressedStream  flume2HDFS_agent.sinks.hdfs_sink.hdfs.fileType=DataStream    flume2HDFS_agent.sinks.hdfs_sink.hdfs.writeFormat=Text    #------- memoryChannel相关配置-------------------------# channel类型# Each channel's type is defined.  flume2HDFS_agent.channels.mem_channel.type = memory  # Other config values specific to each type of channel(sink or source)  # can be defined as well  # channel存储的事件容量# In this case, it specifies the capacity of the memory channel  flume2HDFS_agent.channels.mem_channel.capacity = 100000  # 事务容量flume2HDFS_agent.channels.mem_channel.transactionCapacity = 10000

修改对应的地址及主题等信息就哦了～

运行：

>flume-ng agent -n flume2HDFS_agent -f flume-conf-kafka2hdfs.properties

控制台打开一个producer发送消息试一下吧：

>kafka-console-producer --broker-list cdh5:9092 --topic mtopic

阅读全文

0 0