通过Flume拉取Kafka数据保存到HDFS
来源:互联网 发布:c语言英语怎么说 编辑:程序博客网 时间:2024/05/22 05:33
新建一个flume的.properties文件 vim flume-conf-kafka2hdfs.properties
# ------------------- 定义数据流----------------------# source的名字flume2HDFS_agent.sources = source_from_kafka # channels的名字,建议按照type来命名flume2HDFS_agent.channels = mem_channel # sink的名字,建议按照目标来命名flume2HDFS_agent.sinks = hdfs_sink #auto.commit.enable = true ## kerberos config ## #flume2HDFS_agent.sinks.hdfs_sink.hdfs.kerberosPrincipal = flume/datanode2.hdfs.alpha.com@OMGHADOOP.COM #flume2HDFS_agent.sinks.hdfs_sink.hdfs.kerberosKeytab = /root/apache-flume-1.6.0-bin/conf/flume.keytab #-------- kafkaSource相关配置-----------------# 定义消息源类型# For each one of the sources, the type is defined flume2HDFS_agent.sources.source_from_kafka.type = org.apache.flume.source.kafka.KafkaSource flume2HDFS_agent.sources.source_from_kafka.channels = mem_channel flume2HDFS_agent.sources.source_from_kafka.batchSize = 5000 # 定义kafka所在的地址 #flume2HDFS_agent.sources.source_from_kafka.zookeeperConnect = 10.129.142.46:2181,10.166.141.46:2181,10.166.141.47:2181/testkafka flume2HDFS_agent.sources.source_from_kafka.kafka.bootstrap.servers = 192.168.2.86:9092,192.168.2.87:9092# 配置消费的kafka topic#flume2HDFS_agent.sources.source_from_kafka.topic = itil_topic_4097 flume2HDFS_agent.sources.source_from_kafka.kafka.topics = mtopic# 配置消费的kafka groupid#flume2HDFS_agent.sources.source_from_kafka.groupId = flume4097 flume2HDFS_agent.sources.source_from_kafka.kafka.consumer.group.id = flumetest#---------hdfsSink 相关配置------------------# The channel can be defined as follows. flume2HDFS_agent.sinks.hdfs_sink.type = hdfs # 指定sink需要使用的channel的名字,注意这里是channel#Specify the channel the sink should use flume2HDFS_agent.sinks.hdfs_sink.channel = mem_channel #flume2HDFS_agent.sinks.hdfs_sink.filePrefix = %{host} flume2HDFS_agent.sinks.hdfs_sink.hdfs.path = hdfs://192.168.2.xx:8020/tmp/ds=%Y%m%d #File size to trigger roll, in bytes (0: never roll based on file size)flume2HDFS_agent.sinks.hdfs_sink.hdfs.rollSize = 0 #Number of events written to file before it rolled (0 = never roll based on number of events) flume2HDFS_agent.sinks.hdfs_sink.hdfs.rollCount = 0 flume2HDFS_agent.sinks.hdfs_sink.hdfs.rollInterval = 3600 flume2HDFS_agent.sinks.hdfs_sink.hdfs.threadsPoolSize = 30#flume2HDFS_agent.sinks.hdfs_sink.hdfs.codeC = gzip #flume2HDFS_agent.sinks.hdfs_sink.hdfs.fileType = CompressedStream flume2HDFS_agent.sinks.hdfs_sink.hdfs.fileType=DataStream flume2HDFS_agent.sinks.hdfs_sink.hdfs.writeFormat=Text #------- memoryChannel相关配置-------------------------# channel类型# Each channel's type is defined. flume2HDFS_agent.channels.mem_channel.type = memory # Other config values specific to each type of channel(sink or source) # can be defined as well # channel存储的事件容量# In this case, it specifies the capacity of the memory channel flume2HDFS_agent.channels.mem_channel.capacity = 100000 # 事务容量flume2HDFS_agent.channels.mem_channel.transactionCapacity = 10000
修改对应的地址及主题等信息就哦了~
运行:
>flume-ng agent -n flume2HDFS_agent -f flume-conf-kafka2hdfs.properties
控制台打开一个producer发送消息试一下吧:
>kafka-console-producer --broker-list cdh5:9092 --topic mtopic
阅读全文
0 0
- 通过Flume拉取Kafka数据保存到HDFS
- 通过Flume拉取Kafka数据保存到ES
- flume从kafka获取数据并按时间保存到hdfs上
- linux集成 kafka数据通过flume发送到hadoop
- 通过分区和offset拉取Kafka的数据
- flume从kafka导数据到hdfs
- flume采集数据到hdfs
- flume 采集数据到hdfs
- flume实现kafka到hdfs实时数据采集 - 有负载均衡策略
- flume实时接收kafka消息并保存至HDFS
- flume+kafka+hdfs详解
- flume+kafka+storm+hdfs
- 3、flume数据导入到Hdfs中
- flume采集本地数据到hdfs
- flume的导日志数据到hdfs
- flume采集数据到kafka和hive
- flume抓取数据到kafka(整合)
- 大数据架构:flume+Kafka+Storm+HDFS 实时系统组合
- 如何用fragment
- PHP_7.0_升级备注
- yahoo鉴黄模型测试
- PowerDesigner15.1创建模型及生成带注释sql
- hihoCoder 1051 : 补提交卡(贪心+枚举)
- 通过Flume拉取Kafka数据保存到HDFS
- Java注解
- [JAVA学习笔记-85]java的concurrent包的整体认识
- ServletContext与ApplicationContext
- 数据库存储过程学习(一)
- 学习笔记——JAVA执行javascript
- AngularJs的UI组件ui-Bootstrap分享(四)——Datepicker Popup
- Oracle 11G数据库导入导出
- 使用Openssl 创建可以被Torando使用的crt证书以及Key密钥