实战Apache-Flume采集DB数据到kafka
来源:互联网 发布:阿里云 docker收费 编辑:程序博客网 时间:2024/05/22 00:10
Flume是一个优秀的数据采集组件,有些重量级,其本质也是根据SQL语句的查询结果组装成opencsv格式的数据,默认的分隔符号是逗号(,),可以重写opencsv某些类进行修改
1、下载
[root@hadoop0 bigdata]# wget http://apache.fayea.com/flume/1.6.0/apache-flume-1.6.0-bin.tar.gz
2、解压缩
[root@hadoop0 bigdata]# tar -zxvf apache-flume-1.6.0-bin.tar.gz
[root@hadoop0 bigdata]# ls
apache-flume-1.6.0-bin apache-hive-2.0.1-bin.tar.gz hadoop272 hbase-1.1.5-bin.tar.gz kafka sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz taokeeper-monitor.tar.gz zookeeper
apache-flume-1.6.0-bin.tar.gz apache-tomcat-7.0.69.zip hbase-1.1.5 hive2.0 sqoop-1.4.6 stomr096 tomcat7 zookeeper.out
3、编译flume-ng-sql.jar
flume-ng-sql-source-develop_1.2.1 作者 :@author Luis Lázaro <lalazaro@keedio.com>
<groupId>org.keedio.flume.flume-ng-sources</groupId>
<artifactId>flume-ng-sql-source</artifactId>
<version>1.2.1-SNAPSHOT</version>
4、配置数据源(两个作者的FlumeSink)
[root@hadoop0 apache-flume-1.6.0-bin]# vi conf/agent.conf
agent.sources = sql-source
agent.channels=c1
agent.sinks=r
agent.sources.sql-source.type = org.keedio.flume.source.SQLSource
# URL to connect to database (currently only mysql is supported)
agent.sources.sql-source.connection.url = jdbc:mysql://192.168.1.100:3306/test
# Database connection properties
agent.sources.sql-source.user = root
agent.sources.sql-source.password = 123
agent.sources.sql-source.table = sdfs
agent.sources.sql-source.database = database
# Columns to import to kafka (default * import entire row)
agent.sources.sql-source.columns.to.select = *
# Increment column properties
agent.sources.sql-source.incremental.column.name = id
# Increment value is from you want to start taking data from tables (0 will import entire table)
agent.sources.sql-source.incremental.value = 0
# Query delay, each configured milisecond the query will be sent
agent.sources.sql-source.run.query.delay=10000
# Status file is used to save last readed row
agent.sources.sql-source.status.file.path = /tmp
agent.sources.sql-source.status.file.name = sql-source.status
#Custom query
agent.sources.sql-source.custom.query = SELECT * FROM users WHERE 1=1 AND @
agent.sources.sql-source.batch.size = 1000
agent.sources.sql-source.max.rows = 10000
agent.channels.c1.type = memory
agent.channels.c1.capacity = 100
agent.channels.c1.transactionCapacity = 100
agent.channels.c1.byteCapacityBufferPercentage = 20
agent.channels.c1.byteCapacity = 800
#flume-ng-kafka-sink-1.6.0.jar
#agent.sinks.r.type = org.apache.flume.sink.kafka.KafkaSink
#agent.sinks.r.brokerList=localhost:9092
#agent.sinks.r.batchSize=1
#agent.sinks.r.partitioner.class=org.apache.flume.plugins.SinglePartition
#agent.sinks.r.serializer.class=kafka.serializer.StringEncoder
#agent.sinks.r.requiredAcks=0
#agent.sinks.r.topic=test
#gitHub beyondj2ee flumeng-kafka-plugin.jar
agent.sinks.r.type = org.apache.flume.plugins.KafkaSink
agent.sinks.r.metadata.broker.list=localhost:9092
agent.sinks.r.partition.key=0
agent.sinks.r.partitioner.class=org.apache.flume.plugins.SinglePartition
agent.sinks.r.serializer.class=kafka.serializer.StringEncoder
agent.sinks.r.request.required.acks=0
agent.sinks.r.max.message.size=1000000
agent.sinks.r.producer.type=sync
agent.sinks.r.custom.encoding=UTF-8
agent.sinks.r.custom.topic.name=test
5、准备好数据库
6、启动zookeeper
[root@hadoop0 ~]# cd /opt/bigdata/
[root@hadoop0 bigdata]# ls
apache-flume-1.6.0-bin apache-hive-2.0.1-bin.tar.gz hadoop272 hbase-1.1.5-bin.tar.gz kafka sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz taokeeper-monitor.tar.gz zookeeper
apache-flume-1.6.0-bin.tar.gz apache-tomcat-7.0.69.zip hbase-1.1.5 hive2.0 sqoop-1.4.6 stomr096 tomcat7 zookeeper.out
[root@hadoop0 bigdata]# cd zookeeper/bin/
[root@hadoop0 bin]# ./zkServer.sh start
JMX enabled by default
Using config: /opt/bigdata/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
7、启动kafka
[root@hadoop0 bin]# cd ../../kafka/bin/
[root@hadoop0 bin]# ./kafka-server-start.sh ../config/server.properties &
[1] 32613
[root@hadoop0 bin]# [1999-05-25 12:34:44,651] INFO KafkaConfig values:
request.timeout.ms = 30000
log.roll.hours = 168
inter.broker.protocol.version = 0.9.0.X
log.preallocate = false
security.inter.broker.protocol = PLAINTEXT
controller.socket.timeout.ms = 30000
broker.id.generation.enable = true
ssl.keymanager.algorithm = SunX509
ssl.key.password = null
log.cleaner.enable = true
ssl.provider = null
[root@hadoop0 bin]# ./kafka-topics.sh --zookeeper localhost --list
test
8、启动Flume
[root@hadoop0 apache-flume-1.6.0-bin]# rm -rf /tmp/sql-source.status
[root@hadoop0 apache-flume-1.6.0-bin]# ./bin/flume-ng agent -n agent -c conf -f conf/agent.conf -Dflume.root.logger=INFO,console
Info: Including Hadoop libraries found via (/opt/bigdata/hadoop272/bin/hadoop) for HDFS access
Info: Excluding /opt/bigdata/hadoop272/share/hadoop/common/lib/slf4j-api-1.7.10.jar from classpath
Info: Excluding /opt/bigdata/hadoop272/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar from classpath
Info: Including HBASE libraries found via (/opt/bigdata/hbase-1.1.5/bin/hbase) for HBASE access
Info: Excluding /opt/bigdata/hbase-1.1.5/lib/slf4j-api-1.7.7.jar from classpath
Info: Excluding /opt/bigdata/hbase-1.1.5/lib/slf4j-log4j12-1.7.5.jar from classpath
Info: Excluding /opt/bigdata/hadoop272/share/hadoop/common/lib/slf4j-api-1.7.10.jar from classpath
Info: Excluding /opt/bigdata/hadoop272/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar from classpath
Info: Including Hive libraries found via (/opt/bigdata/hive2.0) for Hive access
9、准备消费者消费数据
[root@hadoop0 bin]# ./kafka-console-consumer.sh --zookeeper localhost --topic test --from-beginning
test-message
gaojs
杞欢璁捐
1
2
gaojs
nihao
tesdhdhsdhgf
vdxgdgsdg
dfhfdhd
gaojs
gaojingsong
2015-09-02342
535435353
"1","zhangsan","12","17-May-2016 20:06:38"
"3","444","23","17-May-2016 20:06:38"
"4","wan-flume","23","17-May-2016 20:06:38"
"5","gaojs-flume","23","17-May-2016 20:06:38"
"1","zhangsan","12","17-May-2016 20:06:38"
"3","444","23","17-May-2016 20:06:38"
"4","wan-flume","23","17-May-2016 20:06:38"
"5","gaojs-flume","23","17-May-2016 20:06:38"
10、结果验证
启动Flume消费过程中日志
- 实战Apache-Flume采集DB数据到kafka
- flume采集数据到kafka和hive
- Kafka实战-Flume到Kafka
- flume采集log4j日志到kafka
- flume + Kafka采集数据 超简单
- flume采集数据到hdfs
- flume 采集数据到hdfs
- flume实现kafka到hdfs实时数据采集 - 有负载均衡策略
- flume通过tcp/udp采集数据并存到kafka配置及操作方式
- flume-1.4.0整合hbase-0.98.0实战数据采集,同时将采集到的数据放入hbase和h
- flume采集本地数据到hdfs
- Flume和Kafka完成实时数据的采集
- Flume和Kafka的整合完成实时数据采集
- flume抓取数据到kafka(整合)
- log4j结合apache flume输出日志到apache kafka
- Flume到Kafka
- Flume:本地文件到Kafka
- flume数据传输到kafka
- 在静态库中的xib跳转
- centos7 下安装mongodb指南;
- 2017-02-23
- VS2010 编写NPAPI 插件
- 删除文件夹
- 实战Apache-Flume采集DB数据到kafka
- Android消息处理机制Handler、Looper、Message
- JAVA 2 synchronized的用法
- 关于String的一些工具类
- Jsp 分页
- eclipse安装svn
- 入门经典_Chap03_题解总结
- 数据结构实验之图论四:迷宫探索dfs
- CentOS 7.0 安装配置LAMP服务器方法(Apache+PHP+MariaDB)(转)