flume-ng-sql-source实现oracle增量数据读取
来源:互联网 发布:兰州知豆电动车租赁app 编辑:程序博客网 时间:2024/06/07 21:52
一、下载编译flume-ng-sql-source
下载地址:https://github.com/keedio/flume-ng-sql-source.git ,安装说明文档编译和拷贝jar包
嫌麻烦的也可以直接,CSDN下载地址:http://download.csdn.net/detail/chongxin1/9892184
此时最新的版本为flume-ng-sql-source-1.4.3.jar,flume-ng-sql-source-1.4.3.jar是flume用于连接数据库的重要支撑jar包。
二、把flume-ng-sql-source-1.4.3.jar放到flume的lib目录下
三、把oracle(此处用的是oracle库)的驱动包放到flume的lib目录下
oracle的jdbc驱动包,放在oracle安装目录下,路径为:D:\app\product\11.2.0\dbhome_1\jdbc\lib
如图:
把ojdbc5.jar放到flume的lib目录下,如图:
四、运行Demo
1、创建数据库表
- create table flume_ng_sql_source (
- id varchar2(32) primary key,
- msg varchar2(32),
- createTime date not null
- );
- insert into flume_ng_sql_source(id,msg,createTime) values('1','Test increment Data',to_date('2017-08-01 07:06:20','yyyy-mm-dd hh24:mi:ss'));
- insert into flume_ng_sql_source(id,msg,createTime) values('2','Test increment Data',to_date('2017-08-02 07:06:20','yyyy-mm-dd hh24:mi:ss'));
- insert into flume_ng_sql_source(id,msg,createTime) values('3','Test increment Data',to_date('2017-08-03 07:06:20','yyyy-mm-dd hh24:mi:ss'));
- insert into flume_ng_sql_source(id,msg,createTime) values('4','Test increment Data',to_date('2017-08-04 07:06:20','yyyy-mm-dd hh24:mi:ss'));
- insert into flume_ng_sql_source(id,msg,createTime) values('5','Test increment Data',to_date('2017-08-05 07:06:20','yyyy-mm-dd hh24:mi:ss'));
- insert into flume_ng_sql_source(id,msg,createTime) values('6','Test increment Data',to_date('2017-08-06 07:06:20','yyyy-mm-dd hh24:mi:ss'));
- commit;
2、新建flume-sql.conf
- touch /usr/local/flume/flume-sql.conf
- sudo gedit /usr/local/flume/flume-sql.conf
- agentTest.channels = channelTest
- agentTest.sources = sourceTest
- agentTest.sinks = sinkTest
- ###########sql source#################
- # For each Test of the sources, the type is defined
- agentTest.sources.sourceTest.type = org.keedio.flume.source.SQLSource
- agentTest.sources.sourceTest.hibernate.connection.url = jdbc:oracle:thin:@192.168.168.100:1521/orcl
- # Hibernate Database connection properties
- agentTest.sources.sourceTest.hibernate.connection.user = flume
- agentTest.sources.sourceTest.hibernate.connection.password = 1234
- agentTest.sources.sourceTest.hibernate.connection.autocommit = true
- agentTest.sources.sourceTest.hibernate.dialect = org.hibernate.dialect.Oracle10gDialect
- agentTest.sources.sourceTest.hibernate.connection.driver_class = oracle.jdbc.driver.OracleDriver
- agentTest.sources.sourceTest.run.query.delay=1
- agentTest.sources.sourceTest.status.file.path = /usr/local/flume
- agentTest.sources.sourceTest.status.file.name = agentTest.sqlSource.status
- # Custom query
- agentTest.sources.sourceTest.start.from = '2017-07-31 07:06:20'
- agentTest.sources.sourceTest.custom.query = SELECT CHR(39)||TO_CHAR(CREATETIME,'YYYY-MM-DD HH24:MI:SS')||CHR(39),ID,MSG FROM FLUME_NG_SQL_SOURCE WHERE CREATETIME > TO_DATE($@$,'YYYY-MM-DD HH24:MI:SS') ORDER BY CREATETIME ASC
- agentTest.sources.sourceTest.batch.size = 6000
- agentTest.sources.sourceTest.max.rows = 1000
- agentTest.sources.sourceTest.hibernate.connection.provider_class = org.hibernate.connection.C3P0ConnectionProvider
- agentTest.sources.sourceTest.hibernate.c3p0.min_size=1
- agentTest.sources.sourceTest.hibernate.c3p0.max_size=10
- ##############################
- agentTest.channels.channelTest.type = memory
- agentTest.channels.channelTest.capacity = 10000
- agentTest.channels.channelTest.transactionCapacity = 10000
- agentTest.channels.channelTest.byteCapacityBufferPercentage = 20
- agentTest.channels.channelTest.byteCapacity = 1600000
- agentTest.sinks.sinkTest.type = org.apache.flume.sink.kafka.KafkaSink
- agentTest.sinks.sinkTest.topic = TestTopic
- agentTest.sinks.sinkTest.brokerList = 192.168.168.200:9092
- agentTest.sinks.sinkTest.requiredAcks = 1
- agentTest.sinks.sinkTest.batchSize = 20
- agentTest.sinks.sinkTest.channel = channelTest
- agentTest.sinks.sinkTest.channel = channelTest
- agentTest.sources.sourceTest.channels=channelTest
3、flume-ng启动flume-sql.conf和测试
启动kafka的消费者,监听topic主题:
- kafka-console-consumer.sh --zookeeper 192.168.168.200:2181 --topic TestTopic
flume-ng启动flume-sql.conf
- flume-ng agent --conf conf --conf-file /usr/local/flume/flume-sql.conf --name agentTest -Dflume.root.logger=INFO,console
TestTopic消费者控制台打印:
- [root@master ~]# kafka-console-consumer.sh --zookeeper 192.168.168.200:2181 --topic TestTopic
- "'2017-08-01 07:06:20'","1","Test increment Data"
- "'2017-08-02 07:06:20'","2","Test increment Data"
- "'2017-08-03 07:06:20'","3","Test increment Data"
- "'2017-08-04 07:06:20'","4","Test increment Data"
- "'2017-08-05 07:06:20'","5","Test increment Data"
- "'2017-08-06 07:06:20'","6","Test increment Data"
根据配置查看相应的状态文件/usr/local/flume/agentTest.sqlSource.status:
- agentTest.sources.sourceTest.status.file.path = /usr/local/flume
- agentTest.sources.sourceTest.status.file.name = agentTest.sqlSource.status
- {"SourceName":"sourceTest","URL":"jdbc:oracle:thin:@192.168.168.100:1521\/orcl","LastIndex":"'2017-08-06 07:06:20'","Query":"SELECT CHR(39)||TO_CHAR(CREATETIME,'YYYY-MM-DD HH24:MI:SS')||CHR(39) AS INCREMENTAL,ID,MSG FROM FLUME_NG_SQL_SOURCE WHERE TO_CHAR(CREATETIME,'YYYY-MM-DD HH24:MI:SS') > $@$ ORDER BY INCREMENTAL ASC"}
从"LastIndex":"'2017-08-06 07:06:20'",可以看出当前的最后一条增量数据日期是'2017-08-06 07:06:20',也就是说下一次WHERE TO_CHAR(CREATETIME,'YYYY-MM-DD HH24:MI:SS') > $@$,$@$的值就是'2017-08-06 07:06:20'。
往flume_ng_sql_source表中插入增量数据:
- insert into flume_ng_sql_source(id,msg,createTime) values('7','Test increment Data',to_date('2017-08-07 07:06:20','yyyy-mm-dd hh24:mi:ss'));
- insert into flume_ng_sql_source(id,msg,createTime) values('8','Test increment Data',to_date('2017-08-08 07:06:20','yyyy-mm-dd hh24:mi:ss'));
- insert into flume_ng_sql_source(id,msg,createTime) values('9','Test increment Data',to_date('2017-08-09 07:06:20','yyyy-mm-dd hh24:mi:ss'));
- insert into flume_ng_sql_source(id,msg,createTime) values('10','Test increment Data',to_date('2017-08-10 07:06:20','yyyy-mm-dd hh24:mi:ss'));
- commit;
TestTopic消费者控制台打印:
- [root@master ~]# kafka-console-consumer.sh --zookeeper 192.168.168.200:2181 --topic TestTopic
- "'2017-08-01 07:06:20'","1","Test increment Data"
- "'2017-08-02 07:06:20'","2","Test increment Data"
- "'2017-08-03 07:06:20'","3","Test increment Data"
- "'2017-08-04 07:06:20'","4","Test increment Data"
- "'2017-08-05 07:06:20'","5","Test increment Data"
- "'2017-08-06 07:06:20'","6","Test increment Data"
- "'2017-08-07 07:06:20'","7","Test increment Data"
- "'2017-08-08 07:06:20'","8","Test increment Data"
- "'2017-08-09 07:06:20'","9","Test increment Data"
- "'2017-08-10 07:06:20'","10","Test increment Data"
根据配置查看相应的状态文件/usr/local/flume/agentTest.sqlSource.status:
- {"SourceName":"sourceTest","URL":"jdbc:oracle:thin:@192.168.168.100:1521\/orcl","LastIndex":"'2017-08-10 07:06:20'","Query":"SELECT CHR(39)||TO_CHAR(CREATETIME,'YYYY-MM-DD HH24:MI:SS')||CHR(39) AS INCREMENTAL,ID,MSG FROM FLUME_NG_SQL_SOURCE WHERE TO_CHAR(CREATETIME,'YYYY-MM-DD HH24:MI:SS') > $@$ ORDER BY INCREMENTAL ASC"}
"LastIndex":"'2017-08-10 07:06:20'"
至此,flume-ng-sql-source实现oracle增量数据读取成功!!!
五、相关配置参数说明
https://github.com/keedio/flume-ng-sql-source
Configuration of SQL Source:
Mandatory properties in bold
Custom Query
A custom query is supported to bring the possibility of using the entire SQL language. This is powerful, but risky, be careful with the custom queries used.
To avoid row export repetitions use the $@$ special character in WHERE clause, to incrementaly export not processed rows and the new ones inserted.
IMPORTANT: For proper operation of Custom Query ensure that incremental field will be returned in the first position of the Query result.
Example:
- agent.sources.sql-source.custom.query = SELECT incrementalField,field2 FROM table1 WHERE incrementalField > $@$
这段话的意思大意就是为了避免出现问题,把增量字段写在查询的第一个位置。
- flume-ng-sql-source实现oracle增量数据读取
- 用flume-ng-sql-source 从mysql 抽取数据到kafka被storm消费
- Flume-ng源码解析之Source组件
- Flume-ng源码解析之Source组件
- Flume-ng源码解析之Source组件
- Flume-ng源码解析之Source组件
- Flume-ng源码解析之Source组件
- Flume-ng源码解析之Source组件
- Flume-ng数据连接kafka
- 通过Thrift source向Flume发送数据的Python实现
- flume 读取kafka 数据
- Flume NG 学习笔记(四)Source配置
- flume-ng开发自己的source两种方法
- 【Hadoop】Flume-ng源码解析之Source组件
- Flume-ng 数据发送速度限制
- Flume-ng 数据发送速度限制
- Flume-ng 数据发送速度限制
- Flume-ng HDFS Sink “丢数据”
- 有点心烦
- Python笔记3
- 文件的分割与合并(Java实现)
- 一个通用的双向链表
- leetcode23 Merge k Sorted Lists
- flume-ng-sql-source实现oracle增量数据读取
- mvc使用JsonResult返回Json数据
- HDU2602 Bone Collector 01背包入门
- 搭建JSP开发环境
- pm命令介绍与包名信息查询(静默安装)
- javascript实现图的广度优先搜索、深度优先搜素
- 周记9.11
- HDU6201 | 2017 ACM-ICPC 亚洲区(沈阳赛区)网络赛-H transaction transaction transaction
- 统计学习方法笔记:逻辑斯谛回归与最大熵模型(上)