flume使用hive stream写入到hive
来源:互联网 发布:淘宝的优点和缺点 编辑:程序博客网 时间:2024/06/01 08:40
1、hive中创建表:
create table customers (id string, name string, email string, street_address string, company string)partitioned by (time string)clustered by (id) into 5 buckets stored as orclocation '/user/iteblog/salescust'TBLPROPERTIES ('transactional'='true');注意:采用orc存储,同时支持clustered
为了在Hive中启用事务,我们需要在Hive中进行如下的配置:
hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
2、配置flume:
$ vi flumetohive.confflumeagent1.sources = source_from_kafkaflumeagent1.channels = mem_channelflumeagent1.sinks = hive_sink# Define / Configure sourceflumeagent1.sources.source_from_kafka.type = org.apache.flume.source.kafka.KafkaSourceflumeagent1.sources.source_from_kafka.zookeeperConnect = sandbox.hortonworks.com:2181flumeagent1.sources.source_from_kafka.topic = SalesDBTransactionsflumeagent1.sources.source_from_kafka.groupID = flumeflumeagent1.sources.source_from_kafka.channels = mem_channelflumeagent1.sources.source_from_kafka.interceptors = i1flumeagent1.sources.source_from_kafka.interceptors.i1.type = timestampflumeagent1.sources.source_from_kafka.consumer.timeout.ms = 1000 # Hive Sinkflumeagent1.sinks.hive_sink.type = hiveflumeagent1.sinks.hive_sink.hive.metastore = thrift://sandbox.hortonworks.com:9083flumeagent1.sinks.hive_sink.hive.database = rajflumeagent1.sinks.hive_sink.hive.table = customersflumeagent1.sinks.hive_sink.hive.txnsPerBatchAsk = 2flumeagent1.sinks.hive_sink.hive.partition = %y-%m-%d-%H-%Mflumeagent1.sinks.hive_sink.batchSize = 10flumeagent1.sinks.hive_sink.serializer = DELIMITEDflumeagent1.sinks.hive_sink.serializer.delimiter = ,flumeagent1.sinks.hive_sink.serializer.fieldnames = id,name,email,street_address,company# Use a channel which buffers events in memoryflumeagent1.channels.mem_channel.type = memoryflumeagent1.channels.mem_channel.capacity = 10000flumeagent1.channels.mem_channel.transactionCapacity = 100# Bind the source and sink to the channelflumeagent1.sources.source_from_kafka.channels = mem_channelflumeagent1.sinks.hive_sink.channel = mem_channel1)source使用的是kafka;
2)sink使用的是hive,这里没有使用hdfs+hive创建外表的方式,主要是因为flume的hive sink内部使用了hive stream来做的orc文件追加,好处是文件小而且效率高。
参考:https://www.iteblog.com/archives/1771.html
阅读全文
0 0
- flume使用hive stream写入到hive
- flume 模拟将日志内容写入到 hive中
- Hive读取Flume正在写入的HDFS
- flume将日志到hive实现
- flume采集数据到kafka和hive
- hive读取与flume写入hdfs文件冲突
- Flume使用Hive作为Sink总结
- Flume使用Hive作为Sink总结
- 简单的项目使用flume,hive,sqoop,flume
- flume使用之flume+hive 实现日志离线收集、分析
- flume+hive处理日志
- flume实现从kafka读取消息到hive
- hive写入Elasticsearch参数设置
- dataframe 写入hive中
- hive 数据写入es
- 使用hive来分析flume收集的日志数据
- 使用hive来分析flume收集的日志数据
- 使用flume+hive采集Web服务器的access日志
- Git常用命令
- [BZOJ]2120 数颜色 带修改莫队
- C++包含头文件中<>和""的区别
- 串行总线--差分线(差分互连)基本原理及优缺点
- leetcode_687. Longest Univalue Path ? 待解决
- flume使用hive stream写入到hive
- 怎样读取文件内容和写入文件
- eclipse使用generator插件问题汇总
- Myeclipse 关闭 updating indexes
- 查看可执行程序(ELF)或动态库所依赖动态库——ldd脚本分析
- 使用JS-SDK自定义微信分享效果
- Shader-水波纹效果
- 二分查找法
- boost 数据结构