通过Thrift source向Flume发送数据的Python实现
来源:互联网 发布:优酷福利待遇 知乎 编辑:程序博客网 时间:2024/05/17 18:01
目前Flume支持Thrift source,即通过一个Thrift服务来收集数据(这一点和scribe是一样的),然后通过相应的channel发送到sink中去。以下是具体的实现过程:
环境:Python 2.7.5/CDH4.3 Flume 1.3/Thrift 0.9/
首先,我们需要一个Thrift协议的Python Flume客户端的模块,这个模块可以根据Thrift的定义自动生成。你应该先从Cloudera的网站上下载到CDH4.3中的Flume tarball :
wget http://archive.cloudera.com/cdh4/cdh/4/
下载到本地之后解压,在目录flume-ng-sdk\src\main\thrift下有Thrift对应的定义文件,并用它来生成对应的客户端模块:
tar xzvf flume-ng-1.3.0-cdh4.3.0.tar.gzcd apache-flume-1.3.0-cdh4.3.0-bin\flume-ng-sdk\src\main\thriftthrift --gen py flume.thrit
你会在当前目录下得到一个叫做gen-py的目录,我们将其更名为genpy之后,放到Python的系统模块路径中去:
mv gen-py/ /usr/local/lib/python2.7/site-packages/genpy
此时,你就可以通过以下过程来引用这个模块了:
[GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2Type "help", "copyright", "credits" or "license" for more information.>>> from genpy import flume>>> dir(flume)['__all__', '__builtins__', '__doc__', '__file__', '__name__', '__package__', '__path__']>>>
下面,就是利用这模块来封装一个客户端模块,注意Flume的Thrift source服务端使用的协议是继承自TCompactProtocol的TTupleProtocol:
public final class TTupleProtocol extends TCompactProtocol {...
在Thrift Python模块中,只有两种可选协议:TCompactProtocol, TBinaryProtocol, 很显然我们需要使用前一种协议,如果使用TBinaryProtocol,会在服务器端报以下错误:
18 Jul 2013 18:25:29,447 ERROR [pool-5-thread-4] (org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run:213) - Thrift error occurred during processing of message.org.apache.thrift.protocol.TProtocolException: Expected protocol id ffffff82 but got ffffff80 at org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:472) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:662)
在客户端会有以下的报错
Traceback (most recent call last): File "pyflume.py", line 65, in <module> flume_client.send({'a':'hello', 'b':'world'}, 'events under hello world') File "pyflume.py", line 53, in send self.client.append(event) File "/usr/local/lib/python2.7/site-packages/genpy/flume/ThriftSourceProtocol.py", line 49, in append return self.recv_append() File "/usr/local/lib/python2.7/site-packages/genpy/flume/ThriftSourceProtocol.py", line 60, in recv_append (fname, mtype, rseqid) = self._iprot.readMessageBegin() File "/usr/local/lib/python2.7/site-packages/thrift/protocol/TBinaryProtocol.py", line 126, in readMessageBegin sz = self.readI32() File "/usr/local/lib/python2.7/site-packages/thrift/protocol/TBinaryProtocol.py", line 206, in readI32 buff = self.trans.readAll(4) File "/usr/local/lib/python2.7/site-packages/thrift/transport/TTransport.py", line 58, in readAll chunk = self.read(sz - have) File "/usr/local/lib/python2.7/site-packages/thrift/transport/TTransport.py", line 271, in read self.readFrame() File "/usr/local/lib/python2.7/site-packages/thrift/transport/TTransport.py", line 275, in readFrame buff = self.__trans.readAll(4) File "/usr/local/lib/python2.7/site-packages/thrift/transport/TTransport.py", line 58, in readAll chunk = self.read(sz - have) File "/usr/local/lib/python2.7/site-packages/thrift/transport/TSocket.py", line 118, in read message='TSocket read 0 bytes')thrift.transport.TTransport.TTransportException: TSocket read 0 bytes
以下就是具体的实现代码,大家可以参考修改:
#coding=utf-8'''Created on 2013-07-18@author: Felix'''from genpy.flume import ThriftSourceProtocolfrom genpy.flume.ttypes import ThriftFlumeEventfrom thrift.transport import TTransport, TSocketfrom thrift.protocol import TCompactProtocolclass _Transport(object): def __init__(self, thrift_host, thrift_port, timeout=None, unix_socket=None): self.thrift_host = thrift_host self.thrift_port = thrift_port self.timeout = timeout self.unix_socket = unix_socket self._socket = TSocket.TSocket(self.thrift_host, self.thrift_port, self.unix_socket) self._transport_factory = TTransport.TFramedTransportFactory() self._transport = self._transport_factory.getTransport(self._socket) def connect(self): try: if self.timeout: self._socket.setTimeout(self.timeout) if not self.is_open(): self._transport = self._transport_factory.getTransport(self._socket) self._transport.open() except Exception, e: print(e) self.close() def is_open(self): return self._transport.isOpen() def get_transport(self): return self._transport def close(self): self._transport.close() class FlumeClient(object): def __init__(self, thrift_host, thrift_port, timeout=None, unix_socket=None): self._transObj = _Transport(thrift_host, thrift_port, timeout=timeout, unix_socket=unix_socket) self._protocol = TCompactProtocol.TCompactProtocol(trans=self._transObj.get_transport()) self.client = ThriftSourceProtocol.Client(iprot=self._protocol, oprot=self._protocol) self._transObj.connect() def send(self, event): try: self.client.append(event) except Exception, e: print(e) finally: self._transObj.connect() def send_batch(self, events): try: self.client.appendBatch(events) except Exception, e: print(e) finally: self._transObj.connect() def close(self): self._transObj.close() if __name__ == '__main__': import random flume_client = FlumeClient('192.168.1.141', 4141) event = ThriftFlumeEvent({'a':'hello', 'b':'world'}, 'events under hello world2') events = [ThriftFlumeEvent({'a':'hello', 'b':'world'}, 'events under hello world%s' % random.randint(0, 1000)) for _ in range(100)] flume_client.send(event) flume_client.send_batch(events) flume_client.close()
以上代码也在github上:https://github.com/sinolambda/pyflume
- 通过Thrift source向Flume发送数据的Python实现
- python通过thrift实现向flume发送数据
- c++通过Thrift向flume发送数据
- 使用flume thrift source的一点心得
- flume通过thrift协议收集日志-Python
- Flume thrift source C++ Demo
- flume学习05---Thrift Source
- 通过libcurl向服务器发送数据,返回的数据说明
- 通过ajax异步向后端发送请求,响应请求向前端传送json格式数据的实现思路
- 以Gzip的形式通过服务器向客户端发送数据
- Android 通过Socket实现手机端向PC发送数据,并接受PC端返回的数据
- 通过python下的socket实现组播数据的发送和接收
- 通过wifi向服务器端发送数据
- linux集成 kafka数据通过flume发送到hadoop
- flume-ng-sql-source实现oracle增量数据读取
- 发送数据到Flume
- asp.net发送电子邮件的实现(通过一个邮箱向另一个邮箱发送)
- Python 通过thrift接口连接Hbase读取存储数据
- MyDoc.cpp
- NYOJ 8-一种排序
- 不按照读取excel表格的方法,注册“microsoft.ace.oledb.12.0”提供程序
- 如何读取txt内的行重新生成新的txt文件
- 我的TDD实践
- 通过Thrift source向Flume发送数据的Python实现
- 调用 BinaryRead 之后,不能使用一般 Request 集合
- poj 2478 Farey Sequence(欧拉函数)
- UVA1146 Now or later
- 现代操作系统笔记 1
- hdu1085 Holding Bin-Laden Captive!
- 如何在Eclipse中看Android的源码
- Linux内核scripts/Makefile.build文件结构
- 第13章习题 复制控制