flume使用之httpSource
来源:互联网 发布:北大光华mba知乎 编辑:程序博客网 时间:2024/06/06 13:11
flume自带很长多的source,如:exe、kafka...其中有一个非常简单的source——httpsource,使用httpSource,flume启动后会拉起一个web服务来监听指定的ip和port。常用的使用场景:对于有些应用环境中,不能部署Flume SDK及其依赖项,可以在代码中通过HTTP而不是Flume的PRC发送数据的情况,此时HTTP SOURCE可以用来将数据接收到Flume中。
1、httpsource 参数:
配置参数默认值描述typehttp (org.apache.fluem.source.httpSource)bind
绑定的IP地址或主机名port
绑定的端口号enableSSLfalse
keystore
使用的keystore文件的路径keystorePassword
能够进入keystore的密码handlerJSONHandlerHTTP SOURCE使用的处理程序类handler.*
传给处理程序类的任何参数 可以 通过使用此参数(*)配置传入
1)handler:
Flume使用一个可插拔的“handler”程序来实现转换,如果不指定默认是:JSONHandler,它能处理JSON格式的事件,格式如下。此外用户可以自定义handler,必须实现HTTPSourceHandler接口。
json数据格式:
[ { "headers":{"":"","":"" }, "body":"the first event" }, { "headers":{"":"","":"" }, "body":"the second event" } ]
2、简单介绍一下flume的logger sink:
记录INFO级别的日志,一般用于调试。本文将使用这种类型的sink,配置的属性:
- type logger
- maxBytesToLog 16 Maximum number of bytes of the Event body to log
3、简单的httpSource实例:
1)下载flume、解压:
cd /usr/local/wget http://mirror.bit.edu.cn/apache/flume/1.7.0/apache-flume-1.7.0-bin.tar.gztar -xvzf apache-flume-1.7.9-bin.tar.gz
配置flume的环境变量:
vim /etc/profileexport PS1="[\u@`/sbin/ifconfig eth0|grep 'inet '|awk -F'[: ]+' '{print $4}'` \W]"'$ 'export FLUME_HOME=/usr/local/apache-flume-1.6.0-binexport PATH=$PATH:$FLUME_HOME/bin
2)安装jdk、配置环境变量;
3)配置flume:
cd /usr/local/flume/confvim flume-env.sh指定java_home,同时放入如下log4j.properties
### set log levels ###log4j.rootLogger = info,stdout , D , E###log4j.appender.stdout = org.apache.log4j.ConsoleAppenderlog4j.appender.stdout.Target = System.outlog4j.appender.stdout.layout = org.apache.log4j.PatternLayoutlog4j.appender.stdout.layout.ConversionPattern = [%d{MM-dd HH:mm:ss}] [%p] [%c:%L] %m%n ### 输出到日志文件 ###log4j.appender.D = org.apache.log4j.DailyRollingFileAppenderlog4j.appender.D.File = /data/logs/flume/flume.loglog4j.appender.D.Append = truelog4j.appender.D.Threshold = infolog4j.appender.D.layout = org.apache.log4j.PatternLayoutlog4j.appender.D.layout.ConversionPattern = [%d{MM-dd HH:mm:ss}] [%p] [%c:%L] %m%n ### 保存异常信息到单独文件 ###log4j.appender.E = org.apache.log4j.DailyRollingFileAppenderlog4j.appender.E.File =/data/logs/flume/flume_error.loglog4j.appender.E.Append = truelog4j.appender.E.Threshold = ERRORlog4j.appender.E.layout = org.apache.log4j.PatternLayoutlog4j.appender.E.layout.ConversionPattern = [%d{MM-dd HH:mm:ss}] [%p] [%c:%L] %m%n### sinklog4j.logger.com.iqiyi.ttbrain.log.flume.sink.MysqlSink= INFO, F, EElog4j.additivity.com.iqiyi.ttbrain.log.flume.sink.MysqlSink = falselog4j.appender.F= org.apache.log4j.DailyRollingFileAppenderlog4j.appender.F.File=/data/logs/flume/flume_sink.loglog4j.appender.F.Append = truelog4j.appender.F.Threshold = infolog4j.appender.F.layout=org.apache.log4j.PatternLayout log4j.appender.F.layout.ConversionPattern= [%d{MM-dd HH:mm:ss}] [%p] [%c:%L] %m%nlog4j.appender.EE= org.apache.log4j.DailyRollingFileAppenderlog4j.appender.EE.File=/data/logs/flume/flume_sink_error.loglog4j.appender.EE.Append = truelog4j.appender.EE.Threshold = ERRORlog4j.appender.EE.layout=org.apache.log4j.PatternLayout log4j.appender.EE.layout.ConversionPattern= [%d{MM-dd HH:mm:ss}] [%p] [%c:%L] %m%n
4)配置httpSource:
cd /usr/local/flume/confvim http_test.confa1.sources=r1a1.sinks=k1a1.channels=c1a1.sources.r1.type=httpa1.sources.r1.bind=localhosta1.sources.r1.port=50000a1.sources.r1.channels=c1a1.sinks.k1.type=loggera1.sinks.k1.channel=c1a1.channels.c1.type=memorya1.channels.c1.capacity=1000a1.channels.c1.transactionCapacity=100
5)启动flume:
flume-ng agent -c /usr/local/flume/conf/ -f /usr/local/flume/conf/http_test.conf -n a1
6)测试:
开一个shell窗口,输入命令:
curl -X POST -d'[{"headers":{"h1":"v1","h2":"v2"},"body":"hello body"}]' http://localhost:50000
在/data/log/flume/flume.log 文件中可以看到:
[09-29 10:31:12] [INFO] [org.apache.flume.sink.LoggerSink:94] Event: { headers:{h1=v1, h2=v2} body: 68 65 6C 6C 6F 20 62 6F 64 79 hello body }
4、自定义handler:
假定xml请求格式,期望格式如下:
<events> <event> <headers><header1>value1</header1></headers> <body>test</body> </event> <event> <headers><header1>value1</header1></headers> <body>test2</body> </event> </events>
1)pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>org.pq</groupId> <artifactId>flume-demo</artifactId> <packaging>jar</packaging> <version>1.0</version> <name>flume-demo Maven jar</name> <url>http://maven.apache.org</url> <dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.8.2</version> <scope>test</scope> </dependency> <dependency> <groupId>org.slf4j</groupId> <artifactId>slf4j-log4j12</artifactId> <version>1.7.7</version> <scope>compile</scope> </dependency> <dependency> <groupId>org.apache.flume</groupId> <artifactId>flume-ng-core</artifactId> <version>1.6.0</version> <scope>compile</scope> </dependency> </dependencies> <build> <finalName>flume-demo</finalName> </build></project>2)自定义handler:
package org.pq.flumeDemo.sources;import com.google.common.base.Preconditions;import org.apache.flume.Context;import org.apache.flume.Event;import org.apache.flume.event.EventBuilder;import org.apache.flume.source.http.HTTPBadRequestException;import org.apache.flume.source.http.HTTPSourceHandler;import org.slf4j.Logger;import org.slf4j.LoggerFactory;import org.w3c.dom.Document;import org.w3c.dom.Element;import org.w3c.dom.Node;import org.w3c.dom.NodeList;import org.xml.sax.SAXException;import javax.servlet.http.HttpServletRequest;import javax.xml.parsers.DocumentBuilder;import javax.xml.parsers.DocumentBuilderFactory;import java.util.ArrayList;import java.util.HashMap;import java.util.List;import java.util.Map;public class HTTPSourceXMLHandler implements HTTPSourceHandler { private final String ROOT = "events"; private final String EVENT_TAG = "event"; private final String HEADERS_TAG = "headers"; private final String BODY_TAG = "body"; private final String CONF_INSERT_TIMESTAMP = "insertTimestamp"; private final String TIMESTAMP_HEADER = "timestamp"; private final DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance(); // Document builders are not thread-safe. // So make sure we have one for each thread. private final ThreadLocal<DocumentBuilder> docBuilder = new ThreadLocal<DocumentBuilder>(); private boolean insertTimestamp; private static final Logger LOG = LoggerFactory.getLogger(HTTPSourceXMLHandler.class); public List<Event> getEvents(HttpServletRequest httpServletRequest) throws HTTPBadRequestException, Exception { if (docBuilder.get() == null) { docBuilder.set(documentBuilderFactory.newDocumentBuilder()); } Document doc; final List<Event> events; try { doc = docBuilder.get().parse(httpServletRequest.getInputStream()); Element root = doc.getDocumentElement(); root.normalize(); // Verify that the root element is "events" Preconditions.checkState( ROOT.equalsIgnoreCase(root.getTagName())); NodeList nodes = root.getElementsByTagName(EVENT_TAG); LOG.info("get nodes={}",nodes); int eventCount = nodes.getLength(); events = new ArrayList<Event>(eventCount); for (int i = 0; i < eventCount; i++) { Element event = (Element) nodes.item(i); // Get all headers. If there are multiple header sections, // combine them. NodeList headerNodes = event.getElementsByTagName(HEADERS_TAG); Map<String, String> eventHeaders = new HashMap<String, String>(); for (int j = 0; j < headerNodes.getLength(); j++) { Node headerNode = headerNodes.item(j); NodeList headers = headerNode.getChildNodes(); for (int k = 0; k < headers.getLength(); k++) { Node header = headers.item(k); // Read only element nodes if (header.getNodeType() != Node.ELEMENT_NODE) { continue; } // Make sure a header is inserted only once, // else the event is malformed Preconditions.checkState( !eventHeaders.containsKey(header.getNodeName()), "Header expected only once " + header.getNodeName()); eventHeaders.put( header.getNodeName(), header.getTextContent()); } } Node body = event.getElementsByTagName(BODY_TAG).item(0); if (insertTimestamp) { eventHeaders.put(TIMESTAMP_HEADER, String.valueOf(System .currentTimeMillis())); } events.add(EventBuilder.withBody( body.getTextContent().getBytes( httpServletRequest.getCharacterEncoding()), eventHeaders)); } } catch (SAXException ex) { throw new HTTPBadRequestException( "Request could not be parsed into valid XML", ex); } catch (Exception ex) { throw new HTTPBadRequestException( "Request is not in expected format. " + "Please refer documentation for expected format.", ex); } return events; } public void configure(Context context) { insertTimestamp = context.getBoolean(CONF_INSERT_TIMESTAMP, false); }}打包成dependency,然后放到flume的lib下。
3)flume配置文件:
a1.sources=r1a1.sinks=k1a1.channels=c1a1.sources.r1.type=httpa1.sources.r1.bind=localhosta1.sources.r1.port=50000a1.sources.r1.channels=c1a1.sources.r1.handler=org.pq.flumeDemo.sources.HTTPSourceXMLHandlera1.sources.r1.insertTimestamp=truea1.sinks.k1.type=loggera1.sinks.k1.channel=c1a1.channels.c1.type=memorya1.channels.c1.capacity=1000a1.channels.c1.transactionCapacity=1004)启动:
$ bin/flume-ng agent -c conf -f conf/http_test.conf -n a1 -Dflume.root.logger=INFO,console
阅读全文
0 0
- flume使用之httpSource
- flume组件HTTPSource解析
- Flume 小记(二) HttpSource 存入Hive
- Flume 小记(三) HttpSource 存入本地
- flume使用之flume+hive 实现日志离线收集、分析
- flume使用
- flume使用
- Flume使用
- flume之flume-daemon.sh
- hadoop学习之flume(4):flume的安装与使用
- 分布式数据日志收集之Flume的安装和使用
- Flume使用大全之kafka source-kafka channel-hdfs
- flume学习:Flume Interceptors的使用
- Flume-NG之KafkaChannel
- flume之avro实践
- Flume之监控
- flume之hdfsSink分析
- flume之Http Source
- raft协议和paxos
- Elasticsearch 集群优化总结
- Unity3D c# 使对象物体始终面向摄像机。
- js清除缓存的几种方法
- 面向对象 异常处理 系统自带、自定义、捕捉多个
- flume使用之httpSource
- 为什么你的手机会越用越卡,那是你不知道这些
- 关于物联网平台 你想知道的都在这里!
- 试图编译lantern android
- Failed to resolve: com.facebook.fresco:fresco:0.6.0
- 备忘--python常见面试题
- Zabbix中文乱码的解决方法
- spring+mybatis报错:Caused by: org.springframework.beans.factory.BeanCreationException: Error creating
- C语言字符串处理的一些函数strok,strstr, strchr,strsub