Flume直接到SparkStreaming的两种方式
来源:互联网 发布:python 服务 编辑:程序博客网 时间:2024/06/04 18:34
一般是flume->kafka->SparkStreaming,如果非要从Flume直接将数据输送到SparkStreaming里面有两种方式,如下:
- 第一种:Push推送的方式
程序如下:
package cn.lijieimport org.apache.log4j.Levelimport org.apache.spark.streaming.flume.FlumeUtilsimport org.apache.spark.streaming.{Seconds, StreamingContext}import org.apache.spark.{HashPartitioner, SparkConf, SparkContext}/** * User: lijie * Date: 2017/8/3 * Time: 15:19 */object Flume2SparkStreaming01 { def myFunc = (it: Iterator[(String, Seq[Int], Option[Int])]) => { it.map(x => { (x._1, x._2.sum + x._3.getOrElse(0)) }) } def main(args: Array[String]): Unit = { MyLog.setLogLeavel(Level.ERROR) val conf = new SparkConf().setAppName("fs01").setMaster("local[2]") val sc = new SparkContext(conf) val ssc = new StreamingContext(sc, Seconds(10)) val ds = FlumeUtils.createStream(ssc, "10.1.9.102", 6666) sc.setCheckpointDir("C:\\Users\\Administrator\\Desktop\\checkpoint") val res = ds.flatMap(x => { new String(x.event.getBody.array()).split(" ") }).map((_, 1)).updateStateByKey(myFunc, new HashPartitioner(sc.defaultParallelism), true) res.print() ssc.start() ssc.awaitTermination() }}
flume配置如下:
#agent名, source、channel、sink的名称a1.sources = r1a1.channels = c1a1.sinks = k1#具体定义sourcea1.sources.r1.type = spooldira1.sources.r1.spoolDir = /home/hadoop/monitor#具体定义channela1.channels.c1.type = memorya1.channels.c1.capacity = 10000a1.channels.c1.transactionCapacity = 100#具体定义sinka1.sinks.k1.type = avroa1.sinks.k1.hostname = 10.1.9.102a1.sinks.k1.port = 6666#组装source、channel、sinka1.sources.r1.channels = c1a1.sinks.k1.channel = c1
启动flume:
/usr/java/flume/bin/flume-ng agent -n a1 -c conf -f /usr/java/flume/mytest/push.properties
结果:
- 第二种:Poll拉的方式
但是这种方法必须要引入Spark官方的一个jar包,见官方的文档:点击跳转,将jar下载下来放到flume安装包的lib目录下即可,点击直接下载jar包
程序如下:
package cn.lijieimport java.net.InetSocketAddressimport org.apache.log4j.Levelimport org.apache.spark.storage.StorageLevelimport org.apache.spark.streaming.flume.FlumeUtilsimport org.apache.spark.streaming.{Seconds, StreamingContext}import org.apache.spark.{HashPartitioner, SparkConf, SparkContext}/** * User: lijie * Date: 2017/8/3 * Time: 15:19 */object Flume2SparkStreaming02 { def myFunc = (it: Iterator[(String, Seq[Int], Option[Int])]) => { it.map(x => { (x._1, x._2.sum + x._3.getOrElse(0)) }) } def main(args: Array[String]): Unit = { MyLog.setLogLeavel(Level.WARN) val conf = new SparkConf().setAppName("fs01").setMaster("local[2]") val sc = new SparkContext(conf) val ssc = new StreamingContext(sc, Seconds(10)) val addrs = Seq(new InetSocketAddress("192.168.80.123", 10086)) val ds = FlumeUtils.createPollingStream(ssc, addrs, StorageLevel.MEMORY_AND_DISK_2) sc.setCheckpointDir("C:\\Users\\Administrator\\Desktop\\checkpointt") val res = ds.flatMap(x => { new String(x.event.getBody.array()).split(" ") }).map((_, 1)).updateStateByKey(myFunc, new HashPartitioner(sc.defaultParallelism), true) res.print() ssc.start() ssc.awaitTermination() }}
启动flume:
#agent名, source、channel、sink的名称a1.sources = r1a1.channels = c1a1.sinks = k1#具体定义sourcea1.sources.r1.type = spooldira1.sources.r1.spoolDir = /home/hadoop/monitor#具体定义channela1.channels.c1.type = memorya1.channels.c1.capacity = 10000a1.channels.c1.transactionCapacity = 100#具体定义sinka1.sinks.k1.type = org.apache.spark.streaming.flume.sink.SparkSinka1.sinks.k1.hostname = 192.168.80.123a1.sinks.k1.port = 10086#组装source、channel、sinka1.sources.r1.channels = c1a1.sinks.k1.channel = c1
启动flume:
/usr/java/flume/bin/flume-ng agent -n a1 -c conf -f /usr/java/flume/mytest/push.properties
结果
公用类:
MyLog类:
package cn.lijieimport org.apache.log4j.{Level, Logger}import org.apache.spark.Logging/** * User: lijie * Date: 2017/8/3 * Time: 15:36 */object MyLog extends Logging { /** * 设置日志级别 * * @param level */ def setLogLeavel(level: Level): Unit = { val flag = Logger.getRootLogger.getAllAppenders.hasMoreElements if (!flag) { logInfo("set log level ->" + level) Logger.getRootLogger.setLevel(level) } }}
Pom文件:
<?xml version="1.0" encoding="UTF-8"?><project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>flume-sparkstreaming</groupId> <artifactId>flume-sparkstreaming</artifactId> <version>1.0-SNAPSHOT</version> <properties> <maven.compiler.source>1.7</maven.compiler.source> <maven.compiler.target>1.7</maven.compiler.target> <encoding>UTF-8</encoding> <scala.version>2.10.6</scala.version> <spark.version>1.6.1</spark.version> <hadoop.version>2.6.4</hadoop.version> </properties> <dependencies> <dependency> <groupId>org.scala-lang</groupId> <artifactId>scala-library</artifactId> <version>${scala.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming_2.10</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming-flume_2.10</artifactId> <version>${spark.version}</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>${hadoop.version}</version> </dependency> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>5.1.38</version> </dependency> </dependencies> <build> <sourceDirectory>src/main/scala</sourceDirectory> <testSourceDirectory>src/test/scala</testSourceDirectory> <plugins> <plugin> <groupId>net.alchim31.maven</groupId> <artifactId>scala-maven-plugin</artifactId> <version>3.2.2</version> <executions> <execution> <goals> <goal>compile</goal> <goal>testCompile</goal> </goals> <configuration> <args> <arg>-dependencyfile</arg> <arg>${project.build.directory}/.scala_dependencies</arg> </args> </configuration> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-shade-plugin</artifactId> <version>2.4.3</version> <executions> <execution> <phase>package</phase> <goals> <goal>shade</goal> </goals> <configuration> <filters> <filter> <artifact>*:*</artifact> <excludes> <exclude>META-INF/*.SF</exclude> <exclude>META-INF/*.DSA</exclude> <exclude>META-INF/*.RSA</exclude> </excludes> </filter> </filters> <transformers> <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"> <mainClass>cn.lijie.Flume2SparkStreaming01</mainClass> </transformer> </transformers> </configuration> </execution> </executions> </plugin> </plugins> </build></project>
阅读全文
0 0
- Flume直接到SparkStreaming的两种方式
- Kafka到SparkStreaming的两种方式
- sparkstreaming读取kafka的两种方式
- Flume推送数据到SparkStreaming
- SparkStreaming整合Flume(一)Push方式的整合
- SparkStreaming整合Flume(二)Pull方式的整合
- Flume和SparkStream结合的两种方式--pull
- Flume和SparkStream结合的两种方式--push
- kafka0.8版本和sparkstreaming整合的两种不同方式
- SparkStreaming和Kafka集成的两种方式(最全)
- sparkStreaming+flume
- excel上传的两种方式(保存到服务器并读取/直接读取内容)
- 直接插入排序的两种实现方式
- 抓取网页的两种方式.可直接使用
- Flume Avro 两台机器间进行数据传输的方式
- flume学习(一):log4jAppender直接输出日志到flume的avro-source
- Flume+Kafka+SparkStreaming整合
- Flume与SparkStreaming集成
- Spring Cloud Eureka Server HA With Docker
- poj 2559 & hdu 1506 Largest Rectangle in a Histogram 笛卡尔树
- 请编写一个C函数,该函数将一个字符串逆序
- python 正则表达式元字符详细介绍
- mysql存储emoji表情(utf8mb4编码)报错
- Flume直接到SparkStreaming的两种方式
- 002-20161115-2光明回应读者无限关于“佛家、道家,道家三清”的问题
- RSA实践
- ros 节点实现简易超声雷达串口通讯 模拟出激光雷达消息
- fancyTree学习笔记
- HDU 6069-Counting Divisors(多校训练第四场->区间质因数个数)
- SPI总线 的使用 和 裸机程序编写
- JVM调优:选择合适的GC collector (三)
- ios文字设置渐变色