Flume直接到SparkStreaming的两种方式

来源：互联网发布：python 服务编辑：程序博客网时间：2024/06/04 18:34

一般是flume->kafka->SparkStreaming,如果非要从Flume直接将数据输送到SparkStreaming里面有两种方式,如下:

第一种:Push推送的方式

程序如下:

package cn.lijieimport org.apache.log4j.Levelimport org.apache.spark.streaming.flume.FlumeUtilsimport org.apache.spark.streaming.{Seconds, StreamingContext}import org.apache.spark.{HashPartitioner, SparkConf, SparkContext}/** * User: lijie * Date: 2017/8/3 * Time: 15:19    */object Flume2SparkStreaming01 {  def myFunc = (it: Iterator[(String, Seq[Int], Option[Int])]) => {    it.map(x => {      (x._1, x._2.sum + x._3.getOrElse(0))    })  }  def main(args: Array[String]): Unit = {    MyLog.setLogLeavel(Level.ERROR)    val conf = new SparkConf().setAppName("fs01").setMaster("local[2]")    val sc = new SparkContext(conf)    val ssc = new StreamingContext(sc, Seconds(10))    val ds = FlumeUtils.createStream(ssc, "10.1.9.102", 6666)    sc.setCheckpointDir("C:\\Users\\Administrator\\Desktop\\checkpoint")    val res = ds.flatMap(x => {      new String(x.event.getBody.array()).split(" ")    }).map((_, 1)).updateStateByKey(myFunc, new HashPartitioner(sc.defaultParallelism), true)    res.print()    ssc.start()    ssc.awaitTermination()  }}

flume配置如下:

#agent名， source、channel、sink的名称a1.sources = r1a1.channels = c1a1.sinks = k1#具体定义sourcea1.sources.r1.type = spooldira1.sources.r1.spoolDir = /home/hadoop/monitor#具体定义channela1.channels.c1.type = memorya1.channels.c1.capacity = 10000a1.channels.c1.transactionCapacity = 100#具体定义sinka1.sinks.k1.type = avroa1.sinks.k1.hostname = 10.1.9.102a1.sinks.k1.port = 6666#组装source、channel、sinka1.sources.r1.channels = c1a1.sinks.k1.channel = c1

启动flume:

/usr/java/flume/bin/flume-ng agent -n a1 -c conf -f /usr/java/flume/mytest/push.properties

结果:

这里写图片描述

第二种:Poll拉的方式

但是这种方法必须要引入Spark官方的一个jar包,见官方的文档:点击跳转,将jar下载下来放到flume安装包的lib目录下即可,点击直接下载jar包

程序如下:

package cn.lijieimport java.net.InetSocketAddressimport org.apache.log4j.Levelimport org.apache.spark.storage.StorageLevelimport org.apache.spark.streaming.flume.FlumeUtilsimport org.apache.spark.streaming.{Seconds, StreamingContext}import org.apache.spark.{HashPartitioner, SparkConf, SparkContext}/**  * User: lijie  * Date: 2017/8/3  * Time: 15:19    */object Flume2SparkStreaming02 {  def myFunc = (it: Iterator[(String, Seq[Int], Option[Int])]) => {    it.map(x => {      (x._1, x._2.sum + x._3.getOrElse(0))    })  }  def main(args: Array[String]): Unit = {    MyLog.setLogLeavel(Level.WARN)    val conf = new SparkConf().setAppName("fs01").setMaster("local[2]")    val sc = new SparkContext(conf)    val ssc = new StreamingContext(sc, Seconds(10))    val addrs = Seq(new InetSocketAddress("192.168.80.123", 10086))    val ds = FlumeUtils.createPollingStream(ssc, addrs, StorageLevel.MEMORY_AND_DISK_2)    sc.setCheckpointDir("C:\\Users\\Administrator\\Desktop\\checkpointt")    val res = ds.flatMap(x => {      new String(x.event.getBody.array()).split(" ")    }).map((_, 1)).updateStateByKey(myFunc, new HashPartitioner(sc.defaultParallelism), true)    res.print()    ssc.start()    ssc.awaitTermination()  }}

启动flume:

#agent名， source、channel、sink的名称a1.sources = r1a1.channels = c1a1.sinks = k1#具体定义sourcea1.sources.r1.type = spooldira1.sources.r1.spoolDir = /home/hadoop/monitor#具体定义channela1.channels.c1.type = memorya1.channels.c1.capacity = 10000a1.channels.c1.transactionCapacity = 100#具体定义sinka1.sinks.k1.type = org.apache.spark.streaming.flume.sink.SparkSinka1.sinks.k1.hostname = 192.168.80.123a1.sinks.k1.port = 10086#组装source、channel、sinka1.sources.r1.channels = c1a1.sinks.k1.channel = c1

启动flume:

/usr/java/flume/bin/flume-ng agent -n a1 -c conf -f /usr/java/flume/mytest/push.properties

结果

这里写图片描述

公用类:

MyLog类:

package cn.lijieimport org.apache.log4j.{Level, Logger}import org.apache.spark.Logging/**  * User: lijie  * Date: 2017/8/3  * Time: 15:36    */object MyLog extends Logging {  /**    * 设置日志级别    *    * @param level    */  def setLogLeavel(level: Level): Unit = {    val flag = Logger.getRootLogger.getAllAppenders.hasMoreElements    if (!flag) {      logInfo("set log level ->" + level)      Logger.getRootLogger.setLevel(level)    }  }}

Pom文件:

<?xml version="1.0" encoding="UTF-8"?><project xmlns="http://maven.apache.org/POM/4.0.0"         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">    <modelVersion>4.0.0</modelVersion>    <groupId>flume-sparkstreaming</groupId>    <artifactId>flume-sparkstreaming</artifactId>    <version>1.0-SNAPSHOT</version>    <properties>        <maven.compiler.source>1.7</maven.compiler.source>        <maven.compiler.target>1.7</maven.compiler.target>        <encoding>UTF-8</encoding>        <scala.version>2.10.6</scala.version>        <spark.version>1.6.1</spark.version>        <hadoop.version>2.6.4</hadoop.version>    </properties>    <dependencies>        <dependency>            <groupId>org.scala-lang</groupId>            <artifactId>scala-library</artifactId>            <version>${scala.version}</version>        </dependency>        <dependency>            <groupId>org.apache.spark</groupId>            <artifactId>spark-core_2.10</artifactId>            <version>${spark.version}</version>        </dependency>        <dependency>            <groupId>org.apache.spark</groupId>            <artifactId>spark-streaming_2.10</artifactId>            <version>${spark.version}</version>        </dependency>        <dependency>            <groupId>org.apache.spark</groupId>            <artifactId>spark-streaming-flume_2.10</artifactId>            <version>${spark.version}</version>        </dependency>        <dependency>            <groupId>org.apache.hadoop</groupId>            <artifactId>hadoop-client</artifactId>            <version>${hadoop.version}</version>        </dependency>        <dependency>            <groupId>mysql</groupId>            <artifactId>mysql-connector-java</artifactId>            <version>5.1.38</version>        </dependency>    </dependencies>    <build>        <sourceDirectory>src/main/scala</sourceDirectory>        <testSourceDirectory>src/test/scala</testSourceDirectory>        <plugins>            <plugin>                <groupId>net.alchim31.maven</groupId>                <artifactId>scala-maven-plugin</artifactId>                <version>3.2.2</version>                <executions>                    <execution>                        <goals>                            <goal>compile</goal>                            <goal>testCompile</goal>                        </goals>                        <configuration>                            <args>                                <arg>-dependencyfile</arg>                                <arg>${project.build.directory}/.scala_dependencies</arg>                            </args>                        </configuration>                    </execution>                </executions>            </plugin>            <plugin>                <groupId>org.apache.maven.plugins</groupId>                <artifactId>maven-shade-plugin</artifactId>                <version>2.4.3</version>                <executions>                    <execution>                        <phase>package</phase>                        <goals>                            <goal>shade</goal>                        </goals>                        <configuration>                            <filters>                                <filter>                                    <artifact>*:*</artifact>                                    <excludes>                                        <exclude>META-INF/*.SF</exclude>                                        <exclude>META-INF/*.DSA</exclude>                                        <exclude>META-INF/*.RSA</exclude>                                    </excludes>                                </filter>                            </filters>                            <transformers>                                <transformer                                        implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">                                    <mainClass>cn.lijie.Flume2SparkStreaming01</mainClass>                                </transformer>                            </transformers>                        </configuration>                    </execution>                </executions>            </plugin>        </plugins>    </build></project>

阅读全文

0 0