SparkStreaming之Output Operations

来源:互联网 发布:淘宝兼职红包单 编辑:程序博客网 时间:2024/03/29 09:35
Output Operation On DStream

输出操作允许DStream的数据保存在外部系统中,像数据库或者文件系统。下面是官网给的说明:


1、print函数

/**   * Print the first ten elements of each RDD generated in this DStream. This is an output   * operator, so this DStream will be registered as an output stream and there materialized.   */  def print(): Unit = ssc.withScope {    print(10)  }  /**   * Print the first num elements of each RDD generated in this DStream. This is an output   * operator, so this DStream will be registered as an output stream and there materialized.   */  def print(num: Int): Unit = ssc.withScope {    def foreachFunc: (RDD[T], Time) => Unit = {      (rdd: RDD[T], time: Time) => {        val firstNum = rdd.take(num + 1)        // scalastyle:off println        println("-------------------------------------------")        println(s"Time: $time")        println("-------------------------------------------")        firstNum.take(num).foreach(println)        if (firstNum.length > num) println("...")        println()        // scalastyle:on println      }    }    foreachRDD(context.sparkContext.clean(foreachFunc), displayInnerRDDOps = false)  }
从源码可以看错,输入print()不加参数是默认输入10行,如下就是默认输出的。



2、saveAsTextFiles函数


下面是我书写的保存形式:

data.saveAsTextFiles("file:///root/application/test","txt")
保存下来是  一个一个的文件夹,每个batch interval一个文件:


打开进去看是:


3、saveAsObjectFiles函数

保存和saveAsTextFiles一样,但是不可以打开


原因是,先看看对這个函数的说明:

  保存DStream的内容为一个序列化的文件SequenceFile。每一个批间隔的文件的文件名基于prefix和suffix生成。"prefix-TIME_IN_MS[.suffix]",在Python API中不可用。

就是说保存下来的是一个序列化的文件SequenceFile文件。

import org.apache.log4j.{Level, Logger}import org.apache.spark.streaming.{Seconds, StreamingContext}import org.apache.spark.{SparkConf, SparkContext}object scalaOutput {  def main(args: Array[String]) {    Logger.getLogger("org.apache.spark").setLevel(Level.ERROR)    Logger.getLogger("org.eclipse.jetty.Server").setLevel(Level.OFF)    val conf = new SparkConf().setAppName("scalaOutput test").setMaster("local[4]")    val sc = new SparkContext(conf)    val ssc = new StreamingContext(sc,Seconds(2))    //set the Checkpoint directory    ssc.checkpoint("./Res")    //get the socket Streaming data    val socketStreaming = ssc.socketTextStream("master",9999)    val data = socketStreaming.map(x => (x, 1)).reduceByKeyAndWindow(_+_,      (a,b) => a+b*0      ,Seconds(6),Seconds(2))     data.saveAsTextFiles("file:///root/application/test","txt")    //data.saveAsObjectFiles("file:///root/application/test","txt")    //data.saveAsHadoopFiles("file:///root/application/test","txt")    //data.saveAsHadoopFiles("hdfs://master:9000/examples/test-","txt")    /*data.map(evt =>{      val str = new ArrayBuffer[String]()//StringBuffer(String)      ("string received: "+ str)    }    ).saveAsTextFiles("file:///root/application/test","txt")*/    data.print()    ssc.start()    ssc.awaitTermination()  }}


0 0
原创粉丝点击