spark 2.2.0学习笔记5之SparkStreamingWordCountDemo


spark streaming—-Spark 提供的对实时数据进行流式计算的组件/微批次架构

  • Spark Streaming 使用离散化流(discretized stream)作为抽象表示,叫作DStream
  • DStream 是随时间推移而收到的数据的序列
    • 一种是转化操作(transformation),会生成一个新的DStream
    • 无状态(stateless)—-每个批次的处理不依赖于之前批次的数据
    • 有状态(stateful)—-需要使用之前批次的数据或者是中间结果来计算当前批次的数据
    • 另一种是输出操作(output operation),可以把数据写入外部系统


  • 解压nc.rar,cmd运行 nc -L -p 9999 -v
  • nc.rar 位于本代码目录doc\software\nc.rar


/* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements.  See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License.  You may obtain a copy of the License at * * * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */// scalastyle:off printlnpackage spark30.streamingimport spark30.basic.{SparkContextUtil, StreamingExamples}/**  * Counts words in UTF8 encoded, '\n' delimited text received from the network every second.  *  * Usage: NetworkWordCount <hostname> <port>  * <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data.  *  * To run this on your local machine, you need to first run a Netcat server  * `$ nc -lk 9999`  * and then run the example  * `$ bin/run-example org.apache.spark.examples.streaming.NetworkWordCount localhost 9999`  */object SparkStreamingWordCountDemo {  def main(args: Array[String]) {    if (args.length < 2) {      System.err.println("Usage: NetworkWordCount <hostname> <port>")      System.exit(1)    }    StreamingExamples.setStreamingLogLevels()    // Create the context with a 1 second batch size    val ssc = SparkContextUtil.getStreamingContext("NetworkWordCount")    // Create a socket stream on target ip:port and count the    // words in input stream of \n delimited text (eg. generated by 'nc')    // Note that no duplication in storage level only for running locally.    // Replication necessary in distributed scenario for fault tolerance.    val lines = ssc.socketTextStream(args(0), args(1).toInt, StorageLevel.MEMORY_AND_DISK_SER)    val words = lines.flatMap(_.split(" "))    val wordCounts = => (x, 1)).reduceByKey(_ + _)    wordCounts.print()    ssc.start()    ssc.awaitTermination()  }}// scalastyle:on printlnSparkContextUtil 中部分代码val master2: String = "local[2]"def getStreamingContext(appName: String): StreamingContext = {    val sparkConf = new SparkConf().setAppName(appName).setMaster(master2)    val ssc = new StreamingContext(sparkConf, Seconds(1))    ssc  }


    • spark30.streaming.SparkStreamingWordCountDemo
    • spark30.basic.SparkContextUtil
