spark 2.2.0学习笔记5之SparkStreamingWordCountDemo

来源：互联网发布：淘宝果敢翡翠怎么样编辑：程序博客网时间：2024/06/07 13:49

spark 2.2.0学习笔记5之SparkStreamingWordCountDemo

Info

spark streaming—-Spark 提供的对实时数据进行流式计算的组件/微批次架构

Spark Streaming 使用离散化流（discretized stream）作为抽象表示，叫作DStream
DStream 是随时间推移而收到的数据的序列
- 一种是转化操作（transformation），会生成一个新的DStream
- 无状态（stateless）—-每个批次的处理不依赖于之前批次的数据
- 有状态（stateful)—-需要使用之前批次的数据或者是中间结果来计算当前批次的数据
- 另一种是输出操作（output operation），可以把数据写入外部系统

样例

解压nc.rar,cmd运行 nc -L -p 9999 -v
nc.rar 位于本代码目录doc\software\nc.rar

Code

/* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements.  See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License.  You may obtain a copy of the License at * *    http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */// scalastyle:off printlnpackage spark30.streamingimport org.apache.spark.storage.StorageLevelimport spark30.basic.{SparkContextUtil, StreamingExamples}/**  * Counts words in UTF8 encoded, '\n' delimited text received from the network every second.  *  * Usage: NetworkWordCount <hostname> <port>  * <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data.  *  * To run this on your local machine, you need to first run a Netcat server  * `$ nc -lk 9999`  * and then run the example  * `$ bin/run-example org.apache.spark.examples.streaming.NetworkWordCount localhost 9999`  */object SparkStreamingWordCountDemo {  def main(args: Array[String]) {    if (args.length < 2) {      System.err.println("Usage: NetworkWordCount <hostname> <port>")      System.exit(1)    }    StreamingExamples.setStreamingLogLevels()    // Create the context with a 1 second batch size    val ssc = SparkContextUtil.getStreamingContext("NetworkWordCount")    // Create a socket stream on target ip:port and count the    // words in input stream of \n delimited text (eg. generated by 'nc')    // Note that no duplication in storage level only for running locally.    // Replication necessary in distributed scenario for fault tolerance.    val lines = ssc.socketTextStream(args(0), args(1).toInt, StorageLevel.MEMORY_AND_DISK_SER)    val words = lines.flatMap(_.split(" "))    val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)    wordCounts.print()    ssc.start()    ssc.awaitTermination()  }}// scalastyle:on printlnSparkContextUtil 中部分代码val master2: String = "local[2]"def getStreamingContext(appName: String): StreamingContext = {    val sparkConf = new SparkConf().setAppName(appName).setMaster(master2)    val ssc = new StreamingContext(sparkConf, Seconds(1))    ssc  }

Code

https://github.com/undergrowthlinear/bigdata-learn.git
- spark30.streaming.SparkStreamingWordCountDemo
- spark30.basic.SparkContextUtil

阅读全文

'); })();