spark 2.2.0学习笔记5之SparkStreamingWordCountDemo
来源:互联网 发布:淘宝果敢翡翠怎么样 编辑:程序博客网 时间:2024/06/07 13:49
spark 2.2.0学习笔记5之SparkStreamingWordCountDemo
Info
spark streaming—-Spark 提供的对实时数据进行流式计算的组件/微批次架构
- Spark Streaming 使用离散化流(discretized stream)作为抽象表示,叫作DStream
- DStream 是随时间推移而收到的数据的序列
- 一种是转化操作(transformation),会生成一个新的DStream
- 无状态(stateless)—-每个批次的处理不依赖于之前批次的数据
- 有状态(stateful)—-需要使用之前批次的数据或者是中间结果来计算当前批次的数据
- 另一种是输出操作(output operation),可以把数据写入外部系统
样例
- 解压nc.rar,cmd运行 nc -L -p 9999 -v
- nc.rar 位于本代码目录doc\software\nc.rar
Code
/* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */// scalastyle:off printlnpackage spark30.streamingimport org.apache.spark.storage.StorageLevelimport spark30.basic.{SparkContextUtil, StreamingExamples}/** * Counts words in UTF8 encoded, '\n' delimited text received from the network every second. * * Usage: NetworkWordCount <hostname> <port> * <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data. * * To run this on your local machine, you need to first run a Netcat server * `$ nc -lk 9999` * and then run the example * `$ bin/run-example org.apache.spark.examples.streaming.NetworkWordCount localhost 9999` */object SparkStreamingWordCountDemo { def main(args: Array[String]) { if (args.length < 2) { System.err.println("Usage: NetworkWordCount <hostname> <port>") System.exit(1) } StreamingExamples.setStreamingLogLevels() // Create the context with a 1 second batch size val ssc = SparkContextUtil.getStreamingContext("NetworkWordCount") // Create a socket stream on target ip:port and count the // words in input stream of \n delimited text (eg. generated by 'nc') // Note that no duplication in storage level only for running locally. // Replication necessary in distributed scenario for fault tolerance. val lines = ssc.socketTextStream(args(0), args(1).toInt, StorageLevel.MEMORY_AND_DISK_SER) val words = lines.flatMap(_.split(" ")) val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _) wordCounts.print() ssc.start() ssc.awaitTermination() }}// scalastyle:on printlnSparkContextUtil 中部分代码val master2: String = "local[2]"def getStreamingContext(appName: String): StreamingContext = { val sparkConf = new SparkConf().setAppName(appName).setMaster(master2) val ssc = new StreamingContext(sparkConf, Seconds(1)) ssc }
Code
- https://github.com/undergrowthlinear/bigdata-learn.git
- spark30.streaming.SparkStreamingWordCountDemo
- spark30.basic.SparkContextUtil
阅读全文