spark 2.2.0学习笔记5之SparkStreamingWordCountDemo

来源:互联网 发布:淘宝果敢翡翠怎么样 编辑:程序博客网 时间:2024/06/07 13:49

spark 2.2.0学习笔记5之SparkStreamingWordCountDemo

Info

spark streaming—-Spark 提供的对实时数据进行流式计算的组件/微批次架构

  • Spark Streaming 使用离散化流(discretized stream)作为抽象表示,叫作DStream
  • DStream 是随时间推移而收到的数据的序列
    • 一种是转化操作(transformation),会生成一个新的DStream
    • 无状态(stateless)—-每个批次的处理不依赖于之前批次的数据
    • 有状态(stateful)—-需要使用之前批次的数据或者是中间结果来计算当前批次的数据
    • 另一种是输出操作(output operation),可以把数据写入外部系统

样例

  • 解压nc.rar,cmd运行 nc -L -p 9999 -v
  • nc.rar 位于本代码目录doc\software\nc.rar

Code

/* * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements.  See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License.  You may obtain a copy of the License at * *    http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */// scalastyle:off printlnpackage spark30.streamingimport org.apache.spark.storage.StorageLevelimport spark30.basic.{SparkContextUtil, StreamingExamples}/**  * Counts words in UTF8 encoded, '\n' delimited text received from the network every second.  *  * Usage: NetworkWordCount <hostname> <port>  * <hostname> and <port> describe the TCP server that Spark Streaming would connect to receive data.  *  * To run this on your local machine, you need to first run a Netcat server  * `$ nc -lk 9999`  * and then run the example  * `$ bin/run-example org.apache.spark.examples.streaming.NetworkWordCount localhost 9999`  */object SparkStreamingWordCountDemo {  def main(args: Array[String]) {    if (args.length < 2) {      System.err.println("Usage: NetworkWordCount <hostname> <port>")      System.exit(1)    }    StreamingExamples.setStreamingLogLevels()    // Create the context with a 1 second batch size    val ssc = SparkContextUtil.getStreamingContext("NetworkWordCount")    // Create a socket stream on target ip:port and count the    // words in input stream of \n delimited text (eg. generated by 'nc')    // Note that no duplication in storage level only for running locally.    // Replication necessary in distributed scenario for fault tolerance.    val lines = ssc.socketTextStream(args(0), args(1).toInt, StorageLevel.MEMORY_AND_DISK_SER)    val words = lines.flatMap(_.split(" "))    val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)    wordCounts.print()    ssc.start()    ssc.awaitTermination()  }}// scalastyle:on printlnSparkContextUtil 中部分代码val master2: String = "local[2]"def getStreamingContext(appName: String): StreamingContext = {    val sparkConf = new SparkConf().setAppName(appName).setMaster(master2)    val ssc = new StreamingContext(sparkConf, Seconds(1))    ssc  }

Code

  • https://github.com/undergrowthlinear/bigdata-learn.git
    • spark30.streaming.SparkStreamingWordCountDemo
    • spark30.basic.SparkContextUtil
阅读全文
'); })();
0 0
原创粉丝点击
热门IT博客
热门问题 老师的惩罚 人脸识别 我在镇武司摸鱼那些年 重生之率土为王 我在大康的咸鱼生活 盘龙之生命进化 天生仙种 凡人之先天五行 春回大明朝 姑娘不必设防,我是瞎子 青协面试问题和答案 深圳和协医院 和协集团 和协 中期协 马鞍山和县 和县 区和县 和县人才网 安徽和县 碧桂园和县 市和县 和县邮编 和县在哪 和县房产网 和县论坛 和县人 和县吧 和县在哪里 和县招聘网 和县房产 和县美食 和县一中 和县景点 和县概况 区和县的区别 区和县哪个大 市和县的区别 和县一中校花将露露 区和县有什么区别 和县乌江镇宾馆 和县招标采购网 和县酒店预订 和县交友征婚 和县到合肥的汽车时刻表 和县招投标网 马鞍山和县论坛 和县人才招聘网 马鞍山市和县 和县属于哪个市