spark-streaming-[1]-streaming基础NetworkWordCount
来源:互联网 发布:mac点不开 app store 编辑:程序博客网 时间:2024/06/06 02:22
一、编程框架
Define context
val conf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount")
val ssc = new StreamingContext(conf, Seconds(1))
After a context is defined, you have to do the following.
1. Define the input sources by creating input DStreams.例如:
val lines = ssc.socketTextStream("localhost", 9999)
2. Define the streaming computations by applying transformation and output operations to DStreams.
Basic Sources:
[1]File Streams: For reading data from files on any file system compatible with the HDFS API (that is, HDFS, S3, NFS, etc.), a DStream can be created as:
[2]Streams based on Custom Actors: DStreams can be created with data streams received through Akka actors by using streamingContext.actorStream(actorProps, actor-name)
[3]Queue of RDDs as a Stream
3. Start receiving data and processing it using streamingContext.start().
4. Wait for the processing to be stopped (manually or due to any error) using
streamingContext.awaitTermination().
5. The processing can be manually stopped using streamingContext.stop().
二、注意事项
设置local [n] > = stream源数
Once a context has been started, no new streaming computations can be set up or added to it. Once a context has been stopped, it cannot be restarted.
Only one StreamingContext can be active in a JVM at the same time.stop() on StreamingContext also stops the SparkContext.
To stop only the StreamingContext, set the optional parameter of stop() called stopSparkContext to false.
A SparkContext can be re-used to create multiple StreamingContexts, as long as the previous StreamingContext is stopped (without stopping the SparkContext) before the next StreamingContext is created.三、Basic Sources
[1]TCP socket
[2]File Streams: For reading data from files on any file system compatible with the HDFS API (that is, HDFS, S3, NFS, etc.)
Streams based on Custom Actors: DStreams can be created with data streams received through Akka actors by using streamingContext.actorStream(actorProps, actor-name).
[3]Queue of RDDs as a Stream
NetCat测试一
官方实例:NetworkWordCount
1:运行程序
2://启动netcat serverroot@sparkmaster:~/streaming# nc -lk 9999
3:输入单词
package com.dt.spark.main.Streaming.tcpimport org.apache.log4j.{Logger, Level}import org.apache.spark._import org.apache.spark.streaming._ // not necessary since Spark 1.3/** * Created by hjw on 17/4/17. *//*After a context is defined, you have to do the following.1. Define the input sources by creating input DStreams.例如: val conf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount") val ssc = new StreamingContext(conf, Seconds(1))2. Define the streaming computations by applying transformation and output operations to DStreams.例如: val lines = ssc.socketTextStream("localhost", 9999)Basic Sources:[1]File Streams: For reading data from files on any file system compatible with the HDFS API (that is, HDFS, S3, NFS, etc.), a DStream can be created as:[2]Streams based on Custom Actors: DStreams can be created with data streams received through Akka actors by using streamingContext.actorStream(actorProps, actor-name)[3]Queue of RDDs as a Stream3. Start receiving data and processing it using streamingContext.start().4. Wait for the processing to be stopped (manually or due to any error) usingstreamingContext.awaitTermination().5. The processing can be manually stopped using streamingContext.stop().Points to remember:Once a context has been started, no new streaming computations can be set up or added to it. Once a context has been stopped, it cannot be restarted.Only one StreamingContext can be active in a JVM at the same time.stop() on StreamingContext also stops the SparkContext. To stop only the StreamingContext, set the optional parameter of stop() called stopSparkContext to false.A SparkContext can be re-used to create multiple StreamingContexts, as long as the previous StreamingContext is stopped (without stopping the SparkContext) before the next StreamingContext is created.Points to remember设置local [n] > = stream源数When running a Spark Streaming program locally, do not use “local” or “local[1]” asthe master URL. Either of these means that only one thread will be used for running tasks locally. If you are using a input DStream based on a receiver (e.g. sockets, Kafka, Flume, etc.), then the single thread will be used to run the receiver, leaving no thread for processing the received data. Hence, when running locally, always use “local[n]” as the master URL, where n > number of receivers to run*/object NetworkWordCount { Logger.getLogger("org").setLevel(Level.ERROR) def main(args: Array[String]): Unit ={ // Create a local StreamingContext with two working thread and batch interval of 1 second. // The master requires 2 cores to prevent from a starvation scenario. val conf = new SparkConf().setMaster("local[2]").setAppName("NetworkWordCount") val ssc = new StreamingContext(conf, Seconds(1)) // Create a DStream that will connect to hostname:port, like localhost:9999 val lines = ssc.socketTextStream("localhost", 9999) // Split each line into words val words = lines.flatMap(_.split(" ")) // not necessary since Spark 1.3 // Count each word in each batch val pairs = words.map(word => (word, 1)) val wordCounts = pairs.reduceByKey(_ + _) // Print the first ten elements of each RDD generated in this DStream to the console wordCounts.print() ssc.start() // Start the computation ssc.awaitTermination() // Wait for the computation to terminate }}//-------------------------------------------//Time: 1492433839000 ms//-------------------------------------------//(word,1)//(hello,1)//-------------------------------------------//Time: 1492433903000 ms//-------------------------------------------//(word,1)//(hello,1)
socket模拟spout测试二
先运行以下模拟器再运行上面的NetworkWordCountNetworkWordCountData.txt中内容如下
hello worldhello javahello chello c++hjw hjwprogram arguments:./srcFile/NetworkWordCountData.txt 9999 1000
输出如下:
-------------------------------------------
Time: 1493642176000 ms
-------------------------------------------
-------------------------------------------
Time: 1493642177000 ms
-------------------------------------------
-------------------------------------------
Time: 1493642178000 ms
-------------------------------------------
(,1)
(hello,1)
(world,1)
-------------------------------------------
Time: 1493642179000 ms
-------------------------------------------
(,1)
(hello,1)
(java,1)
-------------------------------------------
Time: 1493642180000 ms
-------------------------------------------
(,2)
(hjw,2)
-------------------------------------------
Time: 1493642181000 ms
-------------------------------------------
(,1)
(hello,1)
(java,1)
package com.dt.spark.main.Streaming.tcpimport java.io.PrintWriterimport java.net.ServerSocketimport scala.io.Source/** * Created by hjw on 17/5/1. */object StreamingSimulation { /* 随机取整函数 */ def index(length:Int) ={ import java.util.Random val rdm = new Random() rdm.nextInt(length) } def main(args: Array[String]) { if (args.length != 3){ System.err.println("Usage: <filename><port><millisecond>") System.exit(1) } val filename = args(0) val lines = Source.fromFile(filename).getLines().toList val fileRow = lines.length val listener = new ServerSocket(args(1).toInt) //指定端口,当有请求时建立连接 while(true){ val socket = listener.accept() new Thread(){ override def run() = { println("Got client connect from: " + socket.getInetAddress) val out = new PrintWriter(socket.getOutputStream,true) while(true){ Thread.sleep(args(2).toLong) //随机发送一行数据至client val content = lines(index(fileRow)) println(content) out.write(content + '\n') out.flush() } socket.close() } }.start() } }}
- spark-streaming-[1]-streaming基础NetworkWordCount
- Spark组件之Spark Streaming学习1--NetworkWordCount学习
- spark streaming的NetworkWordCount实例理解
- spark streaming的NetworkWordCount实例理解
- spark streaming的NetworkWordCount实例理解
- Spark Streaming调试——NetworkWordCount和KafkaWordCount
- [3.1]Spark Streaming初体验之NetworkWordCount案例完美解读
- Spark2.x学习笔记:16、Spark Streaming入门实例NetworkWordCount
- Spark Streaming基础原理
- Spark Streaming-1:Spark Streaming编程指南
- Spark streaming-1
- 1 Spark Streaming本质
- Spark Streaming基础概念介绍
- Spark Streaming基础概念介绍
- Spark Streaming
- spark streaming
- Spark/Streaming
- Spark Streaming
- Deep Learning模型之:CNN卷积神经网络(一)深度解析CNN
- Android Studio的Instant Run工作原理及用法
- JS事件监听的添加方法
- Ubuntu下查看linux版本,内核版本,系统位数,gcc版本
- 标准IO下对文件权限的修改
- spark-streaming-[1]-streaming基础NetworkWordCount
- 萌新解决虚拟机炸了的辛酸史
- php wamp htaccess 伪静态配置无效的问题
- @Deprecated-重构技巧
- Javascript的10个设计缺陷
- <? extends AnyClass> <? super AnyClass>
- 基于Surface的视频编解码与OpenGL ES渲染
- <深入理解linux内核》笔记六
- String的== 与equals详解