Spark Streaming 3:转换操作
来源:互联网 发布:农村淘宝的前景怎么样 编辑:程序博客网 时间:2024/06/06 07:42
1.6.2 spark streaming programming guide http://spark.apache.org/docs/1.6.2/streaming-programming-guide.html
DStreams转换操作 Transformations on DStreams
与rdd类似,DStream也有许多转换操作,常用的如下
spark.default.parallelism
) to do the grouping. You can pass an optionalnumTasks
argument to set a different number of tasks.join(otherStream, [numTasks])When called on two DStreams of (K, V) and (K, W) pairs, return a new DStream of (K, (V, W)) pairs with all pairs of elements for each key.cogroup(otherStream, [numTasks])When called on a DStream of (K, V) and (K, W) pairs, return a new DStream of (K, Seq[V], Seq[W]) tuples.transform(func)Return a new DStream by applying a RDD-to-RDD function to every RDD of the source DStream. This can be used to do arbitrary RDD operations on the DStream.updateStateByKey(func)Return a new "state" DStream where the state for each key is updated by applying the given function on the previous state of the key and the new values for the key. This can be used to maintain arbitrary state data for each key.- transform(func)
可以对DStream中的rdd进行操作
- updateStateByKey(func)
返回一个新的DStream。根据给定的func更新之前批次状态的结果,实现sparkstreaming计算结果的跨批次更新
案例:wordcount中实现跨批次计数
#encoding=utf8"""SimpleApp"""from pyspark import SparkContext,SparkConffrom pyspark.sql import HiveContext,Rowfrom pyspark.streaming import StreamingContextimport sysreload(sys)sys.setdefaultencoding('utf-8')# test upddateStateByKey functiondef updateFunc(newValues,states): return sum(newValues) + (states or 0)sc = SparkContext("local[2]","streamApp")sqlContext = HiveContext(sc)ssc = StreamingContext(sc,30)ssc.checkpoint('file:///input/checkpoint')lines = ssc.textFileStream("file:///input/flume").flatMap(lambda line:line.split(',')).map(lambda x:(x,1)).reduceByKey(lambda x,y:x+y)output = lines.updateStateByKey(updateFunc)output.pprint()ssc.start()ssc.awaitTermination()
0 0
- Spark Streaming 3:转换操作
- <<Spark Streaming Programming Guide>> - Part 3 转换操作
- Spark Streaming操作笔记
- Spark Streaming的窗口操作
- Spark Streaming 的 UpdateStateByKey操作
- Spark Streaming的窗口操作
- Spark Streaming中的操作函数
- Spark Streaming中的操作函数
- Kakfka-Spark Streaming-Spark SQL操作笔记
- spark-streaming-[3]-Transform
- Spark Streaming中的操作函数分析
- <转>Spark Streaming中的操作函数分析
- Spark Streaming中的操作函数分析
- Spark Streaming中的操作函数分析
- Spark Streaming中的操作函数分析
- Spark Streaming 实战案例(二) Transformation操作
- Spark Streaming中的操作函数讲解
- Spark Streaming中的操作函数分析
- 文件短名转长名
- 啥是web service 和soap?
- 使用Spring注解,在静态方法中注入bean
- 事务以及Spring中的事务管理一
- matlab关于plotfit函数,lsqcurvefit函数,cftool工具箱的使用
- Spark Streaming 3:转换操作
- 有关一些swift 控件的基本创建和使用
- Android RadioGroup多行显示,解决单选问题
- [leetcode] 330. Patching Array
- webview 不显示图片
- java thread
- win10 cmd窗口切换目录并运行python代码
- Eqs POJ
- 快速排序问题