Spark Programming Guide(三)
来源:互联网 发布:软件质量保证措施方案 编辑:程序博客网 时间:2024/05/22 09:39
Working with Key-Value Pairs
While most Spark operations work on RDDs containing any type of objects, a few special operations are only available on RDDs of key-value pairs. The most common ones are distributed “shuffle” operations, such as grouping or aggregating the elements by a key.
大多数Spark操作可以使用RDDs包含不同的数据类型来完成,一些少数特殊的操作可以使用键值对的RDDs。最常见的是分布式 “shuffle” 操作,如通过元素的 key 来进行 grouping 或 aggregating 操作。
In Scala, these operations are automatically available on RDDs containing Tuple2 objects (the built-in tuples in the language, created by simply writing (a, b)). The key-value pair operations are available in the PairRDDFunctions class, which automatically wraps around an RDD of tuples.
在scala中,这些操作都是自动可用的scala包含了tuple2 RDDS(内置对象的元组的语言,由简单的写作(A,B)),键值对的操作都定义在了PairRDDFunctions类中,该类将对元组RDD的功能进行增强。
For example, the following code uses the reduceByKey operation on key-value pairs to count how many times each line of text occurs in a file:
例如,下面的代码使用的 Key-Value 对的 reduceByKey 操作统计文本文件中每一行出现了多少次:
val lines = sc.textFile("data.txt")val pairs = lines.map(s => (s, 1))val counts = pairs.reduceByKey((a, b) => a + b)
We could also use counts.sortByKey(), for example, to sort the pairs alphabetically, and finally counts.collect() to bring them back to the driver program as an array of objects.
我们也可以使用counts.sortByKey(),例如,对其进行按字母排序,最后使用counts.collect()方法收集结果数据返回到驱动程序
Note: when using custom objects as the key in key-value pair operations, you must be sure that a custom equals() method is accompanied with a matching hashCode() method. For full details, see the contract outlined in the Object.hashCode() documentation.
注意:当你使用键值对RDD操作一个自定义的对象时,如果你重写了equals()方法也必须重写hashCode()方法。有关详情, 请参阅 Object.hashCode() documentation 中列出的约定.
Transformations
The following table lists some of the common transformations supported by Spark. Refer to the RDD API doc (Scala, Java, Python, R) and pair RDD functions doc (Scala, Java) for details.
下表列出了一些 Spark 常用的 transformations(转换). 详情请参考 RDD API 文档 (Scala, Java, Python, R) 和 pair RDD 函数文档 (Scala, Java).
Actions
The following table lists some of the common actions supported by Spark. Refer to the RDD API doc (Scala, Java, Python, R)and pair RDD functions doc (Scala, Java) for details.
下表列出了一些 Spark 常用的 actions 操作。详细请参考 RDD API 文档 (Scala, Java, Python, R)和 pair RDD 函数文档 (Scala, Java).
- Spark Programming Guide(三)
- Spark Programming Guide (Python) Spark编程指南 (三)
- Spark Programming Guide
- Spark programming guide
- Spark Programming Guide 中文版
- Spark Programming Guide
- Spark Programming Guide 翻译
- Spark Programming Guide
- spark programming guide
- Spark Programming Guide(二)
- Spark Programming Guide(四)
- Spark Programming Guide(五)
- Spark Streaming Programming Guide
- spark streaming programming guide 基础概念之linking(三a)
- Spark1.1.0 Spark Programming Guide
- Apache Spark - Programming Guide(Spark编程指南)
- Spark开发指南Spark Programming Guide
- Spark官方文档《Spark Programming Guide》解读
- (三)linux内核移植、网卡移植过程、第三方驱动移植
- java实现JNI调用dll总结
- VGG网络笔记整理
- 设定MS SQL Server 2008定期自动备份
- 购物车
- Spark Programming Guide(三)
- SumlimeText 3 2017年9月更新后可用的License 版本3143
- 将一个十六进制的字符串型的数字转换成整型数字的函数(目前此函数只支持32位)
- Spring中 静态成员变量的注入,以及其他静态成员变量的单例延迟加载--查漏补缺
- [BZOJ1231][Usaco2008 Nov]mixup2 混乱的奶牛(状压dp)
- Thread基础
- 我原谅绿了我的女友错了吗
- 项目开发总结之CountDownTimerUtils
- 将json转换成datatable