examples / Dataset Wordcount
来源:互联网 发布:儒释道网络电视台 编辑:程序博客网 时间:2024/06/06 03:00
https://docs.cloud.databricks.com/docs/spark/1.6/index.html#examples/Dataset%20Wordcount.html
In this example, we take lines of text and split them up into words. Next, we count the number of occurances of each work in the set using a variety of Spark API.
>dbutils.fs.put("/home/spark/1.6/lines","""
Hello hello world
Hello how are you world
""", true)
Wrote 43 bytes.
res0: Boolean = true
>
import org.apache.spark.sql.functions._
// Load a text file and interpret each line as a java.lang.String
val ds = sqlContext.read.text("/home/spark/1.6/lines").as[String]
val result = ds
.flatMap(_.split(" ")) // Split on whitespace
.filter(_ !="\"\"" ) // Filter empty words
.toDF() // Convert to DataFrame to perform aggregation / sorting
.groupBy($"value") // Count number of occurences of each word
.agg(count("*") as "numOccurances")
.orderBy($"numOccurances" desc) // Show most common words first
display(result)
world 2
Hello 2
are 1
hello 1
how 1
you 1
value numOccurances
It is also possible to perform the aggregation in pure scala, instead of switching to DataFrames. In the following example, we perform the same wordcount, normalizing the case of the word (i.e. group "hello" and "Hello" together)
>
val wordCount =
ds
.flatMap(_.split(" "))
.filter(_ !="\"\"" )
.groupBy(_.toLowerCase()) // Instead of grouping on a column expression (i.e. $"value") we pass a lambda function
.count()
display(wordCount.toDF())
are 1
hello 3
how 1
world 2
you 1
0 0
- examples / Dataset Wordcount
- Flink学习笔记 --- 理解DataSet WordCount
- 运行 bin/hadoop jar hadoop-*-examples.jar wordcount报错
- bin/hadoop jar hadoop-*-examples.jar wordcount 路径错误。
- Hadoop中自带的examples之wordcount应用案例
- 解决Eclipse中运行WordCount出现 java.lang.ClassNotFoundException: org.apache.hadoop.examples.WordCount$Token
- examples
- examples
- hadoop-examples-0.20.2-cdh3u6.jar wordcount 例子运行出现的问题记录
- hadoop-运行hadoop jar hadoop-examples-1.2.1.jar wordcount /wc/input/ /wc/output/
- wordcount
- wordcount
- WordCount
- wordCount
- wordcount
- wordcount
- WordCount
- wordCount
- php建立虚拟主机
- 剑指-丑数
- spring下载
- Centos7.0 使用YUM安装MariaDB
- 10种简单的Java性能优化
- examples / Dataset Wordcount
- switch中的default
- IntentService使用
- 龙果支付开源项目对账接口介绍
- 纯JS省市区三级级联
- Spring-data-jpa详解,全方位介绍。
- Bootstrap Table使用分享
- MySql 存储过程实战(附完整注释)
- 自绘按钮