Spark2.x学习笔记:17、Spark Streaming之HdfsWordCount 学习
来源:互联网 发布:三维全景图制作软件 编辑:程序博客网 时间:2024/06/05 04:22
Spark2.x学习笔记:17、Spark Streaming之HdfsWordCount 学习
17.1 HdfsWordCount 源码解析
// scalastyle:off printlnpackage org.apache.spark.examples.streamingimport org.apache.spark.SparkConfimport org.apache.spark.streaming.{Seconds, StreamingContext}/** * Counts words in new text files created in the given directory * Usage: HdfsWordCount <directory> * <directory> is the directory that Spark Streaming will use to find and read new text files. * * To run this on your local machine on directory `localdir`, run this example * $ bin/run-example \ * org.apache.spark.examples.streaming.HdfsWordCount localdir * * Then create a text file in `localdir` and the words in the file will get counted. */object HdfsWordCount { def main(args: Array[String]) { if (args.length < 1) { System.err.println("Usage: HdfsWordCount <directory>") System.exit(1) } StreamingExamples.setStreamingLogLevels() val sparkConf = new SparkConf().setAppName("HdfsWordCount") // Create the context val ssc = new StreamingContext(sparkConf, Seconds(2)) // Create the FileInputDStream on the directory and use the // stream to count words in new files created val lines = ssc.textFileStream(args(0)) val words = lines.flatMap(_.split(" ")) val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _) wordCounts.print() ssc.start() ssc.awaitTermination() }}// scalastyle:on println
通过注释可以知道,
- HdfsWordCount 是统计在给定目录中新文本文件中的单词
- 运行方式
run-example org.apache.spark.examples.streaming.HdfsWordCount localdir
,其中localdir是Spark Streaming将用来查找和读取新文本文件的目录
17.2 测试运行
(1)创建目录
[root@node1 ~]# hdfs dfs -mkdir /streaming[root@node1 ~]# hdfs dfs -ls /streaming[root@node1 ~]#
(2)先上传一个文件
[root@node1 ~]# hdfs dfs -put data/word1.txt /streaming[root@node1 ~]# hdfs dfs -ls /streamingFound 1 items-rw-r--r-- 3 root supergroup 30 2017-11-04 09:21 /streaming/word1.txt[root@node1 ~]#
这里需要先在Spark Streaming需要读取的目录中上传一个文件,不然HdfsWordCount 运行后再上传会报错
java.io.FileNotFoundException: File does not exist: /streaming/books.txt._COPYING_
(3)开始运行
[root@node1 ~]# run-example org.apache.spark.examples.streaming.HdfsWordCount /streaming17/11/04 09:22:09 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable-------------------------------------------Time: 1509801734000 ms--------------------------------------------------------------------------------------Time: 1509801736000 ms--------------------------------------------------------------------------------------Time: 1509801738000 ms--------------------------------------------------------------------------------------Time: 1509801740000 ms--------------------------------------------------------------------------------------Time: 1509801742000 ms--------------------------------------------------------------------------------------Time: 1509801744000 ms-------------------------------------------
(4)上传需要处理的文件
另外开一个终端,上传文件。
[root@node1 ~]# hdfs dfs -put data/books.txt /streaming
这是可以看到HdfsWordCount 程序的输出
-------------------------------------------Time: 1509801746000 ms--------------------------------------------------------------------------------------Time: 1509801748000 ms-------------------------------------------(2001 49.0 S2 Java,1)(3003 49.0 S3 Hive教程,1)(3002 98.0 S3 Spark基础,1)(3004 56.0 S3 HBase教程,1)(3005 49.5 S3 大数据概论,1)(1002 39.0 S1 C语言,1)(2071 99.0 S2 Oracle,1)(1021 45.0 S1 数据结构,1)(1001 39.0 S1 计算机基础,1)(2091 69.0 S2 Linux,1)...-------------------------------------------Time: 1509801750000 ms--------------------------------------------------------------------------------------Time: 1509801752000 ms--------------------------------------------------------------------------------------Time: 1509801754000 ms--------------------------------------------------------------------------------------Time: 1509801756000 ms-------------------------------------------
再上传一个文件
[root@node1 ~]# hdfs dfs -put data/Hamlet.txt /streaming
同样,这时可以可以看到HdfsWordCount 程序的输出
-------------------------------------------Time: 1509801758000 ms--------------------------------------------------------------------------------------Time: 1509801760000 ms--------------------------------------------------------------------------------------Time: 1509801762000 ms--------------------------------------------------------------------------------------Time: 1509801764000 ms-------------------------------------------(weary,,1)(pate,4)(whereof,,1)(joy.,1)(rises.,1)(lug,1)(stuck,,1)(shot,7)(line:,1)(order,2)...-------------------------------------------Time: 1509801766000 ms--------------------------------------------------------------------------------------Time: 1509801768000 ms-------------------------------------------
阅读全文
0 0
- Spark2.x学习笔记:17、Spark Streaming之HdfsWordCount 学习
- Spark组件之Spark Streaming学习4--HdfsWordCount 学习
- Spark2.x学习笔记:16、Spark Streaming入门实例NetworkWordCount
- Spark2.x学习笔记:18、Spark Streaming程序解读
- Spark Streaming---HDFSwordcount
- Spark Streaming---HDFSwordcount
- Spark学习笔记之-Spark-Streaming
- Spark2.x学习笔记:3、 Spark核心概念RDD
- Spark2.x学习笔记:5、Spark On YARN模式
- Spark2.x学习笔记:7、Spark应用程序设计
- Spark2.x学习笔记:9、 Spark编程实例
- Spark2.x学习笔记:13、Spark SQL快速入门
- Spark2.x学习笔记:14、Spark SQL程序设计
- Spark2.x学习笔记:15、Spark SQL的SQL
- Spark Streaming学习笔记
- Spark Streaming 学习笔记
- Spark Streaming学习笔记
- spark学习笔记:Spark Streaming
- tensorflow 自主选择使用的gpu
- 广度优先收索实现迷宫问题
- 阿里云一键部署LNMP环境
- 阿里云windows/Linux 简易建站教程,附WordPress配置方法
- Linux基础——编译运行程序文件
- Spark2.x学习笔记:17、Spark Streaming之HdfsWordCount 学习
- MT 203 Multiple General Financial Institution Transfer多笔一般金融机构转账
- Android学习之WebView基础
- redis常用的五种数据类型简单介绍
- 循序渐进AOP 04
- 数据结构第三次上机 第三章之链栈
- 025day(学习结构(struct))
- Java性能调优的11个实用技巧
- eclipse怎么把Java项目转换为web项目