Hadoop第一个wordcount程序
来源:互联网 发布:新华产权交易所软件 编辑:程序博客网 时间:2024/04/29 13:52
Hadoop第一个wordcount程序
我们来运行hadoop-example.jar 里面自带的WordCount 程序,作用是统计单词的个数。
1)在Ubuntu1 的Hadoop 的home 目录下创建一个test.txt 文件,内容如下。Hello world
Hello world
Hello world
Hello world
2)在HDFS 系统里创建一个input 文件夹,使用命令如下。
$ hadoop fs –mkdir /user/hadoop/input
3)把创建好的test.txt 文件上传到HDFS 系统的input 文件夹下,命令如下。
$ hadoop fs –put /opt/hadoop-0.20.2/test.txt /user/hadoop/input/ (其中/opt/hadoop-0.20.1是你安装hadoop路径)
4)查看文件是否上传成功,结果如下图1 所示。
5)运行hadoop-1.0.3-examples.jar 下的单词统计案例,执行命令如下。
$ cd /opt/hadoop-1.0.3
$ hadoop jar hadoop-examples-1.0.3.jar wordcount /user/hadoop/input/test.txt
/user/hadoop/output
13/04/20 00:47:07 INFO input.FileInputFormat: Total input paths to process : 1
13/04/20 00:47:07 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/04/20 00:47:07 WARN snappy.LoadSnappy: Snappy native library not loaded
13/04/20 00:47:08 INFO mapred.JobClient: Running job: job_201304200039_0001
13/04/20 00:47:09 INFO mapred.JobClient: map 0% reduce 0%
13/04/20 00:47:45 INFO mapred.JobClient: map 100% reduce 0%
13/04/20 00:48:08 INFO mapred.JobClient: map 100% reduce 100%
13/04/20 00:48:13 INFO mapred.JobClient: Job complete: job_201304200039_0001
13/04/20 00:48:13 INFO mapred.JobClient: Counters: 29
13/04/20 00:48:13 INFO mapred.JobClient: Job Counters
13/04/20 00:48:13 INFO mapred.JobClient: Launched reduce tasks=1
13/04/20 00:48:13 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=28822
13/04/20 00:48:13 INFO mapred.JobClient: Total time spent by all reduces
waiting after reserving slots (ms)=0
13/04/20 00:48:13 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0
13/04/20 00:48:13 INFO mapred.JobClient: Launched map tasks=1
13/04/20 00:48:13 INFO mapred.JobClient: Data-local map tasks=1
13/04/20 00:48:13 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=18236
13/04/20 00:48:13 INFO mapred.JobClient: File Output Format Counters
13/04/20 00:48:13 INFO mapred.JobClient: Bytes Written=16
13/04/20 00:48:13 INFO mapred.JobClient: FileSystemCounters
13/04/20 00:48:13 INFO mapred.JobClient: FILE_BYTES_READ=30
13/04/20 00:48:13 INFO mapred.JobClient: HDFS_BYTES_READ=159
13/04/20 00:48:13 INFO mapred.JobClient: FILE_BYTES_WRITTEN=43053
13/04/20 00:48:13 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=16
13/04/20 00:48:13 INFO mapred.JobClient: File Input Format Counters
13/04/20 00:48:13 INFO mapred.JobClient: Bytes Read=48
13/04/20 00:48:13 INFO mapred.JobClient: Map-Reduce Framework
13/04/20 00:48:13 INFO mapred.JobClient: Map output materialized bytes=30
13/04/20 00:48:13 INFO mapred.JobClient: Map input records=4
13/04/20 00:48:13 INFO mapred.JobClient: Reduce shuffl e bytes=30
13/04/20 00:48:13 INFO mapred.JobClient: Spilled Records=4
13/04/20 00:48:13 INFO mapred.JobClient: Map output bytes=80
13/04/20 00:48:13 INFO mapred.JobClient: CPU time spent (ms)=2870
13/04/20 00:48:13 INFO mapred.JobClient: Total committed heap usage
(bytes)=210698240
13/04/20 00:48:13 INFO mapred.JobClient: Combine input records=8
13/04/20 00:48:13 INFO mapred.JobClient: SPLIT_RAW_BYTES=111
13/04/20 00:48:13 INFO mapred.JobClient: Reduce input records=2
13/04/20 00:48:13 INFO mapred.JobClient: Reduce input groups=2
13/04/20 00:48:13 INFO mapred.JobClient: Combine output records=2
13/04/20 00:48:13 INFO mapred.JobClient: Physical memory (bytes)
snapshot= 180101120
13/04/20 00:48:13 INFO mapred.JobClient: Reduce output records=2
13/04/20 00:48:13 INFO mapred.JobClient: Virtual memory (bytes)
snapshot= 749068288
13/04/20 00:48:13 INFO mapred.JobClient: Map output records=8
6)查看运行结果如下图2所示。
OK !到这里Hadoop 三个节点的集群就安装结束并且测试成功了。
0 0
- Hadoop第一个wordcount程序
- 第一个Hadoop程序WordCount
- 第一个hadoop程序-WordCount
- 第一个Hadoop程序WordCount
- Hadoop的第一个程序 wordcount
- 第一个Hadoop程序——WordCount
- hadoop 第一个程序wordcount执行过程
- hadoop的第一个程序wordcount实现
- hadoop 第一个程序 wordcount 详解
- 第一个hadoop入门程序WordCount
- hadoop的第一个程序WordCount
- 对hadoop第一个小程序WordCount的简单解释.
- Hadoop安装配置、运行第一个WordCount示例程序
- WordCount,第一个MapReduce程序
- 第一个mapreduce程序WordCount
- 第一个MapReduce程序-WordCount
- Maven+Eclipse+Hadoop第一个WordCount
- Hadoop的单机伪分布式搭建和运行第一个WordCount程序
- Maven常用命令
- 【网络基础】IP数据报文段解析
- android的WebView、WebViewClient、WebChromeClient的关系
- freeswitch对接asterisk案例
- 工作问题积累(十四)main函数中两个参数的作用
- Hadoop第一个wordcount程序
- ubuntu12.04上nfs安装、配置、测试
- ubuntu安装和查看已安装
- 如何给指定地址空间拍一个快照
- Implement strStr() [LeetCode] + KMP
- 一个人的时光
- Oracel move与大数据
- TCP_TW_REUSE 含义
- 数据库杂记:oracle忘记密码