Hadoop第一个wordcount程序

来源:互联网 发布:新华产权交易所软件 编辑:程序博客网 时间:2024/04/29 13:52

Hadoop第一个wordcount程序


我们来运行hadoop-example.jar 里面自带的WordCount 程序,作用是统计单词的个数。

1)在Ubuntu1 的Hadoop 的home 目录下创建一个test.txt 文件,内容如下。
Hello world
Hello world
Hello world
Hello world
2)在HDFS 系统里创建一个input 文件夹,使用命令如下。
$ hadoop fs –mkdir /user/hadoop/input
3)把创建好的test.txt 文件上传到HDFS 系统的input 文件夹下,命令如下。
$ hadoop fs –put  /opt/hadoop-0.20.2/test.txt  /user/hadoop/input/   (其中/opt/hadoop-0.20.1是你安装hadoop路径)

4)查看文件是否上传成功,结果如下图1 所示。


                                      图1 上传文件查询
5)运行hadoop-1.0.3-examples.jar 下的单词统计案例,执行命令如下。
$ cd /opt/hadoop-1.0.3
$ hadoop jar hadoop-examples-1.0.3.jar wordcount /user/hadoop/input/test.txt
/user/hadoop/output
13/04/20 00:47:07 INFO input.FileInputFormat: Total input paths to process : 1
13/04/20 00:47:07 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/04/20 00:47:07 WARN snappy.LoadSnappy: Snappy native library not loaded
13/04/20 00:47:08 INFO mapred.JobClient: Running job: job_201304200039_0001
13/04/20 00:47:09 INFO mapred.JobClient: map 0% reduce 0%
13/04/20 00:47:45 INFO mapred.JobClient: map 100% reduce 0%
13/04/20 00:48:08 INFO mapred.JobClient: map 100% reduce 100%
13/04/20 00:48:13 INFO mapred.JobClient: Job complete: job_201304200039_0001
13/04/20 00:48:13 INFO mapred.JobClient: Counters: 29
13/04/20 00:48:13 INFO mapred.JobClient: Job Counters
13/04/20 00:48:13 INFO mapred.JobClient: Launched reduce tasks=1
13/04/20 00:48:13 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=28822
13/04/20 00:48:13 INFO mapred.JobClient: Total time spent by all reduces
waiting after reserving slots (ms)=0
13/04/20 00:48:13 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0
13/04/20 00:48:13 INFO mapred.JobClient: Launched map tasks=1
13/04/20 00:48:13 INFO mapred.JobClient: Data-local map tasks=1
13/04/20 00:48:13 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=18236
13/04/20 00:48:13 INFO mapred.JobClient: File Output Format Counters
13/04/20 00:48:13 INFO mapred.JobClient: Bytes Written=16
13/04/20 00:48:13 INFO mapred.JobClient: FileSystemCounters
13/04/20 00:48:13 INFO mapred.JobClient: FILE_BYTES_READ=30
13/04/20 00:48:13 INFO mapred.JobClient: HDFS_BYTES_READ=159
13/04/20 00:48:13 INFO mapred.JobClient: FILE_BYTES_WRITTEN=43053
13/04/20 00:48:13 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=16
13/04/20 00:48:13 INFO mapred.JobClient: File Input Format Counters
13/04/20 00:48:13 INFO mapred.JobClient: Bytes Read=48
13/04/20 00:48:13 INFO mapred.JobClient: Map-Reduce Framework
13/04/20 00:48:13 INFO mapred.JobClient: Map output materialized bytes=30
13/04/20 00:48:13 INFO mapred.JobClient: Map input records=4
13/04/20 00:48:13 INFO mapred.JobClient: Reduce shuffl e bytes=30
13/04/20 00:48:13 INFO mapred.JobClient: Spilled Records=4
13/04/20 00:48:13 INFO mapred.JobClient: Map output bytes=80
13/04/20 00:48:13 INFO mapred.JobClient: CPU time spent (ms)=2870
13/04/20 00:48:13 INFO mapred.JobClient: Total committed heap usage
(bytes)=210698240
13/04/20 00:48:13 INFO mapred.JobClient: Combine input records=8
13/04/20 00:48:13 INFO mapred.JobClient: SPLIT_RAW_BYTES=111
13/04/20 00:48:13 INFO mapred.JobClient: Reduce input records=2
13/04/20 00:48:13 INFO mapred.JobClient: Reduce input groups=2
13/04/20 00:48:13 INFO mapred.JobClient: Combine output records=2
13/04/20 00:48:13 INFO mapred.JobClient: Physical memory (bytes)
snapshot= 180101120
13/04/20 00:48:13 INFO mapred.JobClient: Reduce output records=2
13/04/20 00:48:13 INFO mapred.JobClient: Virtual memory (bytes)
snapshot= 749068288
13/04/20 00:48:13 INFO mapred.JobClient: Map output records=8

6)查看运行结果如下图2所示。


                                        图2 WordCount 结果
  OK !到这里Hadoop 三个节点的集群就安装结束并且测试成功了。
0 0