Hadoop 运行 Wordcount程序

来源：互联网发布：台风战斗机知乎编辑：程序博客网时间：2024/05/21 12:08

Hadoop在/share/hadoop/mapreduce目录下有hadoop-mapreduce-examples-2.6.0.jar包，可以进行Wordcount运算。

本实验Hadoop版本为2.6.0，不同版本的Hadoop中hadoop-mapreduce-examples-2.6.0.jar存放路径不同

具体步骤如下：

一、在本地硬盘建立test文件夹，我建立在/home/sky目录下

<span style="font-size:18px;">[root@localhost sky]# cd /home/sky/[root@localhost sky]# mkdir test[root@localhost sky]# lsDesktop    eclipse  Music     Public   spark      VideosDocuments  hadoop   mysql     pycharm  Templates  workspaceDownloads  hive     Pictures  scala    test[root@localhost sky]# </span>

二、建立两个需要统计的文档test1.txt和test2.txt

<span style="font-size:18px;">[root@localhost sky]# cd test/[root@localhost test]# echo "hello world,hello hadoop" > test1.txt[root@localhost test]# echo "hello world,hello spark"  > test2.txt[root@localhost test]# cat test1.txt hello world,hello hadoop</span>

三、在HDFS上创建input目录，并查看

<span style="font-size:18px;">[root@localhost test]# cd /home/sky/hadoop/[root@localhost hadoop]# bin/hadoop fs -mkdir /input[root@localhost hadoop]# bin/hadoop fs -ls /Found 5 itemsdrwxr-xr-x   - root supergroup          0 2015-10-26 19:47 /homedrwxr-xr-x   - root supergroup          0 2015-10-29 21:56 /inputdrwxr-xr-x   - root supergroup          0 2015-10-29 20:54 /outputdrwx-wx-wx   - root supergroup          0 2015-10-29 20:57 /tmpdrwxr-xr-x   - root supergroup          0 2015-10-28 21:32 /user</span>

四、把本地磁盘上的test文件传到HDFS文件新建的/input目录下

[root@localhost hadoop]# bin/hadoop fs -put /home/sky/test/test*.txt /input
[root@localhost hadoop]# bin/hadoop fs -ls /input
Found 2 items
-rw-r--r-- 1 root supergroup 25 2015-10-29 22:02 /input/test1.txt
-rw-r--r-- 1 root supergroup 24 2015-10-29 22:02 /input/test2.txt

五，运行以及运行过程

<span style="font-size:18px;">[root@localhost hadoop]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /input /output/wordcount215/10/29 22:07:43 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:803215/10/29 22:07:46 INFO input.FileInputFormat: Total input paths to process : 215/10/29 22:07:46 INFO mapreduce.JobSubmitter: number of splits:215/10/29 22:07:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1445687376888_000515/10/29 22:07:48 INFO impl.YarnClientImpl: Submitted application application_1445687376888_000515/10/29 22:07:48 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1445687376888_0005/15/10/29 22:07:48 INFO mapreduce.Job: Running job: job_1445687376888_000515/10/29 22:08:03 INFO mapreduce.Job: Job job_1445687376888_0005 running in uber mode : false15/10/29 22:08:03 INFO mapreduce.Job:  map 0% reduce 0%15/10/29 22:08:23 INFO mapreduce.Job:  map 100% reduce 0%15/10/29 22:08:34 INFO mapreduce.Job:  map 100% reduce 100%15/10/29 22:08:35 INFO mapreduce.Job: Job job_1445687376888_0005 completed successfully15/10/29 22:08:35 INFO mapreduce.Job: Counters: 49File System CountersFILE: Number of bytes read=91FILE: Number of bytes written=316819FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=253HDFS: Number of bytes written=39HDFS: Number of read operations=9HDFS: Number of large read operations=0HDFS: Number of write operations=2Job Counters Launched map tasks=2Launched reduce tasks=1Data-local map tasks=2Total time spent by all maps in occupied slots (ms)=34517Total time spent by all reduces in occupied slots (ms)=8834Total time spent by all map tasks (ms)=34517Total time spent by all reduce tasks (ms)=8834Total vcore-seconds taken by all map tasks=34517Total vcore-seconds taken by all reduce tasks=8834Total megabyte-seconds taken by all map tasks=35345408Total megabyte-seconds taken by all reduce tasks=9046016Map-Reduce FrameworkMap input records=2Map output records=6Map output bytes=73Map output materialized bytes=97Input split bytes=204Combine input records=6Combine output records=6Reduce input groups=4Reduce shuffle bytes=97Reduce input records=6Reduce output records=4Spilled Records=12Shuffled Maps =2Failed Shuffles=0Merged Map outputs=2GC time elapsed (ms)=645CPU time spent (ms)=3820Physical memory (bytes) snapshot=505044992Virtual memory (bytes) snapshot=6221623296Total committed heap usage (bytes)=355999744Shuffle ErrorsBAD_ID=0CONNECTION=0IO_ERROR=0WRONG_LENGTH=0WRONG_MAP=0WRONG_REDUCE=0File Input Format Counters Bytes Read=49File Output Format Counters Bytes Written=39[root@localhost hadoop]# </span>

六、查看运行结果（注：以空格分词，所以出现“world,hello”）

<span style="font-size:18px;">[root@localhost hadoop]# bin/hdfs dfs -cat /output/wordcount2/*hadoop1hello2spark1world,hello2</span>

1 0