Hadoop 运行 Wordcount程序
来源:互联网 发布:台风战斗机 知乎 编辑:程序博客网 时间:2024/05/21 12:08
Hadoop在/share/hadoop/mapreduce目录下有hadoop-mapreduce-examples-2.6.0.jar包,可以进行Wordcount运算。
本实验Hadoop版本为2.6.0,不同版本的Hadoop中hadoop-mapreduce-examples-2.6.0.jar存放路径不同
具体步骤如下:
一、在本地硬盘建立test文件夹,我建立在/home/sky目录下
<span style="font-size:18px;">[root@localhost sky]# cd /home/sky/[root@localhost sky]# mkdir test[root@localhost sky]# lsDesktop eclipse Music Public spark VideosDocuments hadoop mysql pycharm Templates workspaceDownloads hive Pictures scala test[root@localhost sky]# </span>
二、建立两个需要统计的文档test1.txt和test2.txt
<span style="font-size:18px;">[root@localhost sky]# cd test/[root@localhost test]# echo "hello world,hello hadoop" > test1.txt[root@localhost test]# echo "hello world,hello spark" > test2.txt[root@localhost test]# cat test1.txt hello world,hello hadoop</span>
三、在HDFS上创建input目录,并查看
<span style="font-size:18px;">[root@localhost test]# cd /home/sky/hadoop/[root@localhost hadoop]# bin/hadoop fs -mkdir /input[root@localhost hadoop]# bin/hadoop fs -ls /Found 5 itemsdrwxr-xr-x - root supergroup 0 2015-10-26 19:47 /homedrwxr-xr-x - root supergroup 0 2015-10-29 21:56 /inputdrwxr-xr-x - root supergroup 0 2015-10-29 20:54 /outputdrwx-wx-wx - root supergroup 0 2015-10-29 20:57 /tmpdrwxr-xr-x - root supergroup 0 2015-10-28 21:32 /user</span>
[root@localhost hadoop]# bin/hadoop fs -put /home/sky/test/test*.txt /input
[root@localhost hadoop]# bin/hadoop fs -ls /input
Found 2 items
-rw-r--r-- 1 root supergroup 25 2015-10-29 22:02 /input/test1.txt
-rw-r--r-- 1 root supergroup 24 2015-10-29 22:02 /input/test2.txt
五,运行以及运行过程
<span style="font-size:18px;">[root@localhost hadoop]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /input /output/wordcount215/10/29 22:07:43 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:803215/10/29 22:07:46 INFO input.FileInputFormat: Total input paths to process : 215/10/29 22:07:46 INFO mapreduce.JobSubmitter: number of splits:215/10/29 22:07:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1445687376888_000515/10/29 22:07:48 INFO impl.YarnClientImpl: Submitted application application_1445687376888_000515/10/29 22:07:48 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1445687376888_0005/15/10/29 22:07:48 INFO mapreduce.Job: Running job: job_1445687376888_000515/10/29 22:08:03 INFO mapreduce.Job: Job job_1445687376888_0005 running in uber mode : false15/10/29 22:08:03 INFO mapreduce.Job: map 0% reduce 0%15/10/29 22:08:23 INFO mapreduce.Job: map 100% reduce 0%15/10/29 22:08:34 INFO mapreduce.Job: map 100% reduce 100%15/10/29 22:08:35 INFO mapreduce.Job: Job job_1445687376888_0005 completed successfully15/10/29 22:08:35 INFO mapreduce.Job: Counters: 49File System CountersFILE: Number of bytes read=91FILE: Number of bytes written=316819FILE: Number of read operations=0FILE: Number of large read operations=0FILE: Number of write operations=0HDFS: Number of bytes read=253HDFS: Number of bytes written=39HDFS: Number of read operations=9HDFS: Number of large read operations=0HDFS: Number of write operations=2Job Counters Launched map tasks=2Launched reduce tasks=1Data-local map tasks=2Total time spent by all maps in occupied slots (ms)=34517Total time spent by all reduces in occupied slots (ms)=8834Total time spent by all map tasks (ms)=34517Total time spent by all reduce tasks (ms)=8834Total vcore-seconds taken by all map tasks=34517Total vcore-seconds taken by all reduce tasks=8834Total megabyte-seconds taken by all map tasks=35345408Total megabyte-seconds taken by all reduce tasks=9046016Map-Reduce FrameworkMap input records=2Map output records=6Map output bytes=73Map output materialized bytes=97Input split bytes=204Combine input records=6Combine output records=6Reduce input groups=4Reduce shuffle bytes=97Reduce input records=6Reduce output records=4Spilled Records=12Shuffled Maps =2Failed Shuffles=0Merged Map outputs=2GC time elapsed (ms)=645CPU time spent (ms)=3820Physical memory (bytes) snapshot=505044992Virtual memory (bytes) snapshot=6221623296Total committed heap usage (bytes)=355999744Shuffle ErrorsBAD_ID=0CONNECTION=0IO_ERROR=0WRONG_LENGTH=0WRONG_MAP=0WRONG_REDUCE=0File Input Format Counters Bytes Read=49File Output Format Counters Bytes Written=39[root@localhost hadoop]# </span>
六、查看运行结果(注:以空格分词,所以出现“world,hello”)
<span style="font-size:18px;">[root@localhost hadoop]# bin/hdfs dfs -cat /output/wordcount2/*hadoop1hello2spark1world,hello2</span>
1 0
- hadoop运行WordCount程序
- Hadoop 运行 Wordcount程序
- hadoop平台运行WordCount程序
- 使用hadoop运行wordcount程序
- 运行Hadoop示例程序WordCount
- Hadoop示例程序WordCount运行及详解
- 命令行运行hadoop实例wordcount程序
- hadoop初学之WordCount程序一步一步运行
- hadoop实例WordCount程序一步一步运行
- 命令行运行hadoop实例wordcount程序
- ubuntu下hadoop运行wordcount程序
- Hadoop示例程序WordCount编译运行
- Hadoop wordcount程序的配置运行
- Hadoop-1.2.1示例程序wordcount运行
- Hadoop实例WordCount程序一步一步运行
- hadoop-2.3.0运行wordcount程序
- 用hadoop运行一个简单程序WordCount
- hadoop 集群运行WordCount示例程序
- openstack组件oslo.message之RPCServer实现
- acm推荐书籍 培训计划
- 动态规划---最短编辑距离
- Ubuntu12.04上安装MySQL Server提示mysql-server : Depends: mysql-server-5.1
- 面试题10:二进制中1的个数
- Hadoop 运行 Wordcount程序
- hdoj1042(N!,大数乘)
- Xml特点,语法规范
- Levenshtein distance最小编辑距离算法实现
- S2SH+ajax+json-------(二)实现异步加载
- AndroidStudio实用插件收集
- 我自己对三维数组的理解
- ssh之struts2xml简单配置
- 拓扑排序模板