hadoop集群测试(单词计数)
来源:互联网 发布:办公邮件软件 编辑:程序博客网 时间:2024/06/08 15:50
Hadoop集群安装好后,可以测试hadoop的基本功能。hadoop自带了一个jar包(hadoop-examples-0.20.205.0.jar,不同版本最后不同)中wordcount程序可以测试统计单词的个数,先来体验一下再说。
[hadoop@master ~]$ mkdir input#先创建一个输入目录[hadoop@master ~]$ cd input/[hadoop@master input]$ echo "hello world">text1.txt#将要输入的文件放到该目录[hadoop@master input]$ echo "hello hadoop">text2.txt[hadoop@master input]$ lstext1.txt text2.txt[hadoop@master input]$ cat text1.txt hello world[hadoop@master input]$ cat text2.txt hello hadoop[hadoop@master input]$ cd ..[hadoop@master ~]$ lsinput log 公共的 模板 视频 图片 文档 下载 新文件~ 音乐 桌面[hadoop@master ~]$ /usr/bin/hadoop dfs -put ./input in#将input目录中的两个文件放到hdfs中[hadoop@master ~]$ /usr/bin/hadoop dfs -ls ./in/*#查看hdfs中的两个文件-rw-r--r-- 2 hadoop supergroup 12 2012-09-13 16:16 /user/hadoop/in/text1.txt-rw-r--r-- 2 hadoop supergroup 13 2012-09-13 16:16 /user/hadoop/in/text2.txt#运行hadoop自带的一个jar包中的wordcount程序,这个程序统计单词的出现次数#程序的输入是in这个目录中的两个文件,结果输出到out目录[hadoop@master ~]$ /usr/bin/hadoop jar /usr/hadoop-examples-0.20.205.0.jar wordcount in out12/09/13 16:20:32 INFO input.FileInputFormat: Total input paths to process : 212/09/13 16:20:36 INFO mapred.JobClient: Running job: job_201209131425_000112/09/13 16:20:37 INFO mapred.JobClient: map 0% reduce 0%12/09/13 16:23:38 INFO mapred.JobClient: map 50% reduce 0%12/09/13 16:24:31 INFO mapred.JobClient: map 100% reduce 16%12/09/13 16:24:40 INFO mapred.JobClient: map 100% reduce 100%12/09/13 16:24:45 INFO mapred.JobClient: Job complete: job_201209131425_000112/09/13 16:24:45 INFO mapred.JobClient: Counters: 2912/09/13 16:24:45 INFO mapred.JobClient: Job Counters 12/09/13 16:24:45 INFO mapred.JobClient: Launched reduce tasks=112/09/13 16:24:45 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=23020512/09/13 16:24:45 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=012/09/13 16:24:45 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=012/09/13 16:24:45 INFO mapred.JobClient: Launched map tasks=312/09/13 16:24:45 INFO mapred.JobClient: Data-local map tasks=312/09/13 16:24:45 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=5866712/09/13 16:24:45 INFO mapred.JobClient: File Output Format Counters 12/09/13 16:24:45 INFO mapred.JobClient: Bytes Written=2512/09/13 16:24:45 INFO mapred.JobClient: FileSystemCounters12/09/13 16:24:45 INFO mapred.JobClient: FILE_BYTES_READ=5512/09/13 16:24:45 INFO mapred.JobClient: HDFS_BYTES_READ=24112/09/13 16:24:45 INFO mapred.JobClient: FILE_BYTES_WRITTEN=6435412/09/13 16:24:45 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=2512/09/13 16:24:45 INFO mapred.JobClient: File Input Format Counters 12/09/13 16:24:45 INFO mapred.JobClient: Bytes Read=2512/09/13 16:24:45 INFO mapred.JobClient: Map-Reduce Framework12/09/13 16:24:45 INFO mapred.JobClient: Map output materialized bytes=6112/09/13 16:24:45 INFO mapred.JobClient: Map input records=212/09/13 16:24:45 INFO mapred.JobClient: Reduce shuffle bytes=6112/09/13 16:24:45 INFO mapred.JobClient: Spilled Records=812/09/13 16:24:45 INFO mapred.JobClient: Map output bytes=4112/09/13 16:24:45 INFO mapred.JobClient: CPU time spent (ms)=1384012/09/13 16:24:45 INFO mapred.JobClient: Total committed heap usage (bytes)=31936102412/09/13 16:24:45 INFO mapred.JobClient: Combine input records=412/09/13 16:24:45 INFO mapred.JobClient: SPLIT_RAW_BYTES=21612/09/13 16:24:45 INFO mapred.JobClient: Reduce input records=412/09/13 16:24:45 INFO mapred.JobClient: Reduce input groups=312/09/13 16:24:45 INFO mapred.JobClient: Combine output records=412/09/13 16:24:45 INFO mapred.JobClient: Physical memory (bytes) snapshot=32993280012/09/13 16:24:45 INFO mapred.JobClient: Reduce output records=312/09/13 16:24:45 INFO mapred.JobClient: Virtual memory (bytes) snapshot=113326080012/09/13 16:24:45 INFO mapred.JobClient: Map output records=4#运行完成后,可以看到多了一个out目录,注意hdfs中没有当前目录的概念,也不能使用cd命令[hadoop@master ~]$ /usr/bin/hadoop dfs -lsFound 2 itemsdrwxr-xr-x - hadoop supergroup 0 2012-09-13 16:16 /user/hadoop/indrwxr-xr-x - hadoop supergroup 0 2012-09-13 16:24 /user/hadoop/out[hadoop@master ~]$ /usr/bin/hadoop dfs -ls ./out#进入到out目录Found 3 items-rw-r--r-- 2 hadoop supergroup 0 2012-09-13 16:24 /user/hadoop/out/_SUCCESSdrwxr-xr-x - hadoop supergroup 0 2012-09-13 16:20 /user/hadoop/out/_logs-rw-r--r-- 2 hadoop supergroup 25 2012-09-13 16:24 /user/hadoop/out/part-r-00000[hadoop@master ~]$ /usr/bin/hadoop dfs -cat ./out/part-r-00000#查看结果hadoop1hello2world1[hadoop@master ~]$
对于一个需要时间很长的作业,我们可以通过浏览器查看作业的运行状态,通过访问master节点的50030端口(http://masterip:50030)可以查看master节点jobTracker的运行状态,访问master节点的50070端口可以查看集群dfs的信息。
截图如下:
JobTracker运行截图
dfs使用情况截图
- hadoop集群测试(单词计数)
- hadoop单词计数代码
- hadoop单词计数
- hadoop 打包运行 单词计数
- Hadoop 之 Wordcount 单词计数 (学习笔记)
- Hadoop之MapReduce改进的计数单词(八)
- 【Hadoop基础教程】5、Hadoop之单词计数
- 【Hadoop基础教程】Hadoop之单词计数wordcount
- Hadoop入门基础教程 Hadoop之单词计数
- Storm集群部署与单词计数程序
- Hadoop集群搭建之二(测试hadoop集群)
- Hadoop 实战之单词计数WordCount
- Hadoop 实战之单词计数wordcount
- Hadoop之MapReduce单词计数经典实例
- Hadoop WordCount 单词计数示例详细演示
- Hadoop多个文件单词计数
- Hadoop MapReduce Streaming小实验:单词计数
- Hadoop 集群基准测试
- 蒙版缓存 (转)(NEHE lesson26相关)
- Ubuntu 12 最快的两个源 个人感觉 163与cn99最快 ubuntu安装源下包过慢
- bash & dash
- Android手机分辨率基础知识(DPI,DIP计算)
- 关于 分数全家福(加减乘除方法)java版含交互对话框
- hadoop集群测试(单词计数)
- Android中的菜单显示风格
- 面试训练烙饼排序实现
- 树形结构设计
- linuxNFS配置
- QAbstractItemModel 数据更新
- Oracle 修改sessions和processes
- JAVA第三周实验
- XML入门一