Hadoop示例程序WordCount编译运行
来源:互联网 发布:淘宝售后维修能退款吗 编辑:程序博客网 时间:2024/05/22 13:51
首先确保Hadoop已正确安装及运行。
将WordCount.java拷贝出来
$ cp ./src/examples/org/apache/hadoop/examples/WordCount.java /home/hadoop/
在当前目录下创建一个存放WordCount.class的文件夹
$ mkdir class
编译WordCount.java
$ javac -classpath /usr/local/hadoop/hadoop-core-0.20.203.0.jar:/usr/local/hadoop/lib/commons-cli-1.2.jar WordCount.java -d class
编译完成后class文件夹下会出现一个org文件夹
$ ls classorg
对编译好的class打包
$ cd class$ jar cvf WordCount.jar *已添加清单正在添加: org/(输入 = 0) (输出 = 0)(存储了 0%)正在添加: org/apache/(输入 = 0) (输出 = 0)(存储了 0%)正在添加: org/apache/hadoop/(输入 = 0) (输出 = 0)(存储了 0%)正在添加: org/apache/hadoop/examples/(输入 = 0) (输出 = 0)(存储了 0%)正在添加: org/apache/hadoop/examples/WordCount$TokenizerMapper.class(输入 = 1790) (输出 = 765)(压缩了 57%)正在添加: org/apache/hadoop/examples/WordCount$IntSumReducer.class(输入 = 1793) (输出 = 746)(压缩了 58%)正在添加: org/apache/hadoop/examples/WordCount.class(输入 = 1911) (输出 = 996)(压缩了 47%)
至此java文件的编译工作已经完成
准备测试文件,启动Hadoop。
由于运行Hadoop时指定的输入文件只能是HDFS文件系统里的文件,所以我们必须将要测试的文件从本地文件系统拷贝到HDFS文件系统中。
$ hadoop fs -mkdir input$ hadoop fs -lsFound 1 itemsdrwxr-xr-x - hadoop supergroup 0 2014-03-26 10:39 /user/hadoop/input$ hadoop fs -put file input$ hadoop fs -ls inputFound 1 items-rw-r--r-- 2 hadoop supergroup 75 2014-03-26 10:40 /user/hadoop/input/file
运行程序
$ cd class$ lsorg WordCount.jar$ hadoop jar WordCount.jar org.apache.hadoop.examples.WordCount input output14/03/26 10:57:39 INFO input.FileInputFormat: Total input paths to process : 114/03/26 10:57:40 INFO mapred.JobClient: Running job: job_201403261015_000114/03/26 10:57:41 INFO mapred.JobClient: map 0% reduce 0%14/03/26 10:57:54 INFO mapred.JobClient: map 100% reduce 0%14/03/26 10:58:06 INFO mapred.JobClient: map 100% reduce 100%14/03/26 10:58:11 INFO mapred.JobClient: Job complete: job_201403261015_000114/03/26 10:58:11 INFO mapred.JobClient: Counters: 2514/03/26 10:58:11 INFO mapred.JobClient: Job Counters 14/03/26 10:58:11 INFO mapred.JobClient: Launched reduce tasks=114/03/26 10:58:11 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=1232114/03/26 10:58:11 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=014/03/26 10:58:11 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=014/03/26 10:58:11 INFO mapred.JobClient: Launched map tasks=114/03/26 10:58:11 INFO mapred.JobClient: Data-local map tasks=114/03/26 10:58:11 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=1030314/03/26 10:58:11 INFO mapred.JobClient: File Output Format Counters 14/03/26 10:58:11 INFO mapred.JobClient: Bytes Written=5114/03/26 10:58:11 INFO mapred.JobClient: FileSystemCounters14/03/26 10:58:11 INFO mapred.JobClient: FILE_BYTES_READ=8514/03/26 10:58:11 INFO mapred.JobClient: HDFS_BYTES_READ=18414/03/26 10:58:11 INFO mapred.JobClient: FILE_BYTES_WRITTEN=4254114/03/26 10:58:11 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=5114/03/26 10:58:11 INFO mapred.JobClient: File Input Format Counters 14/03/26 10:58:11 INFO mapred.JobClient: Bytes Read=7514/03/26 10:58:11 INFO mapred.JobClient: Map-Reduce Framework14/03/26 10:58:11 INFO mapred.JobClient: Reduce input groups=714/03/26 10:58:11 INFO mapred.JobClient: Map output materialized bytes=8514/03/26 10:58:11 INFO mapred.JobClient: Combine output records=714/03/26 10:58:11 INFO mapred.JobClient: Map input records=114/03/26 10:58:11 INFO mapred.JobClient: Reduce shuffle bytes=014/03/26 10:58:11 INFO mapred.JobClient: Reduce output records=714/03/26 10:58:11 INFO mapred.JobClient: Spilled Records=1414/03/26 10:58:11 INFO mapred.JobClient: Map output bytes=13114/03/26 10:58:11 INFO mapred.JobClient: Combine input records=1414/03/26 10:58:11 INFO mapred.JobClient: Map output records=1414/03/26 10:58:11 INFO mapred.JobClient: SPLIT_RAW_BYTES=10914/03/26 10:58:11 INFO mapred.JobClient: Reduce input records=7
查看结果
$ hadoop fs -lsFound 2 itemsdrwxr-xr-x - hadoop supergroup 0 2014-03-26 10:40 /user/hadoop/inputdrwxr-xr-x - hadoop supergroup 0 2014-03-26 10:58 /user/hadoop/output
可以发现hadoop中多了一个output文件,查看output中的文件信息
$ hadoop fs -ls outputFound 3 items-rw-r--r-- 2 hadoop supergroup 0 2014-03-26 11:04 /user/hadoop/output/_SUCCESSdrwxr-xr-x - hadoop supergroup 0 2014-03-26 11:04 /user/hadoop/output/_logs-rw-r--r-- 2 hadoop supergroup 65 2014-03-26 11:04 /user/hadoop/output/part-r-00000
查看运行结果
$ hadoop fs -cat output/part-r-00000Bye3Hello3Word1World3bye1hello2world1
至此,Hadoop下WordCount示例运行结束。
如果还想运行一遍就需要把output文件夹删除,否则会报异常,如下
14/03/26 11:41:30 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost:9000/tmp/hadoop-hadoop/mapred/staging/hadoop/.staging/job_201403261015_0003Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory output already existsat org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:134)at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:830)at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:791)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:415)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:791)at org.apache.hadoop.mapreduce.Job.submit(Job.java:465)at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:494)at org.apache.hadoop.examples.WordCount.main(WordCount.java:67)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:601)at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
删除output文件夹操作如下
$ hadoop fs -rmr outputDeleted hdfs://localhost:9000/user/hadoop/output
也可以直接运行Hadoop示例中已经编译过的jar文件
$ hadoop jar /usr/local/hadoop/hadoop-examples-0.20.203.0.jar wordcount input output14/03/28 17:02:33 INFO input.FileInputFormat: Total input paths to process : 214/03/28 17:02:33 INFO mapred.JobClient: Running job: job_201403281439_000414/03/28 17:02:34 INFO mapred.JobClient: map 0% reduce 0%14/03/28 17:02:49 INFO mapred.JobClient: map 100% reduce 0%14/03/28 17:03:01 INFO mapred.JobClient: map 100% reduce 100%14/03/28 17:03:06 INFO mapred.JobClient: Job complete: job_201403281439_000414/03/28 17:03:06 INFO mapred.JobClient: Counters: 2514/03/28 17:03:06 INFO mapred.JobClient: Job Counters 14/03/28 17:03:06 INFO mapred.JobClient: Launched reduce tasks=114/03/28 17:03:06 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=1721914/03/28 17:03:06 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=014/03/28 17:03:06 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=014/03/28 17:03:06 INFO mapred.JobClient: Launched map tasks=214/03/28 17:03:06 INFO mapred.JobClient: Data-local map tasks=214/03/28 17:03:06 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=1039814/03/28 17:03:06 INFO mapred.JobClient: File Output Format Counters 14/03/28 17:03:06 INFO mapred.JobClient: Bytes Written=6514/03/28 17:03:06 INFO mapred.JobClient: FileSystemCounters14/03/28 17:03:06 INFO mapred.JobClient: FILE_BYTES_READ=13114/03/28 17:03:06 INFO mapred.JobClient: HDFS_BYTES_READ=34314/03/28 17:03:06 INFO mapred.JobClient: FILE_BYTES_WRITTEN=6384014/03/28 17:03:06 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=6514/03/28 17:03:06 INFO mapred.JobClient: File Input Format Counters 14/03/28 17:03:06 INFO mapred.JobClient: Bytes Read=12414/03/28 17:03:06 INFO mapred.JobClient: Map-Reduce Framework14/03/28 17:03:06 INFO mapred.JobClient: Reduce input groups=914/03/28 17:03:06 INFO mapred.JobClient: Map output materialized bytes=13714/03/28 17:03:06 INFO mapred.JobClient: Combine output records=1114/03/28 17:03:06 INFO mapred.JobClient: Map input records=214/03/28 17:03:06 INFO mapred.JobClient: Reduce shuffle bytes=8514/03/28 17:03:06 INFO mapred.JobClient: Reduce output records=914/03/28 17:03:06 INFO mapred.JobClient: Spilled Records=2214/03/28 17:03:06 INFO mapred.JobClient: Map output bytes=21614/03/28 17:03:06 INFO mapred.JobClient: Combine input records=2314/03/28 17:03:06 INFO mapred.JobClient: Map output records=2314/03/28 17:03:06 INFO mapred.JobClient: SPLIT_RAW_BYTES=21914/03/28 17:03:06 INFO mapred.JobClient: Reduce input records=11
参考资料:http://www.cnblogs.com/aukle/p/3214984.html
http://blog.csdn.net/turkeyzhou/article/details/8121601
http://www.cnblogs.com/xia520pi/archive/2012/05/16/2504205.html
0 0
- Hadoop示例程序WordCount编译运行
- 运行Hadoop示例程序WordCount
- Hadoop MapReduce示例程序WordCount.java手动编译运行解析
- Hadoop MapReduce示例程序WordCount.java手动编译运行解析
- Hadoop示例程序WordCount运行及详解
- Hadoop-1.2.1示例程序wordcount运行
- hadoop 集群运行WordCount示例程序
- Hadoop示例程序WordCount运行及详解
- hadoop运行WordCount程序
- Hadoop 运行 Wordcount程序
- 运行hadoop的WordCount程序——编译,打包,运行
- wordcount示例程序运行全过程(Hadoop-1.0.0)
- Hadoop安装配置、运行第一个WordCount示例程序
- IDEA 运行 Hadoop WordCount示例
- Hadoop示例程序WordCount详解
- Hadoop示例程序WordCount详解
- Hadoop示例程序WordCount详解
- hadoop示例程序wordcount分析
- NSUserDefaults
- 摄像头录像以及回放工具
- Vb.net数据库编程(05):SQlserver的存储过程
- 解决JQUERY在IE下将字符串转成XML对象时产生的BUG
- C++类静态成员的初始化
- Hadoop示例程序WordCount编译运行
- vs2012 使用STLport-5.2.1
- 使用tolua++编译pkg,从而创建自定义类让Lua脚本使用
- 成功的SEOer基本上都有一下习惯”
- mysql修改root密码
- VB.net数据库编程(06):调用存储过程(带参和不参数情况)
- Google Java编程风格指南(献给那些没有良好编码习惯的程序员们)
- ubuntu使用超级终端
- poll机制的实现流程