《Hadoop The Definitive Guide》ch05 Developing a MapReduce Application

来源:互联网 发布:知乎 图标资源 编辑:程序博客网 时间:2024/06/05 16:38

1. 介绍

MapReduce应用开发包含特定的流程。首先,编写map和reduce函数,最好能进行单元测试以保证它们能如期运行。然后写一个驱动程序来运行作业,可以使用数据集中的少量数据从IDE运行,看它是否能够正常运行。

2. GenericOptionsParser, Tool和ToolRunner

[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]>> hadoop ConfigurationPrinter -conf conf/hadoop-localhost.xml |grep mapred.job.tracker=mapred.job.tracker=localhost:8021

3. 问题:老是抱怨找不到class?
解决办法:
1. stop-all.sh
2. rm -rf /tmp/hadoop-nomad2/*
3. hadoop namenode -format
4. start-all.sh
5. jps 确认datanode进程起来
6. 重新运行程序,注意这里的jar文件是在HADOOP_CLASSPATH中的,而不是在hdfs中。

[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]>> hadoop jar ch05.jar v3.MaxTemperatureDriver input/ncdc/all max-temp12/07/03 01:33:40 INFO mapred.FileInputFormat: Total input paths to process : 212/07/03 01:33:40 INFO mapred.JobClient: Running job: job_201207030133_000112/07/03 01:33:41 INFO mapred.JobClient:  map 0% reduce 0%12/07/03 01:33:55 INFO mapred.JobClient:  map 100% reduce 0%12/07/03 01:34:07 INFO mapred.JobClient:  map 100% reduce 100%12/07/03 01:34:12 INFO mapred.JobClient: Job complete: job_201207030133_000112/07/03 01:34:12 INFO mapred.JobClient: Counters: 2612/07/03 01:34:12 INFO mapred.JobClient:   Job Counters 12/07/03 01:34:12 INFO mapred.JobClient:     Launched reduce tasks=112/07/03 01:34:12 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=1634712/07/03 01:34:12 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=012/07/03 01:34:12 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=012/07/03 01:34:12 INFO mapred.JobClient:     Launched map tasks=212/07/03 01:34:12 INFO mapred.JobClient:     Data-local map tasks=212/07/03 01:34:12 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=1000412/07/03 01:34:12 INFO mapred.JobClient:   File Input Format Counters 12/07/03 01:34:12 INFO mapred.JobClient:     Bytes Read=14797212/07/03 01:34:12 INFO mapred.JobClient:   File Output Format Counters 12/07/03 01:34:12 INFO mapred.JobClient:     Bytes Written=1812/07/03 01:34:12 INFO mapred.JobClient:   FileSystemCounters12/07/03 01:34:12 INFO mapred.JobClient:     FILE_BYTES_READ=2812/07/03 01:34:12 INFO mapred.JobClient:     HDFS_BYTES_READ=14818412/07/03 01:34:12 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=6292312/07/03 01:34:12 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1812/07/03 01:34:12 INFO mapred.JobClient:   Map-Reduce Framework12/07/03 01:34:12 INFO mapred.JobClient:     Map output materialized bytes=3412/07/03 01:34:12 INFO mapred.JobClient:     Map input records=1313012/07/03 01:34:12 INFO mapred.JobClient:     Reduce shuffle bytes=3412/07/03 01:34:12 INFO mapred.JobClient:     Spilled Records=412/07/03 01:34:12 INFO mapred.JobClient:     Map output bytes=11816112/07/03 01:34:12 INFO mapred.JobClient:     Map input bytes=177716812/07/03 01:34:12 INFO mapred.JobClient:     Combine input records=1312912/07/03 01:34:12 INFO mapred.JobClient:     SPLIT_RAW_BYTES=21212/07/03 01:34:12 INFO mapred.JobClient:     Reduce input records=212/07/03 01:34:12 INFO mapred.JobClient:     Reduce input groups=212/07/03 01:34:12 INFO mapred.JobClient:     Combine output records=212/07/03 01:34:12 INFO mapred.JobClient:     Reduce output records=212/07/03 01:34:12 INFO mapred.JobClient:     Map output records=13129

4. MapReduce web用户界面

http://server:50030


Job的详细信息,