《Hadoop The Definitive Guide》ch08 MapReduce Features

来源:互联网 发布:js去掉字符串最后几个 编辑:程序博客网 时间:2024/05/22 07:45

1. 计数器

1) 内置计数器

2) 用户自定义Java计数器

[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]>> hadoop jar ch08.jar MaxTemperatureWithCounters input/ncdc/all max-temp12/07/03 19:53:21 INFO mapred.FileInputFormat: Total input paths to process : 212/07/03 19:53:21 INFO mapred.JobClient: Running job: job_201207030133_000212/07/03 19:53:22 INFO mapred.JobClient:  map 0% reduce 0%12/07/03 19:53:37 INFO mapred.JobClient:  map 100% reduce 0%12/07/03 19:53:49 INFO mapred.JobClient:  map 100% reduce 100%12/07/03 19:53:54 INFO mapred.JobClient: Job complete: job_201207030133_000212/07/03 19:53:54 INFO mapred.JobClient: Counters: 2912/07/03 19:53:54 INFO mapred.JobClient:   Job Counters 12/07/03 19:53:54 INFO mapred.JobClient:     Launched reduce tasks=112/07/03 19:53:54 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=1630512/07/03 19:53:54 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=012/07/03 19:53:54 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=012/07/03 19:53:54 INFO mapred.JobClient:     Launched map tasks=212/07/03 19:53:54 INFO mapred.JobClient:     Data-local map tasks=212/07/03 19:53:54 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=1006812/07/03 19:53:54 INFO mapred.JobClient:   File Input Format Counters 12/07/03 19:53:54 INFO mapred.JobClient:     Bytes Read=14797212/07/03 19:53:54 INFO mapred.JobClient:   File Output Format Counters 12/07/03 19:53:54 INFO mapred.JobClient:     Bytes Written=1812/07/03 19:53:54 INFO mapred.JobClient:   FileSystemCounters12/07/03 19:53:54 INFO mapred.JobClient:     FILE_BYTES_READ=2812/07/03 19:53:54 INFO mapred.JobClient:     HDFS_BYTES_READ=14818412/07/03 19:53:54 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=6299212/07/03 19:53:54 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1812/07/03 19:53:54 INFO mapred.JobClient:   TemperatureQuality12/07/03 19:53:54 INFO mapred.JobClient:     1=1312912/07/03 19:53:54 INFO mapred.JobClient:     9=112/07/03 19:53:54 INFO mapred.JobClient:   Air Temperature Records12/07/03 19:53:54 INFO mapred.JobClient:     Missing=112/07/03 19:53:54 INFO mapred.JobClient:   Map-Reduce Framework12/07/03 19:53:54 INFO mapred.JobClient:     Map output materialized bytes=3412/07/03 19:53:54 INFO mapred.JobClient:     Map input records=1313012/07/03 19:53:54 INFO mapred.JobClient:     Reduce shuffle bytes=3412/07/03 19:53:54 INFO mapred.JobClient:     Spilled Records=412/07/03 19:53:54 INFO mapred.JobClient:     Map output bytes=11816112/07/03 19:53:54 INFO mapred.JobClient:     Map input bytes=177716812/07/03 19:53:54 INFO mapred.JobClient:     Combine input records=1312912/07/03 19:53:54 INFO mapred.JobClient:     SPLIT_RAW_BYTES=21212/07/03 19:53:54 INFO mapred.JobClient:     Reduce input records=212/07/03 19:53:54 INFO mapred.JobClient:     Reduce input groups=212/07/03 19:53:54 INFO mapred.JobClient:     Combine output records=212/07/03 19:53:54 INFO mapred.JobClient:     Reduce output records=212/07/03 19:53:54 INFO mapred.JobClient:     Map output records=13129

[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]>> hadoop jar ch08.jar MissingTemperatureFields job_201207030133_0002Records with missing temperature fields: 0.01%

2. 排序

对数据进行排序是MapReduce的核心。

[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]>> hadoop jar ch08.jar SortDataPreprocessor input/ncdc/all input/ncdc/all-seq12/07/03 20:55:15 INFO mapred.FileInputFormat: Total input paths to process : 212/07/03 20:55:16 INFO mapred.JobClient: Running job: job_201207030133_000312/07/03 20:55:17 INFO mapred.JobClient:  map 0% reduce 0%12/07/03 20:55:30 INFO mapred.JobClient:  map 100% reduce 0%12/07/03 20:55:35 INFO mapred.JobClient: Job complete: job_201207030133_000312/07/03 20:55:35 INFO mapred.JobClient: Counters: 1612/07/03 20:55:35 INFO mapred.JobClient:   Job Counters 12/07/03 20:55:35 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=1656012/07/03 20:55:35 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=012/07/03 20:55:35 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=012/07/03 20:55:35 INFO mapred.JobClient:     Launched map tasks=212/07/03 20:55:35 INFO mapred.JobClient:     Data-local map tasks=212/07/03 20:55:35 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=012/07/03 20:55:35 INFO mapred.JobClient:   File Input Format Counters 12/07/03 20:55:35 INFO mapred.JobClient:     Bytes Read=14797212/07/03 20:55:35 INFO mapred.JobClient:   File Output Format Counters 12/07/03 20:55:35 INFO mapred.JobClient:     Bytes Written=16340912/07/03 20:55:35 INFO mapred.JobClient:   FileSystemCounters12/07/03 20:55:35 INFO mapred.JobClient:     HDFS_BYTES_READ=14818412/07/03 20:55:35 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=4175412/07/03 20:55:35 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=16340912/07/03 20:55:35 INFO mapred.JobClient:   Map-Reduce Framework12/07/03 20:55:35 INFO mapred.JobClient:     Map input records=1313012/07/03 20:55:35 INFO mapred.JobClient:     Spilled Records=012/07/03 20:55:35 INFO mapred.JobClient:     Map input bytes=177716812/07/03 20:55:35 INFO mapred.JobClient:     Map output records=1312912/07/03 20:55:35 INFO mapred.JobClient:     SPLIT_RAW_BYTES=212

部分排序

[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]>> hadoop jar ch08.jar SortByTemperatureUsingHashPartitioner -D mapred.reduce.tasks=30 input/ncdc/all-seq output-hashsort                                        <12/07/03 22:28:32 INFO mapred.FileInputFormat: Total input paths to process : 212/07/03 22:28:33 INFO mapred.JobClient: Running job: job_201207030133_000412/07/03 22:28:34 INFO mapred.JobClient:  map 0% reduce 0%12/07/03 22:28:47 INFO mapred.JobClient:  map 100% reduce 0%12/07/03 22:28:59 INFO mapred.JobClient:  map 100% reduce 3%12/07/03 22:29:02 INFO mapred.JobClient:  map 100% reduce 6%12/07/03 22:29:08 INFO mapred.JobClient:  map 100% reduce 10%12/07/03 22:29:11 INFO mapred.JobClient:  map 100% reduce 13%12/07/03 22:29:23 INFO mapred.JobClient:  map 100% reduce 20%12/07/03 22:29:32 INFO mapred.JobClient:  map 100% reduce 23%12/07/03 22:29:38 INFO mapred.JobClient:  map 100% reduce 26%12/07/03 22:29:41 INFO mapred.JobClient:  map 100% reduce 30%12/07/03 22:29:47 INFO mapred.JobClient:  map 100% reduce 33%12/07/03 22:29:56 INFO mapred.JobClient:  map 100% reduce 36%12/07/03 22:30:02 INFO mapred.JobClient:  map 100% reduce 40%12/07/03 22:30:05 INFO mapred.JobClient:  map 100% reduce 43%12/07/03 22:30:11 INFO mapred.JobClient:  map 100% reduce 46%12/07/03 22:30:14 INFO mapred.JobClient:  map 100% reduce 50%12/07/03 22:30:23 INFO mapred.JobClient:  map 100% reduce 53%12/07/03 22:30:29 INFO mapred.JobClient:  map 100% reduce 56%12/07/03 22:30:35 INFO mapred.JobClient:  map 100% reduce 60%12/07/03 22:30:38 INFO mapred.JobClient:  map 100% reduce 63%12/07/03 22:30:44 INFO mapred.JobClient:  map 100% reduce 66%12/07/03 22:30:47 INFO mapred.JobClient:  map 100% reduce 70%12/07/03 22:30:59 INFO mapred.JobClient:  map 100% reduce 73%12/07/03 22:31:02 INFO mapred.JobClient:  map 100% reduce 76%12/07/03 22:31:08 INFO mapred.JobClient:  map 100% reduce 80%12/07/03 22:31:11 INFO mapred.JobClient:  map 100% reduce 83%12/07/03 22:31:17 INFO mapred.JobClient:  map 100% reduce 87%12/07/03 22:31:23 INFO mapred.JobClient:  map 100% reduce 90%12/07/03 22:31:32 INFO mapred.JobClient:  map 100% reduce 93%12/07/03 22:31:35 INFO mapred.JobClient:  map 100% reduce 96%12/07/03 22:31:41 INFO mapred.JobClient:  map 100% reduce 100%12/07/03 22:31:46 INFO mapred.JobClient: Job complete: job_201207030133_000412/07/03 22:31:46 INFO mapred.JobClient: Counters: 2612/07/03 22:31:46 INFO mapred.JobClient:   Job Counters 12/07/03 22:31:46 INFO mapred.JobClient:     Launched reduce tasks=3012/07/03 22:31:46 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=1628212/07/03 22:31:46 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=012/07/03 22:31:46 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=012/07/03 22:31:46 INFO mapred.JobClient:     Launched map tasks=212/07/03 22:31:46 INFO mapred.JobClient:     Data-local map tasks=212/07/03 22:31:46 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=33565812/07/03 22:31:46 INFO mapred.JobClient:   File Input Format Counters 12/07/03 22:31:46 INFO mapred.JobClient:     Bytes Read=16340912/07/03 22:31:46 INFO mapred.JobClient:   File Output Format Counters 12/07/03 22:31:46 INFO mapred.JobClient:     Bytes Written=18039912/07/03 22:31:46 INFO mapred.JobClient:   FileSystemCounters12/07/03 22:31:46 INFO mapred.JobClient:     FILE_BYTES_READ=188217112/07/03 22:31:46 INFO mapred.JobClient:     HDFS_BYTES_READ=16363512/07/03 22:31:46 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=443159612/07/03 22:31:46 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=18039912/07/03 22:31:46 INFO mapred.JobClient:   Map-Reduce Framework12/07/03 22:31:46 INFO mapred.JobClient:     Map output materialized bytes=188235112/07/03 22:31:46 INFO mapred.JobClient:     Map input records=1312912/07/03 22:31:46 INFO mapred.JobClient:     Reduce shuffle bytes=127865112/07/03 22:31:46 INFO mapred.JobClient:     Spilled Records=2625812/07/03 22:31:46 INFO mapred.JobClient:     Map output bytes=184264112/07/03 22:31:46 INFO mapred.JobClient:     Map input bytes=16315912/07/03 22:31:46 INFO mapred.JobClient:     Combine input records=012/07/03 22:31:46 INFO mapred.JobClient:     SPLIT_RAW_BYTES=22612/07/03 22:31:46 INFO mapred.JobClient:     Reduce input records=1312912/07/03 22:31:46 INFO mapred.JobClient:     Reduce input groups=11612/07/03 22:31:46 INFO mapred.JobClient:     Combine output records=012/07/03 22:31:46 INFO mapred.JobClient:     Reduce output records=1312912/07/03 22:31:46 INFO mapred.JobClient:     Map output records=13129

已划分的MapFile文件查找

[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]>> hadoop jar ch08.jar SortByTemperatureToMapFile -D mapred.reduce.tasks=30 input/ncdc/all-seq output-hashmapso>12/07/03 22:35:53 INFO mapred.FileInputFormat: Total input paths to process : 212/07/03 22:35:53 INFO mapred.JobClient: Running job: job_201207030133_000512/07/03 22:35:54 INFO mapred.JobClient:  map 0% reduce 0%12/07/03 22:36:08 INFO mapred.JobClient:  map 100% reduce 0%12/07/03 22:36:20 INFO mapred.JobClient:  map 100% reduce 3%12/07/03 22:36:23 INFO mapred.JobClient:  map 100% reduce 6%12/07/03 22:36:29 INFO mapred.JobClient:  map 100% reduce 10%12/07/03 22:36:32 INFO mapred.JobClient:  map 100% reduce 13%12/07/03 22:36:44 INFO mapred.JobClient:  map 100% reduce 20%12/07/03 22:36:53 INFO mapred.JobClient:  map 100% reduce 23%12/07/03 22:36:56 INFO mapred.JobClient:  map 100% reduce 26%12/07/03 22:37:02 INFO mapred.JobClient:  map 100% reduce 30%12/07/03 22:37:05 INFO mapred.JobClient:  map 100% reduce 33%12/07/03 22:37:17 INFO mapred.JobClient:  map 100% reduce 40%12/07/03 22:37:26 INFO mapred.JobClient:  map 100% reduce 43%12/07/03 22:37:29 INFO mapred.JobClient:  map 100% reduce 46%12/07/03 22:37:35 INFO mapred.JobClient:  map 100% reduce 50%12/07/03 22:37:38 INFO mapred.JobClient:  map 100% reduce 53%12/07/03 22:37:51 INFO mapred.JobClient:  map 100% reduce 60%12/07/03 22:38:00 INFO mapred.JobClient:  map 100% reduce 63%12/07/03 22:38:03 INFO mapred.JobClient:  map 100% reduce 66%12/07/03 22:38:09 INFO mapred.JobClient:  map 100% reduce 70%12/07/03 22:38:12 INFO mapred.JobClient:  map 100% reduce 73%12/07/03 22:38:18 INFO mapred.JobClient:  map 100% reduce 74%12/07/03 22:38:21 INFO mapred.JobClient:  map 100% reduce 77%12/07/03 22:38:24 INFO mapred.JobClient:  map 100% reduce 80%12/07/03 22:38:33 INFO mapred.JobClient:  map 100% reduce 83%12/07/03 22:38:36 INFO mapred.JobClient:  map 100% reduce 86%12/07/03 22:38:42 INFO mapred.JobClient:  map 100% reduce 90%12/07/03 22:38:45 INFO mapred.JobClient:  map 100% reduce 93%12/07/03 22:38:57 INFO mapred.JobClient:  map 100% reduce 100%12/07/03 22:39:02 INFO mapred.JobClient: Job complete: job_201207030133_000512/07/03 22:39:02 INFO mapred.JobClient: Counters: 2612/07/03 22:39:02 INFO mapred.JobClient:   Job Counters 12/07/03 22:39:02 INFO mapred.JobClient:     Launched reduce tasks=3012/07/03 22:39:02 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=1629912/07/03 22:39:02 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=012/07/03 22:39:02 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=012/07/03 22:39:02 INFO mapred.JobClient:     Launched map tasks=212/07/03 22:39:02 INFO mapred.JobClient:     Data-local map tasks=212/07/03 22:39:02 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=33035412/07/03 22:39:02 INFO mapred.JobClient:   File Input Format Counters 12/07/03 22:39:02 INFO mapred.JobClient:     Bytes Read=16340912/07/03 22:39:02 INFO mapred.JobClient:   File Output Format Counters 12/07/03 22:39:02 INFO mapred.JobClient:     Bytes Written=18693512/07/03 22:39:02 INFO mapred.JobClient:   FileSystemCounters12/07/03 22:39:02 INFO mapred.JobClient:     FILE_BYTES_READ=188217112/07/03 22:39:02 INFO mapred.JobClient:     HDFS_BYTES_READ=16363512/07/03 22:39:02 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=443153212/07/03 22:39:02 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=18693512/07/03 22:39:02 INFO mapred.JobClient:   Map-Reduce Framework12/07/03 22:39:02 INFO mapred.JobClient:     Map output materialized bytes=188235112/07/03 22:39:02 INFO mapred.JobClient:     Map input records=1312912/07/03 22:39:02 INFO mapred.JobClient:     Reduce shuffle bytes=105482712/07/03 22:39:02 INFO mapred.JobClient:     Spilled Records=2625812/07/03 22:39:02 INFO mapred.JobClient:     Map output bytes=184264112/07/03 22:39:02 INFO mapred.JobClient:     Map input bytes=16315912/07/03 22:39:02 INFO mapred.JobClient:     Combine input records=012/07/03 22:39:02 INFO mapred.JobClient:     SPLIT_RAW_BYTES=22612/07/03 22:39:02 INFO mapred.JobClient:     Reduce input records=1312912/07/03 22:39:02 INFO mapred.JobClient:     Reduce input groups=11612/07/03 22:39:02 INFO mapred.JobClient:     Combine output records=012/07/03 22:39:02 INFO mapred.JobClient:     Reduce output records=1312912/07/03 22:39:02 INFO mapred.JobClient:     Map output records=13129

全局排序

>> hadoop jar ch08.jar SortByTemperatureUsingTotalOrderPartitioner -D mapred.reduce.tasks=30 input/ncdc/all-seq output-totalsort                                     <12/07/03 23:35:45 INFO mapred.FileInputFormat: Total input paths to process : 212/07/03 23:35:45 INFO util.NativeCodeLoader: Loaded the native-hadoop library12/07/03 23:35:45 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library12/07/03 23:35:45 INFO compress.CodecPool: Got brand-new decompressor12/07/03 23:35:45 INFO compress.CodecPool: Got brand-new decompressor12/07/03 23:35:45 INFO compress.CodecPool: Got brand-new decompressor12/07/03 23:35:45 INFO compress.CodecPool: Got brand-new decompressor12/07/03 23:35:45 INFO lib.InputSampler: Using 1339 samples12/07/03 23:35:45 INFO compress.CodecPool: Got brand-new compressor12/07/03 23:35:45 INFO mapred.FileInputFormat: Total input paths to process : 212/07/03 23:35:45 INFO mapred.JobClient: Running job: job_201207030133_000612/07/03 23:35:46 INFO mapred.JobClient:  map 0% reduce 0%12/07/03 23:36:01 INFO mapred.JobClient:  map 100% reduce 0%12/07/03 23:36:13 INFO mapred.JobClient:  map 100% reduce 3%12/07/03 23:36:16 INFO mapred.JobClient:  map 100% reduce 6%12/07/03 23:36:25 INFO mapred.JobClient:  map 100% reduce 10%12/07/03 23:36:28 INFO mapred.JobClient:  map 100% reduce 13%12/07/03 23:36:37 INFO mapred.JobClient:  map 100% reduce 20%12/07/03 23:36:49 INFO mapred.JobClient:  map 100% reduce 26%12/07/03 23:36:58 INFO mapred.JobClient:  map 100% reduce 30%12/07/03 23:37:01 INFO mapred.JobClient:  map 100% reduce 33%12/07/03 23:37:10 INFO mapred.JobClient:  map 100% reduce 36%12/07/03 23:37:16 INFO mapred.JobClient:  map 100% reduce 40%12/07/03 23:37:19 INFO mapred.JobClient:  map 100% reduce 43%12/07/03 23:37:25 INFO mapred.JobClient:  map 100% reduce 46%12/07/03 23:37:31 INFO mapred.JobClient:  map 100% reduce 50%12/07/03 23:37:40 INFO mapred.JobClient:  map 100% reduce 56%12/07/03 23:37:49 INFO mapred.JobClient:  map 100% reduce 60%12/07/03 23:37:52 INFO mapred.JobClient:  map 100% reduce 63%12/07/03 23:38:01 INFO mapred.JobClient:  map 100% reduce 66%12/07/03 23:38:04 INFO mapred.JobClient:  map 100% reduce 70%12/07/03 23:38:13 INFO mapred.JobClient:  map 100% reduce 76%12/07/03 23:38:22 INFO mapred.JobClient:  map 100% reduce 80%12/07/03 23:38:25 INFO mapred.JobClient:  map 100% reduce 83%12/07/03 23:38:34 INFO mapred.JobClient:  map 100% reduce 87%12/07/03 23:38:37 INFO mapred.JobClient:  map 100% reduce 90%12/07/03 23:38:40 INFO mapred.JobClient:  map 100% reduce 91%12/07/03 23:38:46 INFO mapred.JobClient:  map 100% reduce 93%12/07/03 23:38:49 INFO mapred.JobClient:  map 100% reduce 96%12/07/03 23:38:58 INFO mapred.JobClient:  map 100% reduce 100%12/07/03 23:39:03 INFO mapred.JobClient: Job complete: job_201207030133_000612/07/03 23:39:03 INFO mapred.JobClient: Counters: 2612/07/03 23:39:03 INFO mapred.JobClient:   Job Counters 12/07/03 23:39:03 INFO mapred.JobClient:     Launched reduce tasks=3012/07/03 23:39:03 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=1804012/07/03 23:39:03 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=012/07/03 23:39:03 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=012/07/03 23:39:03 INFO mapred.JobClient:     Launched map tasks=212/07/03 23:39:03 INFO mapred.JobClient:     Data-local map tasks=212/07/03 23:39:03 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=33619312/07/03 23:39:03 INFO mapred.JobClient:   File Input Format Counters 12/07/03 23:39:03 INFO mapred.JobClient:     Bytes Read=16340912/07/03 23:39:03 INFO mapred.JobClient:   File Output Format Counters 12/07/03 23:39:03 INFO mapred.JobClient:     Bytes Written=17733912/07/03 23:39:03 INFO mapred.JobClient:   FileSystemCounters12/07/03 23:39:03 INFO mapred.JobClient:     FILE_BYTES_READ=188217112/07/03 23:39:03 INFO mapred.JobClient:     HDFS_BYTES_READ=16506712/07/03 23:39:03 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=446282812/07/03 23:39:03 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=17733912/07/03 23:39:03 INFO mapred.JobClient:   Map-Reduce Framework12/07/03 23:39:03 INFO mapred.JobClient:     Map output materialized bytes=188235112/07/03 23:39:03 INFO mapred.JobClient:     Map input records=1312912/07/03 23:39:03 INFO mapred.JobClient:     Reduce shuffle bytes=113880612/07/03 23:39:03 INFO mapred.JobClient:     Spilled Records=2625812/07/03 23:39:03 INFO mapred.JobClient:     Map output bytes=184264112/07/03 23:39:03 INFO mapred.JobClient:     Map input bytes=16315912/07/03 23:39:03 INFO mapred.JobClient:     Combine input records=012/07/03 23:39:03 INFO mapred.JobClient:     SPLIT_RAW_BYTES=22612/07/03 23:39:03 INFO mapred.JobClient:     Reduce input records=1312912/07/03 23:39:03 INFO mapred.JobClient:     Reduce input groups=11612/07/03 23:39:03 INFO mapred.JobClient:     Combine output records=012/07/03 23:39:03 INFO mapred.JobClient:     Reduce output records=1312912/07/03 23:39:03 INFO mapred.JobClient:     Map output records=13129

二次排序

[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]>> hadoop jar ch08.jar MaxTemperatureUsingSecondarySort input/ncdc/all output-secondarysort12/07/03 23:59:15 INFO mapred.FileInputFormat: Total input paths to process : 212/07/03 23:59:15 INFO mapred.JobClient: Running job: job_201207030133_000712/07/03 23:59:16 INFO mapred.JobClient:  map 0% reduce 0%12/07/03 23:59:31 INFO mapred.JobClient:  map 100% reduce 0%12/07/03 23:59:43 INFO mapred.JobClient:  map 100% reduce 100%12/07/03 23:59:48 INFO mapred.JobClient: Job complete: job_201207030133_000712/07/03 23:59:48 INFO mapred.JobClient: Counters: 2612/07/03 23:59:48 INFO mapred.JobClient:   Job Counters 12/07/03 23:59:48 INFO mapred.JobClient:     Launched reduce tasks=112/07/03 23:59:48 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=1633012/07/03 23:59:48 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=012/07/03 23:59:48 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=012/07/03 23:59:48 INFO mapred.JobClient:     Launched map tasks=212/07/03 23:59:48 INFO mapred.JobClient:     Data-local map tasks=212/07/03 23:59:48 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=996712/07/03 23:59:48 INFO mapred.JobClient:   File Input Format Counters 12/07/03 23:59:48 INFO mapred.JobClient:     Bytes Read=14797212/07/03 23:59:48 INFO mapred.JobClient:   File Output Format Counters 12/07/03 23:59:48 INFO mapred.JobClient:     Bytes Written=1812/07/03 23:59:48 INFO mapred.JobClient:   FileSystemCounters12/07/03 23:59:48 INFO mapred.JobClient:     FILE_BYTES_READ=13129612/07/03 23:59:48 INFO mapred.JobClient:     HDFS_BYTES_READ=14818412/07/03 23:59:48 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=32648212/07/03 23:59:48 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1812/07/03 23:59:48 INFO mapred.JobClient:   Map-Reduce Framework12/07/03 23:59:48 INFO mapred.JobClient:     Map output materialized bytes=13130212/07/03 23:59:48 INFO mapred.JobClient:     Map input records=1313012/07/03 23:59:48 INFO mapred.JobClient:     Reduce shuffle bytes=13130212/07/03 23:59:48 INFO mapred.JobClient:     Spilled Records=2625812/07/03 23:59:48 INFO mapred.JobClient:     Map output bytes=10503212/07/03 23:59:48 INFO mapred.JobClient:     Map input bytes=177716812/07/03 23:59:48 INFO mapred.JobClient:     Combine input records=012/07/03 23:59:48 INFO mapred.JobClient:     SPLIT_RAW_BYTES=21212/07/03 23:59:48 INFO mapred.JobClient:     Reduce input records=012/07/03 23:59:48 INFO mapred.JobClient:     Reduce input groups=212/07/03 23:59:48 INFO mapred.JobClient:     Combine output records=012/07/03 23:59:48 INFO mapred.JobClient:     Reduce output records=212/07/03 23:59:48 INFO mapred.JobClient:     Map output records=13129

3. 连接

4. 次要数据的分布 Side Data Distribution

分布式缓存

相对于在作业配置中对次要数据进行序列化,更好的方法是使用Hadoop的分布式缓存机制来分布数据集。它提供了为该任务及时复制文件和存档文件到任务节点的服务以便在运行时使用它们。为了节省网络带宽,每个作业文件通常复制到任何特定的节点一次。

>> hadoop jar ch08.jar MaxTemperatureByStationNameUsingDistributedCacheFile -files input/ncdc/metadata/stations-fixed-width.txt input/ncdc/all output                               <12/07/04 00:18:14 INFO mapred.FileInputFormat: Total input paths to process : 212/07/04 00:18:14 INFO mapred.JobClient: Running job: job_201207030133_000812/07/04 00:18:15 INFO mapred.JobClient:  map 0% reduce 0%12/07/04 00:18:29 INFO mapred.JobClient:  map 100% reduce 0%12/07/04 00:18:41 INFO mapred.JobClient:  map 100% reduce 100%12/07/04 00:18:46 INFO mapred.JobClient: Job complete: job_201207030133_000812/07/04 00:18:46 INFO mapred.JobClient: Counters: 2612/07/04 00:18:46 INFO mapred.JobClient:   Job Counters 12/07/04 00:18:46 INFO mapred.JobClient:     Launched reduce tasks=112/07/04 00:18:46 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=1780012/07/04 00:18:46 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=012/07/04 00:18:46 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=012/07/04 00:18:46 INFO mapred.JobClient:     Launched map tasks=212/07/04 00:18:46 INFO mapred.JobClient:     Data-local map tasks=212/07/04 00:18:46 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=1037212/07/04 00:18:46 INFO mapred.JobClient:   File Input Format Counters 12/07/04 00:18:46 INFO mapred.JobClient:     Bytes Read=14797212/07/04 00:18:46 INFO mapred.JobClient:   File Output Format Counters 12/07/04 00:18:46 INFO mapred.JobClient:     Bytes Written=17012/07/04 00:18:46 INFO mapred.JobClient:   FileSystemCounters12/07/04 00:18:46 INFO mapred.JobClient:     FILE_BYTES_READ=23412/07/04 00:18:46 INFO mapred.JobClient:     HDFS_BYTES_READ=14818412/07/04 00:18:46 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=6672212/07/04 00:18:46 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=17012/07/04 00:18:46 INFO mapred.JobClient:   Map-Reduce Framework12/07/04 00:18:46 INFO mapred.JobClient:     Map output materialized bytes=24012/07/04 00:18:46 INFO mapred.JobClient:     Map input records=1313012/07/04 00:18:46 INFO mapred.JobClient:     Reduce shuffle bytes=12012/07/04 00:18:46 INFO mapred.JobClient:     Spilled Records=2412/07/04 00:18:46 INFO mapred.JobClient:     Map output bytes=22319312/07/04 00:18:46 INFO mapred.JobClient:     Map input bytes=177716812/07/04 00:18:46 INFO mapred.JobClient:     Combine input records=1312912/07/04 00:18:46 INFO mapred.JobClient:     SPLIT_RAW_BYTES=21212/07/04 00:18:46 INFO mapred.JobClient:     Reduce input records=1212/07/04 00:18:46 INFO mapred.JobClient:     Reduce input groups=612/07/04 00:18:46 INFO mapred.JobClient:     Combine output records=1212/07/04 00:18:46 INFO mapred.JobClient:     Reduce output records=612/07/04 00:18:46 INFO mapred.JobClient:     Map output records=13129


原创粉丝点击