《Hadoop The Definitive Guide》ch08 MapReduce Features
来源:互联网 发布:js去掉字符串最后几个 编辑:程序博客网 时间:2024/05/22 07:45
1. 计数器
1) 内置计数器
2) 用户自定义Java计数器
[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]>> hadoop jar ch08.jar MaxTemperatureWithCounters input/ncdc/all max-temp12/07/03 19:53:21 INFO mapred.FileInputFormat: Total input paths to process : 212/07/03 19:53:21 INFO mapred.JobClient: Running job: job_201207030133_000212/07/03 19:53:22 INFO mapred.JobClient: map 0% reduce 0%12/07/03 19:53:37 INFO mapred.JobClient: map 100% reduce 0%12/07/03 19:53:49 INFO mapred.JobClient: map 100% reduce 100%12/07/03 19:53:54 INFO mapred.JobClient: Job complete: job_201207030133_000212/07/03 19:53:54 INFO mapred.JobClient: Counters: 2912/07/03 19:53:54 INFO mapred.JobClient: Job Counters 12/07/03 19:53:54 INFO mapred.JobClient: Launched reduce tasks=112/07/03 19:53:54 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=1630512/07/03 19:53:54 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=012/07/03 19:53:54 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=012/07/03 19:53:54 INFO mapred.JobClient: Launched map tasks=212/07/03 19:53:54 INFO mapred.JobClient: Data-local map tasks=212/07/03 19:53:54 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=1006812/07/03 19:53:54 INFO mapred.JobClient: File Input Format Counters 12/07/03 19:53:54 INFO mapred.JobClient: Bytes Read=14797212/07/03 19:53:54 INFO mapred.JobClient: File Output Format Counters 12/07/03 19:53:54 INFO mapred.JobClient: Bytes Written=1812/07/03 19:53:54 INFO mapred.JobClient: FileSystemCounters12/07/03 19:53:54 INFO mapred.JobClient: FILE_BYTES_READ=2812/07/03 19:53:54 INFO mapred.JobClient: HDFS_BYTES_READ=14818412/07/03 19:53:54 INFO mapred.JobClient: FILE_BYTES_WRITTEN=6299212/07/03 19:53:54 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1812/07/03 19:53:54 INFO mapred.JobClient: TemperatureQuality12/07/03 19:53:54 INFO mapred.JobClient: 1=1312912/07/03 19:53:54 INFO mapred.JobClient: 9=112/07/03 19:53:54 INFO mapred.JobClient: Air Temperature Records12/07/03 19:53:54 INFO mapred.JobClient: Missing=112/07/03 19:53:54 INFO mapred.JobClient: Map-Reduce Framework12/07/03 19:53:54 INFO mapred.JobClient: Map output materialized bytes=3412/07/03 19:53:54 INFO mapred.JobClient: Map input records=1313012/07/03 19:53:54 INFO mapred.JobClient: Reduce shuffle bytes=3412/07/03 19:53:54 INFO mapred.JobClient: Spilled Records=412/07/03 19:53:54 INFO mapred.JobClient: Map output bytes=11816112/07/03 19:53:54 INFO mapred.JobClient: Map input bytes=177716812/07/03 19:53:54 INFO mapred.JobClient: Combine input records=1312912/07/03 19:53:54 INFO mapred.JobClient: SPLIT_RAW_BYTES=21212/07/03 19:53:54 INFO mapred.JobClient: Reduce input records=212/07/03 19:53:54 INFO mapred.JobClient: Reduce input groups=212/07/03 19:53:54 INFO mapred.JobClient: Combine output records=212/07/03 19:53:54 INFO mapred.JobClient: Reduce output records=212/07/03 19:53:54 INFO mapred.JobClient: Map output records=13129
[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]>> hadoop jar ch08.jar MissingTemperatureFields job_201207030133_0002Records with missing temperature fields: 0.01%
2. 排序
对数据进行排序是MapReduce的核心。
[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]>> hadoop jar ch08.jar SortDataPreprocessor input/ncdc/all input/ncdc/all-seq12/07/03 20:55:15 INFO mapred.FileInputFormat: Total input paths to process : 212/07/03 20:55:16 INFO mapred.JobClient: Running job: job_201207030133_000312/07/03 20:55:17 INFO mapred.JobClient: map 0% reduce 0%12/07/03 20:55:30 INFO mapred.JobClient: map 100% reduce 0%12/07/03 20:55:35 INFO mapred.JobClient: Job complete: job_201207030133_000312/07/03 20:55:35 INFO mapred.JobClient: Counters: 1612/07/03 20:55:35 INFO mapred.JobClient: Job Counters 12/07/03 20:55:35 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=1656012/07/03 20:55:35 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=012/07/03 20:55:35 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=012/07/03 20:55:35 INFO mapred.JobClient: Launched map tasks=212/07/03 20:55:35 INFO mapred.JobClient: Data-local map tasks=212/07/03 20:55:35 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=012/07/03 20:55:35 INFO mapred.JobClient: File Input Format Counters 12/07/03 20:55:35 INFO mapred.JobClient: Bytes Read=14797212/07/03 20:55:35 INFO mapred.JobClient: File Output Format Counters 12/07/03 20:55:35 INFO mapred.JobClient: Bytes Written=16340912/07/03 20:55:35 INFO mapred.JobClient: FileSystemCounters12/07/03 20:55:35 INFO mapred.JobClient: HDFS_BYTES_READ=14818412/07/03 20:55:35 INFO mapred.JobClient: FILE_BYTES_WRITTEN=4175412/07/03 20:55:35 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=16340912/07/03 20:55:35 INFO mapred.JobClient: Map-Reduce Framework12/07/03 20:55:35 INFO mapred.JobClient: Map input records=1313012/07/03 20:55:35 INFO mapred.JobClient: Spilled Records=012/07/03 20:55:35 INFO mapred.JobClient: Map input bytes=177716812/07/03 20:55:35 INFO mapred.JobClient: Map output records=1312912/07/03 20:55:35 INFO mapred.JobClient: SPLIT_RAW_BYTES=212
部分排序
[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]>> hadoop jar ch08.jar SortByTemperatureUsingHashPartitioner -D mapred.reduce.tasks=30 input/ncdc/all-seq output-hashsort <12/07/03 22:28:32 INFO mapred.FileInputFormat: Total input paths to process : 212/07/03 22:28:33 INFO mapred.JobClient: Running job: job_201207030133_000412/07/03 22:28:34 INFO mapred.JobClient: map 0% reduce 0%12/07/03 22:28:47 INFO mapred.JobClient: map 100% reduce 0%12/07/03 22:28:59 INFO mapred.JobClient: map 100% reduce 3%12/07/03 22:29:02 INFO mapred.JobClient: map 100% reduce 6%12/07/03 22:29:08 INFO mapred.JobClient: map 100% reduce 10%12/07/03 22:29:11 INFO mapred.JobClient: map 100% reduce 13%12/07/03 22:29:23 INFO mapred.JobClient: map 100% reduce 20%12/07/03 22:29:32 INFO mapred.JobClient: map 100% reduce 23%12/07/03 22:29:38 INFO mapred.JobClient: map 100% reduce 26%12/07/03 22:29:41 INFO mapred.JobClient: map 100% reduce 30%12/07/03 22:29:47 INFO mapred.JobClient: map 100% reduce 33%12/07/03 22:29:56 INFO mapred.JobClient: map 100% reduce 36%12/07/03 22:30:02 INFO mapred.JobClient: map 100% reduce 40%12/07/03 22:30:05 INFO mapred.JobClient: map 100% reduce 43%12/07/03 22:30:11 INFO mapred.JobClient: map 100% reduce 46%12/07/03 22:30:14 INFO mapred.JobClient: map 100% reduce 50%12/07/03 22:30:23 INFO mapred.JobClient: map 100% reduce 53%12/07/03 22:30:29 INFO mapred.JobClient: map 100% reduce 56%12/07/03 22:30:35 INFO mapred.JobClient: map 100% reduce 60%12/07/03 22:30:38 INFO mapred.JobClient: map 100% reduce 63%12/07/03 22:30:44 INFO mapred.JobClient: map 100% reduce 66%12/07/03 22:30:47 INFO mapred.JobClient: map 100% reduce 70%12/07/03 22:30:59 INFO mapred.JobClient: map 100% reduce 73%12/07/03 22:31:02 INFO mapred.JobClient: map 100% reduce 76%12/07/03 22:31:08 INFO mapred.JobClient: map 100% reduce 80%12/07/03 22:31:11 INFO mapred.JobClient: map 100% reduce 83%12/07/03 22:31:17 INFO mapred.JobClient: map 100% reduce 87%12/07/03 22:31:23 INFO mapred.JobClient: map 100% reduce 90%12/07/03 22:31:32 INFO mapred.JobClient: map 100% reduce 93%12/07/03 22:31:35 INFO mapred.JobClient: map 100% reduce 96%12/07/03 22:31:41 INFO mapred.JobClient: map 100% reduce 100%12/07/03 22:31:46 INFO mapred.JobClient: Job complete: job_201207030133_000412/07/03 22:31:46 INFO mapred.JobClient: Counters: 2612/07/03 22:31:46 INFO mapred.JobClient: Job Counters 12/07/03 22:31:46 INFO mapred.JobClient: Launched reduce tasks=3012/07/03 22:31:46 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=1628212/07/03 22:31:46 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=012/07/03 22:31:46 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=012/07/03 22:31:46 INFO mapred.JobClient: Launched map tasks=212/07/03 22:31:46 INFO mapred.JobClient: Data-local map tasks=212/07/03 22:31:46 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=33565812/07/03 22:31:46 INFO mapred.JobClient: File Input Format Counters 12/07/03 22:31:46 INFO mapred.JobClient: Bytes Read=16340912/07/03 22:31:46 INFO mapred.JobClient: File Output Format Counters 12/07/03 22:31:46 INFO mapred.JobClient: Bytes Written=18039912/07/03 22:31:46 INFO mapred.JobClient: FileSystemCounters12/07/03 22:31:46 INFO mapred.JobClient: FILE_BYTES_READ=188217112/07/03 22:31:46 INFO mapred.JobClient: HDFS_BYTES_READ=16363512/07/03 22:31:46 INFO mapred.JobClient: FILE_BYTES_WRITTEN=443159612/07/03 22:31:46 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=18039912/07/03 22:31:46 INFO mapred.JobClient: Map-Reduce Framework12/07/03 22:31:46 INFO mapred.JobClient: Map output materialized bytes=188235112/07/03 22:31:46 INFO mapred.JobClient: Map input records=1312912/07/03 22:31:46 INFO mapred.JobClient: Reduce shuffle bytes=127865112/07/03 22:31:46 INFO mapred.JobClient: Spilled Records=2625812/07/03 22:31:46 INFO mapred.JobClient: Map output bytes=184264112/07/03 22:31:46 INFO mapred.JobClient: Map input bytes=16315912/07/03 22:31:46 INFO mapred.JobClient: Combine input records=012/07/03 22:31:46 INFO mapred.JobClient: SPLIT_RAW_BYTES=22612/07/03 22:31:46 INFO mapred.JobClient: Reduce input records=1312912/07/03 22:31:46 INFO mapred.JobClient: Reduce input groups=11612/07/03 22:31:46 INFO mapred.JobClient: Combine output records=012/07/03 22:31:46 INFO mapred.JobClient: Reduce output records=1312912/07/03 22:31:46 INFO mapred.JobClient: Map output records=13129
已划分的MapFile文件查找
[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]>> hadoop jar ch08.jar SortByTemperatureToMapFile -D mapred.reduce.tasks=30 input/ncdc/all-seq output-hashmapso>12/07/03 22:35:53 INFO mapred.FileInputFormat: Total input paths to process : 212/07/03 22:35:53 INFO mapred.JobClient: Running job: job_201207030133_000512/07/03 22:35:54 INFO mapred.JobClient: map 0% reduce 0%12/07/03 22:36:08 INFO mapred.JobClient: map 100% reduce 0%12/07/03 22:36:20 INFO mapred.JobClient: map 100% reduce 3%12/07/03 22:36:23 INFO mapred.JobClient: map 100% reduce 6%12/07/03 22:36:29 INFO mapred.JobClient: map 100% reduce 10%12/07/03 22:36:32 INFO mapred.JobClient: map 100% reduce 13%12/07/03 22:36:44 INFO mapred.JobClient: map 100% reduce 20%12/07/03 22:36:53 INFO mapred.JobClient: map 100% reduce 23%12/07/03 22:36:56 INFO mapred.JobClient: map 100% reduce 26%12/07/03 22:37:02 INFO mapred.JobClient: map 100% reduce 30%12/07/03 22:37:05 INFO mapred.JobClient: map 100% reduce 33%12/07/03 22:37:17 INFO mapred.JobClient: map 100% reduce 40%12/07/03 22:37:26 INFO mapred.JobClient: map 100% reduce 43%12/07/03 22:37:29 INFO mapred.JobClient: map 100% reduce 46%12/07/03 22:37:35 INFO mapred.JobClient: map 100% reduce 50%12/07/03 22:37:38 INFO mapred.JobClient: map 100% reduce 53%12/07/03 22:37:51 INFO mapred.JobClient: map 100% reduce 60%12/07/03 22:38:00 INFO mapred.JobClient: map 100% reduce 63%12/07/03 22:38:03 INFO mapred.JobClient: map 100% reduce 66%12/07/03 22:38:09 INFO mapred.JobClient: map 100% reduce 70%12/07/03 22:38:12 INFO mapred.JobClient: map 100% reduce 73%12/07/03 22:38:18 INFO mapred.JobClient: map 100% reduce 74%12/07/03 22:38:21 INFO mapred.JobClient: map 100% reduce 77%12/07/03 22:38:24 INFO mapred.JobClient: map 100% reduce 80%12/07/03 22:38:33 INFO mapred.JobClient: map 100% reduce 83%12/07/03 22:38:36 INFO mapred.JobClient: map 100% reduce 86%12/07/03 22:38:42 INFO mapred.JobClient: map 100% reduce 90%12/07/03 22:38:45 INFO mapred.JobClient: map 100% reduce 93%12/07/03 22:38:57 INFO mapred.JobClient: map 100% reduce 100%12/07/03 22:39:02 INFO mapred.JobClient: Job complete: job_201207030133_000512/07/03 22:39:02 INFO mapred.JobClient: Counters: 2612/07/03 22:39:02 INFO mapred.JobClient: Job Counters 12/07/03 22:39:02 INFO mapred.JobClient: Launched reduce tasks=3012/07/03 22:39:02 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=1629912/07/03 22:39:02 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=012/07/03 22:39:02 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=012/07/03 22:39:02 INFO mapred.JobClient: Launched map tasks=212/07/03 22:39:02 INFO mapred.JobClient: Data-local map tasks=212/07/03 22:39:02 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=33035412/07/03 22:39:02 INFO mapred.JobClient: File Input Format Counters 12/07/03 22:39:02 INFO mapred.JobClient: Bytes Read=16340912/07/03 22:39:02 INFO mapred.JobClient: File Output Format Counters 12/07/03 22:39:02 INFO mapred.JobClient: Bytes Written=18693512/07/03 22:39:02 INFO mapred.JobClient: FileSystemCounters12/07/03 22:39:02 INFO mapred.JobClient: FILE_BYTES_READ=188217112/07/03 22:39:02 INFO mapred.JobClient: HDFS_BYTES_READ=16363512/07/03 22:39:02 INFO mapred.JobClient: FILE_BYTES_WRITTEN=443153212/07/03 22:39:02 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=18693512/07/03 22:39:02 INFO mapred.JobClient: Map-Reduce Framework12/07/03 22:39:02 INFO mapred.JobClient: Map output materialized bytes=188235112/07/03 22:39:02 INFO mapred.JobClient: Map input records=1312912/07/03 22:39:02 INFO mapred.JobClient: Reduce shuffle bytes=105482712/07/03 22:39:02 INFO mapred.JobClient: Spilled Records=2625812/07/03 22:39:02 INFO mapred.JobClient: Map output bytes=184264112/07/03 22:39:02 INFO mapred.JobClient: Map input bytes=16315912/07/03 22:39:02 INFO mapred.JobClient: Combine input records=012/07/03 22:39:02 INFO mapred.JobClient: SPLIT_RAW_BYTES=22612/07/03 22:39:02 INFO mapred.JobClient: Reduce input records=1312912/07/03 22:39:02 INFO mapred.JobClient: Reduce input groups=11612/07/03 22:39:02 INFO mapred.JobClient: Combine output records=012/07/03 22:39:02 INFO mapred.JobClient: Reduce output records=1312912/07/03 22:39:02 INFO mapred.JobClient: Map output records=13129
全局排序
>> hadoop jar ch08.jar SortByTemperatureUsingTotalOrderPartitioner -D mapred.reduce.tasks=30 input/ncdc/all-seq output-totalsort <12/07/03 23:35:45 INFO mapred.FileInputFormat: Total input paths to process : 212/07/03 23:35:45 INFO util.NativeCodeLoader: Loaded the native-hadoop library12/07/03 23:35:45 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library12/07/03 23:35:45 INFO compress.CodecPool: Got brand-new decompressor12/07/03 23:35:45 INFO compress.CodecPool: Got brand-new decompressor12/07/03 23:35:45 INFO compress.CodecPool: Got brand-new decompressor12/07/03 23:35:45 INFO compress.CodecPool: Got brand-new decompressor12/07/03 23:35:45 INFO lib.InputSampler: Using 1339 samples12/07/03 23:35:45 INFO compress.CodecPool: Got brand-new compressor12/07/03 23:35:45 INFO mapred.FileInputFormat: Total input paths to process : 212/07/03 23:35:45 INFO mapred.JobClient: Running job: job_201207030133_000612/07/03 23:35:46 INFO mapred.JobClient: map 0% reduce 0%12/07/03 23:36:01 INFO mapred.JobClient: map 100% reduce 0%12/07/03 23:36:13 INFO mapred.JobClient: map 100% reduce 3%12/07/03 23:36:16 INFO mapred.JobClient: map 100% reduce 6%12/07/03 23:36:25 INFO mapred.JobClient: map 100% reduce 10%12/07/03 23:36:28 INFO mapred.JobClient: map 100% reduce 13%12/07/03 23:36:37 INFO mapred.JobClient: map 100% reduce 20%12/07/03 23:36:49 INFO mapred.JobClient: map 100% reduce 26%12/07/03 23:36:58 INFO mapred.JobClient: map 100% reduce 30%12/07/03 23:37:01 INFO mapred.JobClient: map 100% reduce 33%12/07/03 23:37:10 INFO mapred.JobClient: map 100% reduce 36%12/07/03 23:37:16 INFO mapred.JobClient: map 100% reduce 40%12/07/03 23:37:19 INFO mapred.JobClient: map 100% reduce 43%12/07/03 23:37:25 INFO mapred.JobClient: map 100% reduce 46%12/07/03 23:37:31 INFO mapred.JobClient: map 100% reduce 50%12/07/03 23:37:40 INFO mapred.JobClient: map 100% reduce 56%12/07/03 23:37:49 INFO mapred.JobClient: map 100% reduce 60%12/07/03 23:37:52 INFO mapred.JobClient: map 100% reduce 63%12/07/03 23:38:01 INFO mapred.JobClient: map 100% reduce 66%12/07/03 23:38:04 INFO mapred.JobClient: map 100% reduce 70%12/07/03 23:38:13 INFO mapred.JobClient: map 100% reduce 76%12/07/03 23:38:22 INFO mapred.JobClient: map 100% reduce 80%12/07/03 23:38:25 INFO mapred.JobClient: map 100% reduce 83%12/07/03 23:38:34 INFO mapred.JobClient: map 100% reduce 87%12/07/03 23:38:37 INFO mapred.JobClient: map 100% reduce 90%12/07/03 23:38:40 INFO mapred.JobClient: map 100% reduce 91%12/07/03 23:38:46 INFO mapred.JobClient: map 100% reduce 93%12/07/03 23:38:49 INFO mapred.JobClient: map 100% reduce 96%12/07/03 23:38:58 INFO mapred.JobClient: map 100% reduce 100%12/07/03 23:39:03 INFO mapred.JobClient: Job complete: job_201207030133_000612/07/03 23:39:03 INFO mapred.JobClient: Counters: 2612/07/03 23:39:03 INFO mapred.JobClient: Job Counters 12/07/03 23:39:03 INFO mapred.JobClient: Launched reduce tasks=3012/07/03 23:39:03 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=1804012/07/03 23:39:03 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=012/07/03 23:39:03 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=012/07/03 23:39:03 INFO mapred.JobClient: Launched map tasks=212/07/03 23:39:03 INFO mapred.JobClient: Data-local map tasks=212/07/03 23:39:03 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=33619312/07/03 23:39:03 INFO mapred.JobClient: File Input Format Counters 12/07/03 23:39:03 INFO mapred.JobClient: Bytes Read=16340912/07/03 23:39:03 INFO mapred.JobClient: File Output Format Counters 12/07/03 23:39:03 INFO mapred.JobClient: Bytes Written=17733912/07/03 23:39:03 INFO mapred.JobClient: FileSystemCounters12/07/03 23:39:03 INFO mapred.JobClient: FILE_BYTES_READ=188217112/07/03 23:39:03 INFO mapred.JobClient: HDFS_BYTES_READ=16506712/07/03 23:39:03 INFO mapred.JobClient: FILE_BYTES_WRITTEN=446282812/07/03 23:39:03 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=17733912/07/03 23:39:03 INFO mapred.JobClient: Map-Reduce Framework12/07/03 23:39:03 INFO mapred.JobClient: Map output materialized bytes=188235112/07/03 23:39:03 INFO mapred.JobClient: Map input records=1312912/07/03 23:39:03 INFO mapred.JobClient: Reduce shuffle bytes=113880612/07/03 23:39:03 INFO mapred.JobClient: Spilled Records=2625812/07/03 23:39:03 INFO mapred.JobClient: Map output bytes=184264112/07/03 23:39:03 INFO mapred.JobClient: Map input bytes=16315912/07/03 23:39:03 INFO mapred.JobClient: Combine input records=012/07/03 23:39:03 INFO mapred.JobClient: SPLIT_RAW_BYTES=22612/07/03 23:39:03 INFO mapred.JobClient: Reduce input records=1312912/07/03 23:39:03 INFO mapred.JobClient: Reduce input groups=11612/07/03 23:39:03 INFO mapred.JobClient: Combine output records=012/07/03 23:39:03 INFO mapred.JobClient: Reduce output records=1312912/07/03 23:39:03 INFO mapred.JobClient: Map output records=13129
二次排序
[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]>> hadoop jar ch08.jar MaxTemperatureUsingSecondarySort input/ncdc/all output-secondarysort12/07/03 23:59:15 INFO mapred.FileInputFormat: Total input paths to process : 212/07/03 23:59:15 INFO mapred.JobClient: Running job: job_201207030133_000712/07/03 23:59:16 INFO mapred.JobClient: map 0% reduce 0%12/07/03 23:59:31 INFO mapred.JobClient: map 100% reduce 0%12/07/03 23:59:43 INFO mapred.JobClient: map 100% reduce 100%12/07/03 23:59:48 INFO mapred.JobClient: Job complete: job_201207030133_000712/07/03 23:59:48 INFO mapred.JobClient: Counters: 2612/07/03 23:59:48 INFO mapred.JobClient: Job Counters 12/07/03 23:59:48 INFO mapred.JobClient: Launched reduce tasks=112/07/03 23:59:48 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=1633012/07/03 23:59:48 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=012/07/03 23:59:48 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=012/07/03 23:59:48 INFO mapred.JobClient: Launched map tasks=212/07/03 23:59:48 INFO mapred.JobClient: Data-local map tasks=212/07/03 23:59:48 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=996712/07/03 23:59:48 INFO mapred.JobClient: File Input Format Counters 12/07/03 23:59:48 INFO mapred.JobClient: Bytes Read=14797212/07/03 23:59:48 INFO mapred.JobClient: File Output Format Counters 12/07/03 23:59:48 INFO mapred.JobClient: Bytes Written=1812/07/03 23:59:48 INFO mapred.JobClient: FileSystemCounters12/07/03 23:59:48 INFO mapred.JobClient: FILE_BYTES_READ=13129612/07/03 23:59:48 INFO mapred.JobClient: HDFS_BYTES_READ=14818412/07/03 23:59:48 INFO mapred.JobClient: FILE_BYTES_WRITTEN=32648212/07/03 23:59:48 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1812/07/03 23:59:48 INFO mapred.JobClient: Map-Reduce Framework12/07/03 23:59:48 INFO mapred.JobClient: Map output materialized bytes=13130212/07/03 23:59:48 INFO mapred.JobClient: Map input records=1313012/07/03 23:59:48 INFO mapred.JobClient: Reduce shuffle bytes=13130212/07/03 23:59:48 INFO mapred.JobClient: Spilled Records=2625812/07/03 23:59:48 INFO mapred.JobClient: Map output bytes=10503212/07/03 23:59:48 INFO mapred.JobClient: Map input bytes=177716812/07/03 23:59:48 INFO mapred.JobClient: Combine input records=012/07/03 23:59:48 INFO mapred.JobClient: SPLIT_RAW_BYTES=21212/07/03 23:59:48 INFO mapred.JobClient: Reduce input records=012/07/03 23:59:48 INFO mapred.JobClient: Reduce input groups=212/07/03 23:59:48 INFO mapred.JobClient: Combine output records=012/07/03 23:59:48 INFO mapred.JobClient: Reduce output records=212/07/03 23:59:48 INFO mapred.JobClient: Map output records=13129
3. 连接
4. 次要数据的分布 Side Data Distribution
分布式缓存
相对于在作业配置中对次要数据进行序列化,更好的方法是使用Hadoop的分布式缓存机制来分布数据集。它提供了为该任务及时复制文件和存档文件到任务节点的服务以便在运行时使用它们。为了节省网络带宽,每个作业文件通常复制到任何特定的节点一次。
>> hadoop jar ch08.jar MaxTemperatureByStationNameUsingDistributedCacheFile -files input/ncdc/metadata/stations-fixed-width.txt input/ncdc/all output <12/07/04 00:18:14 INFO mapred.FileInputFormat: Total input paths to process : 212/07/04 00:18:14 INFO mapred.JobClient: Running job: job_201207030133_000812/07/04 00:18:15 INFO mapred.JobClient: map 0% reduce 0%12/07/04 00:18:29 INFO mapred.JobClient: map 100% reduce 0%12/07/04 00:18:41 INFO mapred.JobClient: map 100% reduce 100%12/07/04 00:18:46 INFO mapred.JobClient: Job complete: job_201207030133_000812/07/04 00:18:46 INFO mapred.JobClient: Counters: 2612/07/04 00:18:46 INFO mapred.JobClient: Job Counters 12/07/04 00:18:46 INFO mapred.JobClient: Launched reduce tasks=112/07/04 00:18:46 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=1780012/07/04 00:18:46 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=012/07/04 00:18:46 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=012/07/04 00:18:46 INFO mapred.JobClient: Launched map tasks=212/07/04 00:18:46 INFO mapred.JobClient: Data-local map tasks=212/07/04 00:18:46 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=1037212/07/04 00:18:46 INFO mapred.JobClient: File Input Format Counters 12/07/04 00:18:46 INFO mapred.JobClient: Bytes Read=14797212/07/04 00:18:46 INFO mapred.JobClient: File Output Format Counters 12/07/04 00:18:46 INFO mapred.JobClient: Bytes Written=17012/07/04 00:18:46 INFO mapred.JobClient: FileSystemCounters12/07/04 00:18:46 INFO mapred.JobClient: FILE_BYTES_READ=23412/07/04 00:18:46 INFO mapred.JobClient: HDFS_BYTES_READ=14818412/07/04 00:18:46 INFO mapred.JobClient: FILE_BYTES_WRITTEN=6672212/07/04 00:18:46 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=17012/07/04 00:18:46 INFO mapred.JobClient: Map-Reduce Framework12/07/04 00:18:46 INFO mapred.JobClient: Map output materialized bytes=24012/07/04 00:18:46 INFO mapred.JobClient: Map input records=1313012/07/04 00:18:46 INFO mapred.JobClient: Reduce shuffle bytes=12012/07/04 00:18:46 INFO mapred.JobClient: Spilled Records=2412/07/04 00:18:46 INFO mapred.JobClient: Map output bytes=22319312/07/04 00:18:46 INFO mapred.JobClient: Map input bytes=177716812/07/04 00:18:46 INFO mapred.JobClient: Combine input records=1312912/07/04 00:18:46 INFO mapred.JobClient: SPLIT_RAW_BYTES=21212/07/04 00:18:46 INFO mapred.JobClient: Reduce input records=1212/07/04 00:18:46 INFO mapred.JobClient: Reduce input groups=612/07/04 00:18:46 INFO mapred.JobClient: Combine output records=1212/07/04 00:18:46 INFO mapred.JobClient: Reduce output records=612/07/04 00:18:46 INFO mapred.JobClient: Map output records=13129
- 《Hadoop The Definitive Guide》ch08 MapReduce Features
- 《Hadoop The Definitive Guide》ch02 MapReduce
- 《Hadoop The Definitive Guide》ch05 Developing a MapReduce Application
- 《Hadoop The Definitive Guide》ch06 How MapReduce Works
- 《Hadoop The Definitive Guide》ch07 MapReduce Types and Formats
- 《Hadoop: The Definitive Guide》读书笔记 -- Chapter 2 MapReduce
- Hadoop- The Definitive Guide 笔记
- Hadoop经典书籍----- Hadoop: The Definitive Guide
- 《Hadoop The Definitive Guide》ch10 Administering Hadoop
- Notes for Hadoop the definitive guide
- Hadoop- The Definitive Guide 笔记2
- 《Hadoop The Definitive Guide》ch12 HBase
- 《Hadoop The Definitive Guide》ch13 ZooKeeper
- 《Hadoop The Definitive Guide》ch11 Pig
- 《Hadoop The Definitive Guide》ch14 Case Studies
- Notes for Hadoop the definitive guide
- Hadoop YARN Installation: The definitive guide
- Hadoop:The Definitive Guide 4th Edition
- unix网络编程-第七章-小结
- dml with lmode=6 ?
- 个人简历
- Linux下用Eclipse编译开源程序
- GDB调试程序(一)
- 《Hadoop The Definitive Guide》ch08 MapReduce Features
- GDB调试程序(二)
- table合并单元格colspan和rowspan
- 大话23种设计模式
- udev轻松上路 (做2.6移植的朋友不要错过:)
- How to Use C's volatile Keyword
- 连接池访问数据库的两种方法
- CMarkup 遍历更新xml数据
- vc维的本质和结构风险最小化