Hadoop小练习——利用MapReduce求平均数

来源:互联网 发布:轻淘客和淘宝联盟区别 编辑:程序博客网 时间:2024/05/17 22:28

前面对MapRuduce理念作了学习,有一点领会,趁热打铁做一个小练习,巩固下理念知识才是真理,实践是检验真理的唯一标准。

这里做一个求分数平均数的MapReduce例子,这里引导一位前辈说的方法,我觉得非常道理。就是:

map阶段输入什么、map过程执行什么、map阶段输出什么、reduce阶段输入什么、执行什么、输出什么。能够将以上几个点弄清楚整明白,一个MapReduce程序就会跃然纸上。这里:

Map:指定格式的数据集(如"张三  60")——输入数据执行每条记录的分割操作以key-value写入上下文context中——执行功能

   得到指定键值对类型的输出(如"(new Text(张三),new IntWritable(60))")——输出结果

Reduce: map的输出——输入数据求出单个个体的总成绩后再除以该个体课程数目——执行功能得到指定键值对类型的输入——输出结果

鉴于上面的map和reduce过程,我们可以得到如下的代码:

package com.linxiaosheng.test;import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.FloatWritable;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.util.GenericOptionsParser;import com.linxiaosheng.test.Test1123.MapperClass;import com.linxiaosheng.test.Test1123.ReducerClass;public class ScoreAvgTest {/** *  * @author hadoop * KEYIN:输入map的key值,为每行文本的开始位置子字节计算,(0,11...) * VALUEIN:输入map的value,为每行文本值 * KEYOUT :输出的key值 * VALUEOUT:输出的value值 */public static class MapperClass extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable score = new IntWritable(); private Text name = new Text();@Overrideprotected void map(Object key, Text value,Context context)throws IOException, InterruptedException {// TODO Auto-generated method stubString lineText=value.toString();System.out.println("Before Map:"+key+","+lineText);StringTokenizer stringTokenizer=new StringTokenizer(lineText);while(stringTokenizer.hasMoreTokens()){name.set(stringTokenizer.nextToken());score.set(Integer.parseInt(stringTokenizer.nextToken()));System.out.println("Aefore Map:"+name+","+score);try {context.write(name, score);            } catch (IOException e) {                e.printStackTrace();            } catch (InterruptedException e) {                e.printStackTrace();            }}}}/** *  * @author hadoop *KEYIN:输入的名字 *VALUEIN:输入的分数 *KEYOUT:输出的名字 *VALUEOUT:统计输出的平均分 */public static class ReducerClass extends Reducer<Text, IntWritable, Text, IntWritable>{public IntWritable result = new IntWritable();protected void reduce(Text name, Iterable<IntWritable> scores,Context context)throws IOException, InterruptedException {// TODO Auto-generated method stubStringBuffer sb=new StringBuffer();int sum=0;int avg=0;int num=0;for(IntWritable score:scores){int s=score.get();sum+=s;num++;sb.append(s+",");}avg=sum/num;System.out.println("Bfter Reducer:"+name+","+sb.toString());System.out.println("After Reducer:"+name+","+avg);result.set(avg); try { context.write(name, result);            } catch (IOException e) {                e.printStackTrace();            } catch (InterruptedException e) {                e.printStackTrace();            }}}public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {        Configuration conf = new Configuration();        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();        /*if (otherArgs.length != 2) {          System.err.println("Usage: wordcount <in> <out>");          System.exit(2);        }*/        Job job = new Job(conf, "ScoreAvgTest");                job.setJarByClass(ScoreAvgTest.class);        job.setMapperClass(MapperClass.class);       job.setCombinerClass(ReducerClass.class);        job.setReducerClass(ReducerClass.class);        job.setOutputKeyClass(Text.class);        job.setOutputValueClass(IntWritable.class);                org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPath(job, new Path(otherArgs[0]));        org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPath(job, new Path(otherArgs[1]));        org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.setOutputPath(job, new Path(otherArgs[2]));        System.exit(job.waitForCompletion(true) ? 0 : 1);        System.out.println("end");    }}

数据集:这里的数据是码农我自己手工创建的,主要是想看看mapreduce的运行过程,所以就创建了两个文件,当然这里面的成绩也就没有什么是否符合正态分布的考虑了……

数据中设定有A-K共11个学生,共16门课程,具体数据如下:

score1.txt:

A   55B   65C   44D   87E   66F   90G   70H   59I   61J   58K   40A   45B   62C   64D   77E   36F   50G   80H   69I   71J   70K   49A   51B   64C   74D   37E   76F   80G   50H   51I   81J   68K   80A   85B   55C   49D   67E   69F   50G   80H   79I   81J   68K   80A   35B   55C   40D   47E   60F   72G   76H   79I   68J   78K   50A   65B   45C   74D   57E   56F   50G   60H   59I   61J   58K   60A   85B   45C   74D   67E   86F   70G   50H   79I   81J   78K   60A   50B   69C   40D   89E   69F   95G   75H   59I   60J   59K   45

score2.txt:

A   55B   65C   44D   87E   66F   90G   70H   59I   61J   58K   40A   45B   62C   64D   77E   36F   50G   80H   69I   71J   70K   49A   51B   64C   74D   37E   76F   80G   50H   51I   81J   68K   80A   85B   55C   49D   67E   69F   50G   80H   79I   81J   68K   80A   35B   55C   40D   47E   60F   72G   76H   79I   68J   78K   50A   65B   45C   74D   57E   56F   50G   60H   59I   61J   58K   60A   85B   45C   74D   67E   86F   70G   50H   79I   81J   78K   60A   50B   69C   40D   89E   69F   95G   75H   59I   60J   59K   45

首先先配置下运行参数arguments:

其执行过程中控制台打印的信息为:
15/07/05 20:35:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable15/07/05 20:35:33 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).15/07/05 20:35:33 INFO input.FileInputFormat: Total input paths to process : 215/07/05 20:35:33 WARN snappy.LoadSnappy: Snappy native library not loaded15/07/05 20:35:34 INFO mapred.JobClient: Running job: job_local577202179_000115/07/05 20:35:34 INFO mapred.LocalJobRunner: Waiting for map tasks15/07/05 20:35:34 INFO mapred.LocalJobRunner: Starting task: attempt_local577202179_0001_m_000000_015/07/05 20:35:34 INFO util.ProcessTree: setsid exited with exit code 015/07/05 20:35:34 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1a6a5f915/07/05 20:35:34 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/hadoop/input/score1.txt:0+70415/07/05 20:35:34 INFO mapred.MapTask: io.sort.mb = 10015/07/05 20:35:34 INFO mapred.MapTask: data buffer = 79691776/9961472015/07/05 20:35:34 INFO mapred.MapTask: record buffer = 262144/32768015/07/05 20:35:35 INFO mapred.JobClient:  map 0% reduce 0%Before Map:0,Before Map:1,A   55Aefore Map:A,55Before Map:8,Before Map:9,B   65Aefore Map:B,65Before Map:16,Before Map:17,C   44Aefore Map:C,44Before Map:24,Before Map:25,D   87Aefore Map:D,87Before Map:32,Before Map:33,E   66Aefore Map:E,66Before Map:40,Before Map:41,F   90Aefore Map:F,90Before Map:48,Before Map:49,G   70Aefore Map:G,70Before Map:56,Before Map:57,H   59Aefore Map:H,59Before Map:64,Before Map:65,I   61Aefore Map:I,61Before Map:72,Before Map:73,J   58Aefore Map:J,58Before Map:80,Before Map:81,K   40Aefore Map:K,40Before Map:88,Before Map:89,A   45Aefore Map:A,45Before Map:96,Before Map:97,B   62Aefore Map:B,62Before Map:104,Before Map:105,C   64Aefore Map:C,64Before Map:112,Before Map:113,D   77Aefore Map:D,77Before Map:120,Before Map:121,E   36Aefore Map:E,36Before Map:128,Before Map:129,F   50Aefore Map:F,50Before Map:136,Before Map:137,G   80Aefore Map:G,80Before Map:144,Before Map:145,H   69Aefore Map:H,69Before Map:152,Before Map:153,I   71Aefore Map:I,71Before Map:160,Before Map:161,J   70Aefore Map:J,70Before Map:168,Before Map:169,K   49Aefore Map:K,49Before Map:176,Before Map:177,A   51Aefore Map:A,51Before Map:184,Before Map:185,B   64Aefore Map:B,64Before Map:192,Before Map:193,C   74Aefore Map:C,74Before Map:200,Before Map:201,D   37Aefore Map:D,37Before Map:208,Before Map:209,E   76Aefore Map:E,76Before Map:216,Before Map:217,F   80Aefore Map:F,80Before Map:224,Before Map:225,G   50Aefore Map:G,50Before Map:232,Before Map:233,H   51Aefore Map:H,51Before Map:240,Before Map:241,I   81Aefore Map:I,81Before Map:248,Before Map:249,J   68Aefore Map:J,68Before Map:256,Before Map:257,K   80Aefore Map:K,80Before Map:264,Before Map:265,A   85Aefore Map:A,85Before Map:272,Before Map:273,B   55Aefore Map:B,55Before Map:280,Before Map:281,C   49Aefore Map:C,49Before Map:288,Before Map:289,D   67Aefore Map:D,67Before Map:296,Before Map:297,E   69Aefore Map:E,69Before Map:304,Before Map:305,F   50Aefore Map:F,50Before Map:312,Before Map:313,G   80Aefore Map:G,80Before Map:320,Before Map:321,H   79Aefore Map:H,79Before Map:328,Before Map:329,I   81Aefore Map:I,81Before Map:336,Before Map:337,J   68Aefore Map:J,68Before Map:344,Before Map:345,K   80Aefore Map:K,80Before Map:352,Before Map:353,A   35Aefore Map:A,35Before Map:360,Before Map:361,B   55Aefore Map:B,55Before Map:368,Before Map:369,C   40Aefore Map:C,40Before Map:376,Before Map:377,D   47Aefore Map:D,47Before Map:384,Before Map:385,E   60Aefore Map:E,60Before Map:392,Before Map:393,F   72Aefore Map:F,72Before Map:400,Before Map:401,G   76Aefore Map:G,76Before Map:408,Before Map:409,H   79Aefore Map:H,79Before Map:416,Before Map:417,I   68Aefore Map:I,68Before Map:424,Before Map:425,J   78Aefore Map:J,78Before Map:432,Before Map:433,K   50Aefore Map:K,50Before Map:440,Before Map:441,A   65Aefore Map:A,65Before Map:448,Before Map:449,B   45Aefore Map:B,45Before Map:456,Before Map:457,C   74Aefore Map:C,74Before Map:464,Before Map:465,D   57Aefore Map:D,57Before Map:472,Before Map:473,E   56Aefore Map:E,56Before Map:480,Before Map:481,F   50Aefore Map:F,50Before Map:488,Before Map:489,G   60Aefore Map:G,60Before Map:496,Before Map:497,H   59Aefore Map:H,59Before Map:504,Before Map:505,I   61Aefore Map:I,61Before Map:512,Before Map:513,J   58Aefore Map:J,58Before Map:520,Before Map:521,K   60Aefore Map:K,60Before Map:528,Before Map:529,A   85Aefore Map:A,85Before Map:536,Before Map:537,B   45Aefore Map:B,45Before Map:544,Before Map:545,C   74Aefore Map:C,74Before Map:552,Before Map:553,D   67Aefore Map:D,67Before Map:560,Before Map:561,E   86Aefore Map:E,86Before Map:568,Before Map:569,F   70Aefore Map:F,70Before Map:576,Before Map:577,G   50Aefore Map:G,50Before Map:584,Before Map:585,H   79Aefore Map:H,79Before Map:592,Before Map:593,I   81Aefore Map:I,81Before Map:600,Before Map:601,J   78Aefore Map:J,78Before Map:608,Before Map:609,K   60Aefore Map:K,60Before Map:616,Before Map:617,A   50Aefore Map:A,50Before Map:624,Before Map:625,B   69Aefore Map:B,69Before Map:632,Before Map:633,C   40Aefore Map:C,40Before Map:640,Before Map:641,D   89Aefore Map:D,89Before Map:648,Before Map:649,E   69Aefore Map:E,69Before Map:656,Before Map:657,F   95Aefore Map:F,95Before Map:664,Before Map:665,G   75Aefore Map:G,75Before Map:672,Before Map:673,H   59Aefore Map:H,59Before Map:680,Before Map:681,I   60Aefore Map:I,60Before Map:688,Before Map:689,J   59Aefore Map:J,59Before Map:696,Before Map:697,K   45Aefore Map:K,4515/07/05 20:35:39 INFO mapred.MapTask: Starting flush of map outputBfter Reducer:A,55,45,51,85,35,65,85,50,After Reducer:A,58Bfter Reducer:B,45,64,65,45,55,69,62,55,After Reducer:B,57Bfter Reducer:C,64,49,44,74,74,40,40,74,After Reducer:C,57Bfter Reducer:D,67,67,77,37,87,57,89,47,After Reducer:D,66Bfter Reducer:E,36,66,76,86,69,69,60,56,After Reducer:E,64Bfter Reducer:F,90,95,70,50,80,50,50,72,After Reducer:F,69Bfter Reducer:G,60,76,50,50,80,70,75,80,After Reducer:G,67Bfter Reducer:H,59,69,51,79,59,79,59,79,After Reducer:H,66Bfter Reducer:I,60,61,81,81,61,71,68,81,After Reducer:I,70Bfter Reducer:J,58,59,78,68,78,68,70,58,After Reducer:J,67Bfter Reducer:K,40,50,49,60,60,45,80,80,After Reducer:K,5815/07/05 20:35:39 INFO mapred.MapTask: Finished spill 015/07/05 20:35:39 INFO mapred.Task: Task:attempt_local577202179_0001_m_000000_0 is done. And is in the process of commiting15/07/05 20:35:39 INFO mapred.LocalJobRunner: 15/07/05 20:35:39 INFO mapred.Task: Task 'attempt_local577202179_0001_m_000000_0' done.15/07/05 20:35:39 INFO mapred.LocalJobRunner: Finishing task: attempt_local577202179_0001_m_000000_015/07/05 20:35:39 INFO mapred.LocalJobRunner: Starting task: attempt_local577202179_0001_m_000001_015/07/05 20:35:39 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@10c269615/07/05 20:35:39 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/hadoop/input/score2.txt:0+70415/07/05 20:35:39 INFO mapred.MapTask: io.sort.mb = 10015/07/05 20:35:39 INFO mapred.JobClient:  map 50% reduce 0%15/07/05 20:35:39 INFO mapred.MapTask: data buffer = 79691776/9961472015/07/05 20:35:39 INFO mapred.MapTask: record buffer = 262144/327680Before Map:0,Before Map:1,A   65Aefore Map:A,65Before Map:8,Before Map:9,B   75Aefore Map:B,75Before Map:16,Before Map:17,C   64Aefore Map:C,64Before Map:24,Before Map:25,D   67Aefore Map:D,67Before Map:32,Before Map:33,E   86Aefore Map:E,86Before Map:40,Before Map:41,F   70Aefore Map:F,70Before Map:48,Before Map:49,G   90Aefore Map:G,90Before Map:56,Before Map:57,H   79Aefore Map:H,79Before Map:64,Before Map:65,I   81Aefore Map:I,81Before Map:72,Before Map:73,J   78Aefore Map:J,78Before Map:80,Before Map:81,K   60Aefore Map:K,60Before Map:88,Before Map:89,A   65Aefore Map:A,65Before Map:96,Before Map:97,B   82Aefore Map:B,82Before Map:104,Before Map:105,C   84Aefore Map:C,84Before Map:112,Before Map:113,D   97Aefore Map:D,97Before Map:120,Before Map:121,E   66Aefore Map:E,66Before Map:128,Before Map:129,F   70Aefore Map:F,70Before Map:136,Before Map:137,G   80Aefore Map:G,80Before Map:144,Before Map:145,H   89Aefore Map:H,89Before Map:152,Before Map:153,I   91Aefore Map:I,91Before Map:160,Before Map:161,J   90Aefore Map:J,90Before Map:168,Before Map:169,K   69Aefore Map:K,69Before Map:176,Before Map:177,A   71Aefore Map:A,71Before Map:184,Before Map:185,B   84Aefore Map:B,84Before Map:192,Before Map:193,C   94Aefore Map:C,94Before Map:200,Before Map:201,D   67Aefore Map:D,67Before Map:208,Before Map:209,E   96Aefore Map:E,96Before Map:216,Before Map:217,F   80Aefore Map:F,80Before Map:224,Before Map:225,G   70Aefore Map:G,70Before Map:232,Before Map:233,H   71Aefore Map:H,71Before Map:240,Before Map:241,I   81Aefore Map:I,81Before Map:248,Before Map:249,J   98Aefore Map:J,98Before Map:256,Before Map:257,K   80Aefore Map:K,80Before Map:264,Before Map:265,A   85Aefore Map:A,85Before Map:272,Before Map:273,B   75Aefore Map:B,75Before Map:280,Before Map:281,C   69Aefore Map:C,69Before Map:288,Before Map:289,D   87Aefore Map:D,87Before Map:296,Before Map:297,E   89Aefore Map:E,89Before Map:304,Before Map:305,F   80Aefore Map:F,80Before Map:312,Before Map:313,G   70Aefore Map:G,70Before Map:320,Before Map:321,H   99Aefore Map:H,99Before Map:328,Before Map:329,I   81Aefore Map:I,81Before Map:336,Before Map:337,J   88Aefore Map:J,88Before Map:344,Before Map:345,K   60Aefore Map:K,60Before Map:352,Before Map:353,A   65Aefore Map:A,65Before Map:360,Before Map:361,B   75Aefore Map:B,75Before Map:368,Before Map:369,C   60Aefore Map:C,60Before Map:376,Before Map:377,D   67Aefore Map:D,67Before Map:384,Before Map:385,E   80Aefore Map:E,80Before Map:392,Before Map:393,F   92Aefore Map:F,92Before Map:400,Before Map:401,G   76Aefore Map:G,76Before Map:408,Before Map:409,H   79Aefore Map:H,79Before Map:416,Before Map:417,I   68Aefore Map:I,68Before Map:424,Before Map:425,J   78Aefore Map:J,78Before Map:432,Before Map:433,K   70Aefore Map:K,70Before Map:440,Before Map:441,A   85Aefore Map:A,85Before Map:448,Before Map:449,B   85Aefore Map:B,85Before Map:456,Before Map:457,C   74Aefore Map:C,74Before Map:464,Before Map:465,D   87Aefore Map:D,87Before Map:472,Before Map:473,E   76Aefore Map:E,76Before Map:480,Before Map:481,F   60Aefore Map:F,60Before Map:488,Before Map:489,G   60Aefore Map:G,60Before Map:496,Before Map:497,H   79Aefore Map:H,79Before Map:504,Before Map:505,I   81Aefore Map:I,81Before Map:512,Before Map:513,J   78Aefore Map:J,78Before Map:520,Before Map:521,K   80Aefore Map:K,80Before Map:528,Before Map:529,A   85Aefore Map:A,85Before Map:536,Before Map:537,B   65Aefore Map:B,65Before Map:544,Before Map:545,C   74Aefore Map:C,74Before Map:552,Before Map:553,D   67Aefore Map:D,67Before Map:560,Before Map:561,E   86Aefore Map:E,86Before Map:568,Before Map:569,F   70Aefore Map:F,70Before Map:576,Before Map:577,G   70Aefore Map:G,70Before Map:584,Before Map:585,H   79Aefore Map:H,79Before Map:592,Before Map:593,I   81Aefore Map:I,81Before Map:600,Before Map:601,J   78Aefore Map:J,78Before Map:608,Before Map:609,K   60Aefore Map:K,60Before Map:616,Before Map:617,A   70Aefore Map:A,70Before Map:624,Before Map:625,B   69Aefore Map:B,69Before Map:632,Before Map:633,C   60Aefore Map:C,60Before Map:640,Before Map:641,D   89Aefore Map:D,89Before Map:648,Before Map:649,E   69Aefore Map:E,69Before Map:656,Before Map:657,F   95Aefore Map:F,95Before Map:664,Before Map:665,G   75Aefore Map:G,75Before Map:672,Before Map:673,H   59Aefore Map:H,59Before Map:680,Before Map:681,I   60Aefore Map:I,60Before Map:688,Before Map:689,J   79Aefore Map:J,79Before Map:696,Before Map:697,K   65Aefore Map:K,6515/07/05 20:35:42 INFO mapred.MapTask: Starting flush of map outputBfter Reducer:A,65,65,71,85,65,85,85,70,After Reducer:A,73Bfter Reducer:B,65,84,75,85,75,69,82,75,After Reducer:B,76Bfter Reducer:C,84,69,64,74,94,60,60,74,After Reducer:C,72Bfter Reducer:D,67,87,97,67,67,87,89,67,After Reducer:D,78Bfter Reducer:E,66,86,96,86,89,69,80,76,After Reducer:E,81Bfter Reducer:F,70,95,70,70,80,60,80,92,After Reducer:F,77Bfter Reducer:G,60,76,70,70,80,90,75,70,After Reducer:G,73Bfter Reducer:H,79,89,71,99,59,79,79,79,After Reducer:H,79Bfter Reducer:I,60,81,81,81,81,91,68,81,After Reducer:I,78Bfter Reducer:J,78,79,78,88,78,98,90,78,After Reducer:J,83Bfter Reducer:K,60,70,69,60,80,65,60,80,After Reducer:K,6815/07/05 20:35:42 INFO mapred.MapTask: Finished spill 015/07/05 20:35:42 INFO mapred.Task: Task:attempt_local577202179_0001_m_000001_0 is done. And is in the process of commiting15/07/05 20:35:42 INFO mapred.LocalJobRunner: 15/07/05 20:35:42 INFO mapred.Task: Task 'attempt_local577202179_0001_m_000001_0' done.15/07/05 20:35:42 INFO mapred.LocalJobRunner: Finishing task: attempt_local577202179_0001_m_000001_015/07/05 20:35:42 INFO mapred.LocalJobRunner: Map task executor complete.15/07/05 20:35:42 INFO mapred.JobClient:  map 100% reduce 0%15/07/05 20:35:43 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@8f544b15/07/05 20:35:43 INFO mapred.LocalJobRunner: 15/07/05 20:35:43 INFO mapred.Merger: Merging 2 sorted segments15/07/05 20:35:43 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 180 bytes15/07/05 20:35:43 INFO mapred.LocalJobRunner: Bfter Reducer:A,58,73,After Reducer:A,65Bfter Reducer:B,76,57,After Reducer:B,66Bfter Reducer:C,57,72,After Reducer:C,64Bfter Reducer:D,78,66,After Reducer:D,72Bfter Reducer:E,64,81,After Reducer:E,72Bfter Reducer:F,77,69,After Reducer:F,73Bfter Reducer:G,67,73,After Reducer:G,70Bfter Reducer:H,79,66,After Reducer:H,72Bfter Reducer:I,70,78,After Reducer:I,74Bfter Reducer:J,83,67,After Reducer:J,75Bfter Reducer:K,58,68,After Reducer:K,6315/07/05 20:35:44 INFO mapred.Task: Task:attempt_local577202179_0001_r_000000_0 is done. And is in the process of commiting15/07/05 20:35:44 INFO mapred.LocalJobRunner: 15/07/05 20:35:44 INFO mapred.Task: Task attempt_local577202179_0001_r_000000_0 is allowed to commit now15/07/05 20:35:44 INFO output.FileOutputCommitter: Saved output of task 'attempt_local577202179_0001_r_000000_0' to hdfs://localhost:9000/user/hadoop/output15/07/05 20:35:44 INFO mapred.LocalJobRunner: reduce > reduce15/07/05 20:35:44 INFO mapred.Task: Task 'attempt_local577202179_0001_r_000000_0' done.15/07/05 20:35:44 INFO mapred.JobClient:  map 100% reduce 100%15/07/05 20:35:44 INFO mapred.JobClient: Job complete: job_local577202179_000115/07/05 20:35:44 INFO mapred.JobClient: Counters: 2215/07/05 20:35:44 INFO mapred.JobClient:   File Output Format Counters 15/07/05 20:35:44 INFO mapred.JobClient:     Bytes Written=5515/07/05 20:35:44 INFO mapred.JobClient:   FileSystemCounters15/07/05 20:35:44 INFO mapred.JobClient:     FILE_BYTES_READ=170315/07/05 20:35:44 INFO mapred.JobClient:     HDFS_BYTES_READ=352015/07/05 20:35:44 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=20590215/07/05 20:35:44 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=5515/07/05 20:35:44 INFO mapred.JobClient:   File Input Format Counters 15/07/05 20:35:44 INFO mapred.JobClient:     Bytes Read=140815/07/05 20:35:44 INFO mapred.JobClient:   Map-Reduce Framework15/07/05 20:35:44 INFO mapred.JobClient:     Reduce input groups=1115/07/05 20:35:44 INFO mapred.JobClient:     Map output materialized bytes=18815/07/05 20:35:44 INFO mapred.JobClient:     Combine output records=2215/07/05 20:35:44 INFO mapred.JobClient:     Map input records=35215/07/05 20:35:44 INFO mapred.JobClient:     Reduce shuffle bytes=015/07/05 20:35:44 INFO mapred.JobClient:     Physical memory (bytes) snapshot=015/07/05 20:35:44 INFO mapred.JobClient:     Reduce output records=1115/07/05 20:35:44 INFO mapred.JobClient:     Spilled Records=4415/07/05 20:35:44 INFO mapred.JobClient:     Map output bytes=105615/07/05 20:35:44 INFO mapred.JobClient:     CPU time spent (ms)=015/07/05 20:35:44 INFO mapred.JobClient:     Total committed heap usage (bytes)=44407193615/07/05 20:35:44 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=015/07/05 20:35:44 INFO mapred.JobClient:     Combine input records=17615/07/05 20:35:44 INFO mapred.JobClient:     Map output records=17615/07/05 20:35:44 INFO mapred.JobClient:     SPLIT_RAW_BYTES=23015/07/05 20:35:44 INFO mapred.JobClient:     Reduce input records=22
查询输出结果:



这里补充两点我在程序过程遇到的问题:
(1).在Map函数前面用了StringTokenizer token = new StringTokenizer(textline),这里说说这个什么用,StringTokenizer(String text)传入的是string值,之前在wordcount例子里面一直不明白,文本数据是怎么通过这个拆分成一个个单词,后来明白了,原来文本传入的默认格式是TextInputFormat即已经将文件中的文本按照行标示进行分割,即输入给map方法的已经是以一行为单位的记录,然后再以空格去拆分成一个个的单词,StringTokenizer还有这样一个构造方法StringTokenizer(String str, String delim),delim表示分隔符,默认是“\t\n\r\f”。说回这里,那么我们只需要考虑每一行怎么拆分就可以了,这里人称和分数是隔着一个“\t”,也就是直接new StringTokenizer(textline)即可,这里的“\t”不用明写,之前因为这样一直无法分割,可能不识别“\t”吧。
(2).从执行过程打印的信息,起初让我有些疑惑,因为从信息来看,似乎是:NameScore1.txt被分割并以每行记录进入map过程,当执行到该文件的最后一行记录时,从打印信息我们似乎看到的是紧接着就去执行reduce过程了,后面的NameScore2.txt也是如此,当两个文件都分别执行了map和reduce时似乎又执行了一次reduce操作。那么事实是不是如此,如果真是这样,这与先前所看到的理论中提到当map执行完后再执行reduce是否有冲突。

通过查看代码我们发现

  job.setMapperClass(MapperClass.class);
  job.setCombinerClass(ReducerClass.class);
  job.setReducerClass(ReducerClass.class);

  是的,没错,在这里我们发现了端倪,其真正执行过程是:先执行map,这就是过程信息中打印了每条成绩记录,后面执行的reduce其实是介于map和reduce之间的combiner操作,那么这个Combiner类又为何物,通过神奇的API我们可以发现Combine其实就是一次reduce过程,而且这里明确写了combiner和reduce过程都是使用ReducerClass类,从而就出现了这里为什么对于单个文件都会先执行map然后在reduce,再后来是两个map执行后,才执行的真正的reduce。



0 0
原创粉丝点击