MapReduce全局变量之捉虫记

来源:互联网 发布:处方软件 编辑:程序博客网 时间:2024/05/23 23:09

全局变量

写MapReduce程序时候,有时候需要用到全局变量,常用的全局变量实现由三种方式:
  • 通过作业的Configuration传递全局变量,作业初始化的时候,conf.set(),需要的时候,再用conf.get()读出来。缺点:不能共享较大的数据。
  • 通过distributedcache
  • 通过HDFS实现:即将全局变量写入一个文件,需要的时候,从该文件读取出来

发现问题

全局变量的代码设置如下,在Mapper中通过Configuration无法读出配置"deadline"。
  public static void main(String[] args) throws Exception {    Configuration conf = new Configuration();    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();    if (otherArgs.length != 2) {      System.err.println("Usage: wordcount <in> <out>");      System.exit(2);    }        Job job = new Job(conf, "word count");    //job.getCluster().getClusterStatus().getMapSlotCapacity();    conf.set("deadline", new Date().toString);    job.setJarByClass(WordCount.class);    job.setMapperClass(TokenizerMapper.class);    job.setCombinerClass(IntSumReducer.class);    job.setReducerClass(IntSumReducer.class);    job.setOutputKeyClass(Text.class);    job.setOutputValueClass(IntWritable.class);    FileInputFormat.addInputPath(job, new Path(otherArgs[0]));    FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));    System.exit(job.waitForCompletion(true) ? 0 : 1);  }

解决问题

可是同事的代码却可以,将代码粘贴出来
 public static void main(String[] args) throws Exception {    Configuration conf = new Configuration();    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();    if (otherArgs.length != 2) {      System.err.println("Usage: wordcount <in> <out>");      System.exit(2);    }      Job job = new Job(conf, "word count");    job.getConfiguration().set("deadline", new Date().toString());     job.setJarByClass(WordCount.class);    job.setMapperClass(TokenizerMapper.class);    job.setCombinerClass(IntSumReducer.class);    job.setReducerClass(IntSumReducer.class);    job.setOutputKeyClass(Text.class);    job.setOutputValueClass(IntWritable.class);    FileInputFormat.addInputPath(job, new Path(otherArgs[0]));    FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));    System.exit(job.waitForCompletion(true) ? 0 : 1);  }
或者
  public static void main(String[] args) throws Exception {    Configuration conf = new Configuration();    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();    if (otherArgs.length != 2) {      System.err.println("Usage: wordcount <in> <out>");      System.exit(2);    }    conf.set("deadline", new Date().toString());        Job job = new Job(conf, "word count");      job.setJarByClass(WordCount.class);    job.setMapperClass(TokenizerMapper.class);    job.setCombinerClass(IntSumReducer.class);    job.setReducerClass(IntSumReducer.class);    job.setOutputKeyClass(Text.class);    job.setOutputValueClass(IntWritable.class);    FileInputFormat.addInputPath(job, new Path(otherArgs[0]));    FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));    System.exit(job.waitForCompletion(true) ? 0 : 1);  }

问题分析

跟踪代码:
Job job = new Job(conf, "word count");
  @Deprecated  public Job(Configuration conf, String jobName) throws IOException {    this(conf);    setJobName(jobName);  }
  @Deprecated  public Job(Configuration conf) throws IOException {    this(new JobConf(conf));  }
这样,Job里面的conf和main()里面的conf已经不一样了,故导致问题

总结

Configuration全局变量没设置成功的原因:设置参数的Configuration和读取参数的Configuration不一致。


0 0
原创粉丝点击