setNumMapTasks() 在Eclipse中无效

来源:互联网 发布:算法导论第二版百度云 编辑:程序博客网 时间:2024/05/17 21:44

情景:

    使用 TotalOrderPartitioner 进行全排序,但是程序始终抛出java.io.IOException: Wrong number of partitions in keyset 的异常

14/05/11 17:22:56 INFO input.FileInputFormat: Total input paths to process : 114/05/11 17:22:56 WARN snappy.LoadSnappy: Snappy native library is available14/05/11 17:22:56 INFO util.NativeCodeLoader: Loaded the native-hadoop library14/05/11 17:22:56 INFO snappy.LoadSnappy: Snappy native library loaded14/05/11 17:22:56 INFO partition.InputSampler: Using 81 samples14/05/11 17:22:56 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library14/05/11 17:22:56 INFO compress.CodecPool: Got brand-new compressor14/05/11 17:35:13 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.14/05/11 17:35:13 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).14/05/11 17:35:13 INFO input.FileInputFormat: Total input paths to process : 114/05/11 17:35:28 INFO mapred.JobClient: Running job: job_local2039601594_000114/05/11 17:35:29 INFO mapred.JobClient:  map 0% reduce 0%14/05/11 17:35:58 INFO mapred.LocalJobRunner: Waiting for map tasks14/05/11 17:35:58 INFO mapred.LocalJobRunner: Starting task: attempt_local2039601594_0001_m_000000_014/05/11 17:36:13 INFO util.ProcessTree: setsid exited with exit code 014/05/11 17:36:13 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@1b5dc8114/05/11 17:36:13 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/input/unsorted_data:0+715114/05/11 17:36:13 INFO mapred.MapTask: io.sort.mb = 10014/05/11 17:36:22 INFO mapred.MapTask: data buffer = 79691776/9961472014/05/11 17:36:22 INFO mapred.MapTask: record buffer = 262144/32768014/05/11 17:36:32 INFO compress.CodecPool: Got brand-new decompressor14/05/11 17:36:32 INFO mapred.LocalJobRunner: Map task executor complete.14/05/11 17:36:32 WARN mapred.LocalJobRunner: job_local2039601594_0001java.lang.Exception: java.lang.IllegalArgumentException: Can't read partitions fileat org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)Caused by: java.lang.IllegalArgumentException: Can't read partitions fileat org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:676)at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)at java.util.concurrent.FutureTask.run(FutureTask.java:262)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)at java.lang.Thread.run(Thread.java:744)Caused by: java.io.IOException: Wrong number of partitions in keysetat org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:90)... 11 more14/05/11 17:36:33 INFO mapred.JobClient: Job complete: job_local2039601594_000114/05/11 17:36:33 INFO mapred.JobClient: Counters: 0



通过 调试最终到 是下边这段代码抛出的异常:
  // In TotalOrderPartitioner.class
  public void setConf(Configuration conf) {
    try {
      this.conf = conf;
      String parts = getPartitionFile(conf);
      final Path partFile = new Path(parts);
      final FileSystem fs = (DEFAULT_PATH.equals(parts))
        ? FileSystem.getLocal(conf)     // assume in DistributedCache
        : partFile.getFileSystem(conf);


      Job job = new Job(conf);
      Class<K> keyClass = (Class<K>)job.getMapOutputKeyClass();
      K[] splitPoints = readPartitions(fs, partFile, keyClass, conf);
      // 通过调试得到:
      // splitPoints.length      : 3
      // job.getNumReduceTasks() : 1

      if (splitPoints.length != job.getNumReduceTasks() - 1) {
        throw new IOException("Wrong number of partitions in keyset");
      }
      ...
    }
  }


在我的 main 函数里面,设置了 Reduce 的个数:
job.setNumReduceTasks( 3 );


显然,在程序执行的过程中,MR 框架自己修改了这个值!

继续追查:

// In JobConf.class public void init(JobConf conf) throws IOException {    // Here    String tracker = conf.get("mapred.job.tracker", "local");    tasklogtimeout = conf.getInt(      TASKLOG_PULL_TIMEOUT_KEY, DEFAULT_TASKLOG_TIMEOUT);    this.ugi = UserGroupInformation.getCurrentUser();    // Here    if ("local".equals(tracker)) {      conf.setNumMapTasks(1);      this.jobSubmitClient = new LocalJobRunner(conf);    } else {      this.rpcJobSubmitClient =           createRPCProxy(JobTracker.getAddress(conf), conf);      this.jobSubmitClient = createProxy(this.rpcJobSubmitClient, conf);    }          }

根据 mapred.job.tracker 这个参数的意义:定义 job tracker 交互端口(localhost:9001)
我部署的是 hadoop的伪分布式


原因:

    程序是在 Eclipse 上使用 hadoop-eclispe 插件跑的,只能运行 本地模式。要运行其他 模式 还要在配置。

解决方法:

    打包程序,不要在 Eclipse 上运行

0 0
原创粉丝点击