Hadoop2.6.0子项目hadoop-mapreduce-examples的简单介绍

来源：互联网发布：linux启动mysql数据库编辑：程序博客网时间：2024/05/22 01:54

引文

学习Hadoop的同学们，一定知道如果运行Hadoop自带的各种例子，以大名鼎鼎的wordcount为例，你会输入以下命令：

hadoop org.apache.hadoop.examples.WordCount -D mapreduce.input.fileinputformat.split.maxsize=1 /wordcount/input /wordcount/output/result1

当然，有些人还会用以下替代方式：

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /wordcount/input /wordcount/output/result1

相比于原始的执行方式，使用jar命令方式，让我们不用再敲入繁琐的完整包路径。比如我们知道hadoop-mapreduce-examples项目中还提供了其它的例子，比如计算圆周率的例子，我们只需要记住此应用的简单名字pi，就可以执行它：

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar pi 5 10

虽说我们只是使用这些现成的例子，没有必要较真，但是这种简洁的使用方式，无疑还是值得借鉴的。本文将分析下这种方式实现的原理，有兴趣的同学可以一读。

源码分析

这一节，我们通过对hadoop-mapreduce-examples项目中的关键源码进行分析，理解简洁执行的原理。在hadoop-mapreduce-examples项目的pom.xml文件中配置了org.apache.hadoop.examples.ExampleDriver作为jar命令的入口，配置如下：

   <plugin>    <groupId>org.apache.maven.plugins</groupId>     <artifactId>maven-jar-plugin</artifactId>      <configuration>       <archive>         <manifest>           <mainClass>org.apache.hadoop.examples.ExampleDriver</mainClass>         </manifest>       </archive>     </configuration>    </plugin>

这决定了使用jar命令执行hadoop-mapreduce-examples-2.6.0.jar包时，实际执行了ExampleDriver的main方法，ExampleDriver的实现如下：

public class ExampleDriver {    public static void main(String argv[]){    int exitCode = -1;    ProgramDriver pgd = new ProgramDriver();    try {      pgd.addClass("wordcount", WordCount.class,                    "A map/reduce program that counts the words in the input files.");      // 省略其它例子的注册代码      pgd.addClass("pi", QuasiMonteCarlo.class, QuasiMonteCarlo.DESCRIPTION);      // 省略其它例子的注册代码      exitCode = pgd.run(argv);    }    catch(Throwable e){      e.printStackTrace();    }        System.exit(exitCode);  }}

以上代码构造了ProgramDriver的实例，并且调用其addClass方法，三个参数分别是例子名称（如wordcount、pi等）、例子的实现Class、例子的描述信息。ProgramDriver的addClass方法的实现如下：

  public void addClass(String name, Class<?> mainClass, String description)      throws Throwable {    programs.put(name , new ProgramDescription(mainClass, description));  }

首先，构造ProgramDescription对象，其构造函数如下：

    public ProgramDescription(Class<?> mainClass,                               String description)      throws SecurityException, NoSuchMethodException {      this.main = mainClass.getMethod("main", paramTypes);      this.description = description;    }

其中main的类型是java.lang.reflect.Method，用于保存例子Class的main方法。
然后，将例子名称（如wordcount、pi等）和ProgramDescription实例注册到programs中，programs的类型定义如下：

  /**   * A description of a program based on its class and a    * human-readable description.   */  Map<String, ProgramDescription> programs;

ExampleDriver的main方法在最后会调用ProgramDriver的run方法，其实现如下：

  public int run(String[] args)    throws Throwable   {    // Make sure they gave us a program name.    if (args.length == 0) {      System.out.println("An example program must be given as the" +                          " first argument.");      printUsage(programs);      return -1;    }    // And that it is good.    ProgramDescription pgm = programs.get(args[0]);    if (pgm == null) {      System.out.println("Unknown program '" + args[0] + "' chosen.");      printUsage(programs);      return -1;    }    // Remove the leading argument and call main    String[] new_args = new String[args.length - 1];    for(int i=1; i < args.length; ++i) {      new_args[i-1] = args[i];    }    pgm.invoke(new_args);    return 0;  }

ProgramDriver的run方法执行的步骤如下：

参数长度校验；
根据第一个参数，从programs中查找对应的ProgramDescription实例；
将其余的参数传递给ProgramDescription的invoke方法，进而执行对应的例子。

ProgramDescription的invoke方法的实现如下：

    public void invoke(String[] args)      throws Throwable {      try {        main.invoke(null, new Object[]{args});      } catch (InvocationTargetException except) {        throw except.getCause();      }    }

由此我们知道具体例子的执行，是通过反射调用具体例子Class的main方法，最终实现的。

后记：个人总结整理的《深入理解Spark：核心思想与源码分析》一书现在已经正式出版上市，目前京东、当当、天猫等网站均有销售，欢迎感兴趣的同学购买。

京东(现有满150减50活动）)：http://item.jd.com/11846120.html

当当：http://product.dangdang.com/23838168.html

0 0