使用ToolRunner运行Hadoop程序基本原理分析
来源:互联网 发布:苹果电脑炒股软件 编辑:程序博客网 时间:2024/05/17 08:26
为了简化命令行方式运行作业,Hadoop自带了一些辅助类。GenericOptionsParser是一个类,用来解释常用的Hadoop命令行选项,并根据需要,为Configuration对象设置相应的取值。通常不直接使用GenericOptionsParser,更方便的方式是:实现Tool接口,通过ToolRunner来运行应用程序,ToolRunner内部调用GenericOptionsParser。
A utility to help run Tools.
ToolRunner can be used to run classes implementing Tool interface. It works in conjunction with GenericOptionsParser to parse the generic hadoop command line arguments and modifies the Configuration of the Tool. The application-specific options are passed along without being modified.
run
public static int run(Configuration conf, Tool tool, String[] args) throws Exception
- Runs the given Tool by Tool.run(String[]), after parsing with the given generic arguments. Uses the given Configuration, or builds one if null. Sets the Tool's configuration with the possibly modified version of the conf.
- Parameters:
- conf - Configuration for the Tool.
- tool - Tool to run.
- args - command-line arguments to the tool.
- Returns:
- exit code of the Tool.run(String[]) method.
- Throws:
- Exception
run
public static int run(Tool tool, String[] args) throws Exception
- Runs the Tool with its Configuration. Equivalent to run(tool.getConf(), tool, args).
- Parameters:
- tool - Tool to run.
- args - command-line arguments to the tool.
- Returns:
- exit code of the Tool.run(String[]) method.
- Throws:
- Exception
它们均是静态方法,即可以通过类名调用。
除此以外,还有一个方法:
static void printGenericCommandUsage(PrintStream out)
Prints generic command-line argurments and usage information.
4、ToolRunner完成以下2个功能:
(1)为Tool创建一个Configuration对象。
(2)使得程序可以方便的读取参数配置。
ToolRunner完整源代码如下:
- package org.apache.hadoop.util;
- import java.io.PrintStream;
- import org.apache.hadoop.conf.Configuration;
- /**
- * A utility to help run {@link Tool}s.
- *
- * <p><code>ToolRunner</code> can be used to run classes implementing
- * <code>Tool</code> interface. It works in conjunction with
- * {@link GenericOptionsParser} to parse the
- * <a href="{@docRoot}/org/apache/hadoop/util/GenericOptionsParser.html#GenericOptions">
- * generic hadoop command line arguments</a> and modifies the
- * <code>Configuration</code> of the <code>Tool</code>. The
- * application-specific options are passed along without being modified.
- * </p>
- *
- * @see Tool
- * @see GenericOptionsParser
- */
- public class ToolRunner {
- /**
- * Runs the given <code>Tool</code> by {@link Tool#run(String[])}, after
- * parsing with the given generic arguments. Uses the given
- * <code>Configuration</code>, or builds one if null.
- *
- * Sets the <code>Tool</code>'s configuration with the possibly modified
- * version of the <code>conf</code>.
- *
- * @param conf <code>Configuration</code> for the <code>Tool</code>.
- * @param tool <code>Tool</code> to run.
- * @param args command-line arguments to the tool.
- * @return exit code of the {@link Tool#run(String[])} method.
- */
- public static int run(Configuration conf, Tool tool, String[] args)
- throws Exception{
- if(conf == null) {
- conf = new Configuration();
- }
- GenericOptionsParser parser = new GenericOptionsParser(conf, args);
- //set the configuration back, so that Tool can configure itself
- tool.setConf(conf);
- //get the args w/o generic hadoop args
- String[] toolArgs = parser.getRemainingArgs();
- return tool.run(toolArgs);
- }
- /**
- * Runs the <code>Tool</code> with its <code>Configuration</code>.
- *
- * Equivalent to <code>run(tool.getConf(), tool, args)</code>.
- *
- * @param tool <code>Tool</code> to run.
- * @param args command-line arguments to the tool.
- * @return exit code of the {@link Tool#run(String[])} method.
- */
- public static int run(Tool tool, String[] args)
- throws Exception{
- return run(tool.getConf(), tool, args);
- }
- /**
- * Prints generic command-line argurments and usage information.
- *
- * @param out stream to write usage information to.
- */
- public static void printGenericCommandUsage(PrintStream out) {
- GenericOptionsParser.printGenericCommandUsage(out);
- }
- }
Unless explicitly turned off, Hadoop by default specifies two resources, loaded in-order from the classpath:
- core-default.xml : Read-only defaults for hadoop.
- core-site.xml: Site-specific configuration for a given hadoop installation.
- static{
- //print deprecation warning if hadoop-site.xml is found in classpath
- ClassLoader cL = Thread.currentThread().getContextClassLoader();
- if (cL == null) {
- cL = Configuration.class.getClassLoader();
- }
- if(cL.getResource("hadoop-site.xml")!=null) {
- LOG.warn("DEPRECATED: hadoop-site.xml found in the classpath. " +
- "Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, "
- + "mapred-site.xml and hdfs-site.xml to override properties of " +
- "core-default.xml, mapred-default.xml and hdfs-default.xml " +
- "respectively");
- }
- addDefaultResource("core-default.xml");
- addDefaultResource("core-site.xml");
- }
同时,检查是否还存在hadoop-site.xml,若还存在,则给出warning,提醒此配置文件已经废弃。
- for (Entry<String, String> entry : conf){
- .....
- }
(四)关于Tool
- package org.apache.hadoop.util;
- import org.apache.hadoop.conf.Configurable;
- public interface Tool extends Configurable {
- int run(String [] args) throws Exception;
- }
- package org.apache.hadoop.conf;
- public interface Configurable {
- void setConf(Configuration conf);
- Configuration getConf();
- }
2、Configured的源文件如下:
- package org.apache.hadoop.conf;
- public class Configured implements Configurable {
- private Configuration conf;
- public Configured() {
- this(null);
- }
- public Configured(Configuration conf) {
- setConf(conf);
- }
- public void setConf(Configuration conf) {
- this.conf = conf;
- }
- public Configuration getConf() {
- return conf;
- }
- }
- package org.jediael.hadoopdemo.toolrunnerdemo;
- import java.util.Map.Entry;
- import org.apache.hadoop.conf.Configuration;
- import org.apache.hadoop.conf.Configured;
- import org.apache.hadoop.util.Tool;
- import org.apache.hadoop.util.ToolRunner;
- public class ToolRunnerDemo extends Configured implements Tool {
- static {
- //Configuration.addDefaultResource("hdfs-default.xml");
- //Configuration.addDefaultResource("hdfs-site.xml");
- //Configuration.addDefaultResource("mapred-default.xml");
- //Configuration.addDefaultResource("mapred-site.xml");
- }
- @Override
- public int run(String[] args) throws Exception {
- Configuration conf = getConf();
- for (Entry<String, String> entry : conf) {
- System.out.printf("%s=%s\n", entry.getKey(), entry.getValue());
- }
- return 0;
- }
- public static void main(String[] args) throws Exception {
- int exitCode = ToolRunner.run(new ToolRunnerDemo(), args);
- System.exit(exitCode);
- }
- }
io.seqfile.compress.blocksize=1000000
keep.failed.task.files=false
mapred.disk.healthChecker.interval=60000
dfs.df.interval=60000
dfs.datanode.failed.volumes.tolerated=0
mapreduce.reduce.input.limit=-1
mapred.task.tracker.http.address=0.0.0.0:50060
mapred.used.genericoptionsparser=true
mapred.userlog.retain.hours=24
dfs.max.objects=0
mapred.jobtracker.jobSchedulable=org.apache.hadoop.mapred.JobSchedulable
mapred.local.dir.minspacestart=0
hadoop.native.lib=true
color=yello
68 68 3028
<?xml version="1.0"?>
- package org.jediael.hadoopdemo.toolrunnerdemo;
- import java.io.IOException;
- import java.util.StringTokenizer;
- import org.apache.hadoop.conf.Configuration;
- import org.apache.hadoop.conf.Configured;
- import org.apache.hadoop.fs.Path;
- import org.apache.hadoop.io.IntWritable;
- import org.apache.hadoop.io.LongWritable;
- import org.apache.hadoop.io.Text;
- import org.apache.hadoop.mapreduce.Job;
- import org.apache.hadoop.mapreduce.Mapper;
- import org.apache.hadoop.mapreduce.Reducer;
- import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
- import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
- import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
- import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
- import org.apache.hadoop.util.Tool;
- import org.apache.hadoop.util.ToolRunner;
- public class WordCount extends Configured implements Tool{
- public static class WordCountMap extends
- Mapper<LongWritable, Text, Text, IntWritable> {
- private final IntWritable one = new IntWritable(1);
- private Text word = new Text();
- public void map(LongWritable key, Text value, Context context)
- throws IOException, InterruptedException {
- String line = value.toString();
- StringTokenizer token = new StringTokenizer(line);
- while (token.hasMoreTokens()) {
- word.set(token.nextToken());
- context.write(word, one);
- }
- }
- }
- public static class WordCountReduce extends
- Reducer<Text, IntWritable, Text, IntWritable> {
- public void reduce(Text key, Iterable<IntWritable> values,
- Context context) throws IOException, InterruptedException {
- int sum = 0;
- for (IntWritable val : values) {
- sum += val.get();
- }
- context.write(key, new IntWritable(sum));
- }
- }
- @Override
- public int run(String[] args) throws Exception {
- Configuration conf = new Configuration();
- Job job = new Job(conf);
- job.setJarByClass(WordCount.class);
- job.setJobName("wordcount");
- job.setOutputKeyClass(Text.class);
- job.setOutputValueClass(IntWritable.class);
- job.setMapperClass(WordCountMap.class);
- job.setReducerClass(WordCountReduce.class);
- job.setInputFormatClass(TextInputFormat.class);
- job.setOutputFormatClass(TextOutputFormat.class);
- FileInputFormat.addInputPath(job, new Path(args[0]));
- FileOutputFormat.setOutputPath(job, new Path(args[1]));
- return(job.waitForCompletion(true)?0:-1);
- }
- public static void main(String[] args) throws Exception {
- int exitCode = ToolRunner.run(new WordCount(), args);
- System.exit(exitCode);
- }
- }
- [root@jediael project]# hadoop fs -mkdir wcin2
- [root@jediael project]# hadoop fs -copyFromLocal /opt/jediael/apache-nutch-2.2.1/CHANGES.txt wcin2
- [root@jediael project]# hadoop jar wordcount2.jar org.jediael.hadoopdemo.toolrunnerdemo.WordCount wcin2 wcout2
- 使用ToolRunner运行Hadoop程序基本原理分析
- 使用ToolRunner运行Hadoop程序基本原理分析
- 使用ToolRunner运行Hadoop程序基本原理分析
- 使用ToolRunner运行Hadoop程序基本原理分析
- 使用ToolRunner运行Hadoop程序基本原理分析
- 使用ToolRunner运行Hadoop程序基本原理分析
- 使用ToolRunner运行Hadoop程序基本原理分析
- 使用ToolRunner运行Hadoop程序基本原理分析
- ToolRunner运行Hadoop原理分析
- 使用ToolRunner运行Hadoop作业的原理及用法
- Hadoop Tool,ToolRunner原理分析
- Hadoop Tool,ToolRunner原理分析
- Hadoop Tool,ToolRunner原理分析
- Hadoop Tool,ToolRunner原理分析
- Hadoop源码之ToolRunner
- 程序运行的基本原理
- 使用hadoop运行wordcount程序
- Win32程序内部运行基本原理
- C语言 strtok 字符串分割
- 蓝桥杯 网络寻路
- AbstractQueuedSynchronizer理解
- Mybatis与Hibernate的详细对比
- php源码安装
- 使用ToolRunner运行Hadoop程序基本原理分析
- 执行所读取的sql文件中的sql语句报语法错误之文件编码问题
- 关于CCS的coff和elf
- js实现帧动画
- 数组的各种遍历方法和jQuery中的each方法
- DB2入门(4)——DB2的账户设置
- 基于 Eclipse 的 MapReduce 开发环境搭建
- ROC,AUC,Precision,Recall,F1的介绍与计算
- runtime交换方法的正确姿势