Hadoop Streaming Input and Output

来源:互联网 发布:c 语言入门经典 编辑:程序博客网 时间:2024/06/05 16:54

StreamJob.java

run() method:

init();  生成 Environment env_ 对象

prePorcessArgs();

parseArgv(); 解析Hadoop Streaming 命令参数,并赋值给StreamJob成员变量

postProcessArgs(); 检查输入参数的完整性,有效性,充分性

setJobConf(); 根据上面的命令参数,配置mapreduce job 的各项参数

JobConf: jobConf_ : general MapRed job properties

Configuration: config_ : as parameter to create JobConf object.

Class fmt=TextInputFormat.class

TextInputFormat implements InputFormat interface:

public interface InputFormat<K,V>

InputFormat describes the input-specification for a Map-Reduce job.

The Map-Reduce framework relies on the InputFormat of the job to:

  1. Validate the input-specification of the job.
  2. Split-up the input file(s) into logical InputSplits, each of which is then assigned to an individual Mapper.
  3. Provide the RecordReader implementation to be used to glean input records from the logicalInputSplit for processing by the Mapper.


原创粉丝点击