精通HADOOP（九） - MAPREDUCE任务的基础知识 - 执行作业

来源：互联网发布：淘宝上可以卖医疗器械编辑：程序博客网时间：2024/05/16 05:58

1.1 执行作业

配置你的MapReduce作业的最终目标是执行作业。MapReduceIntro.java样例程序阐述了一个简单的方式执行一个作业，如列表2-1所示，

logger .info("Launching the job.");/** Send the job configuration to the framework* and request that the job be run.*/final RunningJob job = JobClient.runJob(conf);logger.info("The job has completed.");

runJob()方法向框架提交配置信息，然后，等待框架完成作业后返回。job对象引用包含着相应结果信息。

RunningJob类提供了许多检查响应的方法。可能最有用的就是job.isSuccessful()。

下面执行MapReduceIntro.java（使用本书附带代码中的CH2.jar文件）：

hadoop jar DOWNLOAD_PATH/ch2.jar ➥

com.apress.hadoopbook.examples.ch2.MapReduceIntro

相应如下：

ch2.MapReduceIntroConfig: Generating 3 input files of random data, each record

is a random number TAB the input file name

ch2.MapReduceIntro: Launching the job.

jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=

mapred.JobClient: Use GenericOptionsParser for parsing the arguments.

Applications should implement Tool for the same.

mapred.FileInputFormat: Total input paths to process : 3

mapred.JobClient: Running job: job_local_0001

mapred.MapTask: numReduceTasks: 1

mapred.MapTask: io.sort.mb = 1

mapred.MapTask: data buffer = 796928/996160

mapred.MapTask: record buffer = 2620/3276

mapred.MapTask: Starting flush of map output

mapred.MapTask: bufstart = 0; bufend = 664; bufvoid = 996160

mapred.MapTask: kvstart = 0; kvend = 14; length = 3276

mapred.MapTask: Index: (0, 694, 694)

mapred.MapTask: Finished spill 0

mapred.LocalJobRunner: file:/tmp/MapReduceIntroInput/file-2:0+664

mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.

mapred.TaskRunner: Saved output of task 'attempt_local_0001_m_000000_0' to

file:/tmp/MapReduceIntroOutput

mapred.MapTask: numReduceTasks: 1

mapred.MapTask: io.sort.mb = 1

mapred.MapTask: data buffer = 796928/996160

mapred.MapTask: record buffer = 2620/3276

mapred.MapTask: Starting flush of map output

mapred.MapTask: bufstart = 0; bufend = 3418; bufvoid = 996160

mapred.MapTask: kvstart = 0; kvend = 72; length = 3276

mapred.MapTask: Index: (0, 3564, 3564)

mapred.MapTask: Finished spill 0

mapred.LocalJobRunner: file:/tmp/MapReduceIntroInput/file-1:0+3418

mapred.TaskRunner: Task 'attempt_local_0001_m_000001_0' done.

mapred.TaskRunner: Saved output of task 'attempt_local_0001_m_000001_0' to

file:/tmp/MapReduceIntroOutput

mapred.MapTask: numReduceTasks: 1

mapred.MapTask: io.sort.mb = 1

mapred.MapTask: data buffer = 796928/996160

mapred.MapTask: record buffer = 2620/3276

mapred.MapTask: Starting flush of map output

mapred.MapTask: bufstart = 0; bufend = 3986; bufvoid = 996160

mapred.MapTask: kvstart = 0; kvend = 84; length = 3276

mapred.MapTask: Index: (0, 4156, 4156)

mapred.MapTask: Finished spill 0

mapred.LocalJobRunner: file:/tmp/MapReduceIntroInput/file-0:0+3986

mapred.TaskRunner: Task 'attempt_local_0001_m_000002_0' done.

mapred.TaskRunner: Saved output of task 'attempt_local_0001_m_000002_0' to

file:/tmp/MapReduceIntroOutput

mapred.ReduceTask: Initiating final on-disk merge with 3 files

mapred.Merger: Merging 3 sorted segments

mapred.Merger: Down to the last merge-pass, with 3 segments left of total size:

8414 bytes

mapred.LocalJobRunner: reduce > reduce

mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' done.

mapred.TaskRunner: Saved output of task 'attempt_local_0001_r_000000_0' to

file:/tmp/MapReduceIntroOutput

mapred.JobClient: Job complete: job_local_0001

mapred.JobClient: Counters: 11

mapred.JobClient: File Systems

mapred.JobClient: Local bytes read=230060

mapred.JobClient: Local bytes written=319797

mapred.JobClient: Map-Reduce Framework

mapred.JobClient: Reduce input groups=170

mapred.JobClient: Combine output records=0

mapred.JobClient: Map input records=170

mapred.JobClient: Reduce output records=170

mapred.JobClient: Map output bytes=8068

mapred.JobClient: Map input bytes=8068

mapred.JobClient: Combine input records=0

mapred.JobClient: Map output records=170

mapred.JobClient: Reduce input records=170

ch2.MapReduceIntro: The job has completed.

ch2.MapReduceIntro: The job completed successfully.

恭喜，你已经成功的执行了MapReduce作业了。

Reduce任务仅仅有一个输出文件/tmp/MapReduceIntroOutput/part-00000，这包含一些列的行数据，每一行的格式如下：

Number TAB file:/tmp/MapReduceIntroInput/file-N

首先你会看到这个序号不是连续的。产生输入的代码为输入的每一行的关键字产生一个随机数，但是这个样例程序告诉框架关键字是Text类型。所以，框架对这些数字进行字符排序，并非我们想要的数字排序。