Hadoop2.x编程入门实例:MaxTemperature

来源:互联网 发布:jdk 7u9 windows x64 编辑:程序博客网 时间:2024/05/11 05:22

Hadoop2.x编程入门实例:MaxTemperature

@(博客文章)[hadoop]

  • Hadoop2x编程入门实例MaxTemperature
    • 一前期准备
    • 二编写代码
      • 1创建Map
      • 2创建Reduce
      • 3创建main方法
      • 4导出成MaxTempjar并上传至运行程序的服务器
    • 三运行程序
      • 1创建input目录并将sampletxt复制到input目录
      • 2运行程序
      • 3查看结果

注意:以下内容在2.x版本与1.x版本同样适用,已在2.4.1与1.2.0进行测试。

一、前期准备

1、创建伪分布Hadoop环境,请参考官方文档。或者http://blog.csdn.net/jediael_lu/article/details/38637277

2、准备数据文件如下sample.txt:

123456798676231190101234567986762311901012345679867623119010123456798676231190101234561+00121534567890356123456798676231190101234567986762311901012345679867623119010123456798676231190101234562+01122934567890456123456798676231190201234567986762311901012345679867623119010123456798676231190101234562+02120234567893456123456798676231190401234567986762311901012345679867623119010123456798676231190101234561+00321234567803456123456798676231190101234567986762311902012345679867623119010123456798676231190101234561+00429234567903456123456798676231190501234567986762311902012345679867623119010123456798676231190101234561+01021134568903456123456798676231190201234567986762311902012345679867623119010123456798676231190101234561+01124234578903456123456798676231190301234567986762311905012345679867623119010123456798676231190101234561+04121234678903456123456798676231190301234567986762311905012345679867623119010123456798676231190101234561+00821235678903456

二、编写代码

1、创建Map

package org.jediael.hadoopDemo.maxtemperature;import java.io.IOException;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;public class MaxTemperatureMapper extends        Mapper<LongWritable, Text, Text, IntWritable> {    private static final int MISSING = 9999;    @Override    public void map(LongWritable key, Text value, Context context)            throws IOException, InterruptedException {        String line = value.toString();        String year = line.substring(15, 19);        int airTemperature;        if (line.charAt(87) == '+') { // parseInt doesn't like leading plus                                        // signs            airTemperature = Integer.parseInt(line.substring(88, 92));        } else {            airTemperature = Integer.parseInt(line.substring(87, 92));        }        String quality = line.substring(92, 93);        if (airTemperature != MISSING && quality.matches("[01459]")) {            context.write(new Text(year), new IntWritable(airTemperature));        }    }}

2、创建Reduce

package org.jediael.hadoopDemo.maxtemperature;import java.io.IOException;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Reducer;public class MaxTemperatureReducer extends        Reducer<Text, IntWritable, Text, IntWritable> {    @Override    public void reduce(Text key, Iterable<IntWritable> values, Context context)            throws IOException, InterruptedException {        int maxValue = Integer.MIN_VALUE;        for (IntWritable value : values) {            maxValue = Math.max(maxValue, value.get());        }        context.write(key, new IntWritable(maxValue));    }}

3、创建main方法

package org.jediael.hadoopDemo.maxtemperature;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class MaxTemperature {    public static void main(String[] args) throws Exception {        if (args.length != 2) {            System.err                    .println("Usage: MaxTemperature <input path> <output path>");            System.exit(-1);        }        Job job = new Job();        job.setJarByClass(MaxTemperature.class);        job.setJobName("Max temperature");        FileInputFormat.addInputPath(job, new Path(args[0]));        FileOutputFormat.setOutputPath(job, new Path(args[1]));        job.setMapperClass(MaxTemperatureMapper.class);        job.setReducerClass(MaxTemperatureReducer.class);        job.setOutputKeyClass(Text.class);        job.setOutputValueClass(IntWritable.class);        System.exit(job.waitForCompletion(true) ? 0 : 1);    }}

4、导出成MaxTemp.jar,并上传至运行程序的服务器。

三、运行程序

1、创建input目录并将sample.txt复制到input目录

hadoop fs -put sample.txt /

2、运行程序

export HADOOP_CLASSPATH=MaxTemp.jar hadoop org.jediael.hadoopDemo.maxtemperature.MaxTemperature /sample.txt output10

注意输出目录不能已经存在,否则会创建失败。

3、查看结果

(1)查看结果

[jediael@jediael44 code]$  hadoop fs -cat output10/*14/07/09 14:51:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable1901    421902    2121903    4121904    321905    102

(2)运行时输出

[jediael@jediael44 code]$  hadoop org.jediael.hadoopDemo.maxtemperature.MaxTemperature /sample.txt output1014/07/09 14:50:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable14/07/09 14:50:41 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:803214/07/09 14:50:42 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.14/07/09 14:50:43 INFO input.FileInputFormat: Total input paths to process : 114/07/09 14:50:43 INFO mapreduce.JobSubmitter: number of splits:114/07/09 14:50:44 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1404888618764_000114/07/09 14:50:44 INFO impl.YarnClientImpl: Submitted application application_1404888618764_000114/07/09 14:50:44 INFO mapreduce.Job: The url to track the job: http://jediael44:8088/proxy/application_1404888618764_0001/14/07/09 14:50:44 INFO mapreduce.Job: Running job: job_1404888618764_000114/07/09 14:50:57 INFO mapreduce.Job: Job job_1404888618764_0001 running in uber mode : false14/07/09 14:50:57 INFO mapreduce.Job:  map 0% reduce 0%14/07/09 14:51:05 INFO mapreduce.Job:  map 100% reduce 0%14/07/09 14:51:15 INFO mapreduce.Job:  map 100% reduce 100%14/07/09 14:51:15 INFO mapreduce.Job: Job job_1404888618764_0001 completed successfully14/07/09 14:51:16 INFO mapreduce.Job: Counters: 49        File System Counters                FILE: Number of bytes read=94                FILE: Number of bytes written=185387                FILE: Number of read operations=0                FILE: Number of large read operations=0                FILE: Number of write operations=0                HDFS: Number of bytes read=1051                HDFS: Number of bytes written=43                HDFS: Number of read operations=6                HDFS: Number of large read operations=0                HDFS: Number of write operations=2        Job Counters                 Launched map tasks=1                Launched reduce tasks=1                Data-local map tasks=1                Total time spent by all maps in occupied slots (ms)=5812                Total time spent by all reduces in occupied slots (ms)=7023                Total time spent by all map tasks (ms)=5812                Total time spent by all reduce tasks (ms)=7023                Total vcore-seconds taken by all map tasks=5812                Total vcore-seconds taken by all reduce tasks=7023                Total megabyte-seconds taken by all map tasks=5951488                Total megabyte-seconds taken by all reduce tasks=7191552        Map-Reduce Framework                Map input records=9                Map output records=8                Map output bytes=72                Map output materialized bytes=94                Input split bytes=97                Combine input records=0                Combine output records=0                Reduce input groups=5                Reduce shuffle bytes=94                Reduce input records=8                Reduce output records=5                Spilled Records=16                Shuffled Maps =1                Failed Shuffles=0                Merged Map outputs=1                GC time elapsed (ms)=154                CPU time spent (ms)=1450                Physical memory (bytes) snapshot=303112192                Virtual memory (bytes) snapshot=1685733376                Total committed heap usage (bytes)=136515584        Shuffle Errors                BAD_ID=0                CONNECTION=0                IO_ERROR=0                WRONG_LENGTH=0                WRONG_MAP=0                WRONG_REDUCE=0        File Input Format Counters                 Bytes Read=954        File Output Format Counters                 Bytes Written=43
0 0