Hadoop-MapReduce编程

来源:互联网 发布:淘宝申请退款多久到账 编辑:程序博客网 时间:2024/05/22 02:28

Map阶段

问题定义–SELECT子句

源代码

public class SelectClauseMRJob extends Configured implements Tool {    public static class SelectClauseMapper            extends Mapper<LongWritable,Text,NullWritable,Text>{        @Override        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {            if (! AirlineDataUtils.isHeader(value)){                StringBuilder output = AirlineDataUtils.mergeStringArray(                        AirlineDataUtils.getSelectResultsPerRow(value),","                );                context.write(NullWritable.get(),new Text(output.toString()));            }        }    }    public int run(String[] strings) throws Exception {        Job job = Job.getInstance(getConf());        job.setJarByClass(SelectClauseMRJob.class);        job.setInputFormatClass(TextInputFormat.class);     //逐行输入数据的格式        job.setOutputFormatClass(TextOutputFormat.class);   //输出的数据格式        job.setOutputKeyClass(NullWritable.class);          //因为是CSV文件,键为null        job.setOutputValueClass(Text.class);                //默认情况是和输入一样的,不一样要自定义        job.setMapperClass(SelectClauseMapper.class);        job.setNumReduceTasks(0);                           //没有Reducer        String[] args = new GenericOptionsParser(getConf(),strings).getRemainingArgs();        FileInputFormat.setInputPaths(job,new Path(args[0]));        FileOutputFormat.setOutputPath(job,new Path(args[1]));        boolean status = job.waitForCompletion(true); //打印执行过程        if (status){            return 0;        }else {            return 1;        }    }    public static void main(String...args) throws Exception {        Configuration configuration = new Configuration();        ToolRunner.run(new SelectClauseMRJob(),args);    }}

在集群中运行

hadoop jar '/home/atalisas/桌面/hadoop_final.jar'  /user/atalisas/sampledata  /user/atalisas/output/c5_select17/09/16 21:39:26 INFO input.FileInputFormat: Total input files to process : 217/09/16 21:39:26 INFO mapreduce.JobSubmitter: number of splits:217/09/16 21:39:27 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1082158209_000117/09/16 21:39:27 INFO mapreduce.Job: The url to track the job: http://localhost:8080/17/09/16 21:39:27 INFO mapreduce.Job: Running job: job_local1082158209_000117/09/16 21:39:27 INFO mapred.LocalJobRunner: OutputCommitter set in config null17/09/16 21:39:27 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 117/09/16 21:39:27 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false17/09/16 21:39:27 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter17/09/16 21:39:27 INFO mapred.LocalJobRunner: Waiting for map tasks17/09/16 21:39:27 INFO mapred.LocalJobRunner: Starting task: attempt_local1082158209_0001_m_000000_017/09/16 21:39:27 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 117/09/16 21:39:27 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false17/09/16 21:39:27 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]17/09/16 21:39:27 INFO mapred.MapTask: Processing split: hdfs://0.0.0.0:9000/user/atalisas/sampledata/1988.csv.bz2:0+4949902517/09/16 21:39:27 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version17/09/16 21:39:27 INFO compress.CodecPool: Got brand-new decompressor [.bz2]17/09/16 21:39:28 INFO mapreduce.Job: Job job_local1082158209_0001 running in uber mode : false17/09/16 21:39:28 INFO mapreduce.Job:  map 0% reduce 0%17/09/16 21:39:39 INFO mapred.LocalJobRunner: map > map17/09/16 21:39:40 INFO mapreduce.Job:  map 11% reduce 0%17/09/16 21:39:45 INFO mapred.LocalJobRunner: map > map17/09/16 21:39:46 INFO mapreduce.Job:  map 17% reduce 0%17/09/16 21:39:51 INFO mapred.LocalJobRunner: map > map17/09/16 21:39:52 INFO mapreduce.Job:  map 23% reduce 0%17/09/16 21:39:57 INFO mapred.LocalJobRunner: map > map17/09/16 21:39:58 INFO mapreduce.Job:  map 28% reduce 0%17/09/16 21:40:03 INFO mapred.LocalJobRunner: map > map17/09/16 21:40:04 INFO mapreduce.Job:  map 34% reduce 0%17/09/16 21:40:09 INFO mapred.LocalJobRunner: map > map17/09/16 21:40:10 INFO mapreduce.Job:  map 40% reduce 0%17/09/16 21:40:15 INFO mapred.LocalJobRunner: map > map17/09/16 21:40:16 INFO mapreduce.Job:  map 46% reduce 0%17/09/16 21:40:19 INFO mapred.LocalJobRunner: map > map17/09/16 21:40:19 INFO mapred.Task: Task:attempt_local1082158209_0001_m_000000_0 is done. And is in the process of committing17/09/16 21:40:19 INFO mapred.LocalJobRunner: map > map17/09/16 21:40:19 INFO mapred.Task: Task attempt_local1082158209_0001_m_000000_0 is allowed to commit now17/09/16 21:40:19 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1082158209_0001_m_000000_0' to hdfs://0.0.0.0:9000/user/atalisas/output/c5_select/_temporary/0/task_local1082158209_0001_m_00000017/09/16 21:40:19 INFO mapred.LocalJobRunner: map17/09/16 21:40:19 INFO mapred.Task: Task 'attempt_local1082158209_0001_m_000000_0' done.17/09/16 21:40:19 INFO mapred.LocalJobRunner: Finishing task: attempt_local1082158209_0001_m_000000_017/09/16 21:40:19 INFO mapred.LocalJobRunner: Starting task: attempt_local1082158209_0001_m_000001_017/09/16 21:40:19 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 117/09/16 21:40:19 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false17/09/16 21:40:19 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]17/09/16 21:40:19 INFO mapred.MapTask: Processing split: hdfs://0.0.0.0:9000/user/atalisas/sampledata/1987.csv.bz2:0+1265244217/09/16 21:40:20 INFO mapreduce.Job:  map 100% reduce 0%17/09/16 21:40:31 INFO mapred.LocalJobRunner: map > map17/09/16 21:40:32 INFO mapreduce.Job:  map 96% reduce 0%17/09/16 21:40:32 INFO mapred.LocalJobRunner: map > map17/09/16 21:40:32 INFO mapred.Task: Task:attempt_local1082158209_0001_m_000001_0 is done. And is in the process of committing17/09/16 21:40:32 INFO mapred.LocalJobRunner: map > map17/09/16 21:40:32 INFO mapred.Task: Task attempt_local1082158209_0001_m_000001_0 is allowed to commit now17/09/16 21:40:32 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1082158209_0001_m_000001_0' to hdfs://0.0.0.0:9000/user/atalisas/output/c5_select/_temporary/0/task_local1082158209_0001_m_00000117/09/16 21:40:32 INFO mapred.LocalJobRunner: map17/09/16 21:40:32 INFO mapred.Task: Task 'attempt_local1082158209_0001_m_000001_0' done.17/09/16 21:40:32 INFO mapred.LocalJobRunner: Finishing task: attempt_local1082158209_0001_m_000001_017/09/16 21:40:32 INFO mapred.LocalJobRunner: map task executor complete.17/09/16 21:40:33 INFO mapreduce.Job:  map 100% reduce 0%17/09/16 21:40:33 INFO mapreduce.Job: Job job_local1082158209_0001 completed successfully17/09/16 21:40:33 INFO mapreduce.Job: Counters: 20    File System Counters        FILE: Number of bytes read=77331343        FILE: Number of bytes written=78582382        FILE: Number of read operations=0        FILE: Number of large read operations=0        FILE: Number of write operations=0        HDFS: Number of bytes read=111678140        HDFS: Number of bytes written=528452993        HDFS: Number of read operations=18        HDFS: Number of large read operations=0        HDFS: Number of write operations=8    Map-Reduce Framework        Map input records=6513924        Map output records=6513922        Input split bytes=244        Spilled Records=0        Failed Shuffles=0        Merged Map outputs=0        GC time elapsed (ms)=422        Total committed heap usage (bytes)=567279616    File Input Format Counters         Bytes Read=62169899    File Output Format Counters         Bytes Written=293774629#看看最后几行hdfs dfs -tail output/c5_select/part-m-0000012/10/1988,1325,2258,HNL,LAX,2556,453,309,0,14412/11/1988,1325,2051,HNL,LAX,2556,326,309,0,1712/12/1988,1325,2043,HNL,LAX,2556,318,309,0,912/13/1988,1325,2038,HNL,LAX,2556,313,309,0,412/14/1988,1325,2045,HNL,LAX,2556,320,309,0,1112/01/1988,2027,2152,ATL,MCO,403,85,78,0,712/02/1988,2106,2229,ATL,MCO,403,83,78,39,44