Hadoop-MapReduce编程
来源:互联网 发布:淘宝申请退款多久到账 编辑:程序博客网 时间:2024/05/22 02:28
Map阶段
问题定义–SELECT子句
源代码
public class SelectClauseMRJob extends Configured implements Tool { public static class SelectClauseMapper extends Mapper<LongWritable,Text,NullWritable,Text>{ @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { if (! AirlineDataUtils.isHeader(value)){ StringBuilder output = AirlineDataUtils.mergeStringArray( AirlineDataUtils.getSelectResultsPerRow(value),"," ); context.write(NullWritable.get(),new Text(output.toString())); } } } public int run(String[] strings) throws Exception { Job job = Job.getInstance(getConf()); job.setJarByClass(SelectClauseMRJob.class); job.setInputFormatClass(TextInputFormat.class); //逐行输入数据的格式 job.setOutputFormatClass(TextOutputFormat.class); //输出的数据格式 job.setOutputKeyClass(NullWritable.class); //因为是CSV文件,键为null job.setOutputValueClass(Text.class); //默认情况是和输入一样的,不一样要自定义 job.setMapperClass(SelectClauseMapper.class); job.setNumReduceTasks(0); //没有Reducer String[] args = new GenericOptionsParser(getConf(),strings).getRemainingArgs(); FileInputFormat.setInputPaths(job,new Path(args[0])); FileOutputFormat.setOutputPath(job,new Path(args[1])); boolean status = job.waitForCompletion(true); //打印执行过程 if (status){ return 0; }else { return 1; } } public static void main(String...args) throws Exception { Configuration configuration = new Configuration(); ToolRunner.run(new SelectClauseMRJob(),args); }}
在集群中运行
hadoop jar '/home/atalisas/桌面/hadoop_final.jar' /user/atalisas/sampledata /user/atalisas/output/c5_select17/09/16 21:39:26 INFO input.FileInputFormat: Total input files to process : 217/09/16 21:39:26 INFO mapreduce.JobSubmitter: number of splits:217/09/16 21:39:27 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1082158209_000117/09/16 21:39:27 INFO mapreduce.Job: The url to track the job: http://localhost:8080/17/09/16 21:39:27 INFO mapreduce.Job: Running job: job_local1082158209_000117/09/16 21:39:27 INFO mapred.LocalJobRunner: OutputCommitter set in config null17/09/16 21:39:27 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 117/09/16 21:39:27 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false17/09/16 21:39:27 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter17/09/16 21:39:27 INFO mapred.LocalJobRunner: Waiting for map tasks17/09/16 21:39:27 INFO mapred.LocalJobRunner: Starting task: attempt_local1082158209_0001_m_000000_017/09/16 21:39:27 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 117/09/16 21:39:27 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false17/09/16 21:39:27 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]17/09/16 21:39:27 INFO mapred.MapTask: Processing split: hdfs://0.0.0.0:9000/user/atalisas/sampledata/1988.csv.bz2:0+4949902517/09/16 21:39:27 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version17/09/16 21:39:27 INFO compress.CodecPool: Got brand-new decompressor [.bz2]17/09/16 21:39:28 INFO mapreduce.Job: Job job_local1082158209_0001 running in uber mode : false17/09/16 21:39:28 INFO mapreduce.Job: map 0% reduce 0%17/09/16 21:39:39 INFO mapred.LocalJobRunner: map > map17/09/16 21:39:40 INFO mapreduce.Job: map 11% reduce 0%17/09/16 21:39:45 INFO mapred.LocalJobRunner: map > map17/09/16 21:39:46 INFO mapreduce.Job: map 17% reduce 0%17/09/16 21:39:51 INFO mapred.LocalJobRunner: map > map17/09/16 21:39:52 INFO mapreduce.Job: map 23% reduce 0%17/09/16 21:39:57 INFO mapred.LocalJobRunner: map > map17/09/16 21:39:58 INFO mapreduce.Job: map 28% reduce 0%17/09/16 21:40:03 INFO mapred.LocalJobRunner: map > map17/09/16 21:40:04 INFO mapreduce.Job: map 34% reduce 0%17/09/16 21:40:09 INFO mapred.LocalJobRunner: map > map17/09/16 21:40:10 INFO mapreduce.Job: map 40% reduce 0%17/09/16 21:40:15 INFO mapred.LocalJobRunner: map > map17/09/16 21:40:16 INFO mapreduce.Job: map 46% reduce 0%17/09/16 21:40:19 INFO mapred.LocalJobRunner: map > map17/09/16 21:40:19 INFO mapred.Task: Task:attempt_local1082158209_0001_m_000000_0 is done. And is in the process of committing17/09/16 21:40:19 INFO mapred.LocalJobRunner: map > map17/09/16 21:40:19 INFO mapred.Task: Task attempt_local1082158209_0001_m_000000_0 is allowed to commit now17/09/16 21:40:19 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1082158209_0001_m_000000_0' to hdfs://0.0.0.0:9000/user/atalisas/output/c5_select/_temporary/0/task_local1082158209_0001_m_00000017/09/16 21:40:19 INFO mapred.LocalJobRunner: map17/09/16 21:40:19 INFO mapred.Task: Task 'attempt_local1082158209_0001_m_000000_0' done.17/09/16 21:40:19 INFO mapred.LocalJobRunner: Finishing task: attempt_local1082158209_0001_m_000000_017/09/16 21:40:19 INFO mapred.LocalJobRunner: Starting task: attempt_local1082158209_0001_m_000001_017/09/16 21:40:19 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 117/09/16 21:40:19 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false17/09/16 21:40:19 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]17/09/16 21:40:19 INFO mapred.MapTask: Processing split: hdfs://0.0.0.0:9000/user/atalisas/sampledata/1987.csv.bz2:0+1265244217/09/16 21:40:20 INFO mapreduce.Job: map 100% reduce 0%17/09/16 21:40:31 INFO mapred.LocalJobRunner: map > map17/09/16 21:40:32 INFO mapreduce.Job: map 96% reduce 0%17/09/16 21:40:32 INFO mapred.LocalJobRunner: map > map17/09/16 21:40:32 INFO mapred.Task: Task:attempt_local1082158209_0001_m_000001_0 is done. And is in the process of committing17/09/16 21:40:32 INFO mapred.LocalJobRunner: map > map17/09/16 21:40:32 INFO mapred.Task: Task attempt_local1082158209_0001_m_000001_0 is allowed to commit now17/09/16 21:40:32 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1082158209_0001_m_000001_0' to hdfs://0.0.0.0:9000/user/atalisas/output/c5_select/_temporary/0/task_local1082158209_0001_m_00000117/09/16 21:40:32 INFO mapred.LocalJobRunner: map17/09/16 21:40:32 INFO mapred.Task: Task 'attempt_local1082158209_0001_m_000001_0' done.17/09/16 21:40:32 INFO mapred.LocalJobRunner: Finishing task: attempt_local1082158209_0001_m_000001_017/09/16 21:40:32 INFO mapred.LocalJobRunner: map task executor complete.17/09/16 21:40:33 INFO mapreduce.Job: map 100% reduce 0%17/09/16 21:40:33 INFO mapreduce.Job: Job job_local1082158209_0001 completed successfully17/09/16 21:40:33 INFO mapreduce.Job: Counters: 20 File System Counters FILE: Number of bytes read=77331343 FILE: Number of bytes written=78582382 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=111678140 HDFS: Number of bytes written=528452993 HDFS: Number of read operations=18 HDFS: Number of large read operations=0 HDFS: Number of write operations=8 Map-Reduce Framework Map input records=6513924 Map output records=6513922 Input split bytes=244 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=422 Total committed heap usage (bytes)=567279616 File Input Format Counters Bytes Read=62169899 File Output Format Counters Bytes Written=293774629#看看最后几行hdfs dfs -tail output/c5_select/part-m-0000012/10/1988,1325,2258,HNL,LAX,2556,453,309,0,14412/11/1988,1325,2051,HNL,LAX,2556,326,309,0,1712/12/1988,1325,2043,HNL,LAX,2556,318,309,0,912/13/1988,1325,2038,HNL,LAX,2556,313,309,0,412/14/1988,1325,2045,HNL,LAX,2556,320,309,0,1112/01/1988,2027,2152,ATL,MCO,403,85,78,0,712/02/1988,2106,2229,ATL,MCO,403,83,78,39,44
阅读全文
0 0
- Hadoop MapReduce高级编程
- Hadoop MapReduce高级编程
- hadoop topN mapreduce编程
- Hadoop MapReduce编程模型
- Hadoop-MapReduce编程
- hadoop 中的mapreduce编程模板
- Hadoop MapReduce编程入门案例
- Hadoop之MapReduce编程模型
- Hadoop之MapReduce-Partition编程
- Hadoop之深入MapReduce编程
- Hadoop MapReduce并行编程框架
- Hadoop:MapReduce编程接口体系结构
- Hadoop--06--MapReduce编程基础
- Hadoop--07--MapReduce高级编程
- Hadoop-MapReduce编程思想浅析
- hadoop 之MapReduce编程实战
- Hadoop编程之MapReduce操作Mysql数据库
- hadoop初学之MapReduce编程模型学习
- 依旧摘自《数据结构--用C语言描述》的查找算法 哈希和折半
- 如何测试一个网页登陆界面
- 条件随机场简介(CRF)(二)
- 潜伏者
- 一个NIOServer的简单Demo
- Hadoop-MapReduce编程
- 第二周——项目一—函数参数传值
- 无线网络发射器
- Codeforces Round #288 (Div. 2)---D. Tanya and Password
- VS2013 +openCV(3.0)+Matlab(2014-64)配置:
- 实现块级元素垂直居中
- 如何判断单链表是否存在环
- Spring JPA @Query学习
- SSH与SSM学习之hibernate04——Configuration