文件复制示例
来源:互联网 发布:淘宝老店新开没流量 编辑:程序博客网 时间:2024/04/19 21:03
指定map输出类型
作者: woodbow日期: 2013 年 2 月 25 日 发表评论 (0)查看评论
在hadoop中写map/reduce的时候如果不指定map的输出类型,那么map的输出类型和reduce的输出类型一样,通过setOutputKeyClass,setOutputValueClass指定map和reduce的输出,如果要单独指定map的输出可以通过setMapOutputKeyClass,setMapOutputValueClass来指定,下面通过一个用map/reduce实现文件复制的示例来说明。
import java.io.IOException; import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.Text;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.GenericOptionsParser; public class cpFileMR { // map public static class cpFileMRMapper extends Mapper<LongWritable, Text, LongWritable, Text>{ public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { context.write(key, value); } } // reduce public static class cpFileMRReducer extends Reducer<LongWritable, Text, Text, Text> { public void reduce(LongWritable key, Iterable<Text> values, Context context) throws IOException, InterruptedException { for(Text val : values) { context.write(val, new Text("")); } } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); if (otherArgs.length != 2) { System.err.println("Usage: cpFileMR <in> <out>"); System.exit(2); } Job job = new Job(conf, "first program"); job.setJarByClass(cpFileMR.class); job.setMapperClass(cpFileMRMapper.class); job.setReducerClass(cpFileMRReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); job.setMapOutputKeyClass(LongWritable.class); job.setMapOutputValueClass(Text.class); FileInputFormat.addInputPath(job, new Path(otherArgs[0])); FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } // END: main} // END: cpFileMR
# 运行
[hadoop@node01 yaoyao]$ cat a.shjavac -d cpFileMR cpFileMR.javajar -cvf cpFileMR.jar -C cpFileMR/ .[hadoop@node01 yaoyao]$ sh a.shadded manifestadding: cpFileMR$cpFileMRReducer.class(in = 1616) (out= 662)(deflated 59%)adding: cpFileMR.class(in = 1881) (out= 994)(deflated 47%)adding: cpFileMR$cpFileMRMapper.class(in = 1325) (out= 487)(deflated 63%)adding: lib/(in = 0) (out= 0)(stored 0%)adding: lib/hadoop-0.20.2-core.jar(in = 2689741) (out= 2503110)(deflated 6%)adding: lib/commons-cli-1.2.jar(in = 41123) (out= 37596)(deflated 8%)[hadoop@node01 yaoyao]$ hadoop jar cpFileMR.jar cpFileMR /in1 /ou413/02/25 05:30:51 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=13/02/25 05:30:52 INFO input.FileInputFormat: Total input paths to process : 113/02/25 05:30:53 INFO mapred.JobClient: Running job: job_local_000113/02/25 05:30:53 INFO input.FileInputFormat: Total input paths to process : 113/02/25 05:30:53 INFO mapred.MapTask: io.sort.mb = 10013/02/25 05:30:54 INFO mapred.MapTask: data buffer = 79691776/9961472013/02/25 05:30:54 INFO mapred.MapTask: record buffer = 262144/32768013/02/25 05:30:54 INFO mapred.MapTask: Starting flush of map output13/02/25 05:30:54 INFO mapred.MapTask: Finished spill 013/02/25 05:30:54 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting13/02/25 05:30:54 INFO mapred.LocalJobRunner:13/02/25 05:30:54 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.13/02/25 05:30:54 INFO mapred.JobClient: map 100% reduce 0%13/02/25 05:30:54 INFO mapred.LocalJobRunner:13/02/25 05:30:54 INFO mapred.Merger: Merging 1 sorted segments13/02/25 05:30:54 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 94 bytes13/02/25 05:30:54 INFO mapred.LocalJobRunner:13/02/25 05:30:54 INFO mapred.TaskRunner: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting13/02/25 05:30:54 INFO mapred.LocalJobRunner:13/02/25 05:30:54 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0 is allowed to commit now13/02/25 05:30:54 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to /ou413/02/25 05:30:54 INFO mapred.LocalJobRunner: reduce > reduce13/02/25 05:30:54 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' done.13/02/25 05:30:55 INFO mapred.JobClient: map 100% reduce 100%13/02/25 05:30:55 INFO mapred.JobClient: Job complete: job_local_000113/02/25 05:30:55 INFO mapred.JobClient: Counters: 1413/02/25 05:30:55 INFO mapred.JobClient: FileSystemCounters13/02/25 05:30:55 INFO mapred.JobClient: FILE_BYTES_READ=508806213/02/25 05:30:55 INFO mapred.JobClient: HDFS_BYTES_READ=3383013/02/25 05:30:55 INFO mapred.JobClient: FILE_BYTES_WRITTEN=3452413/02/25 05:30:55 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=512208213/02/25 05:30:55 INFO mapred.JobClient: Map-Reduce Framework13/02/25 05:30:55 INFO mapred.JobClient: Reduce input groups=613/02/25 05:30:55 INFO mapred.JobClient: Combine output records=013/02/25 05:30:55 INFO mapred.JobClient: Map input records=613/02/25 05:30:55 INFO mapred.JobClient: Reduce shuffle bytes=013/02/25 05:30:55 INFO mapred.JobClient: Reduce output records=613/02/25 05:30:55 INFO mapred.JobClient: Spilled Records=1213/02/25 05:30:55 INFO mapred.JobClient: Map output bytes=8013/02/25 05:30:55 INFO mapred.JobClient: Combine input records=013/02/25 05:30:55 INFO mapred.JobClient: Map output records=613/02/25 05:30:55 INFO mapred.JobClient: Reduce input records=6[hadoop@node01 yaoyao]$ hadoop fs -cat /ou4/part-r-00000 wpsccedwpsccedofficeoracle[hadoop@node01 yaoyao]$
- 文件复制示例
- Java 文件 复制 示例
- JavaScript删除、移动和复制文件示例
- IO流 复制示例
- Apache Solr复制示例
- 文件复制
- 文件复制
- 文件复制
- 复制文件
- 文件复制
- 复制文件
- 复制文件
- 复制文件
- 复制文件
- 复制文件
- 复制文件
- 复制文件
- 文件复制
- CodeForces 148C - Terse princess
- WCF Windows集成身份验证详细步骤。
- Java实现CSV读写操作源代码
- C++之tinyXML使用
- ArcEngine加载TIN数据
- 文件复制示例
- 网上找的google地图的级别和比例尺、分辨率之间的关系
- tinyxml使用笔记与总结
- delphi中ShellExecute使用详解
- html+css
- 音频基础知识
- MapControl无法显示地图文档的解决方
- java邮件发送的简单实现
- spring mvc 之初级版