在Maprecue中利用MultipleOutputs输出多个文件

来源:互联网 发布:java pop3发送邮件 编辑:程序博客网 时间:2024/06/03 19:51

用户在使用Mapreduce时默认以part-*命名,

MultipleOutputs可以将不同的键值对输出到用户自定义的不同的文件中。

实现过程是在调用output.write(key, new IntWritable(total), key.toString());

方法时候第三个参数是  public void write(KEYOUT key, VALUEOUT value, String baseOutputPath) 指定了输出文件的命名前缀,那么我们可以通过对不同的key使用不同的baseOutputPath来使不同key对应的value输出到不同的文件中,比如将同一天的数据输出到以该日期命名的文件中

测试数据:ip-to-hosts.txt

18.217.167.70United States206.96.54.107United States196.109.151.139Mauritius174.52.58.113United States142.111.216.8Canada162.100.49.185United States146.38.26.54United States36.35.107.36China95.214.95.13Spain2.96.191.111United Kingdom62.177.119.177Czech Republic21.165.189.3United States46.190.32.115Greece113.173.113.29Vietnam42.65.172.142Taiwan197.91.198.199South Africa68.165.71.27United States110.119.165.104China171.50.76.89India171.207.52.113Singapore40.174.30.170United States191.170.95.175United States17.81.129.101United States91.212.157.202France173.83.82.99United States129.75.56.220United States149.25.104.198United States103.110.22.19Indonesia204.188.117.122United States138.23.10.72United States172.50.15.32United States85.88.38.58Belgium49.15.14.6India19.84.175.5United States50.158.140.215United States161.114.120.34United States118.211.174.52Australia220.98.113.71Japan182.101.16.171China25.45.75.194United Kingdom168.16.162.99United States155.60.219.154Australia26.216.17.198United States68.34.157.157United States89.176.196.28Czech Republic173.11.51.134United States116.207.191.159China164.210.124.152United States168.17.158.38United States174.24.173.11United States143.64.173.176United States160.164.158.125Italy15.111.128.4United States22.71.176.163United States105.57.100.182Morocco111.147.83.42China137.157.65.89Australia
该文件中每行数据有两个字段 分别是ip地址和该ip地址对应的国家,以\t分隔


上代码

 public static class IPCountryReducer            extends Reducer<Text, IntWritable, Text, IntWritable> {        private MultipleOutputs output;        @Override        protected void setup(Context context        ) throws IOException, InterruptedException {            output = new MultipleOutputs(context);        }        @Override        protected void reduce(Text key, Iterable<IntWritable> values, Context context        ) throws IOException, InterruptedException {            int total = 0;            for(IntWritable value: values) {                total += value.get();            }           <span style="color:#FF0000;"> output.write(new Text("Output by MultipleOutputs"), NullWritable.get(), key.toString());            output.write(key, new IntWritable(total), key.toString());</span>        }        @Override        protected void cleanup(Context context        ) throws IOException, InterruptedException {            output.close();        }    }
在reduce的setup方法中
 output = new MultipleOutputs(context);
然后在reduce中通过该output将内容输出到不同的文件中
   private Configuration conf;    public static final String NAME = "named_output";    public static void main(String[] args) throws Exception {        args =new String[] {"hdfs://caozw:9100/user/hadoop/hadooprealword","hdfs://caozw:9100/user/hadoop/hadooprealword/output"};        ToolRunner.run(new Configuration(), new NamedCountryOutputJob(), args);    }    public int run(String[] args) throws Exception {        if(args.length != 2) {            System.err.println("Usage: named_output <input> <output>");            System.exit(1);        }        Job job = new Job(conf, "IP count by country to named files");        job.setInputFormatClass(TextInputFormat.class);        job.setMapperClass(IPCountryMapper.class);        job.setReducerClass(IPCountryReducer.class);        job.setMapOutputKeyClass(Text.class);        job.setMapOutputValueClass(IntWritable.class);        job.setJarByClass(NamedCountryOutputJob.class);        FileInputFormat.addInputPath(job, new Path(args[0]));        FileOutputFormat.setOutputPath(job, new Path(args[1]));        return job.waitForCompletion(true) ? 1 : 0;    }    public void setConf(Configuration conf) {        this.conf = conf;    }    public Configuration getConf() {        return conf;    }    public static class IPCountryMapper            extends Mapper<LongWritable, Text, Text, IntWritable> {        private static final int country_pos = 1;        private static final Pattern pattern = Pattern.compile("\\t");        @Override        protected void map(LongWritable key, Text value,                           Context context) throws IOException, InterruptedException {            String country = pattern.split(value.toString())[country_pos];            context.write(new Text(country), new IntWritable(1));        }    }

测试结果:


0 1