Hadoop MapReduce 修改输出文件名 MultipleOutputs

来源：互联网发布：淘宝哪个店卖高仿aj好编辑：程序博客网时间：2024/05/28 04:52

需求：修改mapreduce的输出文件名称为自己想要的名字

工具：MultipleOutputs

默认文件名：part-r-xxx 或者000178_0

修改后为：自定义名字-r-xxx 后边的r-xxx还没有去掉

主要流程：

1、声明 multipleOutputs

2、在setup方法中初始化

3、在reduce方法中调用 public voidwrite(KEYOUT key, VALUEOUT value, String baseOutputPath)

4、在cleanup放中close

5、取消原来文件的输出， LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);

package com.writer;import org.apache.hadoop.io.NullWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs;import java.io.IOException;/** * @author anjinlong * @create 2017-07-06 10:42 * @description description **/public class HistoryReduce extends Reducer<Text, Text, NullWritable, Text> {  private MultipleOutputs<NullWritable, Text> multipleOutputs;  protected void setup(Context context) throws IOException, InterruptedException {    multipleOutputs = new MultipleOutputs<NullWritable, Text>(context);  }  public void reduce(Text key, Iterable<Text> values, Context context)          throws IOException, InterruptedException {    String fileName = key.toString().substring(key.toString().length() - 2);    for (Text val : values) {      multipleOutputs.write(NullWritable.get(), val, fileName.toString());    }  }  protected void cleanup(Context context) throws IOException, InterruptedException {    multipleOutputs.close();  }}

//取消类似part-r-00000的空文件    LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);

阅读全文

0 0