hadoop利用Gzip压缩文件

来源:互联网 发布:淘宝转卖闲鱼怎么发货 编辑:程序博客网 时间:2024/06/15 15:06
As described in the introduction section, if the input files are compressed, they will be decompressed automatically as they are read by MapReduce, using the filename extension to determine which codec to use. This is input compression.

Here we list some code for setting up output compression in Hadoop for some common compression formats.
Gzip
For final output, we can use the static convenience methos on FileOutputFormat to set the properties.

FileOutputFormat.setCompressOutput(job, true);
FileOutputFormat.setOutputCompressorClass(job, GzipCodec,class);

For map output

Configuration conf = new Configuration();
conf.setBoolean("mapred.compress.map.output",true);
conf.setClass("mapred.map.output.compression.codec", GzipCodec.class, CompressionCodec.class);
Job job=new Job(conf);
MapReduce中对Map的输出进行压缩,2句设置:
conf.setCompressionMapOutput(true);
conf.setMapOutputCompressorClass(GzipCodec.class)
0 0