hadoop 压缩-snappy
来源:互联网 发布:安装虚拟linux系统 编辑:程序博客网 时间:2024/05/16 23:46
下载安装Apache hadoop-1.2.1(bin.tar.gz文件)搭建集群后,在运行wordcount 时报警告 WARN snappy.LoadSnappy: Snappy native library not loaded。
我们想要给Hadoop集群增加snappy压缩支持。很多发行版的hadoop已经内置了snappy/lzo压缩,比如cloudera CDH, Hortonworks HDP. 但是Apache发行版安装包大多不带压缩支持。(Apache hadoop-.1.21 RPM版本Hadoop (hadoop-1.2.1-1.x86_64.rpm
)已经有snappy支持,但其hadoop-1.2.1-bin.tar.gz 并无压缩支持)
1. snappy安装
1. 给OS安装 g++:
centos:
yum -y update gcc
yum -y install gcc+ gcc-c++
ubuntu:
apt-get update gcc
apt-get install g++
2. 下载snappy 源码 , http://code.google.com/p/snappy/downloads/list (可以看到 snappy-1.1.1.tar.gz) 下载后解压(默认目录为 snappy-1.1.1 )
到解压后的目录依次执行:
1) ./configure
2) make
3) make check
4) make install
snappy默认安装目录为/usr/local/lib , 可以用ls /usr/local/lib 命令看到其下有 libsnappy.so 等文件。
3.将生成的libsnappy.so放到$HADOOP_HOME/lib/native/Linux-amd64-64. 重启hadoop集群,这时 Hadoop已经具有snappy压缩功能。
4. 运行 Wordcount之前 设置环境变量LD_LIBRARY_PATH使其包含libsnappy.so的目录 ( export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native/Linux-amd64-64:/usr/local/lib )
5.再次运行wordcount 可以看到之前的warn消失。
如果设置job的输出结果为snappy压缩,在hdfs上能看到输出目录包含一个part-r-00000.snappy的文件
------以上结果在cenos6.6 64bit minimal和Ubuntu 12.04 64bit server中验证通过
2. Hadoop job中使用压缩:
可在mapred-site.xml 中设置 或者针对job在 code中设置。
job中间结果(map输出)使用压缩:
---mrV1:
<property>
<name>mapred.compress.map.output</name>
<value>true</value>
</property>
<property>
<name>mapred.map.output.compression.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
---YARN:
<property><name>mapreduce.map.output.compress</name>
<value>true</value>
</property>
<property>
<name>mapred.map.output.compress.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
job最终结果使用压缩:
---mrV1:
<property>
<name>mapred.output.compress</name>
<value>true</value>
</property>
<property>
<name>mapred.output.compression.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
<property>
<name>mapred.output.compression.type</name>
<value>BLOCK</value>
<description> For SequenceFile outputs, what type of compression should be used (NONE, RECORD, or BLOCK). BLOCK is recommended. </description>
</property>
---YARN:
<property><name>mapreduce.output.fileoutputformat.compress</name>
<value>true</value>
</property>
<property>
<name>mapreduce.output.fileoutputformat.compress.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
<property>
<name>mapreduce.output.fileoutputformat.compress.type</name>
<value>BLOCK</value>
<description>For SequenceFile outputs, what type of compression should be used (NONE, RECORD, or BLOCK). BLOCK is recommended.</description>
</property>
3 读hdfs上snappy压缩结果代码示例
/** * 程序运行时 需要设置LD_LIBRARY_PATH,使其包含含有libsnappy.so的目录.对于Windows,需要设置PATH, 使其包含含有snappy.dll的目录 * * @param file hdfs file, such as hdfs://hadoop-master-node:9000/user/hadoop/wordcount/output/part-r-00000.snappy * @throws Exception */public void testReadOutput_Snappy2(String file) throws Exception {Configuration conf = new configuration();conf.set("fs.default.name", "hdfs://hadoop-master-node:9000");FileSystem fs = FileSystem.get(conf);CompressionCodecFactory factory = new CompressionCodecFactory(conf);CompressionCodec codec = factory.getCodec(new Path(file));if (codec == null) {System.out.println("Cannot find codec for file " + file);return;}CompressionInputStream in = codec.createInputStream(fs.open(new Path(file)));BufferedReader br = null;String line;try {br = new BufferedReader(new InputStreamReader(in, "UTF-8"));while ((line = br.readLine()) != null) {System.out.println(line);}} finally {if (in != null) {br.close();}}}
- Hadoop压缩-SNAPPY算法
- Hadoop压缩算法snappy
- hadoop 压缩-snappy
- Hadoop Snappy压缩算法简介
- 编译hadoop支持snappy压缩
- hadoop,hbase,hive 安装snappy压缩
- hadoop源码编译:支持snappy压缩
- hadoop 压缩 gzip biz2 lzo snappy
- snappy压缩
- Snappy压缩
- Hadoop列式存储引擎Parquet/ORC和snappy压缩
- Hadoop列式存储引擎Parquet/ORC和snappy压缩
- 配置Snappy压缩
- 配置Snappy压缩
- python使用snappy压缩
- HBase开启Snappy压缩
- HBase开启Snappy压缩
- Hadoop 安装Snappy
- Delphi XE5实现Slide Menu(左侧为系统菜单如易信、百度新闻)
- OC的runtime系列二——解析与运用
- DOM学习
- java命令行初级
- python - numpy 基础的小总结
- hadoop 压缩-snappy
- Gentoo安装详解(一) -- 安装基本系统
- ***POJ1185 炮兵阵地 ACM解题报告(状压dp入门题)
- Android 解决adb server is out of date. killing... ADB server didn't ACK * failed to star
- normal、sysdba、sysoper登陆oracle的区别理解
- [备忘] OSX 10.7.5安装MediaWiki中的php53-apc
- Gentoo安装详解(二)-- 编译内核
- 输出倒三角图案
- 将多个int 或其他类型的数据 拼接在一起并转成string类型