个人关于hadoop使用LZO压缩主要步骤以及带来的后续问题和解决办法
来源:互联网 发布:sybase数据库备份 编辑:程序博客网 时间:2024/06/04 01:09
hadoop-lzo安装教程请链接
https://github.com/twitter/hadoop-lzo
下载打包hadoop-lzo
https://github.com/twitter/hadoop-lzo/zipball/master
1.其中说明:首先要在本地安装lzo库,方法如下:
http://www.oberhumer.com/opensource/lzo/#download下载解压后,安装说明编译安装,建议指定安装路径如下:
tar -zxvf lzo-2.06.tar.gz -C /opt/tool/cd /opt/tool/lzo-2.06/mkdir /usr/local/lzo./configure --enable-shared --prefix /usr/local/lzomake & sudo makeinstall
2.本地库安装完成后,回头制作hadoop-lzo的jar包,即解压下载过来的hadoop-lzo-master.zip
unzip hadoop-lzo-master.zip
在目录下执行:
export CFLAGS=-m64export CXXFLAGS=-m64export LIBRARY_PATH=/usr/local/lzo/libC_INCLUDE_PATH=/usr/local/lzo/include \LIBRARY_PATH=/usr/local/lzo/lib \mvn clean package -Dmaven.test.skip=truecd target/native/Linux-amd64-64tar -cBf - -C lib . | tar -xBvf - -C ~mv ~/libgplcompression* $HADOOP_HOME/lib/native/
3.将mvn打包的hadoop-lzo-0.4.20-SNAPSHOT.jar复制到hadoop的common目录下
cp hadoop-lzo-0.4.20-SNAPSHOT.jar $HADOOP_HOME/share/hadoop/common/4.mapreduce中间压缩的配置
在core-site.xml中添加配置
<property> <name>io.compression.codecs</name><value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,com.hadoop.compression.lzo.LzoCodec,com.hadoop.compression.lzo.LzopCodec,org.apache.hadoop.io.compress.BZip2Codec</value></property><property> <name>io.compression.codec.lzo.class</name> <value>com.hadoop.compression.lzo.LzoCodec</value></property>在mapred.size中设置
<property><name>mapred.compress.map.output</name> <value>true</value> </property> <property> <name>mapred.map.output.compression.codec</name> <value>com.hadoop.compression.lzo.LzoCodec</value> </property>5.测试
hadoop中给Lzo文件建立Index
hadoop jar $HADOOP_HOME/share/common/hadoop-lzo-0.4.20-SNAPSHOT.jar com.hadoop.compression.lzo.LzoIndexer /test/inputhadoop中对mr任务直接使用LzoCodec压缩
在inputformat中使用LzoTextInputFormat,可对lzo压缩格式的数据进行mr优化job
在hive中建表使用lzo压缩格式文件
create table lzo(id int,name string)row format delimited fields terminated by '^'stored as inputformat 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' outputformat 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';load data local inpath '/home/hadoop/test/hive/lzo.txt.lzo' into table lzo;select * from lzo;后续问题:
6 .在给HIve使用TEZ引擎后,导致mapreduce on tez运行抛异常,hive on tez直接无法启动hive的cli抛异常,原因:LZO的jar包缺失
解决办法:
不要使用编译出来的tez-0.8.4.tar.gz,应当使用tez-dist/target/中的tez-0.8.4-minimal.tar.gz,在本地解压,设置环境变量
export TEZ_HOME=/opt/single/tezexport TEZ_CONF_DIR=$TEZ_HOME/confexport TEZ_JARS=$TEZ_HOME在hadoop-env.sh中加入
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$TEZ_CONF_DIR:$TEZ_JARS/*:$TEZ_JARS/lib/*
将tez-0.8.4-minimal.tar.gz上传到hdfs://hadoop:9000/apps/tez-0.8.4/目录下
在$TEZ_HOME下建立conf,创建tez-site.xml,
<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration> <property> <name>tez.lib.uris</name> <value>hdfs://hadoop:9000/apps/tez-0.8.4/tez-0.8.4-minimal.tar.gz</value> </property> <property> <name>tez.use.cluster.hadoop-libs</name> <value>true</value> </property> <property></configuration>这是关于lzo问题修改的部分tez配置,其他配置参考其他文章
7.在配置的Spark on hadoop中,无法运行Spark的app
异常大概如下:
java.lang.RuntimeException: Error in configuring objectat org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:112)at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:78)at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:185)at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:198)at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)at scala.Option.getOrElse(Option.scala:120)at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)....at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:606)at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)at org.apache.spark.repl.Main$.main(Main.scala:31)at org.apache.spark.repl.Main.main(Main.scala)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:606)at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Caused by: java.lang.reflect.InvocationTargetExceptionat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:606)at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)... 76 moreCaused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodec not found.at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:139)at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:180)at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45)... 81 moreCaused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec not foundat org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:132)... 83 more
解决:
在spark-env.sh中添加配置如下:
export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:/opt/single/spark/lib:/usr/local/lzo/libexport SPARK_CLASSPATH=$SPARK_CLASSPATH:/opt/single/hadoop-2.7.2/share/hadoop/common/hadoop-lzo-0.4.20-SNAPSHOT.jarexport HADOOP_HOME=/opt/single/hadoop-2.7.2export HADOOP_CONF_DIR=/opt/single/hadoop-2.7.2/etc/hadoop其他不变
- 个人关于hadoop使用LZO压缩主要步骤以及带来的后续问题和解决办法
- 关于hadoop使用lzo压缩的流程
- 关于hadoop使用LZO压缩模式有感
- hadoop中使用lzo的压缩
- hadoop中使用lzo的压缩
- 关于Hadoop支持部署LZO(后续)
- Hadoop如何使用Lzo压缩
- hadoop中使用lzo压缩
- Hadoop 的lzo压缩尝试
- 开启hadoop和Hbase集群的lzo压缩功能
- hadoop,hive启用lzo压缩和创建lzo索引
- Hadoop集群上使用Lzo压缩
- Hadoop集群上使用Lzo压缩
- Hadoop集群上使用Lzo压缩
- hadoop lzo压缩
- Hadoop启用Lzo压缩
- ProtocolBuffer和lzo技术Hadoop系统上的使用
- ProtocolBuffer和lzo技术Hadoop系统上的使用
- BZOJ2084: [Poi2010]Antisymmetry manacher算法
- VS_ 系统环境变量设置
- 自定义ListView上拉刷新,下拉加载更多(一)
- 现代操作系统读书笔记
- 逻辑运算符
- 个人关于hadoop使用LZO压缩主要步骤以及带来的后续问题和解决办法
- 第五章:Neural Networks exercise1-13
- CQOI2006 / NKOJ2004--移动棋子
- 通用对象比较工具类-GeneralComparator
- HTTP协议get与post方法的区别
- GIT注册用户详细编码
- Android studio导入library的正确方式
- window.open打开窗口居中显示
- C# MVC 传统方法实现功能