hadoop2.2.0升级到2.7.2
来源:互联网 发布:苹果移动数据打不开 编辑:程序博客网 时间:2024/05/18 02:34
配置了1个master 2个slave,启动正常,并添加相关数据
2、升级为手动高可用集群(与正式环境一致)
2.1、配置手动故障转移hdfs HA (此处不需要zk,自动切换才依赖zk)
---backup
cp -r /home/test/hadoop-2.2.0/etc/hadoop /home/test/hadoop-2.2.0/etc/hadoopbak
---core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://testcluster</value>
</property>
---hdfs-site.xml
delete: dfs.namenode.secondary.http-address
add:
<property>
<name>dfs.nameservices</name>
<value>testcluster</value>
</property>
<property>
<name>dfs.ha.namenodes.testcluster</name>
<value>master,slave1</value>
</property>
<property>
<name>dfs.namenode.rpc-address.testcluster.master</name>
<value>master:9000</value>
</property>
<property>
<name>dfs.namenode.rpc-address.testcluster.slave1</name>
<value>slave1:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.testcluster.master</name>
<value>master:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.testcluster.slave1</name>
<value>slave1:50070</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled.testcluster</name>
<value>flase</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://master:8485;slave1:8485;slave2:8485/testcluster</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.testcluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/test/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/data/test/tmp/journal</value>
</property>
copy : scp core-site.xml hdfs-site.xml slave1:/home/test/hadoop-2.2.0/etc/hadoop
scp core-site.xml hdfs-site.xml slave2:/home/test/hadoop-2.2.0/etc/hadoop
---初始化journalnode,每个jn节点上执行
hadoop-daemon.sh start journalnode
http://master:8480/
---格式化所有JournalNode(在master节点执行即可)
hdfs namenode -initializeSharedEdits -force
注: 这一操作主要完成格式化所有 JournalNode,以及将namenode下的元数据文件从master拷贝到所有JournalNode
---namenode元数据同步(需要从master同步到slave1,在slave1中执行命令)
hdfs namenode -bootstrapstandby
---启动集群
start-all.sh
---集群启动后均为standby状态,需要手工切换namenode为active状态
hdfs haadmin -transitionToActive master
---验证NN状态
hdfs haadmin -getServiceState master
hdfs haadmin -getServiceState slave1
---主从切换命令
hdfs haadmin -DFSHAadmin -failover master slave1
3、 hadoop2.2.0到2.7.2版本升级步骤
3.1、准备2.7.2安装包,并重命名包下的etc目录,并做好映射
mv etc etcbak
ln -s /home/test/hadoop-2.2.0/etc etc
3.2、停外部应用
3.3、备份namenode元数据
b、进入安全模式: hadoop dfsadmin -safemode enter
c、合并edits并备份namenode元数据: hadoop dfsadmin -saveNamespace
d、备份: cp -r /data/hadoop/dfs/name /data/hadoop/dfs/name_bak
3.4、停hdfs应用
stop-all.sh
3.4、修改集群各节点所有环境变量
./dcopy /home/test/.bash_profile /home/test/.bash_profile_`date +%Y%m%d%H%M`
./drun "ls -la /home/test/"
./drun "sed -i 's/hadoop-2.2.0/hadoop-2.7.2/g' /home/test/.bash_profile"
./drun "source /home/test/.bash_profile"
./drun "grep -i \"hadoop-2.7.2\" /home/test/.bash_profile"
./drun "echo $HADOOP_HOME"
3.5、开始升级
a、在相应节点启动journalnode: hadoop-daemon.sh start journalnode
b、升级namenode:
先升一个namenode: hadoop-daemon.sh start namenode -upgrade
再同步另namenode: hdfs namenode -bootstrapstandby & hadoop-daemon.sh start namenode
c、升级datanode:
hadoop-daemons.sh start datanode
c2、spark 配置调整:
修改spark-env.sh中hadoop配置参数
检查数据完整性: Hadoop fsck /
随机查看文件:hadoop fs -cat .....
e、有问题回滚:
方式1(没试过):
hadoop-daemon.sh start namenode -rollback
hadoop-daemons.sh start datanode –rollback
方式2:
./dcopy /home/test/.bash_profile /home/test/.bash_profile_`date +%Y%m%d%H%M`
./drun "ls -la /home/test/"
./drun "rm -rf /home/test/.bash_profile"
./drun "mv /home/test/.bash_profile_201711162042 /home/test/.bash_profile"
./drun "source /home/test/.bash_profile"
./drun "grep -i \"hadoop-2.2.0\" /home/test/.bash_profile"
./drun "echo $HADOOP_HOME"
mv /data/hadoop/dfs/name /data/hadoop/dfs/name_new
mv /data/hadoop/dfs/name_bak /data/hadoop/dfs/name
hadoop-daemons.sh start journalnode
拷贝元数据到journalnode:
./drun "rm -rf /data/test/tmp/journal/"
hdfs namenode -initializeSharedEdits -force
hadoop-daemon.sh start namenode
启动另一台nn:
hdfs namenode -bootstrapstandby
hadoop-daemon.sh start namenode
hadoop-daemons.sh start datanode
hdfs haadmin -transitionToActive master
hdfs haadmin -DFSHAadmin -failover master slave1
f、运行一段时间稳定后提交升级:
hdfs dfsadmin -finalizeUpgrade
h、重启集群:
stop-dfs.sh
start-all.sh
hdfs haadmin -transitionToActive master
hdfs haadmin -DFSHAadmin -failover master slave1
FAQ:
Q1: 升级后HDFS 监控页面报错
活节点打开报错:
备用节点正常
输入如下地址则正常:
http://192.168.130.136:50070/dfshealth.html#tab-overview
尚不清楚原因。
Q2:回退时,启动journalnode的日志报错
2017-11-1710:50:27,513 INFO org.apache.hadoop.hdfs.server.common.Storage: Cannot lockstorage /data/test/tmp/journal/testcluster. The directory is already locked
2017-11-1710:50:27,514 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionExceptionas:test (auth:SIMPLE) cause:java.io.IOException: Cannot lock storage/data/test/tmp/journal/testcluster. The directory is already locked
2017-11-1710:50:27,514 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 8485,call org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.getEditLogManifestfrom 192.168.130.136:50636 Call#26 Retry#0: error: java.io.IOException: Cannotlock storage /data/test/tmp/journal/testcluster. The directory is alreadylocked
java.io.IOException:Cannot lock storage /data/test/tmp/journal/testcluster. The directory isalready locked
atorg.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:637)
atorg.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:460)
atorg.apache.hadoop.hdfs.qjournal.server.JNStorage.analyzeStorage(JNStorage.java:193)
atorg.apache.hadoop.hdfs.qjournal.server.JNStorage.<init>(JNStorage.java:73)
atorg.apache.hadoop.hdfs.qjournal.server.Journal.<init>(Journal.java:140)
atorg.apache.hadoop.hdfs.qjournal.server.JournalNode.getOrCreateJournal(JournalNode.java:83)
atorg.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest(JournalNodeRpcServer.java:181)
atorg.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest(QJournalProtocolServerSideTranslatorPB.java:203)
atorg.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:17453)
atorg.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
atorg.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2048)
atorg.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
atjava.security.AccessController.doPrivileged(Native Method)
atjavax.security.auth.Subject.doAs(Subject.java:415)
atorg.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
atorg.apache.hadoop.ipc.Server$Handler.run(Server.java:2042)
解决办法:
删除journalnode目录:./drun "rm -rf /data/test/tmp/journal/"
拷贝元数据到journalnode: hdfs namenode-initializeSharedEdits -force
Q3:回退时,datanode启动报集群版本问题:
org.apache.hadoop.hdfs.server.common.Storage:Lock on /data/hadoop/dfs/data/in_use.lock acquired by nodename 11414@master
java.io.IOException:Incompatible clusterIDs in /data/hadoop/dfs/data: namenode clusterID =CID-be66f5bb-6419-45c5-b95a-7681be449e15; datanode clusterID =
atorg.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:808)
atorg.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:280)
atorg.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:222)
atorg.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
atjava.lang.Thread.run(Thread.java:745)
解决办法:
到namenode元数据目录查找当前版本,并执行:
./drun "sed -i's/clusterID=/clusterID=CID-be66f5bb-6419-45c5-b95a-7681be449e15/g'/data/hadoop/dfs/data/current/VERSION"
./drun "cat/data/hadoop/dfs/data/current/VERSION"
执行后,继续报错:
FATALorg.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed forblock pool Block pool BP-1638541588-192.168.130.136-1510661480810 (storage idDS1875847850) service to master/192.168.130.136:9000
org.apache.hadoop.hdfs.server.common.IncorrectVersionException:Unexpected version of storage directory/data/hadoop/dfs/data/current/BP-1638541588-192.168.130.136-1510661480810.Reported: -56. Expecting = -47.
atorg.apache.hadoop.hdfs.server.common.Storage.setLayoutVersion(Storage.java:1082)
atorg.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setFieldsFromProperties(BlockPoolSliceStorage.java:217)
atorg.apache.hadoop.hdfs.server.common.Storage.readProperties(Storage.java:921)
at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:244)
atorg.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:145)
atorg.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:234)
atorg.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:837)
atorg.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:808)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:280)
atorg.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:222)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
atjava.lang.Thread.run(Thread.java:745)
解决办法:
./drun "cat/data/hadoop/dfs/data/current/BP-1638541588-192.168.130.136-1510661480810/current/VERSION"
./drun "cp /data/hadoop/dfs/data/current/BP-1638541588-192.168.130.136-1510661480810/current/VERSION/data/hadoop/dfs/data/current/BP-1638541588-192.168.130.136-1510661480810/current/VERSION-bak1"
./drun "sed -i's/layoutVersion=-56/layoutVersion=-47/g' /data/hadoop/dfs/data/current/BP-1638541588-192.168.130.136-1510661480810/current/VERSION"
执行后,仍然继续报错:
2017-11-1715:06:35,405 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: For namenodemaster/192.168.130.136:9000 using DELETEREPORT_INTERVAL of 300000
msec BLOCKREPORT_INTERVAL of 21600000msec Initialdelay: 0msec; heartBeatInterval=3000
2017-11-1715:06:35,405 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Exceptionin BPOfferService for Block pool BP-1638541588-192.168.130.136-15106
61480810(storage id DS347578212) service to master/192.168.130.136:9000
java.lang.NullPointerException
atorg.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:439)
atorg.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:525)
atorg.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:676)
atjava.lang.Thread.run(Thread.java:745)
这个是由于datanode版本中ctime时间与namenode中ctime不一致,更新即可。
./drun "sed -i 's/cTime=1510838012825/cTime=0/g'/data/hadoop/dfs/data/current/BP-1638541588-192.168.130.136-1510661480810/current/VERSION"
Q4: hive内部报错
Caused by:java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
atsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
atsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
atorg.apache.hive.common.util.ReflectionUtil.setJobConf(ReflectionUtil.java:112)
... 21 more
Caused by:java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.lzo.LzoCodecnot found.
atorg.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:139)
atorg.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:179)
atorg.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45)
... 26 more
Caused by:java.lang.ClassNotFoundException: Class com.hadoop.compression.lzo.LzoCodec notfound
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
atorg.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:132)
... 28 more
原因:hadoop2.7.2中找不到lzo lib包
同步lzo即可
./drun "cp/home/hadoop/hadoop-2.2.0/share/hadoop/common/hadoop-lzo-0.4.21-SNAPSHOT.jar/home/hadoop/hadoop-2.7.2/share/hadoop/common/"
./drun "cp/home/hadoop/hadoop-2.2.0/lib/native/libgplcompression*/home/hadoop/hadoop-2.7.2/lib/native/"
Q5:启用高可用后,执行hql报错:
create tabletest_data_tmp2 as select * from test_data_tmp1;
Moving data to:hdfs://bis-newdatanode-s2b-80:9000/user/hive/warehouse/test.db/.hive-staging_hive_2017-11-20_21-12-31_880_1843842567826987200-1/-ext-10001
Failed with exception Wrong FS:hdfs://bis-newdatanode-s2b-80:9000/user/hive/warehouse/test.db/.hive-staging_hive_2017-11-20_21-12-31_880_1843842567826987200-1/-ext-10003,expected: hdfs://testcluster
FAILED: Execution Error, return code 1 fromorg.apache.hadoop.hive.ql.exec.MoveTask
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 4.78 sec HDFS Read: 10883134 HDFS Write: 10880084SUCCESS
Total MapReduce CPU Time Spent: 4 seconds780 msec
解决:
Hive元数据问题,在Hive数据库中有两张表:
DBS : Hive数据仓库的总路径
SDS : Hive每张表对应的路径
数据库中保存了原来的hdfs的路径,修改成HA对应的别名即可
update DBS set DB_LOCATION_URI=REPLACE (DB_LOCATION_URI,'bis-newdatanode-s2b-80:9000','testcluster');
updateSDS set LOCATION=REPLACE(LOCATION,'bis-newdatanode-s2b-80:9000','testcluster');- hadoop2.2.0升级到2.7.2
- hadoop2.7.0升级到2.7.1,版本升级
- 低版本升级到hadoop2
- hadoop1.0.4升级到hadoop2.2 详细流程步骤
- hadoop1.0.4升级到hadoop2.2 详细流程步骤
- 从Hadoop1.x集群升级到Hadoop2.x步骤
- hadoop1.0.4升级到hadoop2.4.1 with HA
- Centos 升级python 到2.7.2
- 升级python到2.7
- 升级python到2.7
- android studio 2.0升级到2.2.2
- 升级到 2M 。
- 升级到Rails2.2
- hadoop2.2.0定制mapreduce输出到Redis
- 从Hadoop1.x集群升级到Hadoop2.x失败回滚步骤
- centos升级python到2.7
- redhat 升级python到2.7
- redhat 升级python到2.7
- 【云计算的1024种玩法】手把手学会配置安装 LNMP 建站环境
- 普元 EOS Platform 7.6 sso集成业务应用实现单点登录,但登录跳转到成功页面时,经常出现闪屏问题,每秒10次以上
- 架构师向左,项目经理向右??
- IntentService详解及源码分析
- centos6.8编译安装mysql
- hadoop2.2.0升级到2.7.2
- 32位机和64位机下面各类型sizeof的大小
- 文件的属性和基本操作
- 如何在linux下安装phpmyadmin
- 在C++builder中远离恼人的W8123 warning警告
- UE4 VR局域网(二)基础知识
- 解密:天猫双十一1682亿背后的“霸下-七层流量清洗”系统
- Spring的IOC原理[通俗解释一下]
- mask rcnn mxnet