hadoop配置HA原理
来源:互联网 发布:xbox手柄映射软件 编辑:程序博客网 时间:2024/06/05 10:44
Architecture
In a typical HA clusiter, two separate machines are configured as NameNodes. At any point in time, exactly one of the NameNodes is in an Active state, and the other is in a Standby state. The Active NameNode is responsible for all client operations in the cluster, while the Standby is simply acting as a slave, maintaining enough state to provide a fast failover if necessary.
In order for the Standby node to keep its state synchronized with the Active node, both nodes communicate with a group of separate daemons called “JournalNodes” (JNs). When any namespace modification is performed by the Active node, it durably logs a record of the modification to a majority of these JNs. The Standby node is capable of reading the edits from the JNs, and is constantly watching them for changes to the edit log. As the Standby Node sees the edits, it applies them to its own namespace. In the event of a failover, the Standby will ensure that it has read all of the edits from the JounalNodes before promoting itself to the Active state. This ensures that the namespace state is fully synchronized before a failover occurs.
In order to provide a fast failover, it is also necessary that the Standby node have up-to-date information regarding the location of blocks in the cluster. In order to achieve this, the DataNodes are configured with the location of both NameNodes, and send block location information and heartbeats to both.
It is vital for the correct operation of an HA cluster that only one of the NameNodes be Active at a time. Otherwise, the namespace state would quickly diverge between the two, risking data loss or other incorrect results. In order to ensure this property and prevent the so-called “split-brain scenario,” the JournalNodes will only ever allow a single NameNode to be a writer at a time. During a failover, the NameNode which is to become active will simply take over the role of writing to the JournalNodes, which will effectively prevent the other NameNode from continuing in the Active state, allowing the new Active to safely proceed with failover.
HardWare Resources
In order to deploy an HA cluster, you should prepare the following:
- NameNode machines - the machines on which you run the Active and Standby NameNodes should have equivalent hardware to each other, and equivalent hardware to what would be used in a non-HA cluster.
- JournalNode machines - the machines on which you run the JournalNodes. The JournalNode daemon is relatively lightweight, so these daemons may reasonably be collocated on machines with other Hadoop daemons, for example NameNodes, the JobTracker, or the YARN ResourceManager. Note: There must be at least 3 JournalNode daemons, since edit log modifications must be written to a majority of JNs. This will allow the system to tolerate the failure of a single machine. You may also run more than 3 JournalNodes, but in order to actually increase the number of failures the system can tolerate, you should run an odd number of JNs, (i.e. 3, 5, 7, etc.). Note that when running with N JournalNodes, the system can tolerate at most (N - 1) / 2 failures and continue to function normally.
Note that, in an HA cluster, the Standby NameNode also performs checkpoints of the namespace state, and thus it is not necessary to run a Secondary NameNode, CheckpointNode, or BackupNode in an HA cluster. In fact, to do so would be an error. This also allows one who is reconfiguring a non-HA-enabled HDFS cluster to be HA-enabled to reuse the hardware which they had previously dedicated to the Secondary NameNode.
Deployment
hdfs-site.xml
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>192.168.153.201:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>192.168.153.205:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>192.168.153.201:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>192.168.153.205:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://192.168.153.202:8485;192.168.153.203:8485;192.168.153.204:8485/mycluster</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/centos/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/centos/hadoop/hdfs/journal</value>
</property>
core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
Notes
1、Currently, only a maximum of two NameNodes may be configured per nameservice.
阅读全文
0 0
- hadoop配置HA原理
- hadoop HA原理
- Hadoop HA MR1配置
- hadoop-ha配置
- Hadoop HA 配置
- Hadoop 2.0 ResourceManager HA原理
- hadoop(二):hdfs HA原理
- Hadoop 配置及hadoop HA 的配置
- Hadoop 2.2.0 HA配置
- hadoop HA集群环境配置
- Hadoop完全分布式+HA配置
- Hadoop-2.4.1 HA配置
- Hadoop HA模式升级配置
- Hadoop HA高可用配置
- Hadoop Ha (High avilable)配置
- Hadoop HA的安装配置
- hadoop配置HA简单总结
- hadoop HA环境安装配置
- SSS1629|替代CM108|替代CM118|替代CM108AH|台湾鑫创
- 关于劫持快照代码。
- Column count doesn't match value count at row 1
- 文本特征值提取,采用结巴将文本分词,tf-idf算法得到特征值,以及给出了idf词频文件的训练方法
- centos 下安装lamp环境
- hadoop配置HA原理
- JVM学习笔记一 之 内存结构
- MeasureSpec计算分析
- BZOJ3436:小K的农场
- 17090202_Win7(64位)下光盘安装/构建CentOS7(64位)双系统教程
- 关于任务二(用户兴趣标注)的总结
- ubuntu 14.04 perl: warning: Setting locale failed.
- Linux服务器安装SVN
- sql server 异地备份与删除 (保留用于参考)