Hadoop集群balance源码简析
来源:互联网 发布:网站长尾关键词优化 编辑:程序博客网 时间:2024/05/21 13:22
Hadoop集群做balance的时候不是一次做完,而是分多轮做的,每轮单个节点移动和接收的数据不超过10G byte,每次不超过20分钟,每完成一次一轮,都会更新datanode的磁盘空间相关的信息
* <p>The tool moves blocks from highly utilized datanodes to poorly
* utilized datanodes iteratively. In each iteration a datanode moves or* receives no more than the lesser of 10G bytes or the threshold fraction
* of its capacity. Each iteration runs no more than 20 minutes.
* At the end of each iteration, the balancer obtains updated datanodes
* information from the namenode.
balance会在一下5中情况中的一种情况满足时退出:
1. 集群已经平衡
2. 没有块可以移动
3. 在5轮balance过程中没有块被移动
4. 与namenode通信发生异常
5. 启动了另外一个balance线程
* <p>The balancer automatically exits when any of the following five
* conditions is satisfied:
* <ol>
* <li>The cluster is balanced;
* <li>No block can be moved;
* <li>No block has been moved for five consecutive iterations;
* <li>An IOException occurs while communicating with the namenode;
* <li>Another balancer is running.
* </ol>
需要balance的datanode,会首先选择与该节点同机架位的节点进行数据块拷贝,如下:
private boolean isGoodBlockCandidate(Source source,
BalancerDatanode target, BalancerBlock block) {
// check if the block is moved or not
if (movedBlocks.contains(block)) {
return false;
}
if (block.isLocatedOnDatanode(target)) {
return false;
}
boolean goodBlock = false;
if (cluster.isOnSameRack(source.getDatanode(), target.getDatanode())) {
// good if source and target are on the same rack
goodBlock = true;
} else {
boolean notOnSameRack = true;
synchronized (block) {
for (BalancerDatanode loc : block.locations) {
if (cluster.isOnSameRack(loc.datanode, target.datanode)) {
notOnSameRack = false;
break;
}
}
}
if (notOnSameRack) {
// good if target is target is not on the same rack as any replica
goodBlock = true;
} else {
// good if source is on the same rack as on of the replicas
for (BalancerDatanode loc : block.locations) {
if (loc != source &&
cluster.isOnSameRack(loc.datanode, source.datanode)) {
goodBlock = true;
break;
}
}
}
}
return goodBlock;
}
将数据块从原机器拷贝到目的机器的几个原则:
1.需要移动的块没有正在被移动或者移动过
2.需要移动的块在目标机器没有副本
3.移动块后,移动的块副本所在的机架数量不会减少
/* Decide if it is OK to move the given block from source to target
* A block is a good candidate if
* 1. the block is not in the process of being moved/has not been moved;
* 2. the block does not have a replica on the target;
* 3. doing the move does not reduce the number of racks that the block has
*/
- Hadoop集群balance源码简析
- hadoop集群balance工具详解
- hadoop集群balance工具详解
- hadoop集群balance工具详解
- hadoop集群balance工具详解
- hadoop集群内和磁盘内的balance
- hadoop集群负载不均衡及balance工具详解
- Hadoop Balance介绍
- hadoop Balance 优化
- 编译运行HBase源码,安装hadoop集群
- Hadoop源码解析-作业执行流程-集群模式
- HADOOP集群搭建_apache官方源码方式搭建
- Hadoop集群
- hadoop集群
- hadoop 集群
- Hadoop集群
- Hadoop集群
- Hadoop集群
- linux jdk配置
- 线程的内部机制
- js创建对象并赋值其属性
- oracle 服务器进程中LOCAL=NO /YES
- VS2012 编译 boost1.53/ boost1.49
- Hadoop集群balance源码简析
- log4j.properties 配置详解
- Simultaneous CPU/GPU Debugging in Visual Studio 2013
- 一个使用socket进行文件传输的例子
- 什么是长尾关键词?长尾关键词的含义
- [linux]shell脚步记录
- 【REST】在 WCF RESTfull service 中实现自己的身份验证方式
- 如何判断一个数是否是2的幂次方数
- 大数据 存储和管理