HDFS针对多硬盘节点的存储策略
来源:互联网 发布:网商银行客户贷款数据 编辑:程序博客网 时间:2024/05/22 07:44
http://hi.baidu.com/thinkdifferent/blog/item/95de0e2416c4da3fc89559b8.html
对于HDFS针对多硬盘节点的存储策略,一直没有找到比较确实的依据,只有Hadoop官网上说过一句nodes with multiple disks should be managed internally(大致如此,懒得再看了)。今天看到一篇博客,直接把代码片段给贴上去了。现转贴如下:
from: kzk's blog
To use multiple disks in Hadoop DataNode, you should add comma-separated directories to dfs.data.dir in hdfs-site.xml. The following is an example of using four disks.
- <property>
- <name>dfs.data.dir</name>
- <value>/disk1, /disk2, /disk3, /disk4</value>
- </property>
But how to use these disks in Hadoop? I found the following code snippet in ./hdfs/org/apache/hadoop/hdfs/server/datanode/FSDataset.java at hadoop-0.20.1.
- synchronized FSVolume getNextVolume(long blockSize) throws IOException {
- int startVolume = curVolume;
- while (true) {
- FSVolume volume = volumes[curVolume];
- curVolume = (curVolume + 1) % volumes.length;
- if (volume.getAvailable() > blockSize) { return volume; }
- if (curVolume == startVolume) {
- throw new DiskOutOfSpaceException("Insufficient space for an additional block");
- }
- }
- }
FSVolume represents the single directory specified at dfs.data.dir. This code places the blocks in round-robin fashion into multiple disks, while considering the available disk capacities.
One more thing. If the disk utilization reaches the 100%, the other important data (c,f. error log) cannot be written. To prevent this, Hadoop prepares the "dfs.datanode.du.reserved" value. When calculating the disk capacity in Hadoop, this value is always subtracted from the real capacity. Setting this value as severay hundreds of megabytes would be safe.
This is the default strategy of Hadoop, but I think considering the disk load avg would be better. If one disk is busy, Hadoop should avoid to use that disk. However, the block distribution would not be same across the disks in this method. Therefore, the read performance will drop. This is a very difficult problem. Do you come up with a better strategy?
- HDFS针对多硬盘节点的存储策略
- hadoop2.0的datanode多存储硬盘设置数据副本存放策略
- Unix下针对邮件,搜索,网络硬盘等海量存储的分布式文件系统项目
- HDFS多硬盘挂载
- HDFS的副本存放策略
- hadoop单个数据节点的不同存储路径的存储策略源码分析。
- hadoop单个数据节点的不同存储路径的存储策略源码分析
- 硬盘的存储机制
- 针对百度的常规网页优化策略
- 【Hbase】ubuntu下单节点安装hbase存储使用hdfs
- 【Hbase】ubuntu下单节点安装hbase存储使用hdfs
- HDFS中数据节点数据块存储示例
- Hdfs磁盘存储策略和预留空间配置
- hdfs架构的简述与hdfs读写策略
- hadoop的HDFS文件存储
- 客户端得到HDFS各个节点的状况
- HDFS的dataNode节点启动不起来
- HDFS副本放置节点选择的优化
- 如何使用InstallShield中的LaunchAppAndWait()
- code
- 字典树 模板
- sphinx
- 有关Request.UrlReferrer使用
- HDFS针对多硬盘节点的存储策略
- C#路径的八种相关操作
- 嵌入式内功.扎马步
- 获取IWebBrowser2指针的方法(一)
- 程序的多语言支持
- squid 缓存 nginx,gzip设置
- 人脉经营全攻略:认识、经营与开发
- 不惑之礼
- 确保实现“十一五”节能减排目标