Hadoop DataNode启动之asyncBlockReport
来源:互联网 发布:设计效果图软件 编辑:程序博客网 时间:2024/05/21 08:01
DN会不定期定期向NN发送块报告,以使NN能知道自己的块存储情况,便于对外提供服务,对于Hadoop这个大货来说,存放的数据量非常大,如果每次块报告时同步的扫描block显然是不现实的,这时就需要有服务把块报告先准备好,以提高报告的效率,这个服务就是asyncBlockReport,这是一个后台守护线程,在DN创建FSDataset时一并启动。
public FSDataset(DataStorage storage, Configuration conf) throws IOException { ..... ..... FSVolume[] volArray = new FSVolume[storage.getNumStorageDirs()]; for (int idx = 0; idx < storage.getNumStorageDirs(); idx++) { volArray[idx] = new FSVolume(storage.getStorageDir(idx).getCurrentDir(), conf); } //获得卷组结构 volumes = new FSVolumeSet(volArray); //获得数据块到块文件的映射,并存放到HashMap中 volumes.getVolumeMap(volumeMap); //创建异步块报告实例并启动 asyncBlockReport = new AsyncBlockReport(this); asyncBlockReport.start(); File[] roots = new File[storage.getNumStorageDirs()]; for (int idx = 0; idx < storage.getNumStorageDirs(); idx++) { roots[idx] = storage.getStorageDir(idx).getCurrentDir(); } asyncDiskService = new FSDatasetAsyncDiskService(roots); registerMBean(storage.getStorageID()); }
看下线程的执行体
public void run() { while (shouldRun) { try { waitForReportRequest();//等待扫描请求,在DN的启动过程中会先扫描一次 assert requested && scan == null; //打印日志,并记录启动时间 DataNode.LOG.info("Starting asynchronous block report scan"); long st = System.currentTimeMillis(); //开始扫描,并生成块报告 HashMap<Block, File> result = fsd.roughBlockScan(); DataNode.LOG.info("Finished asynchronous block report scan in " + (System.currentTimeMillis() - st) + "ms"); //给blockreport赋值 synchronized (this) { assert scan == null; this.scan = result; } } catch (InterruptedException ie) { // interrupted to end scanner } catch (Throwable t) { DataNode.LOG.error("Async Block Report thread caught exception", t); try { // Avoid busy-looping in the case that we have entered some invalid // state -- don't want to flood the error log with exceptions. Thread.sleep(2000); } catch (InterruptedException e) { } } }}
如何扫描是我们关心的,看下roughBlockScan函数,扫描时并未对目录加锁,有可能更新正在进行,所以这是一个比较粗糙的块报告,但也提供了更高的性能
HashMap<Block, File> roughBlockScan() { int expectedNumBlocks; synchronized (this) { expectedNumBlocks = volumeMap.size(); } HashMap<Block, File> seenOnDisk = new HashMap<Block, File>(expectedNumBlocks, 1.1f); //开始扫描 volumes.scanBlockFilesInconsistent(seenOnDisk); return seenOnDisk; }
继续贴scanBlockFilesInconsistent函数
void scanBlockFilesInconsistent(Map<Block, File> results) { // 创建文件卷的一个快照,以防扫描时发生更改 FSVolume volumesCopy[]; synchronized (this) { volumesCopy = Arrays.copyOf(volumes, volumes.length); } for (FSVolume vol : volumesCopy) { vol.scanBlockFilesInconsistent(results);//注意这里 }}
层层调用
void scanBlockFilesInconsistent(Map<Block, File> results) { scanBlockFilesInconsistent(dataDir.dir, results);}
下面看真正干活的函数scanBlockFilesInconsistent,注意这里生成的块报告并不是同步的,因为在扫描过程中可能会有块的增加或删除,所以在向NN发送之前会通过reconcileRoughBlockScan再次进行核对
private void scanBlockFilesInconsistent( File dir, Map<Block, File> results) { //获得数据目录下的所有文件 File filesInDir[] = dir.listFiles(); if (filesInDir != null) { for (File f : filesInDir) { //判断是否为块文件 if (Block.isBlockFilename(f)) { long blockLen = f.length(); //文件是否存在,因为扫描时可能会被删除 if (blockLen == 0 && !f.exists()) { // length 0 could indicate a race where this file was removed // while we were in the middle of generating the report. continue; } //生成一个标志位,并用该标识创建块实例 long genStamp = FSDataset.getGenerationStampFromFile(filesInDir, f); Block b = new Block(f, blockLen, genStamp); //构建一个blockreport条目,存入HashMap results.put(b, f); } else if (f.getName().startsWith("subdir")) { // 如果有子目录则进行递归扫描 scanBlockFilesInconsistent(f, results); } } }}
每个数据目录都会进行相同的操作,待函数执行完,一个可能不一致的blockreport就产生了,等重新核对报告后便会向NN发送该报告。
- Hadoop DataNode启动之asyncBlockReport
- Hadoop DataNode启动之refreshUsed
- Hadoop DataNode启动之asyncDiskService
- Hadoop DataNode启动之dataXceiverServer
- Hadoop DataNode启动之DataBlockScanner
- Hadoop DataNode启动之register
- Hadoop DataNode启动之heartbeat
- Hadoop DataNode启动之offferService
- hadoop之datanode无法启动
- Hadoop DataNode启动之DiskChecker(一)
- Hadoop DataNode启动之DiskChecker(二)
- Hadoop DataNode启动之数据目录校验
- hadoop datanode 无法启动
- hadoop datanode无法启动
- Hadoop datanode无法启动
- Hadoop datanode无法启动
- hadoop datanode无法启动
- Hadoop DataNode无法启动
- 内存分配方式,堆区,栈区,new/delete/malloc/free
- Raspberry Pi入门 1——启动
- WIN7下硬盘安装linux双系统教程(EasyBCD法)
- 【Leetcode】Given a binary tree, check whether it is a mirror of itself
- C++四种类型转换
- Hadoop DataNode启动之asyncBlockReport
- ubuntu 10.04.4 安装拼音输入法
- 【数据结构复习】二叉树的遍历——从微软2014校园招聘说起
- uva 709 - Formatting Text(记忆化搜索)
- HDU 3695 Computer Virus on Planet Pandora
- MySQL插入索引太慢,加参数 DELAY_KEY_WRITE
- 堆栈溢出一般是由什么原因导致
- Working Practice-多方位学习
- 文件夹删不掉?有种文件夹叫 畸形文件夹