Hadoop源码分析----Client的open、seek和read操作
来源:互联网 发布:大数据金融是什么意思 编辑:程序博客网 时间:2024/06/04 01:04
hadoop虽然没有提供POSIX那样的操作,但是提供的基本的文件操作open,create,delete,write,seek,read还是令用户可以方便的操作文件。下面是一段寻常的hadoop打开文件并且读取文件内容的代码:这个open函数返回的是DFSInputStream对象,下面是DFSInputStream的构造函数:下面是DFSInputStream的openInfo函数,这个函数式整个open系列的核心操作。其中callGetBlockLocations是通过RPC和namenode通信来访问该文件的前prefetchSize个块(配置文件里的,默认为10)。把这10个块的位置存放在这个流中。后面有一个updateBlockInfo函数是选最后一块的datanode的信息与namenode上的信息做比较,若不一致,则遵从datanode上的信息(因为namenode和datanode上的信息可能存在不一致)。
- hdfs=hdfsPath.getFileSystem(conf);
- inFsData=hdfs.open(p);
- inFsData.seek(place);
- inFsData.readLong();
hdfs是FileSystem的实例,FileSystem是一个抽象类,根据conf中url的内容,返回的hdfs可能是本地文件系统的实例,也可能是分布式文件系统的实例。hadoop文件操作的实际类是DistributedFileSystem
下面来看一下DistributedFileSystem的open操作:
- public FSDataInputStream open(Path f, int bufferSize) throws IOException {
- statistics.incrementReadOps(1);
- return new DFSClient.DFSDataInputStream(
- dfs.open(getPathName(f), bufferSize, verifyChecksum, statistics));
- }
可以看出open操作是返回一个FSDataInputStream的输入流,open里面生成了DFSClient中内部类DFSDataInputStream的对象,对象的其中参数是DFSClent的open函数返回值下面是DFSClient的open函数
- public DFSInputStream open(String src, int buffersize, boolean verifyChecksum,
- FileSystem.Statistics stats
- ) throws IOException {
- checkOpen();
- // Get block info from namenode
- return new DFSInputStream(src, buffersize, verifyChecksum);
- }
- DFSInputStream(String src, int buffersize, boolean verifyChecksum
- ) throws IOException {
- this.verifyChecksum = verifyChecksum;
- this.buffersize = buffersize;
- this.src = src;
- prefetchSize = conf.getLong("dfs.read.prefetch.size", prefetchSize);
- openInfo();
- }
- synchronized void openInfo() throws IOException {
- LocatedBlocks newInfo = callGetBlockLocations(namenode, src, 0, prefetchSize);
- if (newInfo == null) {
- throw new FileNotFoundException("File does not exist: " + src);
- }
- // I think this check is not correct. A file could have been appended to
- // between two calls to openInfo().
- if (locatedBlocks != null && !locatedBlocks.isUnderConstruction() &&
- !newInfo.isUnderConstruction()) {
- Iterator<LocatedBlock> oldIter = locatedBlocks.getLocatedBlocks().iterator();
- Iterator<LocatedBlock> newIter = newInfo.getLocatedBlocks().iterator();
- while (oldIter.hasNext() && newIter.hasNext()) {
- if (! oldIter.next().getBlock().equals(newIter.next().getBlock())) {
- throw new IOException("Blocklist for " + src + " has changed!");
- }
- }
- }
- updateBlockInfo(newInfo);
- this.locatedBlocks = newInfo;
- this.currentNode = null;
- }
然后的seek和read函数都是针对于stream的。下面看下DFSInputStream的seek函数
- public synchronized void seek(long targetPos) throws IOException {
- if (targetPos > getFileLength()) {
- throw new IOException("Cannot seek after EOF");
- }
- boolean done = false;
- if (pos <= targetPos && targetPos <= blockEnd) {
- //
- // If this seek is to a positive position in the current
- // block, and this piece of data might already be lying in
- // the TCP buffer, then just eat up the intervening data.
- //
- int diff = (int)(targetPos - pos);
- if (diff <= TCP_WINDOW_SIZE) {
- try {
- pos += blockReader.skip(diff);
- if (pos == targetPos) {
- done = true;
- }
- } catch (IOException e) {//make following read to retry
- LOG.debug("Exception while seek to " + targetPos + " from "
- + currentBlock +" of " + src + " from " + currentNode +
- ": " + StringUtils.stringifyException(e));
- }
- }
- }
- if (!done) {
- pos = targetPos;
- blockEnd = -1;
- }
- }
0 0
- Hadoop源码分析----Client的open、seek和read操作
- Hadoop源码分析----Client的open、seek和read操作
- python 3-5-1 关于文件的操作-open/read/readlines/seek/write/writelines
- Hadoop源码分析9:IPC流程(4) Client 的 wait() 和 notify()
- python学习-read和seek
- Hadoop RPC源码分析之Client
- Hadoop提交Job Client端源码分析
- Hadoop源码分析- RPC client端篇
- 对应于Linux中open, read, write, seek, close的windows API
- Hadoop源码分析HDFS Client向HDFS写入数据的过程解析
- 文件操作open与fopen和read与fread的区别
- 文件操作open与fopen和read与fread的区别
- FIleInputStream中read和Socket中read源码分析
- Android JNI层实现文件的read、write与seek操作
- Android JNI层实现文件的read、write与seek操作
- Android JNI层实现文件的read、write与seek操作
- NDK开发(五):Android JNI层实现文件的read、write与seek操作
- Android JNI层实现文件的read、write与seek操作
- IOS开发之XMPP个人服务器搭建
- BIRT:基于 Eclipse 的报表
- Android热点回顾第一期
- VisualNet有线电视综合布线管理系统项目实际应用其一
- vs链接错误
- Hadoop源码分析----Client的open、seek和read操作
- jQuery适用技巧笔记整合
- eclipse设置的断点无效的解决方案
- Web应用程序开发提高性能的十大秘诀
- jQuery验证控件jquery.validate.js使用说明+中文API
- AE二次开发中,打开本地shp文件后,出现文件锁定状态,即后缀为 .sr.lock
- 基于XMPP的IOS聊天客户端程序(IOS端一)
- SIFT算法学习
- 命令行的故障排除:给linux初学者的建议