HDFS的写数据过程分析
来源:互联网 发布:淘宝上货最少几件 编辑:程序博客网 时间:2024/05/23 17:45
HDFS的写数据过程分析
在整个过程中,DFSClient 是个很重要的类,从名称就可以看出,他表示 HDFS 的 Client,是整个 HDFS 的 RPC 机制的客户端部分。我们对 HDFS 的操作,是通过 FileSsytem 调用的DFSClient 里面的方法。FileSystem 是封装了对 DFSClient 的操作,提供给用户使用的我们通过FileSystem类可以操控HDFS, 那我们就从这里开始分析写数据到HDFS的过程。在我们向 HDFS 写文件的时候,调用的是 FileSystem.create(Path path)方法,我们查看这个方法的源码,通过跟踪内部的重载方法,可以找到这个方法是抽象类,没有实现。那么我们只能向他的子类寻找实现。FileSystem 有个子类是 DistributedFileSystem,在我们的伪分布环境下使用的就是这个类。我们可以看到DistributedFileSystem 的这个方法的实现
- /**
- * Opens an FSDataOutputStream at the indicated Path with write-progress
- * reporting.
- * @param f the file name to open
- * @param permission
- * @param overwrite if a file with this name already exists, then if true,
- * the file will be overwritten, and if false an error will be thrown.
- * @param bufferSize the size of the buffer to be used.
- * @param replication required block replication for the file.
- * @param blockSize
- * @param progress
- * @throws IOException
- * @see #setPermission(Path, FsPermission)
- */
- public abstract FSDataOutputStream create(Path f,
- FsPermission permission,
- boolean overwrite,
- int bufferSize,
- short replication,
- long blockSize,
- Progressable progress) throws IOException;
点create方法 按ctrl+T 点击DistributedFileSystem 进进去了:注意返回值 FSDataOutputStream。这个返回值对象调用了自己的构造方法, 构造方法的第一个参数是 dfs.create()方法。 我们关注一下这里的 dfs 对象是谁,create 方法做了什么事情。现在进入这个方法的实现,
- public FSDataOutputStream create(Path f, FsPermission permission,
- boolean overwrite,
- int bufferSize, short replication, long blockSize,
- Progressable progress) throws IOException {
- statistics.incrementWriteOps(1);
- return new FSDataOutputStream
- (dfs.create(getPathName(f), permission,
- overwrite, true, replication, blockSize, progress, bufferSize),
- statistics);
- }
final DFSOutputStream result = new DFSOutputStream(src, masked,
- /**
- * Create a new dfs file with the specified block replication
- * with write-progress reporting and return an output stream for writing
- * into the file.
- *
- * @param src stream name
- * @param permission The permission of the directory being created.
- * If permission == null, use {@link FsPermission#getDefault()}.
- * @param overwrite do not check for file existence if true
- * @param createParent create missing parent directory if true
- * @param replication block replication
- * @return output stream
- * @throws IOException
- * @see ClientProtocol#create(String, FsPermission, String, boolean, short, long)
- */
- public OutputStream create(String src,
- FsPermission permission,
- boolean overwrite,
- boolean createParent,
- short replication,
- long blockSize,
- Progressable progress,
- int buffersize
- ) throws IOException {
- checkOpen();
- if (permission == null) {
- permission = FsPermission.getDefault();
- }
- FsPermission masked = permission.applyUMask(FsPermission.getUMask(conf));
- LOG.debug(src + ": masked=" + masked);
- final DFSOutputStream result = new DFSOutputStream(src, masked,
- overwrite, createParent, replication, blockSize, progress, buffersize,
- conf.getInt("io.bytes.per.checksum", 512));
- beginFileLease(src, result);
- return result;
- }
返回值创建的对象。这个类有什么神奇的地方吗?我们看一下他的(点击DFSOutputStream)源码,可 以 看 到 , 这 个 类 是 DFSClient 的 内 部 类 。 在 类 内 部 通 过 调用namenode.create()方法创建了一个输出流。我们再看一下 namenode 对象是什么类型
- /**
- * Create a new output stream to the given DataNode.
- * @see ClientProtocol#create(String, FsPermission, String, boolean, short, long)
- */
- DFSOutputStream(String src, FsPermission masked, boolean overwrite,
- boolean createParent, short replication, long blockSize, Progressable progress,
- int buffersize, int bytesPerChecksum) throws IOException {
- this(src, blockSize, progress, bytesPerChecksum, replication);
- computePacketChunkSize(writePacketSize, bytesPerChecksum);
- try {
- // Make sure the regular create() is done through the old create().
- // This is done to ensure that newer clients (post-1.0) can talk to
- // older clusters (pre-1.0). Older clusters lack the new create()
- // method accepting createParent as one of the arguments.
- if (createParent) {
- namenode.create(
- src, masked, clientName, overwrite, replication, blockSize);
- } else {
- namenode.create(
- src, masked, clientName, overwrite, false, replication, blockSize);
- }
- } catch(RemoteException re) {
- throw re.unwrapRemoteException(AccessControlException.class,
- FileAlreadyExistsException.class,
- FileNotFoundException.class,
- NSQuotaExceededException.class,
- DSQuotaExceededException.class);
- }
- streamer.start();
- }
可以看到 namenode 其实是 ClientProtocal 接口。那么,这个对象是什么时候创建的?点击outline点击下面这个DFSClient:
- public final ClientProtocol namenode;
- /**
- * Create a new DFSClient connected to the given nameNodeAddr or rpcNamenode.
- * Exactly one of nameNodeAddr or rpcNamenode must be null.
- */
- DFSClient(InetSocketAddress nameNodeAddr, ClientProtocol rpcNamenode,
- Configuration conf, FileSystem.Statistics stats)
- throws IOException {
- this.conf = conf;
- this.stats = stats;
- this.nnAddress = nameNodeAddr;
- this.socketTimeout = conf.getInt("dfs.socket.timeout",
- HdfsConstants.READ_TIMEOUT);
- this.datanodeWriteTimeout = conf.getInt("dfs.datanode.socket.write.timeout",
- HdfsConstants.WRITE_TIMEOUT);
- this.timeoutValue = this.socketTimeout;
- this.socketFactory = NetUtils.getSocketFactory(conf, ClientProtocol.class);
- // dfs.write.packet.size is an internal config variable
- this.writePacketSize = conf.getInt("dfs.write.packet.size", 64*1024);
- this.maxBlockAcquireFailures = getMaxBlockAcquireFailures(conf);
- this.hdfsTimeout = Client.getTimeout(conf);
- ugi = UserGroupInformation.getCurrentUser();
- this.authority = nameNodeAddr == null? "null":
- nameNodeAddr.getHostName() + ":" + nameNodeAddr.getPort();
- String taskId = conf.get("mapred.task.id", "NONMAPREDUCE");
- this.clientName = "DFSClient_" + taskId + "_" +
- r.nextInt() + "_" + Thread.currentThread().getId();
- defaultBlockSize = conf.getLong("dfs.block.size", DEFAULT_BLOCK_SIZE);
- defaultReplication = (short) conf.getInt("dfs.replication", 3);
- if (nameNodeAddr != null && rpcNamenode == null) {
- this.rpcNamenode = createRPCNamenode(nameNodeAddr, conf, ugi);
- this.namenode = createNamenode(this.rpcNamenode, conf);
- } else if (nameNodeAddr == null && rpcNamenode != null) {
- //This case is used for testing.
- this.namenode = this.rpcNamenode = rpcNamenode;
- } else {
- throw new IllegalArgumentException(
- "Expecting exactly one of nameNodeAddr and rpcNamenode being null: "
- + "nameNodeAddr=" + nameNodeAddr + ", rpcNamenode=" + rpcNamenode);
- }
- // read directly from the block file if configured.
- this.shortCircuitLocalReads = conf.getBoolean(
- DFSConfigKeys.DFS_CLIENT_READ_SHORTCIRCUIT_KEY,
- DFSConfigKeys.DFS_CLIENT_READ_SHORTCIRCUIT_DEFAULT);
- if (LOG.isDebugEnabled()) {
- LOG.debug("Short circuit read is " + shortCircuitLocalReads);
- }
- this.connectToDnViaHostname = conf.getBoolean(
- DFSConfigKeys.DFS_CLIENT_USE_DN_HOSTNAME,
- DFSConfigKeys.DFS_CLIENT_USE_DN_HOSTNAME_DEFAULT);
- if (LOG.isDebugEnabled()) {
- LOG.debug("Connect to datanode via hostname is " + connectToDnViaHostname);
- }
- String localInterfaces[] =
- conf.getStrings(DFSConfigKeys.DFS_CLIENT_LOCAL_INTERFACES);
- if (null == localInterfaces) {
- localInterfaces = new String[0];
- }
- this.localInterfaceAddrs = getLocalInterfaceAddrs(localInterfaces);
- if (LOG.isDebugEnabled() && 0 != localInterfaces.length) {
- LOG.debug("Using local interfaces [" +
- StringUtils.join(",",localInterfaces)+ "] with addresses [" +
- StringUtils.join(",",localInterfaceAddrs) + "]");
- }
- }
可以看到34行namenode 对象是在 DFSClient 的构造函数调用时创建的,即当 DFSClient 对象存在的时候,namenode 对象已经存在了。至此,我们可以看到,使用 FileSystem 对象的 api 操纵 HDFS,其实是通过 DFSClient 对象访问 NameNode 中的方法操纵 HDFS 的。 这里的 DFSClient 是 RPC 机制的客户端, NameNode是 RPC 机制的服务端的调用对象,整个调用过程如图
0 0
- HDFS的写数据过程分析
- HDFS的写数据过程分析
- HDFS写数据过程
- HDFS写文件过程分析
- HDFS写文件过程分析
- HDFS写文件过程分析
- MapReduce(十六): 写数据到HDFS的源码分析
- HDFS dfsclient写文件过程 源码分析
- HDFS dfsclient写文件过程 源码分析
- (4). hdfs数据写过程概述
- HDFS读文件过程分析:读取文件的Block数据
- HDFS读文件过程分析:读取文件的Block数据
- Hadoop源码分析HDFS Client向HDFS写入数据的过程解析
- HDFS数据的读写过程
- HDFS的数据读取过程
- HDFS的数据写入过程
- hdfs写文件过程
- HDFS写文件过程
- getIntent不知道key获取value
- Android权限配置
- 如何使用jQuery的jsonp解决跨域问题
- 11-1. 通讯录的录入与显示(10)
- 手机屏幕尺寸与iOS开发坐标的关系
- HDFS的写数据过程分析
- 第一个runtime error hdu 1402 优化后在下面
- 20140731
- Request的getParameter和getAttribute方法的区别
- c++中蓝牙编程的库类
- 数据按月统计
- STM32存储器 — <2>STM32存储器知识的相关应用(IAP、Bit Banding)
- c++中与指针相关一些基本知识
- jquery事件 【mousedown与mouseup ----keydown与keypress与keyup】focus--blur--orrer--pageX-pageY