HDFS-源码分析(2)——客户端和服务端

来源:互联网 发布:单片机程序编写 编辑:程序博客网 时间:2024/04/27 17:34

RPC通信的两端一端是Client——客户端,一端是Server——服务器,Client/Server的区分不在于机器的物理位置,而在于它们在通信中的逻辑地位。发起通信的是Client,接受信息的是Server。

在HDFS中,存在着1,客户端-NameNode;2,客户端-DataNode;3,DataNode-NameNode;4,DataNode-DataNode 四种C/S结构通信,这里的客户端指的就是使用HDFS集群的那个客户应用了。

PS:可以发现不存在NameNode-DataNode的通信,实际上确实如此,DataNode会向NameNode发送心跳以及节点信息,NameNode只是被动获知而不会主动索取,客户读写数据首先和NameNode通信得到读或者写的DataNode信息,再发起客户端-DataNode通信,不存在NameNode的中转。

Client

org.apache.hadoop.ipc.Client是Client的基类,类定义如下:

/** A client for an IPC service.  IPC calls take a single {@link Writable} as a * parameter, and return a {@link Writable} as their value.  A service runs on * a port and is defined by a parameter class and a value class. *  * @see Server */@Public@InterfaceStability.Evolvingpublic class Client implements AutoCloseable {

可以清晰知道官方定义:“客户端是用于IPC服务的,IPC调用以一个Writable对象作为传入参数,也以一个Writable对象作为返回。”

Client类包含如下子类

ClientExecutorServiceFactoryCallConnectionConnectionIdIpcStreams

挑选其中重要的讲解:

Call

Call是RPC调用的抽象。
以下是Call的域和构造器,从中可以窥见RPC调用的一些机制:

/**    * Class that represents an RPC call   */  static class Call {    final int id;               // call id    final int retry;           // retry count    final Writable rpcRequest;  // the serialized rpc request    Writable rpcResponse;       // null if rpc has error    IOException error;          // exception, null if success    final RPC.RpcKind rpcKind;      // Rpc EngineKind    boolean done;               // true when call is done    private final Object externalHandler;    private Call(RPC.RpcKind rpcKind, Writable param) {      this.rpcKind = rpcKind;      this.rpcRequest = param;      final Integer id = callId.get();      if (id == null) {        this.id = nextCallId();      } else {        callId.set(null);        this.id = id;      }      final Integer rc = retryCount.get();      if (rc == null) {        this.retry = 0;      } else {        this.retry = rc;      }      this.externalHandler = EXTERNAL_CALL_HANDLER.get();    }

我们会发现RPCCall一定有重试机制(retry),有id用于标识call,有Writable类型的request和response作为调用的输入和输出,有IOException,还有一个暂不知道但显然很重要的rpcKind。这在后面需要学习。

Connection

Connection类定义

/** Thread that reads responses and notifies callers.  Each connection owns a   * socket connected to a remote address.  Calls are multiplexed through this   * socket: responses may be delivered out of order. */  private class Connection extends Thread {

一个connection会启动一个线程,建立连接发送请求并不断读取socket直到获得反馈结果。

calls

// currently active calls    private Hashtable<Integer, Call> calls = new Hashtable<Integer, Call>();

如上,Connection中有一个哈希表,维护着Connection上的所有call,故此可知,多个Call是可以复用Connection的,每个call之间有id来区分,复用call可以减少tcp/ip连接建立和断开的花销。

addCall

    /**     * Add a call to this connection's call queue and notify     * a listener; synchronized.     * Returns false if called during shutdown.     * @param call to add     * @return true if the call was added.     */    private synchronized boolean addCall(Call call) {      if (shouldCloseConnection.get())        return false;      calls.put(call.id, call);      notify();      return true;    }

一个RPC调用会通过addCall加入到Connection中。

ConnectionId

Client类持有一个ConnectionId为键,Connection为值的ConcurrentHashMap对象connections:

  private ConcurrentMap<ConnectionId, Connection> connections =      new ConcurrentHashMap<>();

包含着这个Client搜有的RPC连接信息。

一个Connection包含着一个Socket连接,同时还有一个ConnectionId对象,作为连接的标识:

  /**   * This class holds the address and the user ticket. The client connections   * to servers are uniquely identified by <remoteAddress, protocol, ticket>   */  @InterfaceAudience.LimitedPrivate({"HDFS", "MapReduce"})  @InterfaceStability.Evolving  public static class ConnectionId {

ConnectionId包含着一个InetSocketAddress和一个”用户通行证”,不知道这么翻译是否正确。
查看一下,ticket是一个UserGroupInformation类的实例,顾名思义,包含着用户,用户组等信息,确定了权限,可以理解,翻译成通行证也说得过去。

Server

说完Client说Server。

Server.call()

Server的核心方法是call

  /** Called for each call. */  public abstract Writable call(RPC.RpcKind rpcKind, String protocol,      Writable param, long receiveTime) throws Exception;

传入参数有:RPC.RpcKind(是一个枚举类型,设定RPC调用的种类,Writable或者protobuf,如下),protocol,writable对象以及调用接收时间。

  public enum RpcKind {    RPC_BUILTIN ((short) 1),         // Used for built in calls by tests    RPC_WRITABLE ((short) 2),        // Use WritableRpcEngine     RPC_PROTOCOL_BUFFER ((short) 3); // Use ProtobufRpcEngine    final static short MAX_INDEX = RPC_PROTOCOL_BUFFER.value; // used for array size    private final short value;    RpcKind(short val) {      this.value = val;    }   }

Server是一个抽象类,call也是抽象方法,具体实现由具体Server实现决定。

Server.Call

Client有一个子类Call,Server也类似:

  /** A generic call queued for handling. */  public static class Call implements Schedulable,  PrivilegedExceptionAction<Void> {    final int callId;            // the client's call id    final int retryCount;        // the retry count of the call    long timestamp;              // time received when response is null                                 // time served when response is not null    private AtomicInteger responseWaitCount = new AtomicInteger(1);    final RPC.RpcKind rpcKind;    final byte[] clientId;    private final TraceScope traceScope; // the HTrace scope on the server side    private final CallerContext callerContext; // the call context    private boolean deferredResponse = false;    private int priorityLevel;    // the priority level assigned by scheduler, 0 by default

Server也有Connection,因为Server使用NIO,不需要为每一个连接建立一个线程,只需要一个县城就可以接收请求读取socket的数据,这是因为Client需要等待Server的反馈,而Server接收到请求处理事务无需等待,也就没有阻塞问题。这个单一的线程在Server中包装成了一个内部类Listener:

  /** Listens on the socket. Creates jobs for the handler threads*/  private class Listener extends Thread {

Listener不断将请求放入处理队列Call中,然后交给Handler处理,处理则是多线程的,它run方法循环地取出一个Call,调用Server.call方法,然后将结果放入Responder结果队列中。

源代码量很大,没办法也没必要完全看透彻,如果有错误或者因为版本问题有差异多包涵。

原创粉丝点击