Hadoop RPC分析（一） -- Client

来源：互联网发布：彻底清除软件安装痕迹编辑：程序博客网时间：2024/05/22 14:02

[Hadoop RPC调用入口]

在使用Hadoop RPC基本框架中，主要是通过getProxy来获得一个客户端代理对象，通过这个对象来向服务端发送RPC请求。

getProxy有多个重载方法，最终都是调用到了下面这个函数来进行实现

（来自org.apache.hadoop.ipc.RPC）

publicstatic<T> ProtocolProxy<T> getProtocolProxy(Class<T> protocol,

longclientVersion,

InetSocketAddress addr,

UserGroupInformation ticket,

Configuration conf,

SocketFactory factory,

intrpcTimeout,

RetryPolicy connectionRetryPolicy)throwsIOException {

if(UserGroupInformation.isSecurityEnabled()) {

SaslRpcServer.init(conf);

}

returngetProtocolEngine(protocol,conf).getProxy(protocol, clientVersion,

addr, ticket, conf, factory, rpcTimeout, connectionRetryPolicy);

}

而在服务端，通过build方法，来构建一个Server对象

（来自 org.apache.hadoop.ipc.RPC.Builder）

/**

* Build the RPC Server.

*@throwsIOException on error

*@throwsHadoopIllegalArgumentException when mandatory fields are not set

publicServerbuild()throwsIOException, HadoopIllegalArgumentException {

if(this.conf == null) {

thrownewHadoopIllegalArgumentException("conf is not set");

}

if(this.protocol == null) {

thrownewHadoopIllegalArgumentException("protocol is not set");

}

if(this.instance == null) {

thrownewHadoopIllegalArgumentException("instance is not set");

}

returngetProtocolEngine(this.protocol,this.conf).getServer(

this.protocol,this.instance,this.bindAddress,this.port,

this.numHandlers,this.numReaders,this.queueSizePerHandler,

this.verbose,this.conf,this.secretManager,this.portRangeConfig);

}

通过上面的两个入口，分别在客户端和服务端生成了进行远程调用所需要的对象。

上面的getProtocolEngine，是获取一个RPC引擎，默认使用的是WritableRpcEngine（新版本貌似改成了ProtobufRpcEngine？），这里使用WritableRpcEngine来进行源码追踪。

下面简述追踪路径:

客户端：WritableRpcEngine.getProxy() ---> Invoker ---> Client

使用了jdk的动态代理，Invoker实现了InvocationHandler接口，其invoke方法的实现，就是通过调用Client的call方法实现的，代码如下

@Override

publicObjectinvoke(Object proxy, Method method, Object[] args)

throwsThrowable {

longstartTime = 0;

if(LOG.isDebugEnabled()) {

startTime = Time.now();

}

ObjectWritable value = (ObjectWritable)

client.call(RPC.RpcKind.RPC_WRITABLE,newInvocation(method, args),remoteId);

if(LOG.isDebugEnabled()) {

longcallTime = Time.now() - startTime;

LOG.debug("Call: " + method.getName() +" "+ callTime);

}

因此，我们对于客户端的理解，将主要集中在Client类上。

服务端：WritableRpcEngine.getServer() ---> Server

new操作生成了一个Server对象，因此我们对于服务端的理解，将主要集中在Server类上。

[Hadoop RPC客户端：Client]

客户端的思路可以简述为：将调用的方法信息通过网络发送到服务端，并等待服务端的返回。所以本质上，RPC就是对一般的网络访问做了封装，造成了类似本地调用的假象。

这里我们将主要关注客户端的一次RPC是什么样的流程，并希望能找到对应的实现代码。

与Client相关的类主要为下面几个（都是Client的内部类）

Client.Connection -------- 一个Connection对象表示一个和服务端之间的连接通道，它提供了和具体调用业务无关的底层通道信息，作为一个基础工具存在

Client.Call -------- 一个Call表示一次远程过程调用，它里面包含了本次远程过程调用的请求信息，调用结果返回等信息，作为远程过程调用业务存在。

由于实现了底层通道和具体的调用业务无关，多个调用业务可以复用同一个底层通道，在Connection内部会维护多个当前存在的调用业务。

通道本身是业务无关的，客户端和服务端之间是可以存在多条并行的通道的，在Client内部会有一个Connection的线程池。

首先来看Client的属性

（来自org.apache.hadoop.ipc.Client）

/** A counter for generating call IDs. */

privatestaticfinalAtomicIntegercallIdCounter=newAtomicInteger();

privatestaticfinalThreadLocal<Integer>callId=newThreadLocal<Integer>();

privatestaticfinalThreadLocal<Integer>retryCount=newThreadLocal<Integer>();

privateHashtable<ConnectionId, Connection>connections= newHashtable<ConnectionId, Connection>();

privateClass<?extendsWritable>valueClass; // class of call values

privateAtomicBooleanrunning=newAtomicBoolean(true);// if client runs

finalprivateConfigurationconf;

privateSocketFactorysocketFactory; // how to create sockets

privateintrefCount= 1;

privatefinalintconnectionTimeout;

privatefinalbooleanfallbackAllowed;

privatefinalbyte[]clientId;

finalstaticintCONNECTION_CONTEXT_CALL_ID= -3;

可以看到，在Client中是存在多个与服务端的连接对象的。

再看下Connection的属性

（来自 org.apache.hadoop.ipc.Client.Connection）

privateInetSocketAddressserver; // server ip:port

privatefinalConnectionIdremoteId; // connection id

privateAuthMethodauthMethod;// authentication method

privateAuthProtocolauthProtocol;

privateintserviceClass;

privateSaslRpcClientsaslRpcClient;

privateSocketsocket=null; // connected socket

privateDataInputStreamin;

privateDataOutputStreamout;

privateintrpcTimeout;

privateintmaxIdleTime;//connections will be culled if it was idle for

//maxIdleTime msecs

privatefinalRetryPolicyconnectionRetryPolicy;

privateintmaxRetriesOnSocketTimeouts;

privatebooleantcpNoDelay;// if T then disable Nagle's Algorithm

privatebooleandoPing;//do we need to send ping message

privateintpingInterval;// how often sends ping to the server in msecs

privateByteArrayOutputStreampingRequest;// ping message

// currently active calls

privateHashtable<Integer, Call>calls=newHashtable<Integer, Call>();

privateAtomicLonglastActivity=newAtomicLong();// last I/O activity time

privateAtomicBooleanshouldCloseConnection=newAtomicBoolean(); // indicate if the connection is closed

privateIOExceptioncloseException;// close reason

privatefinalObjectsendRpcRequestLock=newObject();

基本上都是建立与服务端的连接所需要的基本配置信息，有一个calls属性，存放的是提交到当前这个连接的请求对象。

Call对象就表示一次远程过程调用业务，因此它含有远程调用业务所需要的参数信息，来看Call的属性

（来自 org.apache.hadoop.ipc.Client.Call）

/**

* Class that represents an RPC call

staticclassCall {

finalintid; // call id

finalintretry; // retry count

finalWritablerpcRequest; // the serialized rpc request

WritablerpcResponse; // null if rpc has error

IOExceptionerror; // exception, null if success

finalRPC.RpcKindrpcKind; // Rpc EngineKind

booleandone; // true when call is done

OK，在结构上了解了类的作用后，就可以来看下客户端的一次远程调用的流程了。只需要研究Client.call即可，代码如下

publicWritablecall(RPC.RpcKind rpcKind, Writable rpcRequest,

ConnectionId remoteId,intserviceClass)throwsIOException {

finalCall call = createCall(rpcKind, rpcRequest);

Connection connection = getConnection(remoteId, call, serviceClass);

try{

connection.sendRpcRequest(call); // send the rpc request

}catch(RejectedExecutionException e) {

thrownewIOException("connection has been closed", e);

}catch(InterruptedException e) {

Thread.currentThread().interrupt();

LOG.warn("interrupted waiting to send rpc request to server", e);

thrownewIOException(e);

}

booleaninterrupted =false;

synchronized(call) {

while(!call.done) {

try{

call.wait(); // wait for the result

}catch(InterruptedException ie) {

// save the fact that we were interrupted

interrupted =true;

}

if(interrupted) {

// set the interrupt flag now that we are done waiting

Thread.currentThread().interrupt();

}

if(call.error!=null) {

if(call.errorinstanceofRemoteException) {

call.error.fillInStackTrace();

throwcall.error;

}else{// local exception

InetSocketAddress address = connection.getRemoteAddress();

throwNetUtils.wrapException(address.getHostName(),

address.getPort(),

NetUtils.getHostname(),

call.error);

}

}else{

returncall.getRpcResponse();

}

执行步骤如下：

1、createCall创建一次远程调用业务对象

2、getConnection获取一个可用的连接对象，在这里会对连接进行初始化，和服务端建立起socket连接，同时把提交的call业务，保存到这个连接对象中

3、connection.sendRpcRequest，执行一次远程过程调用业务操作

4、call.wait等待结果返回

5、call.getRpcResponse返回远程调用结果

整体流程如上面描述的步骤所示，在第二步和第三步中会使用到Connection的相关功能，我们来对Connection做进一步的分析

[Connection]

首先来看getConnection的功能，下面是部分主干流程分支。

/** Get a connection from the pool, or create a new one and add it to the

* pool. Connections to a given ConnectionId are reused. */

privateConnection getConnection(ConnectionId remoteId,

Call call,intserviceClass)throwsIOException {

Connection connection;

do{

synchronized(connections) {

connection =connections.get(remoteId);

if(connection ==null) {

connection =newConnection(remoteId, serviceClass);

connections.put(remoteId, connection);

}

}while(!connection.addCall(call));

connection.setupIOstreams();

returnconnection;

}

从代码行看，获取一个Connection对象，然后把当前远程调用业务加到这个Connection对象提交的calls映射表中，同时建立网络连接，但这里还没有做网络请求发送数据。

privatesynchronizedbooleanaddCall(Call call) {

if(shouldCloseConnection.get())

returnfalse;

calls.put(call.id, call);

notify();

returntrue;

}

注意这里的addCall函数中，有一个notify()的调用。记住Connection本身是继承了Thread类的，本身也是一个独立的线程来运行，但这里的这个notify调用，不是在自身线程调用的。这个是由Client的call引起的调用。在将会看到对应的wait调用存在。 connection.setUpIOstreams做流读写的初始化，不做细究。

上面第三步sendRpcRequest是发送远程调用请求，在这里做的网络请求发送，由于sendRpcRequest的实现涉及到后面的讨论，这里列出其主干代码

publicvoidsendRpcRequest(finalCall call)

throwsInterruptedException, IOException {

synchronized(sendRpcRequestLock) {

Future<?> senderFuture =SEND_PARAMS_EXECUTOR.submit(newRunnable() {

@Override

publicvoidrun() {

synchronized(Connection.this.out) {

byte[] data = d.getData();

inttotalLength = d.getLength();

out.writeInt(totalLength);// Total Length

out.write(data, 0, totalLength);// RpcRequestHeader + RpcRequest

out.flush();

}

});

}

senderFuture.get();

}

这里使用线程池做了网咯数据的发送之后，并没有去同步地等待数据的返回。而在Client.call函数中，是会一直等待call业务的返回。所以，必定有一个地方是会去接受网络返回之后，将call的调用状态设置为完成，这样才能让Client.call函数调用结束。

之前说过Connection本身就是可以作为线程来执行的，这里就需要去看Connection的run方法了。去掉异常分支之后的代码如下

@Override

publicvoidrun() {

while(waitForWork()) {//wait here for work - read or close connection

receiveRpcResponse();

}

close();

}

来看下waitForWork的实现逻辑

privatesynchronizedbooleanwaitForWork() {

if(calls.isEmpty() && ! shouldCloseConnection.get() &&running.get()) {

longtimeout =maxIdleTime-

(Time.now()-lastActivity.get());

if(timeout>0) {

try{

wait(timeout);

}catch(InterruptedException e) {}

}

if(!calls.isEmpty() && !shouldCloseConnection.get() &&running.get()) {

returntrue;

}elseif(shouldCloseConnection.get()) {

returnfalse;

}elseif(calls.isEmpty()) {// idle connection closed or stopped

markClosed(null);

returnfalse;

}else{// get stopped but there are still pending requests

markClosed((IOException)newIOException().initCause(

newInterruptedException()));

returnfalse;

}

注意这里的wait函数，正好对应前面addCall中的notify函数。

至于receiveRpcResponse，它的作用就是通过网络去读取远程过程调用的返回结果，找到对应的callId，然后找到对应的Call对象，设置它的状态。

总结下Connection设计思想：Connection线程自身会一直用wait等待，直到外界有请求到达后触发notify操作，同时更新Connection内部维护的callId和Call对象之间的关系，发送网络请求。

Connnection线程自身在运行的情况下会去读取网络数据，在获取的返回结果数据中，有对应的callId存在。由于是采用异步方式去读取的数据，因此会根据调用的业务的callId来找到对应的Call对象，将其状态置为完成，这样对应的Client.call才能正常结束，否则就会一直等待（不考虑超时）。

1 0