【Hadoop】RPC Overview

来源：互联网发布：电魂网络怎么样编辑：程序博客网时间：2024/06/05 05:11

Hadoop中不同Process或不同Node之间的通讯方式是RPC，因而实现了一个内部的RPC机制，基于TCP和内部Serialization机制。包org.apache.hadoop.ipc主要提供了两个基础类Client和Server，顾名思义是分别提供给通讯客户端和通讯服务器段使用。

RPC Client和Server类封装所有通讯协议、序列化等底层操作，但具体有哪些远程调用，需要由上层使用者来定义。如下图所示，总体来说，通讯双方首先定义好一个RPC Protocol，也就是一个Interface，声明服务器端实现的远程调用。客户端创建一个dynamic proxy实现Interface，并且绑定RPC包提供的InvocationHandler。每次客户端在proxy上的调用，都会被RPC Client截获，进而发送给服务器端。

下面来简单分析Client和Server的实现：

Client的任务是：

1.1 封装任意的远程调用（打包调用方法名、参数类型、参数值等），然后将其序列化；

1.2. 与Server建立连接，将序列化后的远程调用发送给Server；

1.3 等待Server返回调用结果

Server的任务是：

2.1. 监听端口，接受Client发送的调用，反序列化封装的调用

2.2 根据调用信息调用具体的实现方法；

2.3. 封装远程调用返回（打包返回类型，返回值等），然后将其序列化；

2.4. 将序列化后的远程调用返回发送给Client

1.1由org.apache.hadoop.ipc.RPC.Invoker执行，1.2由org.apache.hadoop.ipc.Client执行，2.1和2.4由抽象类org.apache.hadoop.ipc.Server执行，2.2和2.3由实现类org.apache.hadoop.ipc.RPC.Server执行。

1.1代码分析

Invoker调用Client时，首先封装远程调用，这是由org.apache.hadoop.ipc.RPC.Invocation实现的，Invoker将调用信息输入Invocation的构造方法：

ObjectWritable value = (ObjectWritable)        client.call(new Invocation(method, args), remoteId);

Invocation可以序列化调用信息，如下：

UTF8.writeString(out, methodName);out.writeInt(parameterClasses.length);for (int i = 0; i < parameterClasses.length; i++) {   ObjectWritable.writeObject(out, parameters[i], parameterClasses[i], conf);}

可以看到，每个参数都由ObjectWritable封装，因此其class name都被写入序列化流中。这个序列化方法将在后面真正发送请求前被调用，见1.2.

1.2 代码分析

在Client内部，Invocation被进一步封装成Client.Call，增加了id，调用状态等信息。Client用Client.Connection来抽象Socket连接，同一个连接里可以发送多个Call，这样可以减少连接开销。Call被发送代码如下：

public Writable call(Writable param, ConnectionId remoteId)                         throws InterruptedException, IOException {    Call call = new Call(param);    Connection connection = getConnection(remoteId, call);    connection.sendParam(call);                 // send the parameter

在sendParam方法中，会将原来的封装调用Invocation进行序列化：

d = new DataOutputBuffer();d.writeInt(call.id);call.param.write(d);

1.3 代码分析

在调用sendParam方法发送调用前，需要先另起一个线程监听Server返回结果。Connection本身就是一个线程类，在getConnection()中start()。

2.1 代码分析

在Server内部，被反序列化的Invocation同样被封装成Server.Call，增加了id，response等信息。然后，Server.call被放入一个call队列：

Writable param = ReflectionUtils.newInstance(paramClass, conf);//read paramparam.readFields(dis);        Call call = new Call(id, param, this);callQueue.put(call);              // queue the call; maybe blocked here

2.2&2.3 代码分析
在RPC.Server的call方法中，调用Server端真正的实现方法，然后封装返回值：

Method method = protocol.getMethod(call.getMethodName(), call.getParameterClasses());Object value = method.invoke(instance, call.getParameters());return new ObjectWritable(method.getReturnType(), value);

2.4 代码分析

Server.Responder的doRespond()方法最终将带有返回值的Call放入response队列：

void doRespond(Call call) throws IOException {      synchronized (call.connection.responseQueue) {        call.connection.responseQueue.addLast(call);        if (call.connection.responseQueue.size() == 1) {          processResponse(call.connection.responseQueue, true);        }      }    }

以上分析的Client和Server其实是stub，在client中的远程调用(例如rpc.foo())如何统一进入Client，进而被发送给Server的呢？答案是通过Dymamic Proxy。InvocationHandler就是RPC.Invoker。RPC中统一创建dymamic proxy代码如下：

VersionedProtocol proxy =        (VersionedProtocol) Proxy.newProxyInstance(            protocol.getClassLoader(), new Class[] { protocol },            new Invoker(protocol, addr, ticket, conf, factory, rpcTimeout));

举例，client-datanode的RPC是这样实现的：

1. 定义一个RPC Protocol，包含所有可远程调用的方法:

public interface ClientDatanodeProtocol extends VersionedProtocol

2. client端创建一个dynamic proxy:

return (ClientDatanodeProtocol)RPC.getProxy(ClientDatanodeProtocol.class,        ClientDatanodeProtocol.versionID, addr, conf, NetUtils        .getDefaultSocketFactory(conf), socketTimeout);

3. client端通过proxy调用方法：

primary = createClientDatanodeProtocolProxy(primaryNode, conf,            last.getBlock(), last.getBlockToken(), socketTimeout);        Block newBlock = primary.getBlockInfo(last.getBlock());

4. server端实现protocol:

public class DataNode extends Configured     implements InterDatanodeProtocol, ClientDatanodeProtocol