YarnRpc例子-ResourceTracker协议分析

来源：互联网发布：linux开发板推荐编辑：程序博客网时间：2024/06/06 19:35

ResourceManager和NodeManager之间的通信协议是ResourceTracker.。

服务器端和客户端实现都满足，包结构和类名都符合上文所说的规范，ResourceTrackerPBServiceImpl实现了PB服务的BlockingInterface，实际上代理了ResourceTrackerService（真正实现类）的方法，

第一章

. 首先我们介绍Hadoop对PB参数和返回值的Java封装机制，客户端要从java类中释放它的PB原型，然后调用proxy相应方法。服务器端则是要封装它，调用真正的

ResourceTrackerService实现类进行相关操作。

比如抽象类RegisterNodeManagerRequest，它对应的PB Message是

message RegisterNodeManagerRequestProto {  optional NodeIdProto node_id = 1;  optional int32 http_port = 3;  optional ResourceProto resource = 4;  optional string nm_version = 5;  repeated NMContainerStatusProto container_statuses = 6;  repeated ApplicationIdProto runningApplications = 7;}

字段完全对应，只是抽象类的get/set方法是抽象的，真正封装PB消息的是它的实现类RegisterNodeManagerRequestPBImpl。

    下面我们来分析这个实现类，除了上面必要的字段外，还有三个重要字段，proto、builder、viaProto。viaProto是Bolean

字段为true说明通过proto返回字段信息，否则通过builder。构造函数因此也分为两个。

     封装类要get一个字段首先检查对应成员变量，不为null返回，否则检查proto或builder是否有这个字段，没有返回null，有

则从proto消息转换到成员字段再返回。

     接着我们分析getProto方法，此方法用于proto和java pojo类之间的转换，首先调用mergeLocalToProto方法，此方法如果viaProto

为true会先调用maybeInitBuilder方法，此方法如果builder为null会创建，不为null但viaProto为true也会重新创建，最后把viaProto

置为false。然后调用mergeLocalToBuilder方法，就是把java pojo类非null的成员变量转换为Proto形式（调用成员变量.getProto

方法）后设置到builder中,最后调用builder.build()构建proto，把viaProto置为true，然后再返回这个Proto。

     下面分析成员变量set方法，先调用maybeInitBuilder方法，如果viaProto为true或者builder为null，则创建builder（为了重新

创建Proto，重置builder,viaProto为false表示Proto还在builder过程中，新数据在builder中），并把viaProto置为false，如果

set方法的参数为null，则清空builder中相应字段，否则设置成员变量的值即可。builder中的属性值只有在调用getProto时才会导入到

proto。

     第二章

    然后我们来分析，客户端的类，

     2.1 ResourceTrackerPBClientImpl比较简单，构造函数注册ResourceTrackerPB。class和Protobuf

RpcEngine的对应关系。用Rpc工厂类获取Proxy对象。剩下是几个协议方法，把参数java类获取他们封装在内部的Proto，调用proxy

对象的对应的方法，并封装返回的proto成java bean。

      客户端从开始调用起是NodeManager,他持有一个NodeStatusUpdater对象，NodeStatusUpdater类持有一个resourceTracker对象。

NodeStatusUpdater对象在resyncWithRM方法中会调用rebootNodeStatusUpdaterAndRegisterWithRM方法，该方法中会调用
resourceTracker对象的registerNodeManager方法。
     至于ResourceTracker的另一个rpc方法调用是在NodeManager的service.start()中，由于它继承自compositService所以他还包含
其他service,比如NodeStatusUpdater服务，NodeManager服务在service.init中会在自身下级服务加入NodeStatusUpdater服务，然后
在service.start()中调用NodeStatusUpdater的service.start().此方法进一步调用startStatusUpdater方法，此方法会启动一个线程
，run方法中会调用resourceTracker.nodeHeartbeat方法。
     2.2 接下来就是NodeStatusUpdater类的resourceTracker对象的创建问题，他来自getRMClient方法，里面调用ServerRMProxy.
createRMProxy(conf, ResourceTracker.class)方法，里面调用同名其他方法。此方法先创建retryPolicy
    RetryPolicy接口（方法shouldRetry（Exception e, int retries(重试数), int failovers（故障备援转移次数）, 
boolean isIdempotentOrAtMostOnce（方法是否是幂等性））返回RetryAction）的各种针对产生的异常的重试策略，RetryAction有
失败，重试，故障恢复后重试三种，并有重试时间字段。RetryAction的不同取决于Exception的不同。这里是RetryPolicy
实现类FailoverOnNetworkExceptionRetry重试时间以指数（*2）增长。
   //retries当前第几次重试 failovers已恢复次数  @Override  public RetryAction shouldRetry(Exception e, int retries,      int failovers, boolean isIdempotentOrAtMostOnce) throws Exception {
    //恢复次数超过阀值抛出异常    if (failovers >= maxFailovers) {      return new RetryAction(RetryAction.RetryDecision.FAIL, 0,          "failovers (" + failovers + ") exceeded maximum allowed ("          + maxFailovers + ")");    }
    //重试次数超过阀值跑出异常    if (retries - failovers > maxRetries) {      return new RetryAction(RetryAction.RetryDecision.FAIL, 0, "retries ("          + retries + ") exceeded maximum allowed (" + maxRetries + ")");    }    //连不上都应该恢复重试    if (e instanceof ConnectException ||        e instanceof NoRouteToHostException ||        e instanceof UnknownHostException ||        e instanceof StandbyException ||        e instanceof ConnectTimeoutException ||        isWrappedStandbyException(e)) {      return new RetryAction(RetryAction.RetryDecision.FAILOVER_AND_RETRY,          getFailoverOrRetrySleepTime(failovers));
    //指定错误都应该重试    } else if (e instanceof RetriableException        || getWrappedRetriableException(e) != null) {      // RetriableException or RetriableException wrapped       return new RetryAction(RetryAction.RetryDecision.RETRY,            getFailoverOrRetrySleepTime(retries));
    //其他socket或IOException，除了RemoteException（IOException子类）,方法为幂等性的就重试，否则失败    } else if (e instanceof SocketException        || (e instanceof IOException && !(e instanceof RemoteException))) {      if (isIdempotentOrAtMostOnce) {        return RetryAction.FAILOVER_AND_RETRY;      } else {        return new RetryAction(RetryAction.RetryDecision.FAIL, 0,            "the invoked method is not idempotent, and unable to determine "                + "whether it was invoked");      }
    其他Exception或服务端错误(RemoteException)则用fallbackPolicy，立刻失败！    } else {        return fallbackPolicy.shouldRetry(e, retries, failovers,            isIdempotentOrAtMostOnce);    }  }}
    如果支持HA则创建ConfiguredRMFailoverProxyProvider（支持恢复重试的proxy提供者）,此类最重要的是getProxy（）方法，获取真正
的proxy，最后调用的是RMProxy.getProxy方法，
   @Privatestatic <T> T getProxy(final Configuration conf,    final Class<T> protocol, final InetSocketAddress rmAddress)    throws IOException {  return UserGroupInformation.getCurrentUser().doAs(    new PrivilegedAction<T>() {      @Override      public T run() {        return (T) YarnRPC.create(conf).getProxy(protocol, rmAddress, conf);      }    });}
  正好调用YarnRpc API。
  然后再调用RetryProxy.create方法，最后创建动态代理的方法是：
   /** * Create a proxy for an interface of implementations of that interface using * the given {@link FailoverProxyProvider} and the same retry policy for each * method in the interface. *  * @param iface the interface that the retry will implement * @param proxyProvider provides implementation instances whose methods should be retried * @param retryPolicy the policy for retrying or failing over method call failures * @return the retry proxy */public static <T> Object create(Class<T> iface,    FailoverProxyProvider<T> proxyProvider, RetryPolicy retryPolicy) {
 //动态代理  return Proxy.newProxyInstance(      proxyProvider.getInterface().getClassLoader(),      new Class<?>[] { iface },
 // ConfiguredRMFailoverProxyProvider      new RetryInvocationHandler<T>(proxyProvider, retryPolicy)      );}
   
 2.3 接下来我们看看RetryInvocationHandler的构造函数：
protected RetryInvocationHandler(FailoverProxyProvider<T> proxyProvider,    RetryPolicy defaultPolicy,    Map<String, RetryPolicy> methodNameToPolicyMap) {  this.proxyProvider = proxyProvider;  this.defaultPolicy = defaultPolicy;  this.methodNameToPolicyMap = methodNameToPolicyMap;
//返回包含真正Proxy的proxyInfo  this.currentProxy = proxyProvider.getProxy();}

还有invoke方法
Overridepublic Object invoke(Object proxy, Method method, Object[] args)  throws Throwable {
 //缓存的重试策略  RetryPolicy policy = methodNameToPolicyMap.get(method.getName());  if (policy == null) {    policy = defaultPolicy;  }    // The number of times this method invocation has been failed over.  int invocationFailoverCount = 0;
 //proxy是否是Proxy类的实例而且它的InvocationHandler是RpcInvocationHandler  final boolean isRpc = isRpcInvocation(currentProxy.proxy);  final int callId = isRpc? Client.nextCallId(): RpcConstants.INVALID_CALL_ID;  int retries = 0;
 //包含多次rpc重试  while (true) {    // The number of times this invocation handler has ever been failed over,    // before this method invocation attempt. Used to prevent concurrent    // failed method invocations from triggering multiple failover attempts.    long invocationAttemptFailoverCount;    synchronized (proxyProvider) {      invocationAttemptFailoverCount = proxyProviderFailoverCount;    }    if (isRpc) {
//检查两个参数是否是无效值，而且原来的callId要为空      Client.setCallIdAndRetryCount(callId, retries);    }    try {
//用真正proxy来执行此方法。      Object ret = invokeMethod(method, args);      hasMadeASuccessfulCall = true;      return ret;    } catch (Exception e) {
//如果出错，就看逻辑是否重试      if (Thread.currentThread().isInterrupted()) {        // If interrupted, do not retry.        throw e;      }
//从方法的注释看方法是幂等的或者最多一次的      boolean isIdempotentOrAtMostOnce = proxyProvider.getInterface()          .getMethod(method.getName(), method.getParameterTypes())          .isAnnotationPresent(Idempotent.class);      if (!isIdempotentOrAtMostOnce) {        isIdempotentOrAtMostOnce = proxyProvider.getInterface()            .getMethod(method.getName(), method.getParameterTypes())            .isAnnotationPresent(AtMostOnce.class);      }
//传入retries次数,failover次数，该方法如上面分析，获得RetryAction。      RetryAction action = policy.shouldRetry(e, retries++,          invocationFailoverCount, isIdempotentOrAtMostOnce);      if (action.action == RetryAction.RetryDecision.FAIL) {
        //抛出失败原因        if (action.reason != null) {          LOG.warn("Exception while invoking " + currentProxy.proxy.getClass()              + "." + method.getName() + " over " + currentProxy.proxyInfo              + ". Not retrying because " + action.reason, e);        }        throw e;      } else { // retry or failover        // avoid logging the failover if this is the first call on this        // proxy object, and we successfully achieve the failover without        // any flip-flopping
//第一次失败重试没日志        boolean worthLogging =           !(invocationFailoverCount == 0 && !hasMadeASuccessfulCall);        worthLogging |= LOG.isDebugEnabled();
//根据条件不同或者是否开启debug模式打印不同日志        if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY &&            worthLogging) {          String msg = "Exception while invoking " + method.getName()              + " of class " + currentProxy.proxy.getClass().getSimpleName()              + " over " + currentProxy.proxyInfo;          if (invocationFailoverCount > 0) {            msg += " after " + invocationFailoverCount + " fail over attempts";           }          msg += ". Trying to fail over " + formatSleepMessage(action.delayMillis);          LOG.info(msg, e);
//action为retry或者FAILOVER_AND_RETRY的第一次恢复而且开启的debug模式，打印以下日志。        } else {          if(LOG.isDebugEnabled()) {            LOG.debug("Exception while invoking " + method.getName()                + " of class " + currentProxy.proxy.getClass().getSimpleName()                + " over " + currentProxy.proxyInfo + ". Retrying "                + formatSleepMessage(action.delayMillis), e);          }        }        //睡眠重试策略的间隔        if (action.delayMillis > 0) {          Thread.sleep(action.delayMillis);        }                if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) {          // Make sure that concurrent failed method invocations only cause a          // single actual fail over.          synchronized (proxyProvider) {
//防止别的地方也同时进行恢复            if (invocationAttemptFailoverCount == proxyProviderFailoverCount) {
              //把ResourceManager的id换为下一个HA RM列表的id              proxyProvider.performFailover(currentProxy.proxy);              proxyProviderFailoverCount++;            } else {              LOG.warn("A failover has occurred since the start of this method"                  + " invocation attempt.");            }
           //获取对应新的ID RM Address的Proxy            currentProxy = proxyProvider.getProxy();          }          invocationFailoverCount++;        }      }    }  }}
       然后我们看看proxyProvider.getProxy()方法
      final InetSocketAddress rmAddress = rmProxy.getRMAddress(conf, protocol);
最后调用
conf.getSocketAddr(  YarnConfiguration.RM_RESOURCE_TRACKER_ADDRESS,  YarnConfiguration.DEFAULT_RM_RESOURCE_TRACKER_ADDRESS,  YarnConfiguration.DEFAULT_RM_RESOURCE_TRACKER_PORT);

    会去获取当前RM ID，然后再去配置文件获取当前ID的Address

         让我们看看failover的方法（ConfiguredRMFailoverProxyProvider）@Overridepublic synchronized void performFailover(T currentProxy) {
 //换成新的id下标  currentProxyIndex = (currentProxyIndex + 1) % rmServiceIds.length;
  //设置当前resourceManager的id，  conf.set(YarnConfiguration.RM_HA_ID, rmServiceIds[currentProxyIndex]);  LOG.info("Failing over to " + rmServiceIds[currentProxyIndex]);}

第三章 服务端部分代码

    服务端代码在ResourceManager，他有一个ResourceTrackerService类成员变量，该类既是协议的实现类，又是服

务器端的启动代码，resourceTrackerService它是ResourceManager组合服务的一个子服务，会被调用start和init方法

，init方法是读取配置文件的配置，start方法如下：

   @Overrideprotected void serviceStart() throws Exception {  super.serviceStart();  // ResourceTrackerServer authenticates NodeManager via Kerberos if  // security is enabled, so no secretManager.  Configuration conf = getConfig();
//使用YarnRpc类  YarnRPC rpc = YarnRPC.create(conf);  this.server =    rpc.getServer(ResourceTracker.class, this, resourceTrackerAddress,        conf, null,        conf.getInt(YarnConfiguration.RM_RESOURCE_TRACKER_CLIENT_THREAD_COUNT,             YarnConfiguration.DEFAULT_RM_RESOURCE_TRACKER_CLIENT_THREAD_COUNT));    // Enable service authorization?
  //如果支持认证，则加入或刷新安全认证的配置。  if (conf.getBoolean(      CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION,       false)) {    InputStream inputStream =        this.rmContext.getConfigurationProvider()            .getConfigurationInputStream(conf,                YarnConfiguration.HADOOP_POLICY_CONFIGURATION_FILE);    if (inputStream != null) {      conf.addResource(inputStream);    }    refreshServiceAcls(conf, RMPolicyProvider.getInstance());  }  this.server.start();  conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST,      YarnConfiguration.RM_RESOURCE_TRACKER_ADDRESS,      YarnConfiguration.DEFAULT_RM_RESOURCE_TRACKER_ADDRESS,                         server.getListenerAddress());}

0 0