YarnRpc例子-ResourceTracker协议分析

来源:互联网 发布:linux开发板推荐 编辑:程序博客网 时间:2024/06/06 19:35

   ResourceManager和NodeManager之间的通信协议是ResourceTracker.。

   

        服务器端和客户端实现都满足,包结构和类名都符合上文所说的规范,ResourceTrackerPBServiceImpl实现了PB服务的BlockingInterface,实际上代理了ResourceTrackerService(真正实现类)的方法,

       第一章

    . 首先我们介绍Hadoop对PB参数和返回值的Java封装机制,客户端要从java类中释放它的PB原型,然后调用proxy相应方法。服务器端则是要封装它,调用真正的

ResourceTrackerService实现类进行相关操作。

     比如抽象类RegisterNodeManagerRequest,它对应的PB Message是

message RegisterNodeManagerRequestProto {  optional NodeIdProto node_id = 1;  optional int32 http_port = 3;  optional ResourceProto resource = 4;  optional string nm_version = 5;  repeated NMContainerStatusProto container_statuses = 6;  repeated ApplicationIdProto runningApplications = 7;}
字段完全对应,只是抽象类的get/set方法是抽象的,真正封装PB消息的是它的实现类RegisterNodeManagerRequestPBImpl。
    下面我们来分析这个实现类,除了上面必要的字段外,还有三个重要字段,proto、builder、viaProto。viaProto是Bolean
字段为true说明通过proto返回字段信息,否则通过builder。构造函数因此也分为两个。
     封装类要get一个字段首先检查对应成员变量,不为null返回,否则检查proto或builder是否有这个字段,没有返回null,有
则从proto消息转换到成员字段再返回。
     接着我们分析getProto方法,此方法用于proto和java pojo类之间的转换,首先调用mergeLocalToProto方法,此方法如果viaProto
为true会先调用maybeInitBuilder方法,此方法如果builder为null会创建,不为null但viaProto为true也会重新创建,最后把viaProto
置为false。然后调用mergeLocalToBuilder方法,就是把java pojo类非null的成员变量转换为Proto形式(调用成员变量.getProto
方法)后设置到builder中,最后调用builder.build()构建proto,把viaProto置为true,然后再返回这个Proto。
     下面分析成员变量set方法,先调用maybeInitBuilder方法,如果viaProto为true或者builder为null,则创建builder(为了重新
创建Proto,重置builder,viaProto为false表示Proto还在builder过程中,新数据在builder中),并把viaProto置为false,如果
set方法的参数为null,则清空builder中相应字段,否则设置成员变量的值即可。builder中的属性值只有在调用getProto时才会导入到
proto。
     第二章
    然后我们来分析,客户端的类,
     2.1 ResourceTrackerPBClientImpl比较简单,构造函数注册ResourceTrackerPB。class和Protobuf
RpcEngine的对应关系。用Rpc工厂类获取Proxy对象。剩下是几个协议方法,把参数java类获取他们封装在内部的Proto,调用proxy
对象的对应的方法,并封装返回的proto成java bean。
      客户端从开始调用起是NodeManager,他持有一个NodeStatusUpdater对象,NodeStatusUpdater类持有一个resourceTracker对象。
NodeStatusUpdater对象在resyncWithRM方法中会调用rebootNodeStatusUpdaterAndRegisterWithRM方法,该方法中会调用
resourceTracker对象的registerNodeManager方法。
     至于ResourceTracker的另一个rpc方法调用是在NodeManager的service.start()中,由于它继承自compositService所以他还包含
其他service,比如NodeStatusUpdater服务,NodeManager服务在service.init中会在自身下级服务加入NodeStatusUpdater服务,然后
service.start()中调用NodeStatusUpdater的service.start().此方法进一步调用startStatusUpdater方法,此方法会启动一个线程
,run方法中会调用resourceTracker.nodeHeartbeat方法。
     2.2 接下来就是NodeStatusUpdater类的resourceTracker对象的创建问题,他来自getRMClient方法,里面调用ServerRMProxy.
createRMProxy(conf, ResourceTracker.class)方法,里面调用同名其他方法。此方法先创建retryPolicy
    RetryPolicy接口(方法shouldRetry(Exception e, int retries(重试数), int failovers(故障备援转移次数), 
boolean isIdempotentOrAtMostOnce(方法是否是幂等性))返回RetryAction)的各种针对产生的异常的重试策略,RetryAction有
失败,重试,故障恢复后重试三种,并有重试时间字段。RetryAction的不同取决于Exception的不同。这里是RetryPolicy
实现类FailoverOnNetworkExceptionRetry重试时间以指数(*2)增长。
   //retries当前第几次重试 failovers已恢复次数
  @Override  public RetryAction shouldRetry(Exception e, int retries,      int failovers, boolean isIdempotentOrAtMostOnce) throws Exception {
    //恢复次数超过阀值抛出异常    if (failovers >= maxFailovers) {      return new RetryAction(RetryAction.RetryDecision.FAIL, 0,          "failovers (" + failovers + ") exceeded maximum allowed ("          + maxFailovers + ")");    }
    //重试次数超过阀值跑出异常    if (retries - failovers > maxRetries) {      return new RetryAction(RetryAction.RetryDecision.FAIL, 0, "retries ("          + retries + ") exceeded maximum allowed (" + maxRetries + ")");    }    //连不上都应该恢复重试    if (e instanceof ConnectException ||        e instanceof NoRouteToHostException ||        e instanceof UnknownHostException ||        e instanceof StandbyException ||        e instanceof ConnectTimeoutException ||        isWrappedStandbyException(e)) {      return new RetryAction(RetryAction.RetryDecision.FAILOVER_AND_RETRY,          getFailoverOrRetrySleepTime(failovers));
    //指定错误都应该重试    } else if (e instanceof RetriableException        || getWrappedRetriableException(e) != null) {      // RetriableException or RetriableException wrapped       return new RetryAction(RetryAction.RetryDecision.RETRY,            getFailoverOrRetrySleepTime(retries));
    //其他socket或IOException,除了RemoteException(IOException子类),方法为幂等性的就重试,否则失败    } else if (e instanceof SocketException        || (e instanceof IOException && !(e instanceof RemoteException))) {      if (isIdempotentOrAtMostOnce) {        return RetryAction.FAILOVER_AND_RETRY;      } else {        return new RetryAction(RetryAction.RetryDecision.FAIL, 0,            "the invoked method is not idempotent, and unable to determine "                + "whether it was invoked");      }
    其他Exception或服务端错误(RemoteException)则用fallbackPolicy,立刻失败!    } else {        return fallbackPolicy.shouldRetry(e, retries, failovers,            isIdempotentOrAtMostOnce);    }  }}
    如果支持HA则创建ConfiguredRMFailoverProxyProvider(支持恢复重试的proxy提供者),此类最重要的是getProxy()方法,获取真正
的proxy,最后调用的是RMProxy.getProxy方法,
   
@Privatestatic <T> T getProxy(final Configuration conf,    final Class<T> protocol, final InetSocketAddress rmAddress)    throws IOException {  return UserGroupInformation.getCurrentUser().doAs(    new PrivilegedAction<T>() {      @Override      public T run() {        return (T) YarnRPC.create(conf).getProxy(protocol, rmAddress, conf);      }    });}
  正好调用YarnRpc API。
  然后再调用RetryProxy.create方法,最后创建动态代理的方法是:
   
/** * Create a proxy for an interface of implementations of that interface using * the given {@link FailoverProxyProvider} and the same retry policy for each * method in the interface. *  * @param iface the interface that the retry will implement * @param proxyProvider provides implementation instances whose methods should be retried * @param retryPolicy the policy for retrying or failing over method call failures * @return the retry proxy */public static <T> Object create(Class<T> iface,    FailoverProxyProvider<T> proxyProvider, RetryPolicy retryPolicy) {
 //动态代理  return Proxy.newProxyInstance(      proxyProvider.getInterface().getClassLoader(),      new Class<?>[] { iface },
 // ConfiguredRMFailoverProxyProvider      new RetryInvocationHandler<T>(proxyProvider, retryPolicy)      );}
   
 2.3 接下来我们看看RetryInvocationHandler的构造函数:
protected RetryInvocationHandler(FailoverProxyProvider<T> proxyProvider,    RetryPolicy defaultPolicy,    Map<String, RetryPolicy> methodNameToPolicyMap) {  this.proxyProvider = proxyProvider;  this.defaultPolicy = defaultPolicy;  this.methodNameToPolicyMap = methodNameToPolicyMap;
//返回包含真正Proxy的proxyInfo  this.currentProxy = proxyProvider.getProxy();}

还有invoke方法
Overridepublic Object invoke(Object proxy, Method method, Object[] args)  throws Throwable {
 //缓存的重试策略  RetryPolicy policy = methodNameToPolicyMap.get(method.getName());  if (policy == null) {    policy = defaultPolicy;  }    // The number of times this method invocation has been failed over.  int invocationFailoverCount = 0;
 //proxy是否是Proxy类的实例而且它的InvocationHandler是RpcInvocationHandler  final boolean isRpc = isRpcInvocation(currentProxy.proxy);  final int callId = isRpc? Client.nextCallId(): RpcConstants.INVALID_CALL_ID;  int retries = 0;
 //包含多次rpc重试  while (true) {    // The number of times this invocation handler has ever been failed over,    // before this method invocation attempt. Used to prevent concurrent    // failed method invocations from triggering multiple failover attempts.    long invocationAttemptFailoverCount;    synchronized (proxyProvider) {      invocationAttemptFailoverCount = proxyProviderFailoverCount;    }    if (isRpc) {
//检查两个参数是否是无效值,而且原来的callId要为空      Client.setCallIdAndRetryCount(callId, retries);    }    try {
//用真正proxy来执行此方法。      Object ret = invokeMethod(method, args);      hasMadeASuccessfulCall = true;      return ret;    } catch (Exception e) {
//如果出错,就看逻辑是否重试      if (Thread.currentThread().isInterrupted()) {        // If interrupted, do not retry.        throw e;      }
//从方法的注释看方法是幂等的或者最多一次的      boolean isIdempotentOrAtMostOnce = proxyProvider.getInterface()          .getMethod(method.getName(), method.getParameterTypes())          .isAnnotationPresent(Idempotent.class);      if (!isIdempotentOrAtMostOnce) {        isIdempotentOrAtMostOnce = proxyProvider.getInterface()            .getMethod(method.getName(), method.getParameterTypes())            .isAnnotationPresent(AtMostOnce.class);      }
//传入retries次数,failover次数,该方法如上面分析,获得RetryAction。      RetryAction action = policy.shouldRetry(e, retries++,          invocationFailoverCount, isIdempotentOrAtMostOnce);      if (action.action == RetryAction.RetryDecision.FAIL) {
        //抛出失败原因        if (action.reason != null) {          LOG.warn("Exception while invoking " + currentProxy.proxy.getClass()              + "." + method.getName() + " over " + currentProxy.proxyInfo              + ". Not retrying because " + action.reason, e);        }        throw e;      } else { // retry or failover        // avoid logging the failover if this is the first call on this        // proxy object, and we successfully achieve the failover without        // any flip-flopping
//第一次失败重试没日志        boolean worthLogging =           !(invocationFailoverCount == 0 && !hasMadeASuccessfulCall);        worthLogging |= LOG.isDebugEnabled();
//根据条件不同或者是否开启debug模式打印不同日志        if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY &&            worthLogging) {          String msg = "Exception while invoking " + method.getName()              + " of class " + currentProxy.proxy.getClass().getSimpleName()              + " over " + currentProxy.proxyInfo;          if (invocationFailoverCount > 0) {            msg += " after " + invocationFailoverCount + " fail over attempts";           }          msg += ". Trying to fail over " + formatSleepMessage(action.delayMillis);          LOG.info(msg, e);
//action为retry或者FAILOVER_AND_RETRY的第一次恢复而且开启的debug模式,打印以下日志。        } else {          if(LOG.isDebugEnabled()) {            LOG.debug("Exception while invoking " + method.getName()                + " of class " + currentProxy.proxy.getClass().getSimpleName()                + " over " + currentProxy.proxyInfo + ". Retrying "                + formatSleepMessage(action.delayMillis), e);          }        }        //睡眠重试策略的间隔        if (action.delayMillis > 0) {          Thread.sleep(action.delayMillis);        }                if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) {          // Make sure that concurrent failed method invocations only cause a          // single actual fail over.          synchronized (proxyProvider) {
//防止别的地方也同时进行恢复            if (invocationAttemptFailoverCount == proxyProviderFailoverCount) {
              //把ResourceManager的id换为下一个HA RM列表的id              proxyProvider.performFailover(currentProxy.proxy);              proxyProviderFailoverCount++;            } else {              LOG.warn("A failover has occurred since the start of this method"                  + " invocation attempt.");            }
           //获取对应新的ID RM Address的Proxy            currentProxy = proxyProvider.getProxy();          }          invocationFailoverCount++;        }      }    }  }}
       然后我们看看proxyProvider.getProxy()方法
      
final InetSocketAddress rmAddress = rmProxy.getRMAddress(conf, protocol);
最后调用
conf.getSocketAddr(  YarnConfiguration.RM_RESOURCE_TRACKER_ADDRESS,  YarnConfiguration.DEFAULT_RM_RESOURCE_TRACKER_ADDRESS,  YarnConfiguration.DEFAULT_RM_RESOURCE_TRACKER_PORT);

    会去获取当前RM ID,然后再去配置文件获取当前ID的Address

         让我们看看failover的方法(ConfiguredRMFailoverProxyProvider
@Overridepublic synchronized void performFailover(T currentProxy) {
 //换成新的id下标  currentProxyIndex = (currentProxyIndex + 1) % rmServiceIds.length;
  //设置当前resourceManager的id,  conf.set(YarnConfiguration.RM_HA_ID, rmServiceIds[currentProxyIndex]);  LOG.info("Failing over to " + rmServiceIds[currentProxyIndex]);}
第三章 服务端部分代码
   
    服务端代码在ResourceManager,他有一个ResourceTrackerService类成员变量,该类既是协议的实现类,又是服
务器端的启动代码,resourceTrackerService它是ResourceManager组合服务的一个子服务,会被调用start和init方法
,init方法是读取配置文件的配置,start方法如下:
   
@Overrideprotected void serviceStart() throws Exception {  super.serviceStart();  // ResourceTrackerServer authenticates NodeManager via Kerberos if  // security is enabled, so no secretManager.  Configuration conf = getConfig();
//使用YarnRpc类  YarnRPC rpc = YarnRPC.create(conf);  this.server =    rpc.getServer(ResourceTracker.class, this, resourceTrackerAddress,        conf, null,        conf.getInt(YarnConfiguration.RM_RESOURCE_TRACKER_CLIENT_THREAD_COUNT,             YarnConfiguration.DEFAULT_RM_RESOURCE_TRACKER_CLIENT_THREAD_COUNT));    // Enable service authorization?
  //如果支持认证,则加入或刷新安全认证的配置。  if (conf.getBoolean(      CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHORIZATION,       false)) {    InputStream inputStream =        this.rmContext.getConfigurationProvider()            .getConfigurationInputStream(conf,                YarnConfiguration.HADOOP_POLICY_CONFIGURATION_FILE);    if (inputStream != null) {      conf.addResource(inputStream);    }    refreshServiceAcls(conf, RMPolicyProvider.getInstance());  }  this.server.start();  conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST,      YarnConfiguration.RM_RESOURCE_TRACKER_ADDRESS,      YarnConfiguration.DEFAULT_RM_RESOURCE_TRACKER_ADDRESS,                         server.getListenerAddress());}
    


0 0
原创粉丝点击