Hadoop2.6.0运行mapreduce之推断(speculative)执行(下)

来源:互联网 发布:暗黑黎明挂机软件 编辑:程序博客网 时间:2024/04/30 23:23

前言

在《Hadoop2.6.0运行mapreduce之推断(speculative)执行(上)》一文中对推断执行技术的背景进行了介绍,并且在Hadoop集群上做了一些测试以验证mapreduce框架的推断执行。最后还从源码分析的角度解读了DefaultSpeculator的初始化和启动过程,其中涉及DefaultSpeculator的实例化、LegacyTaskRuntimeEstimator的实例化及初始化、处理SpeculatorEvent事件的SpeculatorEventDispatcher、DefaultSpeculator收到类型为JOB_CREATE的SpeculatorEvent事件时的处理等内容。经过以上过程,实际只是完成了DefaultSpeculator的初始化,那么DefaultSpeculator是什么时候真正开始进行任务推断的呢?

任务实例启动与推断

当ResourceManager已经为作业分配并启动一个Container用于执行MRAppMaster后,MRAppMaster会获取当前job的所有task,并为每一个task创建一个TaskAttemptImpl对象,此对象代表task的一次运行尝试。当此次尝试已经分配了Container并且启动其task的时候,TaskAttemptImpl将收到TaskAttemptEventType.TA_CONTAINER_LAUNCHED类型的事件,进而对TaskAttemptImpl对象的状态进行迁移(有关状态机的实现,请参阅《Hadoop2.6.0中YARN底层状态机实现分析》一文),TaskAttemptImpl处理TaskAttemptEventType.TA_CONTAINER_LAUNCHED事件的相关迁移实现如下:

     // Transitions from the ASSIGNED state.     .addTransition(TaskAttemptStateInternal.ASSIGNED, TaskAttemptStateInternal.RUNNING,         TaskAttemptEventType.TA_CONTAINER_LAUNCHED,         new LaunchedContainerTransition())

根据《Hadoop2.6.0中YARN底层状态机实现分析》一文的内容,我们知道会最终调用LaunchedContainerTransition的transition方法,完成状态迁移,其实现逻辑中会

调用SpeculatorEvent的构造器,代码如下:

      taskAttempt.eventHandler.handle          (new SpeculatorEvent              (taskAttempt.attemptId, true, taskAttempt.clock.getTime()));

此处调用的SpeculatorEvent的构造器的实现如下:

  public SpeculatorEvent(TaskAttemptId attemptID, boolean flag, long timestamp) {    super(Speculator.EventType.ATTEMPT_START, timestamp);    this.reportedStatus = new TaskAttemptStatus();    this.reportedStatus.id = attemptID;    this.taskID = attemptID.getTaskId();  }

可以看到这里构造的SpeculatorEvent事件的类型是Speculator.EventType.ATTEMPT_START。根据《Hadoop2.6.0运行mapreduce之推断(speculative)执行(上)》一文中对于SpeculatorEventDispatcher的handle方法的介绍,SpeculatorEvent事件最终交由DefaultSpeculator的processSpeculatorEvent方法处理。Speculator.EventType.ATTEMPT_START类型的事件匹配的代码如下:

      case ATTEMPT_START:      {        LOG.info("ATTEMPT_START " + event.getTaskID());        estimator.enrollAttempt            (event.getReportedStatus(), event.getTimestamp());        break;      }
以默认的LegacyTaskRuntimeEstimator为例,这里实际调用了LegacyTaskRuntimeEstimator的父类StartEndTimesBase的enrollAttempt方法,代码如下:

  @Override  public void enrollAttempt(TaskAttemptStatus status, long timestamp) {    startTimes.put(status.id,timestamp);  }
其中startTimes用于缓存TaskAttemptId与TaskAttemptImpl实例启动的时间之间的映射,startTimes的类型定义如下:

  protected final Map<TaskAttemptId, Long> startTimes      = new ConcurrentHashMap<TaskAttemptId, Long>();

调用StartEndTimesBase的enrollAttempt方法的根本意义在于开启Estimator对此任务实例的监控。

任务实例更新与推断

每当任务实例在运行过程中向MRAppMaster汇报信息时,TaskAttemptImpl对象将会收到TaskAttemptEventType.TA_UPDATE类型的事件,此时TaskAttemptImpl的状态机相关的代码如下:

     // Transitions from RUNNING state.     .addTransition(TaskAttemptStateInternal.RUNNING, TaskAttemptStateInternal.RUNNING,         TaskAttemptEventType.TA_UPDATE, new StatusUpdater())     // 省略其它与TaskAttemptEventType.TA_UPDATE无关的状态迁移代码     // Transitions from COMMIT_PENDING state     .addTransition(TaskAttemptStateInternal.COMMIT_PENDING,         TaskAttemptStateInternal.COMMIT_PENDING, TaskAttemptEventType.TA_UPDATE,         new StatusUpdater())
进而实际执行StatusUpdater的transition方法,其transition方法中与任务推断执行有关的代码实现如下:

      // send event to speculator about the reported status      taskAttempt.eventHandler.handle          (new SpeculatorEvent              (taskAttempt.reportedStatus, taskAttempt.clock.getTime()));
这里的SpeculatorEvent的构造器如下:
  public SpeculatorEvent(TaskAttemptStatus reportedStatus, long timestamp) {    super(Speculator.EventType.ATTEMPT_STATUS_UPDATE, timestamp);    this.reportedStatus = reportedStatus;  }
可以看到这里构造的SpeculatorEvent事件的类型是Speculator.EventType.ATTEMPT_STATUS_UPDATE。根据《Hadoop2.6.0运行mapreduce之推断(speculative)执行(一)》一文中对于SpeculatorEventDispatcher的handle方法的介绍,SpeculatorEvent事件最终交由DefaultSpeculator的processSpeculatorEvent方法处理。Speculator.EventType.ATTEMPT_STATUS_UPDATE类型的事件匹配的代码如下:

      case ATTEMPT_STATUS_UPDATE:        statusUpdate(event.getReportedStatus(), event.getTimestamp());        break;

DefaultSpeculator的statusUpdate方法(见代码清单8)主要用于更新正在运行的任务(runningTasks缓存)、正在运行任务实例的历史统计信息(runningTaskAttemptStatistics缓存)并调用estimator的updateAttempt方法更新任务实例的状态信息。

代码清单8 更新任务实例的状态信息

  protected void statusUpdate(TaskAttemptStatus reportedStatus, long timestamp) {    String stateString = reportedStatus.taskState.toString();    TaskAttemptId attemptID = reportedStatus.id;    TaskId taskID = attemptID.getTaskId();    Job job = context.getJob(taskID.getJobId());    if (job == null) {      return;    }    Task task = job.getTask(taskID);    if (task == null) {      return;    }    estimator.updateAttempt(reportedStatus, timestamp);    if (stateString.equals(TaskAttemptState.RUNNING.name())) {      runningTasks.putIfAbsent(taskID, Boolean.TRUE);    } else {      runningTasks.remove(taskID, Boolean.TRUE);      if (!stateString.equals(TaskAttemptState.STARTING.name())) {        runningTaskAttemptStatistics.remove(attemptID);      }    }  }

以默认的estimator的实现LegacyTaskRuntimeEstimator为例,其updateAttempt方法的实现见代码清单9。

代码清单9 更新任务实例的状态信息

  @Override  public void updateAttempt(TaskAttemptStatus status, long timestamp) {    super.updateAttempt(status, timestamp);        TaskAttemptId attemptID = status.id;    TaskId taskID = attemptID.getTaskId();    JobId jobID = taskID.getJobId();    Job job = context.getJob(jobID);    if (job == null) {      return;    }    Task task = job.getTask(taskID);    if (task == null) {      return;    }    TaskAttempt taskAttempt = task.getAttempt(attemptID);    if (taskAttempt == null) {      return;    }    Long boxedStart = startTimes.get(attemptID);    long start = boxedStart == null ? Long.MIN_VALUE : boxedStart;    // We need to do two things.    //  1: If this is a completion, we accumulate statistics in the superclass    //  2: If this is not a completion, we learn more about it.    // This is not a completion, but we're cooking.    //    if (taskAttempt.getState() == TaskAttemptState.RUNNING) {      // See if this task is already in the registry      AtomicLong estimateContainer = attemptRuntimeEstimates.get(taskAttempt);      AtomicLong estimateVarianceContainer          = attemptRuntimeEstimateVariances.get(taskAttempt);      if (estimateContainer == null) {        if (attemptRuntimeEstimates.get(taskAttempt) == null) {          attemptRuntimeEstimates.put(taskAttempt, new AtomicLong());          estimateContainer = attemptRuntimeEstimates.get(taskAttempt);        }      }      if (estimateVarianceContainer == null) {        attemptRuntimeEstimateVariances.putIfAbsent(taskAttempt, new AtomicLong());        estimateVarianceContainer = attemptRuntimeEstimateVariances.get(taskAttempt);      }      long estimate = -1;      long varianceEstimate = -1;      // This code assumes that we'll never consider starting a third      //  speculative task attempt if two are already running for this task      if (start > 0 && timestamp > start) {        estimate = (long) ((timestamp - start) / Math.max(0.0001, status.progress));        varianceEstimate = (long) (estimate * status.progress / 10);      }      if (estimateContainer != null) {        estimateContainer.set(estimate);      }      if (estimateVarianceContainer != null) {        estimateVarianceContainer.set(varianceEstimate);      }    }  }
具体分析代码清单9前,先理解以下定义:
  • timestamp:本次状态更新的时间戳
  • start:TaskAttemptImpl实例启动即分配Container尝试运行Task的开始时间
  • status.progress:TaskAttemptImpl实例运行完成的进度值,是浮点数

因此从上面代码,我们不难看出任务实例运行需要的总时间的估值(estimate)和方差估值(varianceEstimate)的计算公式。

estimate = (timestamp - start)/status.progress

varianceEstimate = (timestamp - start)/10

estimateContainer和estimateVarianceContainer都是原子类型,分别用于保存估值(estimate)和方差估值(varianceEstimate)。


任务实例Container与推断

Container状态发生变化的场景有以下三种:

  • 当MRAppMaster调度任务实例,并为之将要请求Container时;
  • 为任务实例分配Container时;
  • 任务实例的Container分配完成时。
TaskAttemptImpl的状态机中涉及将要为任务实例请求Container的代码如下:

     .addTransition(TaskAttemptStateInternal.NEW, TaskAttemptStateInternal.UNASSIGNED,         TaskAttemptEventType.TA_SCHEDULE, new RequestContainerTransition(false))     .addTransition(TaskAttemptStateInternal.NEW, TaskAttemptStateInternal.UNASSIGNED,         TaskAttemptEventType.TA_RESCHEDULE, new RequestContainerTransition(true))
RequestContainerTransition的transition方法中涉及推断的代码如下:

      // Tell any speculator that we're requesting a container      taskAttempt.eventHandler.handle          (new SpeculatorEvent(taskAttempt.getID().getTaskId(), +1));
TaskAttemptImpl的状态机中涉及为任务实例分配Container的代码如下:
     .addTransition(TaskAttemptStateInternal.UNASSIGNED, TaskAttemptStateInternal.KILLED,         TaskAttemptEventType.TA_KILL, new DeallocateContainerTransition(             TaskAttemptStateInternal.KILLED, true))     .addTransition(TaskAttemptStateInternal.UNASSIGNED, TaskAttemptStateInternal.FAILED,         TaskAttemptEventType.TA_FAILMSG, new DeallocateContainerTransition(             TaskAttemptStateInternal.FAILED, true))     // 省略其它状态迁移代码     // Transitions from the ASSIGNED state.     // <span style="font-family: Arial, Helvetica, sans-serif;">省略其它状态迁移代码</span>     .addTransition(TaskAttemptStateInternal.ASSIGNED, TaskAttemptStateInternal.FAILED,         TaskAttemptEventType.TA_CONTAINER_LAUNCH_FAILED,         new DeallocateContainerTransition(TaskAttemptStateInternal.FAILED, false))
DeallocateContainerTransition的transition方法中涉及推断的代码如下:
      // send event to speculator that we withdraw our container needs, if      //  we're transitioning out of UNASSIGNED      if (withdrawsContainerRequest) {        taskAttempt.eventHandler.handle            (new SpeculatorEvent(taskAttempt.getID().getTaskId(), -1));      }
TaskAttemptImpl的状态机中涉及的任务实例的Container分配完成的代码如下:
     // Transitions from the UNASSIGNED state.     .addTransition(TaskAttemptStateInternal.UNASSIGNED,         TaskAttemptStateInternal.ASSIGNED, TaskAttemptEventType.TA_ASSIGNED,         new ContainerAssignedTransition())

ContainerAssignedTransition的transition方法中涉及推断的代码如下:

      // send event to speculator that our container needs are satisfied      taskAttempt.eventHandler.handle          (new SpeculatorEvent(taskAttempt.getID().getTaskId(), -1));
以上三种状态迁移中都使用了SpeculatorEvent的同一个构造器,代码如下:

  public SpeculatorEvent(TaskId taskID, int containersNeededChange) {    super(Speculator.EventType.TASK_CONTAINER_NEED_UPDATE);    this.taskID = taskID;    this.containersNeededChange = containersNeededChange;  }
可以看到这里构造的SpeculatorEvent事件的类型是Speculator.EventType.TASK_CONTAINER_NEED_UPDATE。根据《Hadoop2.6.0运行mapreduce之推断(speculative)执行(上)》一文中对于SpeculatorEventDispatcher的handle方法的介绍,SpeculatorEvent事件最终交由DefaultSpeculator的processSpeculatorEvent方法处理。Speculator.EventType.TASK_CONTAINER_NEED_UPDATE类型的事件匹配的代码如下:
      case TASK_CONTAINER_NEED_UPDATE:      {        AtomicInteger need = containerNeed(event.getTaskID());        need.addAndGet(event.containersNeededChange());        break;      }
containerNeed方法(见代码清单10)用于获取当前作业的所有map或者reduce任务需要的Container数量。并将当前任务实例需要的资源数量(+1表示需要,-1表示释放)更新到当前作业的所有map或者reduce任务需要的Container数量中。

代码清单10 获取当前作业的所有map或者reduce任务需要的Container数量
  private AtomicInteger containerNeed(TaskId taskID) {    JobId jobID = taskID.getJobId();    TaskType taskType = taskID.getTaskType();    ConcurrentMap<JobId, AtomicInteger> relevantMap        = taskType == TaskType.MAP ? mapContainerNeeds : reduceContainerNeeds;    AtomicInteger result = relevantMap.get(jobID);    if (result == null) {      relevantMap.putIfAbsent(jobID, new AtomicInteger(0));      result = relevantMap.get(jobID);    }    return result;  }

总结

从以上分析可以看出map任务的推断执行主要为:启动任务实例时开启对任务实例的监控;根据任务实例在运行过程中向MRAppMaster汇报信息计算运行总时长的估值和方差估值;当任务实例由于推断执行需要分配新的Container时对任务需要的Container数量进行更新。


后记:个人总结整理的《深入理解Spark:核心思想与源码分析》一书现在已经正式出版上市,目前京东、当当、天猫等网站均有销售,欢迎感兴趣的同学购买。


京东:(现有满150送50活动)http://item.jd.com/11846120.html 

当当:http://product.dangdang.com/23838168.html 



3 0
原创粉丝点击