Hadoop之TaskTraker分析

来源：互联网发布：为什么程序员都黑php 编辑：程序博客网时间：2024/05/19 16:23

TaskTracker的工作职责之前已经和大家提过，主要负责维护，申请和监控Task，通过heartbeat和JobTracker进行通信。

TaskTracker的init过程：

1.读取配置文件，解析参数

2.将TaskTraker上原有的用户local files删除并新建新的dir和file

3. Map<TaskAttemptID, TaskInProgress> tasks = new HashMap<TaskAttemptID, TaskInProgress>(); 清除map

4. this.runningTasks = new LinkedHashMap<TaskAttemptID, TaskInProgress>();记录task的链表
this.runningJobs = new TreeMap<JobID, RunningJob>();记录job的id信息

5.初始化JVMManager：

  mapJvmManager = new JvmManagerForType(tracker.getMaxCurrentMapTasks(),         true, tracker);    reduceJvmManager = new JvmManagerForType(tracker.getMaxCurrentReduceTasks(),        false, tracker);

6.初始化RPC，获取JobTracker client用于heartbeat通信；

7.new一个后台线程用于监听map完成的事件

      this.mapEventsFetcher = new MapEventsFetcherThread();    mapEventsFetcher.setDaemon(true);    mapEventsFetcher.setName(                             "Map-events fetcher for all reduce tasks " + "on " +                              taskTrackerName);    mapEventsFetcher.start();

后台线程的run方法如下：

 while (running) {        try {          List <FetchStatus> fList = null;          synchronized (runningJobs) {            while (((fList = reducesInShuffle()).size()) == 0) {              try {                runningJobs.wait();              } catch (InterruptedException e) {                LOG.info("Shutting down: " + this.getName());                return;              }            }          }          // now fetch all the map task events for all the reduce tasks          // possibly belonging to different jobs          boolean fetchAgain = false; //flag signifying whether we want to fetch                                      //immediately again.          for (FetchStatus f : fList) {            long currentTime = System.currentTimeMillis();            try {              //the method below will return true when we have not               //fetched all available events yet              if (f.fetchMapCompletionEvents(currentTime)) {                fetchAgain = true;              }            } catch (Exception e) {              LOG.warn(                       "Ignoring exception that fetch for map completion" +                       " events threw for " + f.jobId + " threw: " +                       StringUtils.stringifyException(e));             }            if (!running) {              break;            }          }          synchronized (waitingOn) {            try {              if (!fetchAgain) {                waitingOn.wait(heartbeatInterval);              }            } catch (InterruptedException ie) {              LOG.info("Shutting down: " + this.getName());              return;            }          }        } catch (Exception e) {          LOG.info("Ignoring exception "  + e.getMessage());        }      }    }

8.initializeMemoryManagement，初始化每个TrackTask的内存设置

9.new一个Map和Reducer的Launcher后台线程

   mapLauncher = new TaskLauncher(TaskType.MAP, maxMapSlots);    reduceLauncher = new TaskLauncher(TaskType.REDUCE, maxReduceSlots);    mapLauncher.start();    reduceLauncher.start();

用于后面创建子JVM来执行map、reduce task

看一下

TaskLauncher的run方法： //before preparing the job localize       //all the archives      TaskAttemptID taskid = t.getTaskID();      final LocalDirAllocator lDirAlloc = new LocalDirAllocator("mapred.local.dir");      //simply get the location of the workDir and pass it to the child. The      //child will do the actual dir creation      final File workDir =      new File(new Path(localdirs[rand.nextInt(localdirs.length)],           TaskTracker.getTaskWorkDir(t.getUser(), taskid.getJobID().toString(),           taskid.toString(),          t.isTaskCleanupTask())).toString());            String user = tip.getUGI().getUserName();            // Set up the child task's configuration. After this call, no localization      // of files should happen in the TaskTracker's process space. Any changes to      // the conf object after this will NOT be reflected to the child.      // setupChildTaskConfiguration(lDirAlloc);      if (!prepare()) {        return;      }            // Accumulates class paths for child.      List<String> classPaths = getClassPaths(conf, workDir,                                              taskDistributedCacheManager);      long logSize = TaskLog.getTaskLogLength(conf);            //  Build exec child JVM args.      Vector<String> vargs = getVMArgs(taskid, workDir, classPaths, logSize);            tracker.addToMemoryManager(t.getTaskID(), t.isMapTask(), conf);      // set memory limit using ulimit if feasible and necessary ...      String setup = getVMSetupCmd();      // Set up the redirection of the task's stdout and stderr streams      File[] logFiles = prepareLogFiles(taskid, t.isTaskCleanupTask());      File stdout = logFiles[0];      File stderr = logFiles[1];      tracker.getTaskTrackerInstrumentation().reportTaskLaunch(taskid, stdout,                 stderr);            Map<String, String> env = new HashMap<String, String>();      errorInfo = getVMEnvironment(errorInfo, user, workDir, conf, env, taskid,                                   logSize);            // flatten the env as a set of export commands      List <String> setupCmds = new ArrayList<String>();      for(Entry<String, String> entry : env.entrySet()) {        StringBuffer sb = new StringBuffer();        sb.append("export ");        sb.append(entry.getKey());        sb.append("=\"");        sb.append(entry.getValue());        sb.append("\"");        setupCmds.add(sb.toString());      }      setupCmds.add(setup);            launchJvmAndWait(setupCmds, vargs, stdout, stderr, logSize, workDir);      tracker.getTaskTrackerInstrumentation().reportTaskEnd(t.getTaskID());      if (exitCodeSet) {        if (!killed && exitCode != 0) {          if (exitCode == 65) {            tracker.getTaskTrackerInstrumentation().taskFailedPing(t.getTaskID());          }          throw new IOException("Task process exit with nonzero status of " +              exitCode + ".");        }      }    }

run方法为当前task new一个child JVM，为其设置文件路径，上下文环境，JVM启动参数和启动命令等信息，然后调用TaskControll方法启动新的JVM执行对应的Task工作。

各个类关系图如下所示：

最后以TaskController的launchTask截至

10.然后开始 startHealthMonitor(this.fConf);

再来看看TaskLauncher的run方法，就是不停的循环去获取TaskTracker中新的task，然后调用startNewTask方法

 if (this.taskStatus.getRunState() == TaskStatus.State.UNASSIGNED ||          this.taskStatus.getRunState() == TaskStatus.State.FAILED_UNCLEAN ||          this.taskStatus.getRunState() == TaskStatus.State.KILLED_UNCLEAN) {        localizeTask(task);        if (this.taskStatus.getRunState() == TaskStatus.State.UNASSIGNED) {          this.taskStatus.setRunState(TaskStatus.State.RUNNING);        }        setTaskRunner(task.createRunner(TaskTracker.this, this, rjob));        this.runner.start();        long now = System.currentTimeMillis();        this.taskStatus.setStartTime(now);        this.lastProgressReport = now;

TaskTracker的run方法：通过维护心跳和JobTracker通信，以获取、杀掉新的Task，重点看一下heartBeat通信过程：

 synchronized (this) {      askForNewTask =         ((status.countOccupiedMapSlots() < maxMapSlots ||           status.countOccupiedReduceSlots() < maxReduceSlots) &&          acceptNewTasks);       localMinSpaceStart = minSpaceStart;    }    if (askForNewTask) {      askForNewTask = enoughFreeSpace(localMinSpaceStart);      long freeDiskSpace = getFreeSpace();      long totVmem = getTotalVirtualMemoryOnTT();      long totPmem = getTotalPhysicalMemoryOnTT();      long availableVmem = getAvailableVirtualMemoryOnTT();      long availablePmem = getAvailablePhysicalMemoryOnTT();      long cumuCpuTime = getCumulativeCpuTimeOnTT();      long cpuFreq = getCpuFrequencyOnTT();      int numCpu = getNumProcessorsOnTT();      float cpuUsage = getCpuUsageOnTT();      status.getResourceStatus().setAvailableSpace(freeDiskSpace);      status.getResourceStatus().setTotalVirtualMemory(totVmem);      status.getResourceStatus().setTotalPhysicalMemory(totPmem);      status.getResourceStatus().setMapSlotMemorySizeOnTT(          mapSlotMemorySizeOnTT);      status.getResourceStatus().setReduceSlotMemorySizeOnTT(          reduceSlotSizeMemoryOnTT);      status.getResourceStatus().setAvailableVirtualMemory(availableVmem);       status.getResourceStatus().setAvailablePhysicalMemory(availablePmem);      status.getResourceStatus().setCumulativeCpuTime(cumuCpuTime);      status.getResourceStatus().setCpuFrequency(cpuFreq);      status.getResourceStatus().setNumProcessors(numCpu);      status.getResourceStatus().setCpuUsage(cpuUsage);    }    //add node health information        TaskTrackerHealthStatus healthStatus = status.getHealthStatus();    synchronized (this) {      if (healthChecker != null) {        healthChecker.setHealthStatus(healthStatus);      } else {        healthStatus.setNodeHealthy(true);        healthStatus.setLastReported(0L);        healthStatus.setHealthReport("");      }    }    //    // Xmit the heartbeat    //    HeartbeatResponse heartbeatResponse = jobClient.heartbeat(status,                                                               justStarted,                                                              justInited,                                                              askForNewTask,                                                               heartbeatResponseId);

该方法主要将TaskTracker上的各种性能参数信息反馈给JobTraker，调用其heartbeat方法然后解析返回的结果，下篇详细分析heartBeat机制