hadoop YARN分布式程序的编写

来源：互联网发布：淘宝卖家正常发货时间编辑：程序博客网时间：2024/05/17 20:09

转自某大牛的译文：hadoop 0.23 YARN分布式程序的编写（Hadoop MapReduce Next Generation - Writing YARN Applications）感谢大牛的翻译！！

目的

这个文档从比较高的层面上描述了如何编写一个YARN应用

概念和流程

首先说的概念是“Application Submission Client”他负责将“Application”提交到YARN的Resource Manager.客户端通过ClientRMProtocol协议与ResourceManager联系，如果需要Client会通过ClientRPProtocol::getNewApplication来获取新的ApplicationId，然后通过ClientRMProtocol::submitApplication将应用提交运行。作为ClientRMProtocol::submitApplication调用的一部分，客户端需要足够的信息给ResourceManager来运行应用的第一个container也就是Applicationmaster.你需要提供如下一些信息：你的应用程序运行时所需要的local file/jars,执行时所运行的命令（包括必要的命令参数），Unix环境变量设置（可选的）等等。实际上你需要为ApplicationMaster提供Unix进程的描述信息。

YARN的ResourceManager会在一个获得的container上启动ApplicationMaster。ApplicationMaster然后通过AMRMProtocol协议与ResourceManager通讯，首先ApplicationMaster需要将自身注册到ResouceManager。ApplicationMaster为了完成交给他的任务，他会通过AMRMProtocol::allocate来申请containers。如果获得了container，ApplicationMaster会通过ContainerManager::startContainer和NodeManager联系，来为任务启动一个container。作为启动container的一部分，ApplicationMaster需要指定ContainerLaunchContext，ContainerLaunchContext和ApplicationSubmissionContext相似，包括了一些启动时需要的信息，诸如：命令行命令、环境变量等。一旦任务完成，ApplicationMaster会通过AMRMProtocol::finishApplicationMaster来通知ResourceManager任务完成。

与此同时，client可以通过查询ResourceManager来获取application的状态信息，或者如果ApplicationMaster支持也可以直接从ApplicationMaster查询信息。如果需要，client可以通过ClientRMProtocol::forceKillApplication来kill掉application。

接口

你可能关心的接口包括以下这些：

ClientRMProtocol – Client <–> ResourceManager
这是client和ResourceManager通讯来启动一个新的application（这个application是ApplicationMaster等）的协议，可以通过这个协议查询或kill application。例如：a job-client将使用这个协议。
AMRMProtocol – ApplicationMaster <–>ResourceManager
这个协议用于ApplicationManager向ResourceManager注册和注销自己，同时包括从Scheduler申请资源来完成任务。
ContainerManager - ApplicationMaster <–>NodeManager
这个协议用于ApplicationMaster和NodeManager来开始或停止一个container，或者获取container的状态更新信息。

写一个简单的YARN应用

写一个简单的Client

第一步是client连接到ResourceManager或者更具体一点说，连接到ResourceManager的ApplicationManager（AM）接口

ClientRMProtocol applicationsManager;      YarnConfiguration yarnConf = new YarnConfiguration(conf);     InetSocketAddress rmAddress =          NetUtils.createSocketAddr(yarnConf.get(             YarnConfiguration.RM_ADDRESS,             YarnConfiguration.DEFAULT_RM_ADDRESS));                  LOG.info("Connecting to ResourceManager at " + rmAddress);     configuration appsManagerServerConf = new Configuration(conf);     appsManagerServerConf.setClass(         YarnConfiguration.YARN_SECURITY_INFO,         ClientRMSecurityInfo.class, SecurityInfo.class);     applicationsManager = ((ClientRMProtocol) rpc.getProxy(         ClientRMProtocol.class, rmAddress, appsManagerServerConf));

一旦AM的handler获得后，client需要从ResourceManager获取一个ApplicationId

GetNewApplicationRequest request =          Records.newRecord(GetNewApplicationRequest.class);                   GetNewApplicationResponse response =          applicationsManager.getNewApplication(request);     LOG.info("Got new ApplicationId=" + response.getApplicationId());

从AM返回的response也包含一些整个集群的信息，诸如minimum/maximum资源容量等。有了这些信息才能够适当的设置container的一些参数使得ApplicationMaster能够在这个container上运行。可以参考GetNewApplicationResponse获得更多细节信息。
client的一个关键工作就是设置ApplicationSubmissionContext，使得ResourceManager能够启动ApplicationMaster。client需要设置下面的一些context：

Application Info：id和name
队列（Queue)，优先级信息（Priority info）：application将被提交到的队列，以及application被设定的优先级
User：提交application的用户
ContainerLaunchContext：ApplicationMaster被启动的container的一些信息。ContainerLaunchContext正如前面所描述的，定义了启动ApplicationMaster需要的信息包括local resource（binary,jars,files等等），security tokens，environment setting（CLASSPATH等）和被执行的command。

// Create a new ApplicationSubmissionContext    ApplicationSubmissionContext appContext =         Records.newRecord(ApplicationSubmissionContext.class);    // set the ApplicationId     appContext.setApplicationId(appId);    // set the application name    appContext.setApplicationName(appName);    // Create a new container launch context for the AM's container    ContainerLaunchContext amContainer =         Records.newRecord(ContainerLaunchContext.class);    // Define the local resources required     Map<String, LocalResource> localResources =         new HashMap<String, LocalResource>();    // Lets assume the jar we need for our ApplicationMaster is available in     // HDFS at a certain known path to us and we want to make it available to    // the ApplicationMaster in the launched container     Path jarPath; // <- known path to jar file      FileStatus jarStatus = fs.getFileStatus(jarPath);    LocalResource amJarRsrc = Records.newRecord(LocalResource.class);    // Set the type of resource - file or archive    // archives are untarred at the destination by the framework    amJarRsrc.setType(LocalResourceType.FILE);    // Set visibility of the resource     // Setting to most private option i.e. this file will only     // be visible to this instance of the running application    amJarRsrc.setVisibility(LocalResourceVisibility.APPLICATION);              // Set the location of resource to be copied over into the     // working directory    amJarRsrc.setResource(ConverterUtils.getYarnUrlFromPath(jarPath));     // Set timestamp and length of file so that the framework     // can do basic sanity checks for the local resource     // after it has been copied over to ensure it is the same     // resource the client intended to use with the application    amJarRsrc.setTimestamp(jarStatus.getModificationTime());    amJarRsrc.setSize(jarStatus.getLen());    // The framework will create a symlink called AppMaster.jar in the     // working directory that will be linked back to the actual file.     // The ApplicationMaster, if needs to reference the jar file, would     // need to use the symlink filename.      localResources.put("AppMaster.jar",  amJarRsrc);        // Set the local resources into the launch context        amContainer.setLocalResources(localResources);    // Set up the environment needed for the launch context    Map<String, String> env = new HashMap<String, String>();        // For example, we could setup the classpath needed.    // Assuming our classes or jars are available as local resources in the    // working directory from which the command will be run, we need to append    // "." to the path.     // By default, all the hadoop specific classpaths will already be available     // in $CLASSPATH, so we should be careful not to overwrite it.       String classPathEnv = "$CLASSPATH:./*:";        env.put("CLASSPATH", classPathEnv);    amContainer.setEnvironment(env);    // Construct the command to be executed on the launched container     String command =         "${JAVA_HOME}" + /bin/java" +        " MyAppMaster" +         " arg1 arg2 arg3" +         " 1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout" +        " 2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr";                         List<String> commands = new ArrayList<String>();    commands.add(command);    // add additional commands if needed                    // Set the command array into the container spec    amContainer.setCommands(commands);    // Define the resource requirements for the container    // For now, YARN only supports memory so we set the memory     // requirements.     // If the process takes more than its allocated memory, it will     // be killed by the framework.     // Memory being requested for should be less than max capability     // of the cluster and all asks should be a multiple of the min capability.     Resource capability = Records.newRecord(Resource.class);    capability.setMemory(amMemory);    amContainer.setResource(capability);    // Set the container launch content into the ApplicationSubmissionContext    appContext.setAMContainerSpec(amContainer);

进程信息设置完成后，client已经最终准备好提交任务到AM

// Create the request to send to the ApplicationsManager       SubmitApplicationRequest appRequest =           Records.newRecord(SubmitApplicationRequest.class);      appRequest.setApplicationSubmissionContext(appContext);      // Submit the application to the ApplicationsManager      // Ignore the response as either a valid response object is returned on       // success or an exception thrown to denote the failure      applicationsManager.submitApplication(appRequest);

这时，ResourceManager将接受这个application并且在后台根据设定的参数获取container并且在container上启动ApplicationManager
有多种办法client能够track progress的状态，client可以通过ClientRMProtocol::getApplicationReport和ResourceManager通讯来获取application的report

GetApplicationReportRequest reportRequest =       Records.newRecord(GetApplicationReportRequest.class);  reportRequest.setApplicationId(appId);  GetApplicationReportResponse reportResponse =       applicationsManager.getApplicationReport(reportRequest);  ApplicationReport report = reportResponse.getApplicationReport();

从ResourceManager获取的ApplicationReport包含下面这些信息：

一般性application information：ApplicationId，application被提交到的queue，提交application的user，application开始的时间
ApplicationMaster的详细信息：ApplicationMaster运行的主机，提供给client连接的rpc端口，以及client与ApplicationManager通讯需要的一个token
Application tracking information：如果application支持某种类型的progress tracking，他可以设置监控的url，client可以通过ApplicationReport::getTrackingUrl来获取url并通过这个url来监控progress的状态
ApplicationStatus：ResourceManager能够看到的一些application的状态，可以通过Application::getYarnApplicationState得到是否YarnApplicationState被设置为FINISHED，client可以通过ApplicationReport::getFinalApplicationStatus来check application的success/failure。在failures时，ApplicationReport::getDiagnostics可以提供一些关于failure的一些信息。

如果ApplicationMaster支持，client可以直接通过host:rpcport（通过ApplicationReport获得的）来从ApplicationMaster获取progress的更新信息，如果可以获得，client也可以通过tracking url来获取状态信息。

在特定条件下，如果应用花费了太长时间或者其他因素，client可能希望kill掉application。ClientRMProtocol支持forceKillApplication调用通过ResourceManager给Application发送一个kill消息。ApplicationMaster也可以通过设计为client提供abort调用，client通过rpc方式来调用。

KillApplicationRequest killRequest =       Records.newRecord(KillApplicationRequest.class);                  killRequest.setApplicationId(appId);  applicationsManager.forceKillApplication(killRequest);

编写ApplicationMaster

ApplicationMaster是job的实际持有者，他由client通过ResouceManager启动，并被提供了job运行需要的必要的信息和资源，负责task的监督管理和相关工作的完成。

ApplicationMaster在多用户环境下可能与其他container运行在相同的物理主机上，因此假设他使用哪个预先配置的端口来监听都是不合理的。

当ApplicationMaster启动时，他可以通过环境变量来获得一些参数，诸如：ApplicationMaster所在container的ContainerId，application提交的时间，以及运行ApplicationMaster的NodeManger host的细节信息，这些信息可以查阅ApplicationConstants来获得参数名称。

所有与ResouceManager的交互需要一个ApplicationAttemptId(如果任务失败可能会有多次尝试)，ApplicationAttemptId能够通过ApplicationMaster的containerId来获得，有相应的API可以完成从环境变量获得的字符串到对象的转换。

Map<String, String> envs = System.getenv();  String containerIdString =       envs.get(ApplicationConstants.AM_CONTAINER_ID_ENV);  if (containerIdString == null) {    // container id should always be set in the env by the framework     throw new IllegalArgumentException(        "ContainerId not set in the environment");  }  ContainerId containerId = ConverterUtils.toContainerId(containerIdString);  ApplicationAttemptId appAttemptID = containerId.getApplicationAttemptId();

ApplicationMaster初始化完成后，可以通过ARMRMProtocol::registerApplicationMaster来向ResourceManager注册，ApplicationMaster通过ResouceManager的Scheduler接口来进行通讯。

// Connect to the Scheduler of the ResourceManager.   YarnConfiguration yarnConf = new YarnConfiguration(conf);  InetSocketAddress rmAddress =       NetUtils.createSocketAddr(yarnConf.get(          YarnConfiguration.RM_SCHEDULER_ADDRESS,          YarnConfiguration.DEFAULT_RM_SCHEDULER_ADDRESS));             LOG.info("Connecting to ResourceManager at " + rmAddress);  AMRMProtocol resourceManager =       (AMRMProtocol) rpc.getProxy(AMRMProtocol.class, rmAddress, conf);  // Register the AM with the RM  // Set the required info into the registration request:   // ApplicationAttemptId,   // host on which the app master is running  // rpc port on which the app master accepts requests from the client   // tracking url for the client to track app master progress  RegisterApplicationMasterRequest appMasterRequest =       Records.newRecord(RegisterApplicationMasterRequest.class);  appMasterRequest.setApplicationAttemptId(appAttemptID);       appMasterRequest.setHost(appMasterHostname);  appMasterRequest.setRpcPort(appMasterRpcPort);  appMasterRequest.setTrackingUrl(appMasterTrackingUrl);  // The registration response is useful as it provides information about the   // cluster.   // Similar to the GetNewApplicationResponse in the client, it provides   // information about the min/mx resource capabilities of the cluster that   // would be needed by the ApplicationMaster when requesting for containers.  RegisterApplicationMasterResponse response =       resourceManager.registerApplicationMaster(appMasterRequest);

ApplicationMaster需要发出心跳通知ResouceManager，告知ApplicationMaster is alive and still running。在ResouceManager端设置的超时时间可以通过YarnConfiguration.RM_AM_EXPIRY_INTERVAL_MS来访问，缺省值定义下YarnConfiguration.DEFAULT_RM_AM_EXPIRY_INTERVAL_MS。对ResouceManager的AMRMProtocol::allocate调用可以算所是heatbeats，它还包含发送进程进展的相关信息。
依据任务的需求，ApplicationMaster可以申请一系列containers来运行任务。ApplicationMaster使用ResouceRequest类来指定container的specifications。具体包括：

hostname：如果container需要host在特定的rack或主机上，需要设定这个参数，其中“*”代表container可以分配在任何主机上。
Resouce capability：目前，YARN只支持Memeory base的资源需求分配，因此request只需要定义application需要多少memory。Memory的值以MB为单位，并且必须小于机群的max容量，并且是min容量的整数倍。内存是以物理内存使用来设定限制的。
Priority：当申请到一些container时，ApplicationMaster可以给不同组的container设置不同的优先级，举例来说，对于Map-Reduce任务来说，ApplicationMaster可以给map container指定比较高的优先级，而给reduce container指定比较低的优先级。

定义了container requirement以后，ApplicationMaster需要构建AllocateRequest发送到

ResourceManager。AllocateRequest包括：
Requested containers：container specification和ApplicationMaster从ResourceManager处申请的container的数量
Released containers：在有些情况下，ApplicationMaster可能申请了过多的container，它可以返还这些不用的container给ResourceManager，这些container可以分配给其他的应用使用。
ResponseId：在allocate调用时保持在response当中的response id
Progress update information：ApplicationMaster可以发送进程更新信息给ResourceManager，取值为0-1

List<ResourceRequest> requestedContainers;  List<ContainerId> releasedContainers      AllocateRequest req = Records.newRecord(AllocateRequest.class);  // The response id set in the request will be sent back in   // the response so that the ApplicationMaster can   // match it to its original ask and act appropriately.  req.setResponseId(rmRequestID);  // Set ApplicationAttemptId   req.setApplicationAttemptId(appAttemptID);  // Add the list of containers being asked for   req.addAllAsks(requestedContainers);  // If the ApplicationMaster has no need for certain   // containers due to over-allocation or for any other  // reason, it can release them back to the ResourceManager  req.addAllReleases(releasedContainers);  // Assuming the ApplicationMaster can track its progress  req.setProgress(currentProgress);  AllocateResponse allocateResponse = resourceManager.allocate(req);

ResourceManager返回的AllocateResponse通过AMResponse对象包含了下面这些信息：
Reboot flag：如果ApplicationMaster失去了和ResourceManager同步，则需要reboot
Allocated containers：分配给ApplicationMaster的containers
Headroom：整个机群的余量资源，基于这个信息和自己的资源需求，ApplicationMaster可以智能的决定调整子任务的优先度利用已经获得的containers，或者如果没有可获得的resource时，能够快速的脱困。
Completed containers：当ApplicationMaster启动了一个获得的container后，当这个container完成后，它将接收到来自ResourceManager的更新信息。ApplicationMaster能够查看完成的container的状态信息，采取适当的行动，比如如果任务失败则重试执行。

一个需要注意的事情是，container不一定会立刻分配给ApplicationMaster。这不意味着ApplicationMaster需要持续不断的请求没有获得的containers，一旦allocate request被发送了，在考虑到机群容量、优先级和scheduling policy的条件下，ApplicationMaster最终将获得container。ApplicationMaster只有在它估计需要的container数量增加时，才会再次发送request的请求。

// Get AMResponse from AllocateResponse   AMResponse amResp = allocateResponse.getAMResponse();                         // Retrieve list of allocated containers from the response   // and on each allocated container, lets assume we are launching   // the same job.  List<Container> allocatedContainers = amResp.getAllocatedContainers();  for (Container allocatedContainer : allocatedContainers) {    LOG.info("Launching shell command on a new container."        + ", containerId=" + allocatedContainer.getId()        + ", containerNode=" + allocatedContainer.getNodeId().getHost()         + ":" + allocatedContainer.getNodeId().getPort()        + ", containerNodeURI=" + allocatedContainer.getNodeHttpAddress()        + ", containerState" + allocatedContainer.getState()        + ", containerResourceMemory"          + allocatedContainer.getResource().getMemory());    // Launch and start the container on a separate thread to keep the main     // thread unblocked as all containers may not be allocated at one go.    LaunchContainerRunnable runnableLaunchContainer =         new LaunchContainerRunnable(allocatedContainer);    Thread launchThread = new Thread(runnableLaunchContainer);            launchThreads.add(launchThread);    launchThread.start();  }  // Check what the current available resources in the cluster are  Resource availableResources = amResp.getAvailableResources();  // Based on this information, an ApplicationMaster can make appropriate   // decisions  // Check the completed containers  // Let's assume we are keeping a count of total completed containers,   // containers that failed and ones that completed successfully.                       List<ContainerStatus> completedContainers =       amResp.getCompletedContainersStatuses();  for (ContainerStatus containerStatus : completedContainers) {                                   LOG.info("Got container status for containerID= "         + containerStatus.getContainerId()        + ", state=" + containerStatus.getState()             + ", exitStatus=" + containerStatus.getExitStatus()         + ", diagnostics=" + containerStatus.getDiagnostics());    int exitStatus = containerStatus.getExitStatus();    if (0 != exitStatus) {      // container failed       // -100 is a special case where the container       // was aborted/pre-empted for some reason       if (-100 != exitStatus) {        // application job on container returned a non-zero exit code        // counts as completed         numCompletedContainers.incrementAndGet();        numFailedContainers.incrementAndGet();                                                              }      else {         // something else bad happened         // app job did not complete for some reason         // we should re-try as the container was lost for some reason        // decrementing the requested count so that we ask for an        // additional one in the next allocate call.                  numRequestedContainers.decrementAndGet();        // we do not need to release the container as that has already         // been done by the ResourceManager/NodeManager.       }      }      else {         // nothing to do         // container completed successfully         numCompletedContainers.incrementAndGet();        numSuccessfulContainers.incrementAndGet();      }    }  }

当container分配给ApplicationMaster以后，ApplicationMaster需要follow Client相似的过程来为最终的task设置ContainerLaunchContext，使得task能够在获取到的container上运行。一旦ContainerLaunchContext被定义了，ApplicationMaster能够与ContainerManager进行通信启动这个allocated container。

//Assuming an allocated Container obtained from AMResponse   Container container;     // Connect to ContainerManager on the allocated container   String cmIpPortStr = container.getNodeId().getHost() + ":"       + container.getNodeId().getPort();                InetSocketAddress cmAddress = NetUtils.createSocketAddr(cmIpPortStr);                 ContainerManager cm =       (ContainerManager)rpc.getProxy(ContainerManager.class, cmAddress, conf);       // Now we setup a ContainerLaunchContext    ContainerLaunchContext ctx =       Records.newRecord(ContainerLaunchContext.class);  ctx.setContainerId(container.getId());  ctx.setResource(container.getResource());  try {    ctx.setUser(UserGroupInformation.getCurrentUser().getShortUserName());  } catch (IOException e) {    LOG.info(        "Getting current user failed when trying to launch the container",        + e.getMessage());  }  // Set the environment   Map<String, String> unixEnv;  // Setup the required env.   // Please note that the launched container does not inherit   // the environment of the ApplicationMaster so all the   // necessary environment settings will need to be re-setup   // for this allocated container.        ctx.setEnvironment(unixEnv);  // Set the local resources   Map<String, LocalResource> localResources =       new HashMap<String, LocalResource>();  // Again, the local resources from the ApplicationMaster is not copied over   // by default to the allocated container. Thus, it is the responsibility         // of the ApplicationMaster to setup all the necessary local resources         // needed by the job that will be executed on the allocated container.   // Assume that we are executing a shell script on the allocated container   // and the shell script's location in the filesystem is known to us.   Path shellScriptPath;   LocalResource shellRsrc = Records.newRecord(LocalResource.class);  shellRsrc.setType(LocalResourceType.FILE);  shellRsrc.setVisibility(LocalResourceVisibility.APPLICATION);            shellRsrc.setResource(      ConverterUtils.getYarnUrlFromURI(new URI(shellScriptPath)));  shellRsrc.setTimestamp(shellScriptPathTimestamp);  shellRsrc.setSize(shellScriptPathLen);  localResources.put("MyExecShell.sh", shellRsrc);  ctx.setLocalResources(localResources);                        // Set the necessary command to execute on the allocated container   String command = "/bin/sh ./MyExecShell.sh"      + " 1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout"      + " 2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr";  List<String> commands = new ArrayList<String>();  commands.add(command);  ctx.setCommands(commands);  // Send the start request to the ContainerManager  StartContainerRequest startReq = Records.newRecord(StartContainerRequest.class);  startReq.setContainerLaunchContext(ctx);  cm.startContainer(startReq);

正如前面所描述的，通过AMRMProtocol::allocate调用的返回信息，ApplicationMaster能够得到完成情况的更新信息，他也能够通过查询ContainerManager的状态来主动监测launched container。

GetContainerStatusRequest statusReq =        Records.newRecord(GetContainerStatusRequest.class);   statusReq.setContainerId(container.getId());   GetContainerStatusResponse statusResp = cm.getContainerStatus(statusReq);   LOG.info("Container Status"       + ", id=" + container.getId()       + ", status=" + statusResp.getStatus());

FAQ

1、我如何能够将我的application的jars部署到需要这些jars的全部的节点上？

你可以利用LocalResource将需要的resource添加进去。这将使YARN分发这些资源到ApplicationMaster node。如果资源类型是tgz，zip或者jar，你可以让YARN去unzip他们。你需要做的只是将unziped的folder添加到你的classpath中。举例来说，当你像下面这样创建你的application request：

File packageFile = new File(packagePath);  Url packageUrl = ConverterUtils.getYarnUrlFromPath(      FileContext.getFileContext.makeQualified(new Path(packagePath)));  packageResource.setResource(packageUrl);  packageResource.setSize(packageFile.length());  packageResource.setTimestamp(packageFile.lastModified());  packageResource.setType(LocalResourceType.ARCHIVE);  packageResource.setVisibility(LocalResourceVisibility.APPLICATION);  resource.setMemory(memory)  containerCtx.setResource(resource)  containerCtx.setCommands(ImmutableList.of(      "java -cp './package/*' some.class.to.Run "      + "1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout "      + "2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr"))  containerCtx.setLocalResources(      Collections.singletonMap("package", packageResource))  appCtx.setApplicationId(appId)  appCtx.setUser(user.getShortUserName)  appCtx.setAMContainerSpec(containerCtx)  request.setApplicationSubmissionContext(appCtx)  applicationsManager.submitApplication(request)

正如你所看到的，setLocalResource命令通过一个map建立了names和resources的映射，name成为一个sym链接进入你应用的cwd，因此通过使用“./package*.”，你就可以使用这些设施了（artifacts）
注意：Java‘s classpath参数是非常sensitive的，一定要保证你使用的语法正确。
一旦你的package被部署到你的ApplicationMaster，无论何时ApplicationMaster启动一个新的container，你只要follow这个相同的过程（假设你希望resource被发送到你的container）。这段代码是完全相同的，你只要确信你给你的ApplicationMaster package path（无论是HDFS或者local），这样它可以随着container的ctx一起发送resource URL。

2、我如何获得ApplicationMaster的ApplicationAttemptId？

通过环境变量，ApplicationAttemptId将被发送给ApplicationMaster，从环境变量获得的值通过ConverterUtils辅助函数能够转化为ApplicationAttemptId对象。

3、我的container被NodeManager kill了

这可能是因为比较高的内存消耗超出了你的container的memory size。有一系列的原因可能产生这种现象，首先当container被kill时，可以产看node manager dump出来的process tree。你需要关注的是physical memory和virtual memory。如果你超出了physical memory限制，你的application使用了太多的physical memory，如果你运行一个Java app，你可以使用 -hprof来什么占用了堆里的空间。如果你超出了虚拟内存的限制，你需要增加机群范围的配置变量yarn.nodemanager.vmem-pmem-ratio.

0 0