YARN系统官方文档翻译

来源:互联网 发布:用户数据加密存储模块 编辑:程序博客网 时间:2024/06/11 21:48

首先翻译的确实是水,毕竟四级都没有过。翻译原因只为学习,不惜勿喷,谢谢


Apache Hadoop NextGen MapReduce (YARN)

Apache Hadoop下一代MapReduce(YARN系统)

MapReduce has undergone a complete overhaul in hadoop-0.23 and we now have, what we call, MapReduce 2.0 (MRv2) or YARN.
MapReduce在hadoop-0.23基础上修复、完善,有了现在我们称为Mapreduce2.0或者称为YARN的系统

The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker, resource management and job scheduling/monitoring, into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job in the classical sense of Map-Reduce jobs or a DAG of jobs.
MRv2的基本思想是将JobTracker功能分为资源管理和作业调度/管理两个独立的守护进程。这样做是为了能够拥有一个全局的资源管理(ResourceManager)和对于每个应用程序有相应的应用管理(ApplicationMaster)。应用程序就是传统意义上的Map-Reduce作业或DGA作业。

The ResourceManager and per-node slave, the NodeManager (NM), form the data-computation framework. The ResourceManager is the ultimate authority that arbitrates resources among all the applications in the system.
ResourceManager和每个节点上的NodeManager形成了数据计算框架,ResourceManager对系统中所有的应用程序使用资源有最终分配权利。

The per-application ApplicationMaster is, in effect, a framework specific library and is tasked with negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the tasks.
每个应用程序上的ApplicationMaster实际上是这个计算框架上的库,并且他需要与ResourceManager协商资源,以及在NodeManager上工作:执行和监控作业。





The ResourceManager has two main components: Scheduler and ApplicationsManager.
ResouceManager有两个主要的组件:Scheduler(调度)和AppllicationManager(应用程序管理)

The Scheduler is responsible for allocating resources to the various running applications subject to familiar constraints of capacities, queues etc. The Scheduler is pure scheduler in the sense that it performs no monitoring or tracking of status for the application. Also, it offers no guarantees about restarting failed tasks either due to application failure or hardware failures. The Scheduler performs its scheduling function based the resource requirements of the applications; it does so based on the abstract notion of a resource Container which incorporates elements such as memory, cpu, disk, network etc. In the first version, only memory is supported.
Scheduler(资源调度)负责为根据资源容量的限制为各个正在运行的应用程序序列分配资源。这个Scheduler是纯粹的资源调度,他并不对应用程序进行监控和状态跟踪。此外,他也不能重启因为因为应用失败或者硬件错误而运行失败的任务。Scheduler执行资源调度是基于当前这个应用程序的资源需求。这个资源容器组件包括内存、cpu、磁盘、网络等等,在第一个版本中,仅仅支持内存。

The Scheduler has a pluggable policy plug-in, which is responsible for partitioning the cluster resources among the various queues, applications etc. The current Map-Reduce schedulers such as the CapacityScheduler and the FairScheduler would be some examples of the plug-in.
Scheduler有一种可插入插件策略,负责为序列、应用程序划分集群资源。当前Map-Reduce中如CapatityScheduler和FairSheduler都是插件的一些列子。

The CapacityScheduler supports hierarchical queues to allow for more predictable sharing of cluster resources
CapacityScheduler支持层次化序列,允许更多的预测集群中的资源。

The ApplicationsManager is responsible for accepting job-submissions, negotiating the first container for executing the application specific ApplicationMaster and provides the service for restarting the ApplicationMaster container on failure.
ApplicationManager负责接受提交的作业,通过协商第一容器(container)执行特殊的应用程序ApplicationMaster并且为ApplicationMaster container执行失败后重新启动提供服务。

The NodeManager is the per-machine framework agent who is responsible for containers, monitoring their resource usage (cpu, memory, disk, network) and reporting the same to the ResourceManager/Scheduler.
NodeManager是每台机器上的一个框架代理,负责对container的资源(cpu,内存,磁盘,网络)使用情况进行监控,并且报告给ResourceManager/Scheduler

The per-application ApplicationMaster has the responsibility of negotiating appropriate resource containers from the Scheduler, tracking their status and monitoring for progress.
每个应用程序的ApplicationMaster负责与Scheduler协商获取的适当的container资源,并且跟踪程序的状态,监控执行过程。

MRV2 maintains API compatibility with previous stable release (hadoop-0.20.205). This means that all Map-Reduce jobs should still run unchanged on top of MRv2 with just a recompile.
MRV2保持了API与以前稳定版本(hadoop-0.2..205)的兼容性,这意味着以前所有的Map-Reduce作业都能够在MRV2上运行,只需要重新编译一下。

个人总结:
YARN系统结构类型是将节点分为ResourceManager和NodeManager,ResourceManager负责资源的调度,NodeManager负责运行作业。每个应用程序都有一个相应的ApplicationMaster,运行在NodeManager上,负责为应用程序与ResourceManager协商资源,并且对作业执行和监控。在每个NodeManager上有一个container(资源容器)包括了使用的各种资源比如cpu、磁盘、网络、内存,NodeManager负责监控各个节点上的container,并且向ResourceManager报告。
ResourceManager又有两大组件:Scheduler和ApplicationManager。Scheduler只负责对集群中的资源进行分配,协调各个节点应用程序的资源,并且有多种资源分配策略比如CapacityScheduler和FairScheduler。ApplicationManager则负责接受提交的作业,为作业执行失败重启提供服务。
0 0