Cloudera Manager 5 Overview

来源：互联网发布：thumbnails java 编辑：程序博客网时间：2024/06/13 13:26

cloudera manager是一个end-to-end的应用，用于管理CDH集群。

Cloudera Manager is an end-to-end application for managing CDH clusters. Cloudera Manager 为应用部署设置标准 by delivering granular visibility into and 控制CDH集群的每一个部分—使管理者提高效率, 提高服务质量, 提高兼容性，降低管理成本. With Cloudera Manager, 你可以简单部署和集中管理 CHD stack和其他管理服务。应用使安装过程自动化, 减少部署时间 from weeks to minutes; gives you a cluster-wide, real-time view of hosts and services running; provides a single, central console to enact configuration changes across your cluster; 并且包括 a full range of 报告和诊断工具 to help you optimize performance and utilization. This primer introduces cloudera Manager的基本概念结构和功能。

Terminology

为了高效使用 Cloudera Manager,你首先需要了解它的术语. 术语间的关系 is illustrated below :

其中一些术语, 比如集群（cluster）和服务(service), 不再解释.

Others, such as role group, gateway, host template, andparcel 将在随后部分介绍.

A common point of 疑惑is the overloading of the terms service and role for both types and instances;

Cloudera Manager and this section sometimes uses the same term for type类型 and instance实例.

For example, the Cloudera Manager Admin Console Home > Status tab and Clusters > ClusterName menu 展示服务实例列表.

编程语言的做法相似， where for example the term "string" 可以代表类型 (java.lang.String) 或者类型的一个实例 ("hi there").

deployment部署

A 配置 of Cloudera Manager 和它管理的所有集群.

dynamic resource pool动态资源池

In Cloudera Manager, a named configuration of resources and a policy for scheduling

the resources among YARN applications or Impala queries running in the pool.

cluster集群

A set of 机器 or 机架of 机器that 包含HDFS 文件系统，并且可以跑MapReduce和其他进程. 一个伪分布式集群指一个在单一机器上安装的CDH and useful for demonstrations and individual study.
In Cloudera Manager, 一个逻辑实体 that 包含 a set of 宿主机, a single version of CDH installed on the hosts, and the service and role instances running on the hosts. 一个宿主机只能属于一个集群，
Cloudera Manager can 管理许多multiple CDH 集群, 然而每一个集群只能由一个ClouderaManager server 管理或 Cloudera Manager HA pair.

host宿主机

In Cloudera Manager, a 物理的或虚拟的机器that runs role instances. 一个宿主机只能属于一个集群。

service服务

被管理的功能in Cloudera Manager, 可以是分布式的也可以不是, 跑在集群中.

有时特指服务类型. 比如: MapReduce, HDFS, YARN, Spark, and Accumulo.

传统环境中, 一台宿主机跑很多服务; 在分布式环境中, 一个服务在多台机器上运行.

service instance服务实例

In Cloudera Manager, 一个服务实例跑在一个集群中. 比如: "HDFS-1" and "yarn".

一个服务实例具有多个角色实例，看不懂见下文。

role角色

In Cloudera Manager, 代表一个服务中的一类功能. 比如, HDFS 服务有如下角色:

NameNode, SecondaryNameNode, DataNode, and Balancer.

role instance角色实例

In Cloudera Manager, 一个角色实例跑在一个宿主机上.

角色实例其实对应一个Unix 进程. For example: "NameNode-h1" and "DataNode-h1".

role group角色组

In Cloudera Manager, a set of configuration properties for a set of role instances.

host template宿主机模板？

role groups的集合。 When a template 应用于一台宿主机, 每个角色组中的一个角色实例将被创建并分配到宿主机。

gateway？？？

In Cloudera Manager, 角色 that designates a host that 应当接收客户端配置 for a 服务，

当宿主机没有运行任何这个服务的角色实例时。

parcel???

A 二进制distribution format，包含编译代码和元数据信息，比如包描述 , 版本, 和依赖.

static service pool静态服务池

In Cloudera Manager, a 静态partitioning of 所有集群资源—CPU, memory, and I/O weight—across a set of services.

Cluster Example 集群例子

Consider a cluster 集群1，包含 4台宿主机， as shown in the following listing from Cloudera Manager:

The 宿主机 tcdn501-1 是集群的"master" 机器, 所以它有很多的角色实例,21个，其他只有7个角色实例。

除了CDH "master" 角色实例, tcdn501-1 还拥有 Cloudera Management Service 角色们:

Architecture架构

如下描述， Cloudera Manager 的核心是 Cloudera Manager Server.

The Server hosts the Admin Console Web Server（管理console web服务器）和应用逻辑,

负责安装软件, 配置, 启动, 和停止服务, ，管理集群 on which the services run.

The Cloudera Manager Server 的一些组件:

Agent - 安装在每一个宿主机. The agent 负责启动和停止进程,解析配置, 触发软件安装,和监控宿主机.
Management Service 管理服务。监控，预警，报告等等功能.
Database 数据库。存储配置和监控信息. 典型情况下, 多个逻辑数据库运行于一个或多个数据库服务器。
比如, the Cloudera Manager Server 和the monitoring roles 使用不同的逻辑数据库logical database.
Cloudera Repository 仓库- repository of 软件for distribution by Cloudera Manager.
Clients客户端 - 和服务器交互的接口:
- Admin Console 界面- Web-based UI，运维人员可以用来管理集群和Cloudera Manager自身.
- API - API ，开发者可以用来创建定制化的Cloudera Manager 应用.

Heartbeating心跳

心跳是Cloudera Manager主要的通信机制.

默认情况下， Agents 每15秒发送一次心跳给Cloudera Manager Server.

然而, 为了降低用户延迟，频率会提高，当状态（state）改变的时候。

心跳交换的过程中，the Agent 通知the Cloudera Manager Server 它的一些活动.

作为反馈， the Cloudera Manager Server 告诉 the Agent 应当做什么操作。

the Agent and the Cloudera Manager Server 双方都做出一些沟通，比如, 如果你开启一个服务，

the Agent 会尝试开启一个相关的进程;如果进程启动失败，Cloudera Manager Server 将标记进程启动为失败.

State Management状态管理

The Cloudera Manager Server 可以维护集群的状态state.

状态可以被分为两种: "model" and "runtime", 存储在 Cloudera Manager Server 数据库中.

Cloudera Manager models CDH and managed services: 他们的角色, 配置,和内部依赖.

Model state 捕获什么应用应该运行在哪里, 用什么配置.

比如， model state captures the fact that 一个集群包含17台宿主机, 每一个宿主机应该跑一个datanode.

你通过 Cloudera Manager Admin Console configuration screens和model做交互 and

API and operations such as "Add Service".

Runtime state运行时状态，什么进程正跑在哪里，什么命令（比如，rebalance HDFS或Backup/Disaster Recovery 调度或 rolling重启停止）当前正在运行

The runtime state 包含需要运行一个进程的确切配置文件. 当你选择“Start” in the Cloudera Manager Admin Console,

the server 收集相关服务和角色的所有配置，校验这些配置, 产生配置文件, 并且存储到数据库。

当你更新一个配置 (比如, the Hue Server web 端口),你同时更新了 model state. 然而,如果此时hue正在运行, 它使用的仍然是旧端口。

当这种不匹配产生,这个角色将被标记为持有“过期配置” . 为了同步配置, 你需要重启角色 (触发配置重新读取和进程重启).

即便Cloudera Manager models 所有合理的配置, 某些情况下无法避免需要特殊处理. To allow you to workaround, 比如, a bug or to 探索不支持的options, Cloudera Manager 支持an "advanced configuration snippet" 机制允许你直接添加properties to 配置文件.

Configuration Management配置管理

Cloudera Manager在几个层级上定义配置:

The service level（服务level）可以定义应用于整个服务实例的配置, 比如，一个 HDFS service的默认复制银子replication factor (dfs.replication).
The role group level may define configurations that apply to 组中的角色, such as the DataNodes的处理器数量(dfs.datanode.handler.count). 不同的DataNodes组可以设置成不一样.比如，跑在更好硬件上的DataNodes可以有更多处理器.
The role instance level 可以覆盖配置that it 继承from its role group. This should be used sparingly, because it easily leads to configuration divergence within the role group. One example usage is to temporarily enable debug logging in a specific role instance to troubleshoot an issue.
Hosts have configurations related to 监控, 软件管理和资源管理.
Cloudera Manager itself has configurations related to its own administrative operations.

Role Groups

You can set configuration at the 服务实例(for example, HDFS) or 角色实例(for example, the DataNode on host17).

An individual role 继承服务层级的配置. Configurations made at the 角色层级覆盖those inherited from the 服务层级.

While 这种方式提供了灵活性, 配置一个角色实例的集合 in the same way can be 琐碎.

Cloudera Manager 支持角色组, 一种分配配置给一个角色组实例的机制.

组角色成员继承这些配置. 比如, in a cluster with heterogeneous hardware,

DataNode 角色组 can be created for each host type and the DataNodes running on those hosts can be assigned to their corresponding role group.

为同一机器上所有dataNodes设置配置，修改role group的配置即可。

The HDFS 服务discussed earlier has the following 角色组 defined for the service's roles:

In addition to making it easy to 管理配置 of subsets of roles,

角色组也可以make it possible to 维护不同的配置for experimentation or managing shared clusters for 不同的用户or workloads.

Host Templates

In typical environments, sets of hosts have the same hardware and the same set of services running on them.

A host template 定义a set of 角色组(at most one of each type) in a cluster and provides two主要好处:

添加新的宿主机到集群很容易 - 多个宿主机可以在单个操作中创建，配置，启动不同服务的角色。
Altering the configuration of roles from different services on a set of hosts easily - which is useful for quickly switching the configuration of an entire cluster to accommodate different workloads or users.

Server and Client Configuration 服务器和客户端配置

管理员有时会惊讶于修改 /etc/hadoop/conf 然后重启HDSF，没有效果，。

那是因为服务实例started by Cloudera Manager do not 读取配合from the 默认路径.

以HDFS 为例子, 当不使用 Cloudera Manager管理时, there would usually be one HDFS 配置per 宿主机,

路径为 /etc/hadoop/conf/hdfs-site.xml. 服务端进程和刻度换running on the same host 都是使用相同的配置。

Cloudera Manager 能区分服务端和客户端的配置. 以HDFS为例， the file /etc/hadoop/conf/hdfs-site.xml 只包含HDFS客户端的相关配置.

也就是说，默认情况下, 如果你跑一个需要和Hadoop通信的程序 ,

it will get the 地址of the NameNode and JobTracker, and 其他重要配置, from that directory.

A similar approach is taken for /etc/hbase/conf and /etc/hive/conf.

相反的, HDFS 角色实例 (比如, NameNode and DataNode) 获取their 配置 from a 私人的per-process 文件夹,

路径：/var/run/cloudera-scm-agent/process/unique-process-name.

Giving each process its own 私人的执行和配置环境allows Cloudera Manager 单独控制么一个进程

. 比如， here are the 内容 of an example 879-hdfs-NAMENODEprocess 文件夹:

$ tree -a /var/run/cloudera-scm-Agent/process/879-hdfs-NAMENODE/  /var/run/cloudera-scm-Agent/process/879-hdfs-NAMENODE/  ├── cloudera_manager_Agent_fencer.py  ├── cloudera_manager_Agent_fencer_secret_key.txt  ├── cloudera-monitor.properties  ├── core-site.xml  ├── dfs_hosts_allow.txt  ├── dfs_hosts_exclude.txt  ├── event-filter-rules.json  ├── hadoop-metrics2.properties  ├── hdfs.keytab  ├── hdfs-site.xml  ├── log4j.properties  ├── logs  │   ├── stderr.log  │   └── stdout.log  ├── topology.map  └── topology.py

Distinguishing between server and client configuration provides several advantages:

敏感信息 in the 服务端配置, 比如the 密码for the Hive Metastore RDBMS, 不会暴露给客户端。
A 服务that depends on 其他服务may deploy with customized configuration. For example, to get good HDFS 读性能, Impala needs a specialized version of the HDFS 客户端配置, which may be 有害的to a generic client. This is achieved by separating the HDFS configuration for the Impala daemons (stored in the per-process directory mentioned above) from that of the generic client (/etc/hadoop/conf).

客户端配置文件更小，可读性更好. This also avoids confusing non-administrator Hadoop users with 不相关的服务端配置.

Deploying Client Configurations and Gateways

A client configuration is azip 文件 that contain the relevant configuration files with the settings for a service.

Each zip file contains 配置文件集合 needed by the service. For example, the MapReduce client configuration zip file contains copies of

core-site.xml, hadoop-env.sh, hdfs-site.xml, log4j.properties, and mapred-site.xml.

Cloudera Manager 支持下载客户端配置文件 to enable distributing the client configuration file to users outside the cluster.

Cloudera Manager can deploy client configurations within the cluster;

each applicable service has a Deploy Client Configuration action. This action 并没有 deploy the client configuration to the 整个集群;

it only deploys the client configuration to 所有宿主机 that this 服务has been assigned to.

比如, 假设一个集群有 10台宿主机器, and a MapReduce 服务跑在 1-9.

当你使用Cloudera Manager to 部署the MapReduce 客户端配置, host 10不会收到配置, because the MapReduce service has no role assigned to it.

This design is 有意为之 to avoid deploying conflicting client configurations from multiple services.

To deploy a client configuration to a host that does not have a role assigned to it you use agateway.

A gateway is a 标记to convey that a 服务should be accessible from 某一个特定的宿主机.

不像其他所有机器， it has no associated process. In the preceding example, to deploy the MapReduce client configuration to host 10,you assign a MapReduce gateway role to that host.

Gateways can also be used to 定制客户端配置 for some hosts. Gateways can be placed in role groups and those groups can be configured differently. However, unlike role instances, there is no way to override configurations for gateway instances.

In the cluster we discussed earlier, the three hosts (tcdn501-[2-5]) 没有hive角色实例，但有 Hive gateways:

Process Management进程管理

在一个非Cloudera Manager 管理的集群中, you most likely start a 角色实例进程 using an init 脚本,

比如, service hadoop-hdfs-datanode start. Cloudera Manager 不使用 init scripts for the daemons it manages;

in a Cloudera Manager managed cluster, starting and stopping services using initscripts 将不work!

在一个Cloudera Manager 管理的集群中，you启动或停止角色实例进程 using Cloudera Manager.

Cloudera Manager 使用一个开源的进程管理工具 called supervisord, that 启动进程, 处理重定向日志文件,

通知失败进程, setting the effective user ID of the calling process to the right user, and so on.

Cloudera Manager 支持自动重启crashed进程. It will also 立flag a 角色实例 with 不健康的 flag如果一个进程在重启不久后马上crashed.

停止the Cloudera Manager Server 和the Cloudera Manager Agents 不会停止你的服务; 任何正在运行的角色实例会继续运行。

The Agent is started by init.d at start-up. It, in turn, 联系the Cloudera Manager Server and 决定哪些进程需要运行. The Agent is monitored as part of Cloudera Manager's host monitoring: if the Agent 停止发送心跳, the host is 标记as having 不健康.

One of the Agent's 主要责任启动和停止进程. When the Agent 检测到一个新的进程 from 服务器心跳, the Agent creates a 文件夹for it in /var/run/cloudera-scm-agent and 解压配置. It then 联系supervisord,由它启动进程。

这些动作强调一个重要的点: a Cloudera Manager 进程never travels alone. 换言之, a process is more than just the 参数to exec()—it also includes 配置文件, 需要创建的文件夹, and 其他信息.

Software Distribution Management 软件发布管理

A 主要功能of Cloudera Manager is to 安装CDH and 管理服务软件.

Cloudera Manager 安装软件 for new deployments and to 更新已有的deployments.

Cloudera Manager 支持两种software distribution formats: packages and parcels.

A package包 is a binary distribution format that contains 已编译的代码 and 元数据信息比如

a package description, version, and dependencies.

Package management systems evaluate this meta-information to allow package searches,

perform upgrades to a newer version, and ensure that all dependencies of a package are fulfilled.

Cloudera Manager uses the native system package manager for each supported OS.

。。。。

Host Management宿主机管理

Cloudera Manager provides several features to 管理宿主机器in your Hadoop 集群.

The first time you 运行Cloudera Manager Admin Console you can 搜索 hosts to add to the cluster and

一旦宿主机被选择了，你可以 you can map the assignment of CDH 角色to hosts.

Cloudera Manager 自动部署所有软件 required to participate as a managed host in a cluster:

JDK, Cloudera Manager Agent, CDH, Impala, Solr, and so on to the hosts.

Once the 服务s are deployed and running, the Hosts area within the Admin Console shows

the overall status of the managed hosts in your cluster. 提供的信息包括 CDH的版本 running on the host,

宿主机属于哪一个集群, 宿主机上运行的角色数量.

Cloudera Manager provides operations to 管理the 生命周期of the participating hosts and to 添加和删除宿主机.

The Cloudera Management Service Host Monitor role performs 健康测试 and collects host metrics to allow you to monitor the health and performance of the hosts.

Resource Management

资源管理helps 确保可预测的行为 by defining the 影响of 不同服务on cluster resources.

使用资源管理的目的：

保证重要的工作负载在合理的时间内完成.
支持合理的集群调度 between groups of users based on fair allocation of resources per group.
防止users from depriving other users access to the cluster.

With Cloudera Manager 5, 静态分配资源using cgroups is 可配置through a single static service pool wizard.

You allocate services as 所有资源的百分比, and the wizard configures the cgroups.

Static service pools 隔离the 服务in your 集群from one another, so that 负载on one service has a 有限的影响

on other services. Services are allocated a static percentage of total resources—CPU, memory, and I/O weight—

which are not shared with other services. When you configure static service pools,

Cloudera Manager computes 推荐的memory, CPU, and I/O configurations for the worker roles of the services that correspond to the percentage assigned to each service.

Static service pools are implemented per role group within a cluster, using Linux control groups (cgroups) and cooperative memory limits (for example, Java maximum heap sizes).

Static service pools can be used to control access to 资源by HBase, HDFS, Impala, MapReduce, Solr, Spark, YARN, and add-on services.

Static service pools are not enabled by default.

For example, the following 图片表示static pools for HBase, HDFS, Impala, and YARN services

that are 分别分配了20%, 30%, 20%, and 30% of 集群资源.

You can 动态地apportion resources that are statically allocated to YARN and Impala by using dynamic resource pools.

Depending on the version of CDH you are using, dynamic resource pools in Cloudera Manager support the following scenarios:

YARN (CDH 5) - YARN 管理虚拟内核, 内存, 运行的应用, 最大的资源for undeclared children (for parent pools), and 调度策略for each pool.
In the preceding diagram, three dynamic resource pools—Dev, Product, and Mktgwith weights 3, 2, and 1 respectively—are defined for YARN.
If an application starts and is assigned to the Product pool, and other applications are using the Dev and Mktg pools,
the Product resource pool receives 30% x 2/6 (or 10%) of the total cluster resources.
If no applications are using the Dev and Mktg pools, the YARN Product pool is allocated 30% of the cluster resources.

阅读全文

0 0