Hadoop(2.7.3)安全模式-hadoop kerberos官方配置详解

来源:互联网 发布:淘宝闲置物品怎么买 编辑:程序博客网 时间:2024/05/24 00:33

原文地址:http://hadoop.apache.org/docs/r2.7.3/hadoop-project-dist/hadoop-common/SecureMode.html

介绍

这篇文档描述了如何为Hadoop在安全模式下配置认证。当Hadoop被配置运行在安全模式下时,每个Hadoop服务和每个用户都必须被Kerberos认证。正向方向的主机去查找所有服务的主机,必须被正确地配置来相互认证。主机查找可能都被配置在DNS或者/etc/hosts文件中。推荐你在尝试配置Hadoop安全模式前,先了解kerberos和DNS的工作原理。

就kerberos,本人博客有相关的详细介绍。

Hadoop的安全特性,由Authentication(认证), Service Level Authorization(服务级别认证), Authentication for Web Consoles(web控制台认证)和Data Confidentiality(数据保密)组成。

Authentication(认证)

终端用户帐号

当服务基本的认证开启时,终端用户必须在和Hadoop服务交互前认证。最简单的方式就是使用Kerberos的kinit命令来交互认证。使用Kerberos keytab文件的程序认证可能会在使用kinit的交互登录不可用时使用。

Hadoop进程的用户帐号

确认HDFS和YARN进程跑在不同的Unix用户下,比如:hdfs和yarn。还有,保证MapReduce JobHistory服务也跑在不同的用户之下,比如mapred。

推荐它们使用同一个Unix组,比如:hadoop。参考“Mapping from user to group”进行组的管理。

用户:组 进程 hdfs:hadoop NameNode, Secondary NameNode, JournalNode, DataNode yarn:hadoop ResourceManager, NodeManager mapred:hadoop MapReduce JobHistory Server

Hadoop进程的Kerberos principals(实体)

每个Hadoop服务实例都必须配置他的Kerberos principal和keytab文件位置。

一个服务实体的一般格式是:服务名/_HOST@REALM.TLD。比如:dn/_HOST@EXAMPLE.COM。

Hadoop通过允许服务principal的主机组件被指定为_HOST通配符来简化配置文件的部署。每个服务实例都会用它们自己当前运行的合法主机名来代替_HOST。这就允许管理员给所有节点部署相同设置的配置文件。但是,keytab文件将会不同。

HDFS

NameNode在每个NameNode主机上的keytab文件,应该看起来像这样:

$ klist -e -k -t /etc/security/keytab/nn.service.keytabKeytab name: FILE:/etc/security/keytab/nn.service.keytabKVNO Timestamp         Principal   4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)   4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)   4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)

Seconday NameNode在主机上的keytab文件,应该看起来像这样:

$ klist -e -k -t /etc/security/keytab/sn.service.keytabKeytab name: FILE:/etc/security/keytab/sn.service.keytabKVNO Timestamp         Principal   4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)   4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)   4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)

DataNode在每个主机上的keytab文件, 应该看起来像这样:

$ klist -e -k -t /etc/security/keytab/dn.service.keytabKeytab name: FILE:/etc/security/keytab/dn.service.keytabKVNO Timestamp         Principal   4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)   4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)   4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)

YARN

在资源管理器(ResourceManager)主机上的资源管理器keytab文件,应该看起来像这样:

$ klist -e -k -t /etc/security/keytab/rm.service.keytabKeytab name: FILE:/etc/security/keytab/rm.service.keytabKVNO Timestamp         Principal   4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)   4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)   4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)

在每个主机上的节点管理器(NodeManager)的keytab文件,应该看起来像这样:

$ klist -e -k -t /etc/security/keytab/nm.service.keytabKeytab name: FILE:/etc/security/keytab/nm.service.keytabKVNO Timestamp         Principal   4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)   4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)   4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)

MapReduce JobHistory Server

在MapReduce JobHistory Server主机上的keytab文件,应该看起来像这样:

$ klist -e -k -t /etc/security/keytab/jhs.service.keytabKeytab name: FILE:/etc/security/keytab/jhs.service.keytabKVNO Timestamp         Principal   4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)   4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)   4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)

映射Kerberos principals到操作系统用户账号

Hadoop使用被hadoop.security.auth_to_local指定的规则来映射kerberos principals到操作系统(系统)用户账号。这些规则使用和在 Kerberos configuration file (krb5.conf)中的auth_to_local相同的方式工作。另外,hadoop auth_to_local映射支持/L标志来是返回的名字小写。

默认会取principal名字的第一部分作为系统用户名如果realm匹配defaul_realm(通常被定义在/etc/krb5.conf)。比如:默认的的规则映射principal host/full.qualified.domain.name@REALM.TLD到系统用户host。默认的规则可能对大多数的集群都不合适。

在一个典型的集群中,HDFS和YARN服务将分别由hdfs和yarn用户启动。hadoop.security.auth_to_local可以被配置成这样:

<property>  <name>hadoop.security.auth_to_local</name>  <value>    RULE:[2:$1@$0](nn/.*@.*REALM.TLD)s/.*/hdfs/    RULE:[2:$1@$0](jn/.*@.*REALM.TLD)s/.*/hdfs/    RULE:[2:$1@$0](dn/.*@.*REALM.TLD)s/.*/hdfs/    RULE:[2:$1@$0](nm/.*@.*REALM.TLD)s/.*/yarn/    RULE:[2:$1@$0](rm/.*@.*REALM.TLD)s/.*/yarn/    RULE:[2:$1@$0](jhs/.*@.*REALM.TLD)s/.*/mapred/    DEFAULT  </value></property>

自定义规则可以使用Hadoop kerbname命令测试,这个命令运行你指定一个principal并应用Hadoop当前的auth_to_local规则设置。

映射用户到组

系统用户到系统组的映射机制可以通过hadoop.security.group.mapping配置。更多细节查看HDFS Permissions Guide。

实际上,你需要在Hadoop安全模式中使用Kerberos with LDAP管理SSO(单点登录)环境。

代理用户

有些访问终端用户维护的Hadoop服务的产品,比如Apache Oozie,需要能够模拟终端用户。更多细节查看the doc of proxy user。

保护DataNode

因为DataNode的数据传输协议没有使用Hadoop RPC框架,DataNodes必须使用被dfs.datanode.address和dfs.datanode.http.address指定的特权端口来认证他们自己。该认证是基于假设攻击者无法获取在DataNode主机上的root特权。

当你使用root执行hdfs datanode命令时,服务器进程首先绑定特权端口,随后销毁特权并使用被HADOOP_SECURE_DN_USER指定的用户账号运行。这个启动进程使用被安装在JSVC_HOME的the jsvc program。你必须在启动项中(hadoop-env.sh)指定HADOOP_SECURE_DN_USER和JSVC_HOME做为环境变量。

2.6.0版本开始起,SASL可以被使用来认证数据传输协议。这不再需要安全集群使用jsvc的用户启动DataNode并绑定特权接口。要在数据传输协议上启用SASL,在hdfs-site.xml设置dfs.data.transfer.protection,为dfs.datanode.address设置一个免特权端口,设置dfs.http.policy to HTTPS_ONLY并保证HADOOP_SECURE_DN_USER环境变量没有设置。注意,如果dfs.datanode.address是设置了一个特权端口将不可能在数据传输协议上使用SASL。这是向后兼容的原因所要求的。

为了迁移一个存在的使用root认证的集群用使用SASL启动的方式替代。首先保证2.6.0或以上版本的hadoop已经被部署在所有的集群节点上,同时所有外部应用程序需要连接在这个集群上。只有2.6.0或以上版本的HDFS客户端可以使用SASL认证数据传输协议来连接DataNode。所以,在迁移前保证所有的节点版本正确是至关重要的。 所有地方的2.6.0或以上版本被部署之后,更新所有外部应用程序的配置来是SASL生效。如果以个HDFS客户端使用了SASL,那么他可以成功的连接一个DataNode,不管它使用的事root认证或者是SASL认证。配置所有的客户端保证以后在DataNode上的配置改变不会破坏这个应用程序。最后,每个DataNode个体都可以通过改变它的配置和重启来迁移。

数据保密

在RPC上的数据加密

在hadoop服务端和客户端之间传输的数据可以被加密。在core-site.xml上设置hadoop.rpc.protection隐私来激活加密。

块数据传输的数据加密

你需要在hdfs-site.xml上设置dfs.encrypt.data.transfer成true来激活为Datanode的数据传输协议的数据加密。

你可以选择性的设置dfs.encrypt.data.transfer.algorithm为3des或者rc4来选择使用特定的加密算法。如果不指定,那么在这个系统中,被配置的JCE将被默认使用,它通常情况使用3DES。

设置dfs.encrypt.data.transfer.cipher.suites成AES/CTR/NoPadding激活AES加密。默认情况下,这不被指定,所以AES不被使用。当AES被使用时,在一个初始密钥交换过程中被指定在dfs.encrypt.data.transfer.algorithm中的算法仍然被使用。AES密钥的长度可以通过设置dfs.encrypt.data.transfer.cipher.key.bitlength成128,192,或者256来配置。默认是128.

AES提供最大的加密强度和最佳的性能。目前,3DES和RC4已经经常在Hadoop集群中使用。

HTTP上的数据加密

在Web-console和客户端的数据传输被SSL(HTTPS)保护。SSL配置是推荐的,但是不需要使用kerberos配置Hadoop的安全。

配置

对于HDFS和本地文件系统路径的权限

下面的表格列出了各种HDFS和本地文件系统的路径(在所有节点上)和推荐的权限设置:

Filesystem Path User:Group Permissions local dfs.namenode.name.dir hdfs:hadoop drwx—— local dfs.datanode.data.dir hdfs:hadoop drwx—— local $HADOOP_LOG_DIR hdfs:hadoop drwxrwxr-x local $YARN_LOG_DIR yarn:hadoop drwxrwxr-x local yarn.nodemanager.local-dirs yarn:hadoop drwxr-xr-x local yarn.nodemanager.log-dirs yarn:hadoop drwxr-xr-x local container-executor root:hadoop –Sr-s–* local conf/container-executor.cfg root:hadoop r——-* hdfs / hdfs:hadoop drwxr-xr-x hdfs /tmp hdfs:hadoop drwxrwxrwxt hdfs /user hdfs:hadoop drwxr-xr-x hdfs yarn.nodemanager.remote-app-log-dir yarn:hadoop drwxrwxrwxt hdfs mapreduce.jobhistory.intermediate-done-dir mapred:hadoop drwxrwxrwxt hdfs mapreduce.jobhistory.done-dir mapred:hadoop drwxr-x—

常见的配置

为了在Hadoop上开启RPC认证,设置hadoop.security.authentication的属性值为“kerberos”,并且合理地设置在下面列出的安全相关的配置项。

下面的属性应该在集群中所有节点的core-site.xml文件中。

Parameter Value Notes hadoop.security.authentication kerberos simple : No authentication. (default) kerberos : Enable authentication by Kerberos. hadoop.security.authorization true Enable RPC service-level authorization. hadoop.rpc.protection authentication authentication : authentication only (default); integrity : integrity check in addition to authentication; privacy : data encryption in addition to integrity hadoop.security.auth_to_local RULE:exp1 RULE:exp2 … DEFAULT The value is string containing new line characters. See Kerberos documentation for the format of exp. hadoop.proxyuser.superuser.hosts comma separated hosts from which superuser access are allowed to impersonation. * means wildcard. hadoop.proxyuser.superuser.groups comma separated groups to which users impersonated by superuser belong. * means wildcard.

NameNode

Parameter Value Notes dfs.block.access.token.enable true Enable HDFS block access tokens for secure operations. dfs.namenode.kerberos.principal nn/_HOST@REALM.TLD Kerberos principal name for the NameNode. dfs.namenode.keytab.file /etc/security/keytab/nn.service.keytab Kerberos keytab file for the NameNode. dfs.namenode.kerberos.internal.spnego.principal HTTP/_HOST@REALM.TLD The server principal used by the NameNode for web UI SPNEGO authentication. The SPNEGO server principal begins with the prefix HTTP/ by convention. If the value is ‘*’, the web server will attempt to login with every principal specified in the keytab file dfs.web.authentication.kerberos.keytab. For most deployments this can be set to ${dfs.web.authentication.kerberos.principal} i.e use the value of dfs.web.authentication.kerberos.principal. dfs.web.authentication.kerberos.keytab /etc/security/keytab/spnego.service.keytab SPNEGO keytab file for the NameNode. In HA clusters this setting is shared with the Journal Nodes.

下面的设置允许配置SSL访问NameNode的web UI(可选)。

Parameter Value Notes dfs.http.policy HTTP_ONLY or HTTPS_ONLY or HTTP_AND_HTTPS HTTPS_ONLY turns off http access. This option takes precedence over the deprecated configuration dfs.https.enable and hadoop.ssl.enabled. If using SASL to authenticate data transfer protocol instead of running DataNode as root and using privileged ports, then this property must be set to HTTPS_ONLY to guarantee authentication of HTTP servers. (See dfs.data.transfer.protection.) dfs.namenode.https-address nn_host_fqdn:50470 dfs.https.port 50470 dfs.https.enable true This value is deprecated. Use dfs.http.policy

Secondary NameNode

Parameter Value Notes dfs.namenode.secondary.http-address snn_host_fqdn:50090 dfs.secondary.namenode.keytab.file /etc/security/keytab/sn.service.keytab Kerberos keytab file for the Secondary NameNode. dfs.secondary.namenode.kerberos.principal sn/_HOST@REALM.TLD Kerberos principal name for the Secondary NameNode. dfs.secondary.namenode.kerberos.internal.spnego.principal HTTP/_HOST@REALM.TLD The server principal used by the Secondary NameNode for web UI SPNEGO authentication. The SPNEGO server principal begins with the prefix HTTP/ by convention. If the value is ‘*’, the web server will attempt to login with every principal specified in the keytab file dfs.web.authentication.kerberos.keytab. For most deployments this can be set to ${dfs.web.authentication.kerberos.principal} i.e use the value of dfs.web.authentication.kerberos.principal. dfs.namenode.secondary.https-port 50470

JournalNode

Parameter Value Notes dfs.journalnode.kerberos.principal jn/_HOST@REALM.TLD Kerberos principal name for the JournalNode. dfs.journalnode.keytab.file /etc/security/keytab/jn.service.keytab Kerberos keytab file for the JournalNode. dfs.journalnode.kerberos.internal.spnego.principal HTTP/_HOST@REALM.TLD The server principal used by the JournalNode for web UI SPNEGO authentication when Kerberos security is enabled. The SPNEGO server principal begins with the prefix HTTP/ by convention. If the value is ‘*’, the web server will attempt to login with every principal specified in the keytab file dfs.web.authentication.kerberos.keytab. For most deployments this can be set to ${dfs.web.authentication.kerberos.principal} i.e use the value of dfs.web.authentication.kerberos.principal. dfs.web.authentication.kerberos.keytab /etc/security/keytab/spnego.service.keytab SPNEGO keytab file for the JournalNode. In HA clusters this setting is shared with the Name Nodes.

DataNode

Parameter Value Notes dfs.datanode.data.dir.perm 700 dfs.datanode.address 0.0.0.0:1004 Secure DataNode must use privileged port in order to assure that the server was started securely. This means that the server must be started via jsvc. Alternatively, this must be set to a non-privileged port if using SASL to authenticate data transfer protocol. (See dfs.data.transfer.protection.) dfs.datanode.http.address 0.0.0.0:1006 Secure DataNode must use privileged port in order to assure that the server was started securely. This means that the server must be started via jsvc. dfs.datanode.https.address 0.0.0.0:50470 dfs.datanode.kerberos.principal dn/_HOST@REALM.TLD Kerberos principal name for the DataNode. dfs.datanode.keytab.file /etc/security/keytab/dn.service.keytab Kerberos keytab file for the DataNode. dfs.encrypt.data.transfer false set to true when using data encryption dfs.encrypt.data.transfer.algorithm optionally set to 3des or rc4 when using data encryption to control encryption algorithm dfs.encrypt.data.transfer.cipher.suites optionally set to AES/CTR/NoPadding to activate AES encryption when using data encryption dfs.encrypt.data.transfer.cipher.key.bitlength optionally set to 128, 192 or 256 to control key bit length when using AES with data encryption dfs.data.transfer.protection authentication : authentication only; integrity : integrity check in addition to authentication; privacy : data encryption in addition to integrity This property is unspecified by default. Setting this property enables SASL for authentication of data transfer protocol. If this is enabled, then dfs.datanode.address must use a non-privileged port, dfs.http.policy must be set to HTTPS_ONLY and the HADOOP_SECURE_DN_USER environment variable must be undefined when starting the DataNode process.

WebHDFS

Parameter Value Notes dfs.web.authentication.kerberos.principal h ttp/_HOST@REALM.TLD Kerberos principal name for the WebHDFS. In HA clusters this setting is commonly used by the JournalNodes for securing access to the JournalNode HTTP server with SPNEGO. dfs.web.authentication.kerberos.keytab /etc/security/keytab/http.service.keytab Kerberos keytab file for WebHDFS. In HA clusters this setting is commonly used the JournalNodes for securing access to the JournalNode HTTP server with SPNEGO.

ResourceManager

Parameter Value Notes yarn.resourcemanager.principal rm/_HOST@REALM.TLD Kerberos principal name for the ResourceManager. yarn.resourcemanager.keytab /etc/security/keytab/rm.service.keytab Kerberos keytab file for the ResourceManager.

NodeManager

Parameter Value Notes yarn.nodemanager.principal nm/_HOST@REALM.TLD Kerberos principal name for the NodeManager. yarn.nodemanager.keytab /etc/security/keytab/nm.service.keytab Kerberos keytab file for the NodeManager. yarn.nodemanager.container-executor.class org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor Use LinuxContainerExecutor. yarn.nodemanager.linux-container-executor.group hadoop Unix group of the NodeManager. yarn.nodemanager.linux-container-executor.path /path/to/bin/container-executor The path to the executable of Linux container executor.

WebAppProxy配置

WebAppProxy在应用程序输出的web应用和一个终端用户之间提供一个代理。如果安全机制被启用,在用户访问一个潜在不安全的web应用时它会发出警告。认证和使用代理的认证和其他加密的web应用一样被处理。

Parameter Value Notes yarn.web-proxy.address WebAppProxy host:port for proxy to AM web apps. host:port if this is the same as yarn.resourcemanager.webapp.address or it is not defined then the ResourceManager will run the proxy otherwise a standalone proxy server will need to be launched. yarn.web-proxy.keytab /etc/security/keytab/web-app.service.keytab Kerberos keytab file for the WebAppProxy. yarn.web-proxy.principal wap/_HOST@REALM.TLD Kerberos principal name for the WebAppProxy.

LinuxContainerExecutor

一个被YARN框架使用的ContainerExecutor(容器执行者)定义了任何container如何被启动和控制。

下面在Hadoop YARN中是可用的:

ContainerExecutor Description DefaultContainerExecutor The default executor which YARN uses to manage container execution. The container process has the same Unix user as the NodeManager. LinuxContainerExecutor Supported only on GNU/Linux, this executor runs the containers as either the YARN user who submitted the application (when full security is enabled) or as a dedicated user (defaults to nobody) when full security is not enabled. When full security is enabled, this executor requires all user accounts to be created on the cluster nodes where the containers are launched. It uses a setuid executable that is included in the Hadoop distribution. The NodeManager uses this executable to launch and kill containers. The setuid executable switches to the user who has submitted the application and launches or kills the containers. For maximum security, this executor sets up restricted permissions and user/group ownership of local files and directories used by the containers such as the shared objects, jars, intermediate files, log files etc. Particularly note that, because of this, except the application owner and NodeManager, no other user can access any of the local files/directories including those localized as part of the distributed cache.

构建LinuxContainerExecutor可执行文件,执行:

$ mvn package -Dcontainer-executor.conf.dir=/etc/hadoop/

这个可执行文件必须有特殊的权限:6050或者–Sr-s—权限被root用户所拥有(super-user)和被特殊组(比如:hadoop)所拥有,这个组中NodeManager Unix用户是他的成员并且没有其他普通应用用户。如果有其他应用的用户属于这个特殊的组,那么安全性就不能得到保证了。这个特殊的组的名字应该被指定在yarn.nodemanager.linux-container-executor.group配置属性中,conf/yarn-site.xml和conf/container-executor.cfg有需要。

比如,假设NodeManager使用yarn用户(是users和hadoop组的一部分,他们中的任何一个都是主要的组)运行。让users组中处理yarn还有另外一个用户alice(应用提交者),并且alice不在hadoop组中。根据以上的描述,setuid/setgid可执行文件一个被设置成6050或者–Sr-s—,user-owner是yarn,group-owner是hadoop,yarn是hadoop的成员(而不是users组,它出了yarn用户外还有一个alice的用户)。

LinuxTaskController要求被指定在yarn.nodemanager.local-dirs和yarn.nodemanager.log-dirs的包含路径和引导到的目录,它就像上面的表格中描述的一样被设置成775权限在权限路径上。

  • conf/container-executor.cfg

这个可执行文件需要一个叫做container-executor.cfg的配置文件,在配置路径中出现,通过之前提到的MVN target。

这个配置文件必须被运行NodeManager的用户所拥有(比如上面例子中的yarn用户),被任何拥有0400或r——–权限的组所拥有。

这个可执行文件需要以下在conf/container-executor.cfg文件中出现的配置项。这些项目应该被要求成简单的key=value(键值对),每一项一行。

Parameter Value Notes yarn.nodemanager.linux-container-executor.group hadoop Unix group of the NodeManager. The group owner of the container-executor binary should be this group. Should be same as the value with which the NodeManager is configured. This configuration is required for validating the secure access of the container-executor binary. banned.users hdfs,yarn,mapred,bin Banned users. allowed.system.users foo,bar Allowed system users. min.user.id 1000 Prevent other super-users.

复习一下,这里是本地文件系统各种与LinuxContainerExecutor相关的路径的权限要求:

Filesystem Path User:Group Permissions local container-executor root:hadoop –Sr-s–* local conf/container-executor.cfg root:hadoop r——-* local yarn.nodemanager.local-dirs yarn:hadoop drwxr-xr-x local yarn.nodemanager.log-dirs yarn:hadoop drwxr-xr-x

MapReduce JobHistory Server

Parameter Value Notes mapreduce.jobhistory.address MapReduce JobHistory Server host:port Default port is 10020. mapreduce.jobhistory.keytab /etc/security/keytab/jhs.service.keytab Kerberos keytab file for the MapReduce JobHistory Server. mapreduce.jobhistory.principal jhs/_HOST@REALM.TLD Kerberos principal name for the MapReduce JobHistory Server.

多宿主

多宿主(每个主机可能在DNS上有多个主机名,比如:不同的主机名对应公共和私有的网络接口)的设置,可需要额外的配置来使kerberos工作。查看HDFS Support for Multihomed Networks。

参考

  1. O’Malley O et al. Hadoop Security Design
  2. O’Malley O, Hadoop Security Architecture
  3. Troubleshooting Kerberos on Java 7
  4. Troubleshooting Kerberos on Java 8
  5. Java 7 Kerberos Requirements
  6. Java 8 Kerberos Requirements
  7. Loughran S., Hadoop and Kerberos: The Madness beyond the Gate

转载请注明出处:
http://blog.csdn.net/m1213642578/article/details/52450639

0 0