kerberos 与Hadoop集成

来源:互联网 发布:开淘宝店名字 编辑:程序博客网 时间:2024/05/22 04:27

kerberos 安装

通过yum安装即可,组成KDC

yum install krb5-server krb5-libs krb5-workstation

配置KDC及说明

  • /var/kerberos/krb5kdc/kdc.conf
[kdcdefaults] kdc_ports = 88 kdc_tcp_ports = 88[realms] HADOOP.COM = {  # master_key_type = aes128-cts  acl_file = /var/kerberos/krb5kdc/kadm5.acl  dict_file = /usr/share/dict/words  admin_keytab = /var/kerberos/krb5kdc/kadm5.keytab  max_renewable_life = 7d  supported_enctypes = aes128-cts:normal des3-hmac-sha1:normal arcfour-hmac:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal }

说明:

HADOOP.COM:是设定的realms。名字随意。Kerberos可以支持多个realms,会增加复杂度。本文不探讨。大小写敏感,一般为了识别使用全部大写。这个realms跟机器的host没有大关系。

max_renewable_life = 7d 涉及到是否能进行ticket的renwe必须配置。

master_key_type:和supported_enctypes默认使用aes256-cts。由于,JAVA使用aes256-cts验证方式需要安装额外的jar包。推荐不使用。

acl_file:标注了admin的用户权限,需要用户自己创建。

admin_keytab:KDC进行校验的keytab。后文会提及如何创建。

supported_enctypes:支持的校验方式。注意把aes256-cts去掉。

  • /var/kerberos/krb5kdc/kadm5.acl
*/admin@HADOOP.COM      *

说明:

文件格式是
Kerberos_principal permissions [target_principal] [restrictions]
支持通配符等。最简单的写法是
/admin@HADOOP.COM
代表名称匹配/admin@HADOOP.COM 都认为是admin,权限是 。代表全部权限。

  • /etc/krb5.conf
[logging] default = FILE:/var/log/krb5libs.log kdc = FILE:/var/log/krb5kdc.log admin_server = FILE:/var/log/kadmind.log[libdefaults] dns_lookup_realm = false dns_lookup_kdc = false ticket_lifetime = 24h renew_lifetime = 7d forwardable = true rdns = false default_realm = HADOOP.COM default_ccache_name = KEYRING:persistent:%{uid} udp_preference_limit =1[realms] HADOOP.COM = {  kdc = SZB-L0049038:88  admin_server = SZB-L0049038:749 }[domain_realm] .hadoop.com = HADOOP.COM hadoop.com = HADOOP.COM

说明:

[logging]:表示server端的日志的打印位置
[libdefaults]:每种连接的默认配置,需要注意以下几个关键的小配置
default_realm = HADOOP.COM 默认的realm,必须跟要配置的realm的名称一致。
udp_preference_limit = 1 禁止使用udp可以防止一个Hadoop中的错误
kdc:代表要kdc的位置。格式是 机器:端口
admin_server:代表admin的位置。格式是 机器:端口
default_domain:代表默认的域名

初始化KDC

初始化数据库:在hadoop1上运行命令。其中-r指定对应realm。

kdb5_util create -r HADOOP.COM -s

如果遇到数据库已经存在的提示,可以把/var/kerberos/krb5kdc/目录下的principal的相关文件都删除掉。默认的数据库名字都是principal。可以使用-d指定数据库名字。(尚未测试多数据库的情况)。

启动kerberos

yum 安装应该会在操作系统中安装好krb5kdc和kadmin两个服务。kadmin服务依赖于krb5kdc服务。
测试机器为redhat7,所以启动命令为:

systemctl start krb5kdc.service
systemctl start kadmin.service

kerberos 管理

认证
kinit admin/admin

超级管理员
本机登录:kadmin.local
远程登录:kadmin admin/admin # 如果需要远程管理,需要先以admin用户验证登录 kinit admin/admin

查看当前认证用户:klist
删除当前认证缓存:kdestroy

Hadoop集群和kerberos集成

keytab分发

#!/bin/bashfor h in "namenode" "datanode1" "datanode2" "datanode3" "datanode4" "datanode5" "datanode6" ;do  echo $h  kadmin.local -q "delprinc -force hdfs/${h}@HADOOP.COM   "  kadmin.local -q "delprinc -force mapred/${h}@HADOOP.COM "  kadmin.local -q "delprinc -force yarn/${h}@HADOOP.COM   "  kadmin.local -q "delprinc -force HTTP/${h}@HADOOP.COM   "  kadmin.local -q "addprinc -randkey hdfs/${h}@HADOOP.COM   "  kadmin.local -q "addprinc -randkey mapred/${h}@HADOOP.COM "  kadmin.local -q "addprinc -randkey yarn/${h}@HADOOP.COM   "  kadmin.local -q "addprinc -randkey HTTP/${h}@HADOOP.COM   "  kadmin.local -q "xst -norandkey -k /root/keytabs/hdfs.keytab.${h} hdfs/${h}@HADOOP.COM HTTP/${h}@HADOOP.COM"  kadmin.local -q "xst -norandkey -k /root/keytabs/mapred.keytab.${h} mapred/${h}@HADOOP.COM HTTP/${h}@HADOOP.COM"  kadmin.local -q "xst -norandkey -k /root/keytabs/yarn.keytab.${h} yarn/${h}@HADOOP.COM HTTP/${h}@HADOOP.COM "  ssh ${h}  "rm -rf /etc/hadoop/conf/*.keytab"  scp /root/keytabs/hdfs.keytab.${h}   ${h}:/etc/hadoop/conf/hdfs.keytab  scp /root/keytabs/mapred.keytab.${h} ${h}:/etc/hadoop/conf/mapred.keytab  scp /root/keytabs/yarn.keytab.${h}   ${h}:/etc/hadoop/conf/yarn.keytab  ssh ${h} "sudo chown hdfs:hadoop /etc/hadoop/conf/hdfs.keytab"  ssh ${h} "sudo chown mapred:hadoop /etc/hadoop/conf/mapred.keytab"  ssh ${h} "sudo chown yarn:hadoop /etc/hadoop/conf/yarn.keytab"  ssh ${h} "sudo chmod 400 /etc/hadoop/conf/*.keytab"done

在hadoop节点上验证生成的keytab

klist -k -t -e yarn.keytab

-norandkey

因为要重复使用HTTP/hostname@HADOOP.COM ,所以在生成keytab的时候,需要添加 -norandkey 参数,以防止kvno 密钥版本号 (Key Version Number, KVNO) 不断升级,导致验证不通过。 -norandkey 只可以通过 kadmin.local添加,不能远程使用。

Do not randomize the keys. The keys and their version numbers stay unchanged. This option is only available in kadmin.local, and cannot be specified in combination with the -e option.

HDFS 使用kerberos

关闭所有HADOOP服务。

修改完配置文件后,重启hdfs服务。

core-site.xml

  <property>    <name>hadoop.security.authentication</name>    <value>kerberos</value>  </property>  <property>    <name>hadoop.security.authorization</name>    <value>true</value>  </property>  <property>    <name>hadoop.proxyuser.yarn.hosts</name>    <value>*</value>  </property>  <property>    <name>hadoop.proxyuser.yarn.groups</name>    <value>*</value>  </property>

hdfs-site.xml

  <!-- General HDFS security config -->  <property>    <name>dfs.block.access.token.enable</name>    <value>true</value>  </property>  <!-- NameNode security config -->  <property>    <name>dfs.namenode.keytab.file</name>    <value>/etc/hadoop/conf/hdfs.keytab</value>  </property>  <property>    <name>dfs.namenode.kerberos.principal</name>    <value>hdfs/_HOST@HADOOP.COM</value>  </property>  <property>    <name>dfs.namenode.kerberos.internal.spnego.principal</name>    <value>HTTP/_HOST@HADOOP.COM</value>  </property>  <!-- Secondary NameNode security config -->  <property>    <name>dfs.secondary.namenode.keytab.file</name>    <value>/etc/hadoop/conf/hdfs.keytab</value>  </property>  <property>    <name>dfs.secondary.namenode.kerberos.principal</name>    <value>hdfs/_HOST@HADOOP.COM</value>  </property>  <property>    <name>dfs.secondary.namenode.kerberos.internal.spnego.principal</name>    <value>HTTP/_HOST@HADOOP.COM</value>  </property>  <!-- DataNode security config -->  <property>    <name>dfs.datanode.data.dir.perm</name>    <value>700</value>   </property>  <property>    <name>dfs.datanode.address</name>    <value>0.0.0.0:1004</value>  </property>  <property>    <name>dfs.datanode.http.address</name>    <value>0.0.0.0:1006</value>  </property>  <property>    <name>dfs.datanode.keytab.file</name>    <value>/etc/hadoop/conf/hdfs.keytab</value>  </property>  <property>    <name>dfs.datanode.kerberos.principal</name>    <value>hdfs/_HOST@HADOOP.COM</value>  </property>  <!-- Web Authentication config -->  <property>    <name>dfs.web.authentication.kerberos.principal</name>    <value>HTTP/_HOST@HADOOP.COM</value>  </property>  <property>    <name>dfs.journalnode.keytab.file</name>    <value>/etc/hadoop/conf/hdfs.keytab</value>  </property>  <property>    <name>dfs.journalnode.kerberos.principal</name>    <value>hdfs/_HOST@HADOOP.COM</value>  </property>  <property>    <name>dfs.journalnode.kerberos.internal.spnego.principal</name>    <value>HTTP/_HOST@HADOOP.COM</value>  </property>

注意,datanode的服务端口修改为1004和1006了。

/etc/default/hadoop-hdfs-namenode
/etc/default/hadoop-hdfs-journalnode
/etc/default/hadoop-hdfs-zkfc
/etc/default/hadoop-hdfs-datanode

export HADOOP_PID_DIR=/var/run/hadoop-hdfsexport HADOOP_LOG_DIR=/data/log/hadoop-hdfsexport HADOOP_NAMENODE_USER=hdfsexport HADOOP_SECONDARYNAMENODE_USER=hdfsexport HADOOP_DATANODE_USER=hdfsexport HADOOP_IDENT_STRING=hdfsexport HADOOP_PRIVILEGED_NFS_USER=hdfsexport HADOOP_PRIVILEGED_NFS_PID_DIR=/var/run/hadoop-hdfsexport HADOOP_PRIVILEGED_NFS_LOG_DIR=/data/log/hadoop-hdfsexport HADOOP_SECURE_DN_USER=hdfsexport HADOOP_SECURE_DN_PID_DIR=/var/run/hadoop-hdfsexport HADOOP_SECURE_DN_LOG_DIR=/data/log/hadoop-hdfs

YARN 使用kerberos

yarn-site.xml

  <!-- ResourceManager security configs -->  <property>    <name>yarn.resourcemanager.keytab</name>    <value>/etc/hadoop/conf/yarn.keytab</value> <!-- path to the YARN keytab -->  </property>  <property>    <name>yarn.resourcemanager.principal</name>    <value>yarn/_HOST@HADOOP.COM</value>  </property>  <!-- NodeManager security configs -->  <property>    <name>yarn.nodemanager.keytab</name>    <value>/etc/hadoop/conf/yarn.keytab</value> <!-- path to the YARN keytab -->  </property>  <property>    <name>yarn.nodemanager.principal</name>    <value>yarn/_HOST@HADOOP.COM</value>  </property>  <property>    <name>yarn.nodemanager.container-executor.class</name>    <value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value>  </property>  <property>    <name>yarn.nodemanager.linux-container-executor.group</name>    <value>yarn</value>  </property>  <property>    <name>yarn.resourcemanager.proxy-user-privileges.enabled</name>    <value>true</value>  </property>

container-executor.cfg

yarn.nodemanager.local-dirs=/data/yarn/localyarn.nodemanager.log-dirs=/data/yarn/logsyarn.nodemanager.linux-container-executor.group=yarnbanned.users=hdfs,yarn,mapred,binmin.user.id=300allowed.system.users=hadoop,hive

这个配置文件是LinuxContainerExecutor执行时的配置文件,修改为和yarn-site.xml中一样即可。min.user.id 这个参数需要注意,一般redhat的应用用户都是大于500的,但是不知道为什么,我们的应用用户(yarn) 的id小于500,导致报错。所以这个参数我暂时设置为300

/etc/default/hadoop-yarn-resourcemanager

export YARN_IDENT_STRING=yarnexport YARN_PID_DIR=/var/run/hadoop-yarnexport YARN_LOG_DIR=/data/log/hadoop-yarnexport YARN_CONF_DIR=/etc/hadoop/conf

因为我这里配置HTTPS服务出现了一些问题,所以屏蔽了YARN和MR JobHistory的HTTPS服务

yarn-site.xml

  <!-- To enable TLS/SSL -->  <property>    <name>yarn.http.policy</name>    <value>HTTPS_ONLY</value>  </property>

mapred-site.xml

  <property>    <name>mapreduce.jobhistory.http.policy</name>    <value>HTTPS_ONLY</value>  </property>

HIVE 使用kerberos

kadmin.local -q "addprinc -randkey hive/namenode@HADOOP.COM"kadmin.local -q "xst -norandkey -k /root/keytabs/hive.keytab hive/namenode@HADOOP.COM"scp hive.keytab namenode:/etc/hive/conf/chown hive:hive /etc/hive/conf/hive.keytabchmod 400 /etc/hive/conf/hive.keytab

core-site.xml

  <property>    <name>hadoop.proxyuser.hive.hosts</name>    <value>*</value>  </property>  <property>    <name>hadoop.proxyuser.hive.groups</name>    <value>*</value>  </property>

hive-site.xml

  <property>    <name>hive.metastore.sasl.enabled</name>    <value>true</value>  </property>  <property>    <name>hive.metastore.kerberos.keytab.file</name>    <value>/etc/hive/conf/hive.keytab</value>  </property>  <property>    <name>hive.metastore.kerberos.principal</name>    <value>hive/namenode@HADOOP.COM</value>  </property>  <property>    <name>hive.server2.authentication</name>    <value>KERBEROS</value>  </property>  <property>    <name>hive.server2.authentication.kerberos.principal</name>    <value>hive/namenode@HADOOP.COM</value>  </property>  <property>    <name>hive.server2.authentication.kerberos.keytab</name>    <value>/etc/hive/conf/hive.keytab</value>  </property>  <property>    <name>hive.server2.enable.impersonation</name>    <value>true</value>  </property>

Beeline和使用JDBC方式连接hive server2

注意:spark和hive都有beeline,在使用的时候要看清楚是哪一个。

/usr/lib/hive/bin/beeline -u ‘jdbc:hive2://10.20.9.35:10000/default;principal=hive/namenode@HADOOP.COM’ -n hadoop -p hadoop
!connect jdbc:hive2://10.20.9.35:10000/default;principal=hive/namenode@HADOOP.COM hadoop hadoop

Sentry 服务安装和配置

安装Sentry Server

yum install sentry* -y

以下三个组件会进行安装

sentry:sentry的基本包
sentry-hdfs-plugin:hdfs插件
sentry-store:sentry store组件

kerberos 生成

kadmin.local -q "addprinc -randkey sentry/namenode@HADOOP.COM   "kadmin.local -q "xst -norandkey -k /root/keytabs/sentry.keytab sentry/namenode@HADOOP.COM"scp /root/keytabs/sentry.keytab   namenode:/etc/sentry/conf/sentry.keytabssh namenode "sudo chown sentry:sentry /etc/sentry/conf/sentry.keytab"

创建mysql数据库 sentry,并拷贝mysql jar到sentry目录下

ln -s /usr/lib/hive/lib/mysql-connector-java-5.1.37.jar /usr/lib/sentry/lib/mysql-connector-java-5.1.37.jar

配置sentry-site.xml

初始化sentry store schema

sentry –command schema-tool –conffile /etc/sentry/conf/sentry-site.xml –dbType mysql –initSchema

如果是升级:

sentry –command schema-tool –conffile /etc/sentry/conf/sentry-site.xml –dbType mysql –upgradeSchema

启动服务

service sentry-store start

监控页面
http://namenode:51000/

sentry Git

Sentry 和Hive集成

配置hive-site.xml

配置sentry-site.xml

重启hive-server2

sentry权限配置

  • 以hive用户登录并认证kerberos

参考文档

hdfs-sentry Plugin

hdfs-site.xml

  <property>    <name>dfs.namenode.acls.enabled</name>    <value>true</value>  </property>  <property>    <name>dfs.namenode.authorization.provider.class</name>    <value>org.apache.sentry.hdfs.SentryAuthorizationProvider</value>  </property>  <property>    <name>dfs.permissions</name>    <value>true</value>  </property>  <!-- Comma-separated list of HDFS path prefixes where Sentry permissions should be enforced. -->  <!-- Privilege synchronization will occur only for tables located in HDFS regions specified here. -->  <property>    <name>sentry.authorization-provider.hdfs-path-prefixes</name>    <value>/user/hive/warehouse</value>  </property>  <property>    <name>sentry.hdfs.service.security.mode</name>    <value>kerberos</value>  </property>  <property>    <name>sentry.hdfs.service.server.principal</name>    <value>sentry/_HOST@HADOOP.COM</value>  </property>  <property>    <name>sentry.hdfs.service.client.server.rpc-port</name>    <value>8038</value>  </property>  <property>    <name>sentry.hdfs.service.client.server.rpc-address</name>    <value>namenode</value>  </property>
sudo -u hdfs kinit -kt /etc/hadoop/conf/hdfs.keytab hdfs/namenode@HADOOP.COMsudo -u hdfs hdfs dfs -chmod -R 771 /user/hive/warehousesudo -u hdfs hdfs dfs -chown -R hive:hive /user/hive/warehouse

hive-site.xml

  <property>    <name>hive.security.authorization.task.factory</name>    <value>org.apache.sentry.binding.hive.SentryHiveAuthorizationTaskFactoryImpl</value>  </property>  <property>    <name>hive.server2.session.hook</name>    <value>org.apache.sentry.binding.hive.HiveAuthzBindingSessionHook</value>  </property>  <property>    <name>hive.sentry.conf.url</name>    <value>file:///etc/sentry/conf/sentry-site.xml</value>  </property>  <property>    <name>hive.metastore.filter.hook</name>    <value>org.apache.sentry.binding.metastore.SentryMetaStoreFilterHook</value>  </property>  <property>    <name>hive.metastore.pre.event.listeners</name>    <value>org.apache.sentry.binding.metastore.MetastoreAuthzBinding</value>    <description>list of comma separated listeners for metastore events.</description>  </property>  <property>    <name>hive.metastore.event.listeners</name>    <value>org.apache.sentry.binding.metastore.SentryMetastorePostEventListener</value>    <description>list of comma separated listeners for metastore, post events.</description>  </property>  <property>    <name>sentry.metastore.plugins</name>    <value>org.apache.sentry.hdfs.MetastorePlugin</value>  </property>  <property>    <name>sentry.hdfs.service.client.server.rpc-port</name>    <value>8038</value>  </property>  <property>    <name>sentry.hdfs.service.client.server.rpc-address</name>    <value>namenode</value>  </property>  <property>    <name>sentry.hdfs.service.client.server.rpc-connection-timeout</name>    <value>200000</value>  </property>  <property>    <name>sentry.hdfs.service.security.mode</name>    <value>kerberos</value>  </property>  <property>    <name>sentry.hdfs.service.server.principal</name>    <value>sentry/_HOST@HADOOP.COM</value>  </property>

sentry-site.xml

  <property>    <name>sentry.service.processor.factories</name>    <value>org.apache.sentry.provider.db.service.thrift.SentryPolicyStoreProcessorFactory,    org.apache.sentry.hdfs.SentryHDFSServiceProcessorFactory</value>  </property>  <property>    <name>sentry.policy.store.plugins</name>    <value>org.apache.sentry.hdfs.SentryPlugin</value>  </property>  <!-- Enable the Sentry web server -->  <property>    <name>sentry.service.web.enable</name>    <value>true</value>  </property>  <!-- Set Kerberos authentication properties -->  <property>    <name>sentry.service.web.authentication.type</name>    <value>KERBEROS</value>  </property>  <property>    <name>sentry.service.web.authentication.kerberos.principal</name>    <value>HTTP/_HOST@HADOOP.COM</value>  </property>  <property>    <name>sentry.service.web.authentication.kerberos.keytab</name>    <value>/etc/sentry/conf/sentry.keytab</value>  </property>  <!-- Define comma-separated list of users allowed to connect to the web server-->  <property>    <name>sentry.service.web.authentication.allow.connect.users</name>    <value>hadoop,hive,hue,dsp</value>  </property>

Sentry 权限分配

revoke all on table fact.ui_index_module_marking_hold_fund from role fact_role;revoke all on database fact from role fact_role;grant select on table fact.ui_index_module_marking_hold_fund to role fact_role;show grant role fact_role;

HUE 使用kerberos

kadmin.local -q "addprinc -randkey hue/namenode@HADOOP.COM"kadmin.local -q "xst -norandkey -k /root/keytabs/hue.keytab hue/namenode@HADOOP.COM"scp hue.keytab namenode:/etc/hue/conf/ssh root@namenode 'chown hue:hue /etc/hue/conf/hue.keytab'ssh root@namenode 'chmod 400 /etc/hue/conf/hue.keytab'

core-site.xml

  <property>    <name>hue.kerberos.principal.shortname</name>    <value>hue</value>  </property>  <property>    <name>hadoop.proxyuser.hue.groups</name>    <value>*</value>  </property>  <property>    <name>hadoop.proxyuser.hue.hosts</name>    <value>hue.server.fully.qualified.domain.name</value>  </property>

hue.ini

  [[kerberos]]    # Path to Hue's Kerberos keytab file    hue_keytab=/etc/hue/conf/hue.keytab    # Kerberos principal name for Hue    hue_principal=hue/datanode1@HADOOP.COM    # Path to kinit    kinit_path=/usr/bin/kinit
  • 由于历史原因,我们的hue需要使用 /etc/init.d/hue restart来重启,不能使用service管理
  • 在hue节点上需要同步好hive的配置文件

问题整理

错误排查:

DataNode 启动报错处理

错误日志:

17/08/16 17:36:58 FATAL datanode.DataNode: Exception in secureMain
java.lang.RuntimeException: Cannot start secure DataNode without configuring either privileged resources or SASL RPC data transfer protection and SSL for HTTP. Using privileged resources in combination with SASL RPC data transfer protection is not supported.
at org.apache.hadoop.hdfs.server.datanode.DataNode.checkSecureConfig(DataNode.java:1210)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1113)
at org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:453)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2402)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2289)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2336)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2513)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2537)
17/08/16 17:36:58 INFO util.ExitUtil: Exiting with status 1
17/08/16 17:36:58 INFO datanode.DataNode: SHUTDOWN_MSG:

解决办法:

检查参数
dfs.datanode.address 1004
dfs.datanode.http.address 1006

修改默认配置文件/etc/default/hadoop-hdfs-namenode 和/etc/default/hadoop-hdfs-datanode配置:

export HADOOP_SECURE_DN_USER=hdfs
export HADOOP_SECURE_DN_PID_DIR=/var/run/hadoop-hdfs
export HADOOP_SECURE_DN_LOG_DIR=/var/log/hadoop-hdfs


错误日志:

2017-08-16 19:04:51,644 INFO org.apache.hadoop.http.HttpServer2: HttpServer.start() threw a non Bind IOException
java.io.FileNotFoundException: /var/lib/hadoop-yarn/.keystore (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.(FileInputStream.java:146)
at org.mortbay.resource.FileResource.getInputStream(FileResource.java:275)
at org.mortbay.jetty.security.SslSocketConnector.createFactory(SslSocketConnector.java:242)
at org.mortbay.jetty.security.SslSocketConnector.newServerSocket(SslSocketConnector.java:476)
at org.apache.hadoop.security.ssl.SslSocketConnectorSecure.newServerSocket(SslSocketConnectorSecure.java:46)
at org.mortbay.jetty.bio.SocketConnector.open(SocketConnector.java:73)
at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:939)
at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:880)
at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:273)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startWepApp(ResourceManager.java:985)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1085)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1217)

解决办法:
yarn和history server 都不使用https协议

/etc/hadoop 目录和 /etc/hadoop/conf.empty/container-executor.cfg 所有者都需要是root


错误日志:

2017-08-17 09:30:22,142 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_e73_1502885758813_0006_02_000002 startLocalizer is : 255
ExitCodeException exitCode=255: Failure to exec app initialization process - Permission denied

    at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)    at org.apache.hadoop.util.Shell.run(Shell.java:460)    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)    at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.startLocalizer(LinuxContainerExecutor.java:253)    at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1132)

2017-08-17 09:30:22,143 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : command provided 0
2017-08-17 09:30:22,143 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : run as user is deployop
2017-08-17 09:30:22,143 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor: main : requested yarn user is deployop
2017-08-17 09:30:22,143 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Localizer failed


问题:
在查看container日志的时候报错
Error getting logs at datanode2:34662

解决办法:

这个需要到nodemanager下看报错内容,我这里是因为AppLogAggregatorImpl 进程没有日志收集目录的读写权限.Permission denied: user=yarn, access=EXECUTE, inode="/tmp/logs/deployop":deployop:supergroup:drwxrwx---删除原先的日志收集目录,由yarn自己重新创建。

问题:
container启动报错,Permission denied

解决办法:

  • 调试办法:在NodeManager LogLevel页面开启org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor 类的log级别为DEBUG可以得到查询命令数组
  • container-executor.cfg 中min.user.id=300 参数即可。
  • yarn-site.xml 开启yarn.resourcemanager.proxy-user-privileges.enabled 参数
  • 设置NodeManager的日志级别为DEBUG级别,这个时候会输出执行的详细命令,然后去除container-executor开始后的一堆参数,直接以指定用户执行后面的java代码,即可以进行程序调试。

问题:
nodemanager 日志位置不对

解决办法:
IDENT,LOG_DIR,PID_DIR,CONF_DIR 的信息修改在/etc/default/hadoop-yarn-nodemanager 文件中

其他

操作系统查看及redhat7 服务管理命令

yum install redhat-lsb-core
lsb_release -a

启动一个服务:systemctl start postfix.service
关闭一个服务:systemctl stop postfix.service
重启一个服务:systemctl restart postfix.service
显示一个服务的状态:systemctl status postfix.service
在开机时启用一个服务:systemctl enable postfix.service
在开机时禁用一个服务:systemctl disable postfix.service
查看服务是否开机启动:systemctl is-enabled postfix.service
查看已启动的服务列表:systemctl list-unit-files|grep enabled
查看启动失败的服务列表:systemctl –failed

参考资料

数据库密码和管理员密码均为公司机器密码

参考文档

[JAVA 安装]http://www.oracle.com/technetwork/java/javase/downloads/index.html

kerberos 官方文档

相关术语介绍