Hadoop

来源:互联网 发布:图形界面软件 编辑:程序博客网 时间:2024/06/03 20:08

Configuration Files

Hadoop configuration is driven by two types of important configuration files:

  • Read-only default configuration -core-default.xml, hdfs-default.xml, yarn-default.xml and mapred-default.xml.
  • Site-specific configuration - conf/core-site.xml, conf/hdfs-site.xml,conf/yarn-site.xmlandconf/mapred-site.xml.


Additionally, you can control the Hadoop scripts found in the bin/ directoryof the distribution, by setting site-specific values via theconf/hadoop-env.sh and yarn-env.sh.


Configuring Environment of Hadoop Daemons


Administrators should use the conf/hadoop-env.sh and conf/yarn-env.shscript to do site-specific customization of the Hadoop daemons' process environment.
At the very least you should specify the JAVA_HOME so that it is correctly defined on each remote node.


Configuring the Hadoop Daemons in Non-Secure Mode


  • conf/core-site.xml




<configuration><property><name>fs.defaultFS</name><value>hdfs://master:8020</value></property><property><name>hadoop.tmp.dir</name><value>file:///home/aboutyun/hadoop/tmp</value><description>Abase for other temporary directories.</description></property><property><name>hadoop.proxyuser.aboutyun.hosts</name><value>*</value><description>abouyun 用户可以代理任意机器上的用户 </description></property><property><name>hadoop.proxyuser.aboutyun.groups</name><value>*</value><description>abouyun 用户代理任何组下的用户 </description></property><property><name>io.file.buffer.size</name><value>131072</value></property></configuration>


hadoop的用户代理机制:


以用户peerslee使用代理用户aboutyun提交作业为例,当用户peerslee提交作业时,aboutyun会接管该作业,负责作业资源的申请及监管。
但其中若遇到读取HDFS文件时,要判断是否有使用该文件的权限,此时使用的用户是peerslee,作业运行完后,作业列表中显示该作业的用户也是peerslee。当然,除此之外,剩下的工作都由aboutyun负责,以体现“代理”的作用。


注意:需要创建 tmp目录

  • conf/hdfs-site.xml





<configuration><property><name>dfs.namenode.secondary.http-address</name><value>master:9001</value></property><property><name>dfs.namenode.name.dir</name><value>file:///home/aboutyun/hadoop/namenode</value></property><property><name>dfs.datanode.data.dir</name><value>file:///home/aboutyun/hadoop/datanode</value></property><property><name>dfs.replication</name><value>3</value></property><property><name>dfs.webhdfs.enabled</name><value>true</value></property></configuration>

注意:在本地创建 namenode,datanode 目录

HDFS web:ip:50070

  • conf/yarn-site.xml




<configuration><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property><property><name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name><value>org.apache.hadoop.mapred.ShuffleHandler</value></property><property><name>yarn.resourcemanager.address</name><value>master:8032</value></property><property><name>yarn.resourcemanager.scheduler.address</name><value>master:8030</value></property><property><name>yarn.resourcemanager.resource-tracker.address</name><value>master:8031</value></property><property><name>yarn.resourcemanager.admin.address</name><value>master:8033</value></property><property><name>yarn.resourcemanager.webapp.address</name><value>master:8088</value></property></configuration>

  • conf/mapred-site.xml




<configuration><property><name>mapreduce.framework.name</name><value>yarn</value></property><property><name>mapreduce.jobhistory.address</name><value>master:10020</value></property><property><name>mapreduce.jobhistory.webapp.address</name><value>master:19888</value></property></configuration>

参考文档: 


Hadoop配置文件参数详解

Hadoop Cluster Setup

原创粉丝点击