hadoop2.6安装篇(hadoop平台搭建)

来源：互联网发布：人工智能语音电话编辑：程序博客网时间：2024/05/22 16:41

上篇已经介绍完CentOS7的安装。现在来说说hadoop的安装.

上 hadoop.apache.org 下载2.6的版本。

安装jdk, ssh, rsync, wget

>yum -y install java-1.8.0-openjdk-devel

>yum -y install openssh

>yum -y install rsync

>yum -y install wget

>wget http://apache.fayea.com/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz

>tar xzvf hadoop-2.6.0.tar.gz

>mv hadoop-2.6.0 /usr/local/

设置ssh免密码登录

Now check that you can ssh to the localhost without a passphrase:

$ ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands:

$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

CentOS 7.0默认使用的是firewall作为防火墙，这里改为iptables防火墙。
firewall：
systemctl start firewalld.service#启动firewall
systemctl stop firewalld.service#停止firewall
systemctl disable firewalld.service#禁止firewall开机启动

设置DNS

>vi /etc/hosts

    192.168.1.147 Master.Hadoop

    192.168.1.148 Slave1.Hadoop

    192.168.1.149 Slave2.Hadoop

    192.168.1.150 Slave3.Hadoop

>mv /usr/local/hadoop-2.6.0 /usr/local/hadoop

>chown -R hadoop:hadoop hadoop

>cd /usr/local/hadoop

>vi etc/hadoop/hadoop-env.sh

# set to the root of your Java installation export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.45-30.b13.el7_1.x86_64/jre # Assuming your installation directory is /usr/local/hadoop export HADOOP_PREFIX=/usr/local/hadoop

>bin/hadoop

test OK

配置cluster

etc/hadoop/core-site.xml:

[root@systdt hadoop]# cat core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/usr/hadoop/tmp</value>
                <description>Abase for other temporary directories.</description>
        </property>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://Master.Hadoop:9000</value>
        </property>
        <property>
                <name>io.file.buffer.size</name>
                <value>4096</value>
        </property>
</configuration>

etc/hadoop/hdfs-site.xml:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>file:///usr/hadoop/dfs/name</value>
        </property>
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>file:///usr/hadoop/dfs/data</value>
        </property>
        <property>
                <name>dfs.replication</name>
                <value>2</value>
        </property>

    <property>
        <name>dfs.nameservices</name>
        <value>hadoop-cluster1</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>Master.Hadoop:50090</value>
    </property>
    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property>
</configuration>

If set dfs.namenode.datanode.registration.ip-hostname-check=false, the name node will not do the check, which could be useful if your cluster is inside AWS VPC and you do not have proper reverse DNS.

Another usage of this feature is to decommission a data node by listing it in the hosts.deny file

etc/hadoop/mapred-site.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
                <final>true</final>
        </property>

    <property>
        <name>mapreduce.jobtracker.http.address</name>
        <value>Master.Hadoop:50030</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>Master.Hadoop:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>Master.Hadoop:19888</value>
    </property>
        <property>
                <name>mapred.job.tracker</name>
                <value>http://Master.Hadoop:9001</value>
        </property>
</configuration>

etc/hadoop/yarn-site.xml:

<?xml version="1.0"?>
<configuration>

        
        <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>Master.Hadoop</value>
        </property>

    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>Master.Hadoop:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>Master.Hadoop:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>Master.Hadoop:8031</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>Master.Hadoop:8033</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>Master.Hadoop:8088</value>
    </property>
</configuration>

>vi /usr/local/hadoop/etc/hadoop/slaves

Slave1.Hadoop

Slave2.Hadoop

>shutdown now

把这个VM copy3份，

分别是 Master.hadoop Slave1.hadoop Slave2.hadoop

通过nmtui 进行设置修改IP

然后进入Master.hadoop

安装hadoop Namenode

>/usr/local/hadoop/bin/hdfs namenode -format

>sbin/hadoop-daemon.sh --config etc/hadoop --script hdfs start namenode

>sbin/yarn-daemon.sh --config etc/hadoop start resourcemanager

>sbin/yarn-daemon.sh start proxyserver --config etc/hadoop

>sbin/mr-jobhistory-daemon.sh start historyserver --config etc/hadoop/

http://master.hadoop:50070

http://master.hadoop:8088

安装datanode

登陆datanode

>sbin/hadoop-daemon.sh --config etc/hadoop --script hdfs start datanode

>sbin/yarn-daemon.sh --config etc/hadoop start nodemanager

0 0