Hadoop 2.6 集群方式搭建

来源:互联网 发布:淘宝消保保证金在哪里 编辑:程序博客网 时间:2024/05/29 11:20
Ubuntu 14.04下安装 JDK8过程笔记,希望对大家有帮助。
在oracle 官网下载最新稳定的版本,我的是jdk8


tar zvxf jdk-8u20-linux-x64.tar.gz
sudo mkdir -p /usr/local/jdk/
sudo mv jdk1.8.0_40 /usr/local/jdk/




修改/etc/profile
export JAVA_HOME=/usr/local/jdk/jdk1.8.0_40/
export JRE_HOME=/usr/local/jdk/jdk1.8.0_40/jre


export HADOOP_HOME=/home/work/hadoop/default
export SCALA_HOME=/home/work/scala/default
export SPARK_HOME=/home/work/spark/default


export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin


source /etc/profile




验证jdk是否安装成功
work@mjkj:~$ java -version
java version "1.8.0_40"
Java(TM) SE Runtime Environment (build 1.8.0_40-b26)
Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)


=====================
hadoop 安装
1.先保证机器上安装ssh和rsync,否则需要先安装ssh和rsync,以保证后面的安装
  $ sudo apt-get install ssh 
  $ sudo apt-get install rsync
2.根据下面的结构修改集群的机器的hosts和hostname
  sudo vim /etc/hosts
  sudo vim /etc/hostname
  |-------------------------------------------------
  |虚拟机系统       |机器名称  |IP地址         |
  |Ubuntu 12.04.3 LTS  |hadoop-mater   |172.16.101.24  |
  |Ubuntu 12.04.3 LTS  |hadoop-slave1  |172.16.101.22  |
  |Ubuntu 12.04.3 LTS  |hadoop-slave2  |172.16.101.15  |
3.生成公钥让三台机器之间免密码SSH等录
  在三台机器上执行 :ssh-keygen -t rsa,然后回车,将生成后的id_rsa.pub
  的内容追加到另外两台机器~/.ssh/authorized_keys中
  
4.wget http://apache.fayea.com/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
 下载最新的hadoop 2.6


5.解压tar zxvf hadoop-2.6.0.tar.gz,同时为了方便,我们建立一个软链指向解压目录
  ln -s hadoop-2.6.0 hadoop




6.修改hadoop的配置,让他们按照集群方式运行。
  要配置Hadoop集群,你需要设置Hadoop守护进程的运行环境和Hadoop守护进程的运行参数。
  Hadoop守护进程指NameNode/DataNode 和JobTracker/TaskTracker。
  
  hadoop-env.sh :
    cd hadoop/ ; vim etc/hadoop/hadoop-env.sh
  
    export JAVA_HOME=/usr/local/jdk/jdk1.8.0_40/
    export HADOOP_PREFIX=/home/work/hadoop/
sudo mkdir -p /home/work/hadoop/pid_file/
export HADOOP_PID_DIR=/home/work/hadoop/pid_file/
  core-site.xml :
sudo mkdir -p /data/hadoop
chown work:work -R data
vim core-site.xml
===============================================================================
<configuration>
   <property>
           <name>hadoop.tmp.dir</name>
           <value>/data/hadoop/</value>
           <description>A base for other temporary directories.</description>
       </property>
       <!-- file system properties -->
       <property>
           <name>fs.default.name</name>
           <value>hdfs://hadoop2-master:9000</value>
       </property>
       <property>
           <name>hadoop.security.authorization</name>
           <value>false</value>
           <description>
               Enable authorization for different protocols.
           </description>
       </property>
   
       <!-- add like old config by ricky -->
       <property>
       <name>mapred.child.java.opts</name>
       <value>-Xmx2048m</value>
       </property>
   
       <property>
       <name>hadoop.security.authentication </name>
       <value>simple</value>
       </property>
   
       <property>
           <name>fs.trash.interval</name>
           <value>3000</value>
           <description>Number of minutes after which the checkpoint gets deleted.
                 If zero, the trash feature is disabled.
           </description>
        </property>
    </configuration>
    ==================================================================================
 hdfs-site.xml :
==================================================================================
<configuration>
<property>
<name>dfs.name.dir</name>
<value>/data/hadoop/dfs/name</value>
<final>true</final>
</property>


<property>
<name>dfs.data.dir</name>
<value>/data/hadoop/dfs/data</value>
<final>true</final>
</property>


<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<!-- 
<property>
<name>dfs.datanode.failed.volumes.tolerated</name>
<value>1</value>
</property>
-->
<property>
<name>dfs.datanode.max.xcievers</name>
<value>65536</value>
</property>


<property>
<name>dfs.datanode.handler.count</name>
<value>10</value>
</property>


<property>
<name>dfs.datanode.socket.write.timeout</name>
<value>0</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>


<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>


<property>
<name>dfs.http.address</name>
<value>hadoop2-master:50070</value>
</property>


<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop2-slave1:50090</value>
</property>
</configuration>
    ==================================================================================
 mapred-site.xml :
==================================================================================
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.resource.mb</name>
<value>768</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>512</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>640</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx384m</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx512m</value>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx200m</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop2-master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop2-master:19888</value>
</property>


<!-- add from old config -->
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>8</value>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>8</value>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>16</value>
</property>
<property>
<name>tasktracker.http.threads</name>
<value>64</value>
</property>
<property>
<name>mapred.job.shuffle.input.buffer.percent</name>
<value>0.7</value>
</property>
<property>
<name>mapred.job.shuffle.merge.percent</name>
<value>0.7</value>
</property>
<property>
<name>io.sort.mb</name>
<value>64</value>
</property>
<property>
<name>io.sort.factor</name>
<value>16</value>
</property>
<property>
<name>mapred.task.timeout</name>
<value>1200000</value>
</property>
</configuration>
==================================================================================
 yarn-site.xml :
==================================================================================
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>


<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>


<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>2</value>
</property>


<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop2-master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop2-master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop2-master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hadoop2-master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop2-master:50030</value>
</property>


<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>


<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>86400</value>
</property>


<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>


<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
</configuration>
==================================================================================

7.系统试用
1)启动并格式化NameNode :bin/hdfs namenode -format
2)启动HDFS :sbin/start-dfs.sh
3)停止HDFS :sbin/stop-dfs.sh
4)启动yarn :sbin/start-yarn.sh 
    5)停止yarn : sbin/stop-yarn.sh
6)查看hdfs集群状态:bin/hdfs dfsadmin -report
 查看hdfs http://172.16.101.24:50070
      查看RM(Resource Manager): http://172.16.101.24:50030




=====================
spark  安装
wget http://mirrors.cnnic.cn/apache/spark/spark-1.3.0/spark-1.3.0.tgz

0 0
原创粉丝点击