hadoop2.7.1 的集群搭建

来源:互联网 发布:stussy淘宝哪家正 编辑:程序博客网 时间:2024/06/03 23:05

Hadoop 2.7.1 的集群搭建


================
环境和相关软件
================
一个笔记本,启动两个ubuntu的虚拟机


虚拟机:VMware Workstation 12 Pro
操作系统版本:Ubuntu 12 en x64
两个系统 master 10.11.12.45 用户feng
slave  10.11.12.47 用户feng
hadoop:hadoop-2.7.1.tar.gz
JDK:java version "1.7.0_05"


启动第一个虚拟机:实现以下操作


================
一.安装 JDK 1.7
================
1.在 /opt下解压 jdk-7u5-linux-x64.tar.gz
cd /opt
tar -zvxf jdk-7u5-linux-x64.tar.gz
授权给当前用户feng
sudo chown -R feng:root hadoop-2.7.1/


2.修改 /etc/profile
sudo vi /etc/profile


在最尾添加:
export JAVA_HOME=/opt/jdk1.7.0_05
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar


3.source /etc/profile 使配置文件生效
验证:java -version
查看:
java version "1.7.0_05"
Java(TM) SE Runtime Environment (build 1.7.0_05-b06)
Java HotSpot(TM) 64-Bit Server VM (build 23.1-b03, mixed mode)


================
二.解压 hadoop-2.7.1.tar.gz 并配置环境变量
================
1.在/opt下解压hadoop-2.7.1.tar.gz
cd /opt
sudo tar -zvxf hadoop-2.7.1.tar.gz
sudo chown -R feng:root hadoop-2.7.1/


2.修改 /etc/profile
sudo vi /etc/profile
在最下面添加:
export HADOOP_PREFIX=/opt/hadoop-2.7.1
export PATH=$HADOOP_PREFIX/bin:$PATH


3.source /etc/profile 使配置文件生效


4.验证:hadoop version
查看:
Hadoop 2.7.1
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 15ecc87ccf4a0228f35af08fc56de536e6ce657a
Compiled by jenkins on 2015-06-29T06:04Z
Compiled with protoc 2.5.0
From source with checksum fc0a1a23fc1868e4d5ee7fa2b28a58a
This command was run using /opt/hadoop-2.7.1/share/hadoop/common/hadoop-common-2.7.1.jar


================
三.修改 ulimit open file 和 nproc
================


在文件 /etc/security/limits.conf 添加三行,如:


--------------------------
feng  -       nofile  32768
feng soft nproc 32000
feng hard nproc 32000
--------------------------
说明 feng 为当前用户


在 /etc/pam.d/common-session 加上这一行:


session required  pam_limits.so
否则在 /etc/security/limits.conf上的配置不会生效.


还有注销再登录,这些配置才能生效!




================
四.复制虚拟机
================
1.关闭虚拟机系统,复制虚拟机文件。
目的弄出两个虚拟机1个是master另1个为slave


================
五.修改 hostname 和 /etc/hosts
================
1.修改/etc/hostname


sudo vi /etc/hostname


ubuntu 改为 master 或 slave
修改完成后用命令hostname查看


2.分别修改master和slave
主机master的hostname为master
分机slave的hostname为slave


3.分别修改/etc/hosts
sudo vi /etc/hosts


master和slave虚拟机都改成这样
127.0.0.1 localhost
# 127.0.1.1 ubuntu
10.11.12.45      master
10.11.12.47      slave


验证:用ping master 和 ping slave


================
六.设置 ssh 免密码登陆
================
1.两个机器都要做 ssh localhost 免登陆
执行下面两个命令后:
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
用ssh localhost 登陆,应该就不需要输入密码了。


2.两个机器 master和slave 都需要互相ssh登陆免密码
a.master 免密码登陆 slave
1)登陆master,把master生成的 ~/.ssh/id_dsa.pub 复制到slave下, 用scp命令
  scp ~/.ssh/id_dsa.pub feng@slave:/home/feng/Downloads
2) 登陆slave 把 id_dsa.pub 放倒 authorized_keys 中
  cat ~/Downloads/id_dsa.pub >> ~/.ssh/authorized_keys
3) 在 slave 中检查  ~/.ssh/authorized_keys 中内容,执行命令
  more ~/.ssh/authorized_keys
4)在master系统中,登陆slave 看看是能ssh免密码登陆,执行命令
  ssh slave


b.slave 免密码登陆 master
1)登陆slave,把slave生成的 ~/.ssh/id_dsa.pub 复制到master下, 用scp命令
  scp ~/.ssh/id_dsa.pub feng@master:/home/feng/Downloads
2) 登陆master 把 id_dsa.pub 放倒 authorized_keys 中
  cat ~/Downloads/id_dsa.pub >> ~/.ssh/authorized_keys
3) 在 master 中检查  ~/.ssh/authorized_keys 中内容,执行命令
  more ~/.ssh/authorized_keys
4)在slave系统中,登陆master 看看是能ssh免密码登陆,执行命令
  ssh master


完成了master和slave互相ssh登陆免密码


================
七.修改master和slave的hadoop配置文件
================
master和slave的hadoop配置文件都是这么修改。


1.先把配置文件目录备份,然后再修改
1)cd /opt/hadoop-2.7.1/etc
2)cp -R hadoop/ hadoop#bak




2.修改配置文件etc/hadoop/core-site.xml
在<configuration>节点下添加以下属性
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://master:9000</value>
    </property>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>file:/opt/fengwork/hadoop/tmp</value>
    </property>
    <property>
        <name>io.file.buffer.size</name>
        <value>4096</value>
    </property>


在/opt下创建fengwork目录,并修改用户和用户组
sudo mkdir /opt/fengwork
sudo chown -R feng:root /opt/fengwork


3.修改配置文件etc/hadoop/hdfs-site.xml
在<configuration>节点下添加以下属性
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:/opt/fengwork/hadoop/datalog</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:/opt/fengwork/hadoop/data</value>
    </property>
    <property>
        <name>dfs.namenode.secondary.http-address</name>
        <value>master:9001</value>
    </property>
    <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property>


4.修改配置文件etc/hadoop/yarn-site.xml
在<configuration>节点下添加以下属性
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>master:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>master:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>master:8031</value>
    </property>
    <property>
        <name>yarn.resourcemanager.admin.address</name>
        <value>master:8033</value>
    </property>
    <property>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>master:8088</value>
    </property>


4.修改配置文件etc/hadoop/mapred-site.xml


cp mapred-site.xml.template mapred-site.xml


修改mapred-site.xml在<configuration>节点下添加以下属性
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>master:10020</value>
    </property>
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>master:19888</value>
    </property>


5.修改配置文件etc/hadoop/slaves
添加:
master
slave


6.修改配置文件etc/hadoop/hadoop-env.sh
在export JAVA_HOME=${JAVA_HOME}上面添加:
JAVA_HOME=/opt/jdk1.7.0_05
如:
JAVA_HOME=/opt/jdk1.7.0_05
export JAVA_HOME=${JAVA_HOME}


================
八.hadoop命令
================


命令:第一次启动前,需先格式化一次。


$HADOOP_PREFIX/bin/hdfs namenode -format <cluster_name>
如:
hdfs namenode -format hadoop_fengwork

/opt/hadoop-2.7.1/bin/hdfs namenode -format hadoop_fengwork


启动全部:
$HADOOP_PREFIX/sbin/start-all.sh
关闭全部:
$HADOOP_PREFIX/sbin/stop-all.sh


#解除hadoop的安全模式
$HADOOP_PREFIX/bin/hadoop dfsadmin -safemode leave
#进入hadoop的安全模式
$HADOOP_PREFIX/bin/hadoop dfsadmin –savemode  enter




验证hadoop安装环境的命令
$HADOOP_PREFIX/bin/hadoop checknative -a


----------
基本操作命令
----------
创建目录
$HADOOP_PREFIX/bin/hadoop fs -mkdir /usr
上传文件:
$HADOOP_PREFIX/bin/hadoop fs -put ~/jdk-8u25-linux-x64.tar.gz /usr/feng
下载文件:
$HADOOP_PREFIX/bin/hadoop fs -get /usr/feng/jdk-8u25-linux-x64.tar.gz ~/Downloads/


***********
可以验证文件的是否被窜改了:
feng@master:~$ md5sum ~/Downloads/jdk-8u25-linux-x64.tar.gz 
e145c03a7edc845215092786bcfba77e  /home/feng/Downloads/jdk-8u25-linux-x64.tar.gz
feng@master:~$ md5sum ~/jdk-8u25-linux-x64.tar.gz 
e145c03a7edc845215092786bcfba77e  /home/feng/jdk-8u25-linux-x64.tar.gz
查看md5结果是一个,所以文件是没有问题的。
***********


展示文件列表:
$HADOOP_PREFIX/bin/hadoop fs -ls /
递归展示文件列表:
$HADOOP_PREFIX/bin/hadoop fs -ls -R /
展示文件内容:
$HADOOP_PREFIX/bin/hadoop fs -tail /feng/tmp/1462950193038
展示全部文件内容:注意查看测试的小文件可以,太大文件最好不用
$HADOOP_PREFIX/bin/hadoop fs -cat /feng/tmp/1462950193038


更多命令查看官网:
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html


================
命令 hdfs fsck 和  hdfs classpath
================
1.查看classpath:
$HADOOP_PREFIX/bin/hdfs classpath
结果:
/opt/hadoop-2.7.1/etc/hadoop:/opt/hadoop-2.7.1/share/hadoop/common/lib/*:/opt/hadoop-2.7.1/share/hadoop/common/*:/opt/hadoop-2.7.1/share/hadoop/hdfs:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/*:/opt/hadoop-2.7.1/share/hadoop/hdfs/*:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/*:/opt/hadoop-2.7.1/share/hadoop/yarn/*:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/*:/opt/hadoop-2.7.1/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar




fsck


Usage:


   hdfs fsck <path>
          [-list-corruptfileblocks |
          [-move | -delete | -openforwrite]
          [-files [-blocks [-locations | -racks]]]
          [-includeSnapshots]
          [-storagepolicies] [-blockId <blk_Id>]


COMMAND_OPTION Description
path Start checking from this path.
-delete Delete corrupted files.
-files Print out files being checked.
-files -blocks Print out the block report
-files -blocks -locations Print out locations for every block.
-files -blocks -racks Print out network topology for data-node locations.
-includeSnapshots Include snapshot data if the given path indicates a snapshottable directory or there are snapshottable directories under it.
-list-corruptfileblocks Print out list of missing blocks and files they belong to.
-move Move corrupted files to /lost+found.
-openforwrite Print out files opened for write.
-storagepolicies Print out storage policy summary for the blocks.
-blockId Print out information about the block.


例如执行如下命令:
hdfs fsck / -files -blocks
Connecting to namenode via http://master:50070/fsck?ugi=feng&files=1&blocks=1&path=%2F
FSCK started by feng (auth:SIMPLE) from /10.11.12.45 for path / at Mon May 16 14:00:56 CST 2016
/ <dir>
/usr <dir>
/usr/feng <dir>
/usr/feng/jdk-8u25-linux-x64.tar.gz 160872482 bytes, 2 block(s):  OK
0. BP-85890032-10.11.12.45-1463366422938:blk_1073741825_1001 len=134217728 repl=1
1. BP-85890032-10.11.12.45-1463366422938:blk_1073741826_1002 len=26654754 repl=1


Status: HEALTHY
 Total size: 160872482 B
 Total dirs: 3
 Total files: 1
 Total symlinks: 0
 Total blocks (validated): 2 (avg. block size 80436241 B)
 Minimally replicated blocks: 2 (100.0 %)
 Over-replicated blocks: 0 (0.0 %)
 Under-replicated blocks: 0 (0.0 %)
 Mis-replicated blocks: 0 (0.0 %)
 Default replication factor: 1
 Average block replication: 1.0
 Corrupt blocks: 0
 Missing replicas: 0 (0.0 %)
 Number of data-nodes: 1
 Number of racks: 1
FSCK ended at Mon May 16 14:00:56 CST 2016 in 4 milliseconds




The filesystem under path '/' is HEALTHY


================
测试map/reduce
================
1.获取WordCount代码:


来源:http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html


package com.feng.test.mr.example;


import java.io.IOException;
import java.util.StringTokenizer;


import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;


public class WordCount {


  public static class TokenizerMapper
       extends Mapper<Object, Text, Text, IntWritable>{


    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();


    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }


  public static class IntSumReducer
       extends Reducer<Text,IntWritable,Text,IntWritable> {
    private IntWritable result = new IntWritable();


    public void reduce(Text key, Iterable<IntWritable> values,
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }


  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}


2.进入eclipse的Java Project中的bin目录:
执行jar打包命令:
jar cf WordCount.jar com/feng/test/mr/example/WordCount*.class


3.把WordCount.jar放到master中
scp WordCount.jar feng@master:/home/feng/Downloads


4.执行map/reduce
hadoop jar ~/Downloads/WordCount.jar com.feng.test.mr.example.WordCount /usr/feng/news.txt /user/feng/output


================
================
================
================
================
================


eclipse Hadoop 插件想使用需要:

linux 客户端:修改 /etc/hosts 添加

10.11.12.45      master
10.11.12.47      slave


=============================


hadoop命令行 与job相关的:
命令行工具 • 
1.查看 Job 信息:
hadoop job -list 
2.杀掉 Job: 
hadoop  job –kill  job_id
3.指定路径下查看历史日志汇总:
hadoop job -history output-dir 
4.作业的更多细节: 
hadoop job -history all output-dir 
5.打印map和reduce完成百分比和所有计数器:
hadoop job –status job_id 
6.杀死任务。被杀死的任务不会不利于失败尝试:
hadoop jab -kill-task <task-id> 
7.使任务失败。被失败的任务会对失败尝试不利:
hadoop job  -fail-task <task-id>


=============================
打包:
jar cf testsyf.jar com/feng/test/mr/easy/Test*.class
上传:
scp testsyf.jar feng@master:/home/feng/Downloads
执行:
/opt/hadoop-2.7.0/bin/hadoop jar ~/Downloads/testsyf.jar com.feng.test.mr/easy.TestCount /feng/mr/ /user/fengtemp1


=============================
管理信息页面:

http://master:8088

http://master:19888

http://master:50070














0 0
原创粉丝点击