CentOS6.5+hadoop1.2.1安装配置测试记录

来源：互联网发布：网络音乐制作工作室编辑：程序博客网时间：2024/06/04 19:33

仅记录保留，由于是用的ROOT，用户，建议大家别采纳，实验可以

1. 准备四台机器，操作系统都是centOS6.5，角色分配如下：

ip    hostname    role
192.168.81.151    hdp01    secondraynamenode, job tracker, namenode
192.168.81.152    hdp02    task tracker, datanode
192.168.81.153    hdp03    task tracker, datanode
192.168.81.154    hdp04    task tracker, datanode
2. putty到每台远程机器上，修改hostname
/etc/sysconfig/network
vim /etc/hosts
127.0.0.1    localhost
192.168.81.151   hdp01
192.168.81.152   hdp02
192.168.81.153   hdp03
192.168.81.154   hdp04
3.Device eth0 does not seem to be present
2) vi /etc/udev/rules.d/70-persistent-net.rules
记录下，eth1网卡的mac地址00:0c:29:4b:c7:a4
# vi /etc/sysconfig/network-scripts/ifcfg-eth0
将 DEVICE="eth0" 改成 DEVICE="eth1" ,
将 HWADDR="00:0c:29:c8:a2:a4" 改成上面的mac地址 HWADDR="00:0c:29:4b:c7:a4"
3)"00:0c:29:d8:13:c1"
4)"00:0c:29:db:d1:70"
重启网络
4.每台机器上，建立相应文件夹
sudo mkdir /usr/local/bg
sudo chmod 777 -R /usr/local/bg
5.安装JAVA：
scp java root@192:sdk
SSH链接提示凭证问题
$ ssh-keygen -f /home/test/.ssh/known_hosts -R 192.168.81.151
# set java environment
export JAVA_HOME=/usr/local/bg/jdk1.8.0_40
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=$CLASSPATH:.:$JAVA_HOME/lib:$JRE_HOME/lib
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
立即生效：source /etc/profile
6. 安装hadoop
下载hadoop-1.2.1.tar.gz
解压到上面说的bg文件夹中
修改环境变量
sudo vim /etc/profile
# set hadoop environment
export HADOOP_HOME=/usr/local/bg/hadoop-1.2.1
export HADOOP_HOME_WARN_SUPPRESS=1
export PATH=$PATH:$HADOOP_HOME/bin
source /etc/profile
修改hadoop conf文件夹下的hadoop-env.sh
export JAVA_HOME=/usr/local/bg/jdk1.8.0_40
建立数据存储目录
sudo mkdir /usr/local/bg/storage/hadoop
修改conf下的hdfs-site.xml：<修改XML后必须同步4服务器，否则，无法正常启动点>
[root@hdp01 conf]# cat hdfs-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
      <property>
        <name>dfs.name.dir</name>
        <value>/usr/local/bg/storage/hadoop/name</value>
        <final>true</final>
   <description>
          Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
        </description>
      </property>
<property>
<name>dfs.data.dir</name>
<value>/usr/local/bg/storage/hadoop/data</value>
        <final>true</final>
<description>
    Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
</description>
</property>
<property>
<name>dfs.http.address</name>
<value>hdp01:50070</value>
        <final>true</final>
<description>
    The address and the base port where the dfs namenode web ui will listen on. If the port is 0 then the server will start on a free port.
</description>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
        <final>true</final>
<description>
    Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.
</description>
</property>
    </configuration>
修改conf下core-site.xml文件:
[root@hdp01 conf]# cat core-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/bg/storage/hadoop/temp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://hdp01:9000</value>
<final>true</final>
<description>
    The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.
</description>
</property>
</configuration>

修改conf下mapred-site.xml:
[root@hdp01 conf]# cat mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
    <name>mapred.job.tracker</name>
    <value>hdp01:9001</value>
    <description>
      The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task.
    </description>
</property>
</configuration>

修改conf下masters成：bg01，这个其实是决定secondarynamenode的
修改conf下slavers成：
bg02
bg03
bg04
这个是决定datanode和tasktracker的
注意：四台机上的配置一样
　1.　$cd ~/.ssh
　2.　$ssh-keygen -t rsa --然后一直按回车键，将生成的密钥保存在.ssh/id_rsa文件中。
　3.　$cp id_rsa.pub authorized_keys
--   $cat id_rsa.pub >> authorized_keys ......
　4.　$scp authorized_keys summer@10.0.5.198:/home/summer/.ssh
出现需要密码ssh hdp01：关闭IPTABLES与SELINUX,互相访问1次后OK
7.
hadoop namenode -format，注意只需要格式化一次，否则你的数据将全部丢失，还会出现datanode不能启动等一系列问题
8.
启动hadoop
这一步也在主结点master上进行操作
start-all.sh

jps
应该发现secondraynamenode, job tracker, namenode这个3个进程
9.
测试：
[root@hdp01 ~]# mkdir test
[root@hdp01 ~]# cd test
[root@hdp01 test]# echo "hello world" > test1.txt
[root@hdp01 test]# echo "hello hadoop" > test2.txt
[root@hdp01 test]# ll
total 8
-rw-r--r-- 1 root root 12 9月 15 01:42 test1.txt
-rw-r--r-- 1 root root 13 9月 15 01:43 test2.txt
[root@hdp01 test]# hadoop dfs -put ./ in
[root@hdp01 test]# hadoop dfs -ls ./in
Found 2 items
-rw-r--r--   3 root supergroup         12 2015-09-15 01:43 /user/root/in/test1.txt
-rw-r--r--   3 root supergroup         13 2015-09-15 01:43 /user/root/in/test2.txt
[root@hdp01 test]# hadoop dfs -ls ./in
Found 2 items
-rw-r--r--   3 root supergroup         12 2015-09-15 01:43 /user/root/in/test1.txt
-rw-r--r--   3 root supergroup         13 2015-09-15 01:43 /user/root/in/test2.txt
[root@hdp01 test]# hadoop dfs -ls
Found 1 items
drwxr-xr-x   - root supergroup          0 2015-09-15 01:43 /user/root/in
[root@hdp01 ~]# hadoop dfs -get ./in/* ./
[root@hdp01 ~]# ls
anaconda-ks.cfg install.log install.log.syslog test test1.txt test2.txt
[root@hdp01 hadoop-1.2.1]# hadoop jar ./hadoop-examples-1.2.1.jar wordcount in out
15/09/15 01:49:40 INFO input.FileInputFormat: Total input paths to process : 2
15/09/15 01:49:40 INFO util.NativeCodeLoader: Loaded the native-hadoop library
15/09/15 01:49:40 WARN snappy.LoadSnappy: Snappy native library not loaded
15/09/15 01:49:41 INFO mapred.JobClient: Running job: job_201509150117_0001
15/09/15 01:49:42 INFO mapred.JobClient: map 0% reduce 0%
15/09/15 01:49:51 INFO mapred.JobClient: map 50% reduce 0%
15/09/15 01:49:52 INFO mapred.JobClient: map 100% reduce 0%
15/09/15 01:50:00 INFO mapred.JobClient: map 100% reduce 33%
15/09/15 01:50:02 INFO mapred.JobClient: map 100% reduce 100%
15/09/15 01:50:03 INFO mapred.JobClient: Job complete: job_201509150117_0001
15/09/15 01:50:03 INFO mapred.JobClient: Counters: 29
15/09/15 01:50:03 INFO mapred.JobClient:   Map-Reduce Framework
15/09/15 01:50:03 INFO mapred.JobClient:     Spilled Records=8
15/09/15 01:50:03 INFO mapred.JobClient:     Map output materialized bytes=61
15/09/15 01:50:03 INFO mapred.JobClient:     Reduce input records=4
15/09/15 01:50:03 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=5798174720
15/09/15 01:50:03 INFO mapred.JobClient:     Map input records=2
15/09/15 01:50:03 INFO mapred.JobClient:     SPLIT_RAW_BYTES=210
15/09/15 01:50:03 INFO mapred.JobClient:     Map output bytes=41
15/09/15 01:50:03 INFO mapred.JobClient:     Reduce shuffle bytes=61
15/09/15 01:50:03 INFO mapred.JobClient:     Physical memory (bytes) snapshot=420593664
15/09/15 01:50:03 INFO mapred.JobClient:     Reduce input groups=3
15/09/15 01:50:03 INFO mapred.JobClient:     Combine output records=4
15/09/15 01:50:03 INFO mapred.JobClient:     Reduce output records=3
15/09/15 01:50:03 INFO mapred.JobClient:     Map output records=4
15/09/15 01:50:03 INFO mapred.JobClient:     Combine input records=4
15/09/15 01:50:03 INFO mapred.JobClient:     CPU time spent (ms)=2180
15/09/15 01:50:03 INFO mapred.JobClient:     Total committed heap usage (bytes)=337780736
15/09/15 01:50:03 INFO mapred.JobClient:   File Input Format Counters
15/09/15 01:50:03 INFO mapred.JobClient:     Bytes Read=25
15/09/15 01:50:03 INFO mapred.JobClient:   FileSystemCounters
15/09/15 01:50:03 INFO mapred.JobClient:     HDFS_BYTES_READ=235
15/09/15 01:50:03 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=178147
15/09/15 01:50:03 INFO mapred.JobClient:     FILE_BYTES_READ=55
15/09/15 01:50:03 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=25
15/09/15 01:50:03 INFO mapred.JobClient:   Job Counters
15/09/15 01:50:03 INFO mapred.JobClient:     Launched map tasks=2
15/09/15 01:50:03 INFO mapred.JobClient:     Launched reduce tasks=1
15/09/15 01:50:03 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=10296
15/09/15 01:50:03 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
15/09/15 01:50:03 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=13515
15/09/15 01:50:03 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
15/09/15 01:50:03 INFO mapred.JobClient:     Data-local map tasks=2
15/09/15 01:50:03 INFO mapred.JobClient:   File Output Format Counters
15/09/15 01:50:03 INFO mapred.JobClient:     Bytes Written=25

$ls /usr/local/bg/storage/data
在hdfs文件系统中，数据是存放在datanode中，因此应该从slave结点中查看。由上图可以看出，hdfs系统上的文件如果从linux角度上来看，主要是一些元数据和一些数据项，这两者才构成一个完整的文件，也就是说在linux角度查看hdfs文件的数据内容时，是一堆乱七八糟的东西，是没有任何意义的。

[root@hdp01 bg]# hadoop dfsadmin -report
Configured Capacity: 12707119104 (11.83 GB)
Present Capacity: 4093890560 (3.81 GB)
DFS Remaining: 4093448192 (3.81 GB)
DFS Used: 442368 (432 KB)
DFS Used%: 0.01%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 3 (3 total, 0 dead)

Name: 192.168.81.153:50010
Decommission Status : Normal
Configured Capacity: 4235706368 (3.94 GB)
DFS Used: 147456 (144 KB)
Non DFS Used: 2871005184 (2.67 GB)
DFS Remaining: 1364553728(1.27 GB)
DFS Used%: 0%
DFS Remaining%: 32.22%
Last contact: Tue Sep 15 02:00:59 CST 2015

Name: 192.168.81.154:50010
Decommission Status : Normal
Configured Capacity: 4235706368 (3.94 GB)
DFS Used: 147456 (144 KB)
Non DFS Used: 2871156736 (2.67 GB)
DFS Remaining: 1364402176(1.27 GB)
DFS Used%: 0%
DFS Remaining%: 32.21%
Last contact: Tue Sep 15 02:00:58 CST 2015

Name: 192.168.81.152:50010
Decommission Status : Normal
Configured Capacity: 4235706368 (3.94 GB)
DFS Used: 147456 (144 KB)
Non DFS Used: 2871066624 (2.67 GB)
DFS Remaining: 1364492288(1.27 GB)
DFS Used%: 0%
DFS Remaining%: 32.21%
Last contact: Tue Sep 15 02:00:59 CST 2015

进入和退出hadoop的安全模式
[root@hdp01 bg]# hadoop dfsadmin -safemode enter
[root@hdp01 bg]# hadoop dfsadmin -safemode leave

0 0