hadoop单机版测试

来源:互联网 发布:php array排序 编辑:程序博客网 时间:2024/04/25 15:52
都说hadoop很牛逼。。于是就玩一下。。反正有些东东看都看不懂耶

准备环境:

Linux 操作系统:Red Hat Enterprise Linux 4

JDK安装包:jdk-6u27-linux-i586.bin

hadoop 安装包:hadoop-2.2.0.tar.gz 

这些包网上到处都是。。。

安装:

我先安装JDK:

[root@redhat jdk1.6.0_27]# ls
bin        demo     jre  LICENSE  README.html    register_ja.html     sample   THIRDPARTYLICENSEREADME.txt
COPYRIGHT  include  lib  man      register.html  register_zh_CN.html  src.zip
[root@redhat jdk1.6.0_27]# pwd
/usr/java/jdk1.6.0_27
[root@redhat jdk1.6.0_27]# cd bin
[root@redhat bin]# ls
appletviewer  HtmlConverter  java     javap         jcontrol  jmap        jstack     keytool       policytool   schemagen   unpack200
apt           idlj           javac    java-rmi.cgi  jdb       jps         jstat      native2ascii  rmic         serialver   wsgen
ControlPanel  jar            javadoc  javaws        jhat      jrunscript  jstatd     orbd          rmid         servertool  wsimport
extcheck      jarsigner      javah    jconsole      jinfo     jsadebugd   jvisualvm  pack200       rmiregistry  tnameserv   xjc
[root@redhat bin]# pwd
/usr/java/jdk1.6.0_27/bin
[root@redhat bin]# ./java -version
java version "1.6.0_27"
Java(TM) SE Runtime Environment (build 1.6.0_27-b07)
Java HotSpot(TM) Client VM (build 20.2-b06, mixed mode, sharing)
[root@redhat bin]# 

增加环境变量:

vi /etc/bashrc

在最后面新加下面的环境变量

## 2014/01/19 add
export JAVA_HOME=/usr/java/jdk1.6.0_27
export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin

创建hadoop用户和用户组并设置密码:

[root@redhat bin]# useradd hadoop -d /u03

[root@redhat bin]# passwd hadoop  --设置完成

修改/etc/hosts文件:

[root@redhat bin]# vi /etc/hosts

[root@redhat bin]# cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1       localhost       localhost
#192.168.2.200  localhost.localdomain   localhost
192.168.2.200   RedHat.RedHatDomain     RedHat

到hadoop用户设置ssh localhost自动登录

[hadoop@redhat ~]$ pwd
/u03/hadoop
[hadoop@redhat ~]$ ssh-keygen -d  --回车

[hadoop@redhat .ssh]$ pwd
/u03/hadoop/.ssh
[hadoop@redhat .ssh]$ cat id_dsa.pub >>authorized_keys 

[hadoop@redhat .ssh]$ chmod 644 authorized_keys

[hadoop@redhat .ssh]$ ssh localhost --测试是否已经正常登录
Last login: Sat Jan 18 20:54:09 2014 from localhost   --这里不需要输入密码就OK
[hadoop@redhat ~]$ who
root     pts/0        Jan 18 18:51 (192.168.2.11)
root     pts/1        Jan 18 20:31 (192.168.2.11)
hadoop   pts/2        Jan 18 20:54 (localhost)  -- 说明这次
root     pts/3        Jan 18 21:11 (192.168.2.11)
hadoop   pts/4        Jan 18 21:21 (localhost)

上传hadoop-2.2.0.tar.gz 到hadoop目录,我习惯在这个目录下面新建一个soft目录

mv hadoop-2.2.0.tar.gz  soft

到soft目录解压:生成hadoop-2.2.0 目录,我再 mv hadoop-2.2.0  ..

这个时候hadoop-2.2.0 目录就到/u03/hadoop/hadoop-2.2.0 了。

为hadoop用户设置环境变量:

[hadoop@redhat ~]$ cat .bash_profile 
# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi

# User specific environment and startup programs

PATH=$PATH:$HOME/bin:$HOME/hadoop-2.2.0/bin

export PATH
unset USERNAME
export LANG=C
## 2014/01/19 add
export JAVA_HOME=/usr/java/jdk1.6.0_27
export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin
export HADOOP_HOME=/u03/hadoop/hadoop-2.2.0
set -o vi
export HADOOP_ROOT_LOGGER=DEBUG,console   ## 这个是hadoop 的DEBUG模式,便于后面运行hadoop是否正常查看。

这个时候就配置hadoop

[hadoop@redhat hadoop]$ pwd  
/u03/hadoop/hadoop-2.2.0/etc/hadoop   -到这个目录来修改配置文件.xml

修改 core-site.xml  --我也不知道这个配置文件是干嘛用的,反正网上说要修改,我就先修改。。。
[hadoop@redhat hadoop]$ vi core-site.xml


<configuration>
<property>

 <name>fs.default.name</name>

 <value>hdfs://localhost:8020</value>

 <description>The name of the defaultfile system. Either the literal string "local" or a host:port forNDFS.

 </description>

 <final>true</final>

</property>

</configuration>

修改 hdfs-site.xml

<configuration>
  <property>

      <name>dfs.namenode.name.dir</name>

       <value>/u03/hadoop/dfs/name</value>

       <description>Determineswhere on the local filesystem the DFS name node should store the name table. Ifthis is a comma-delimited list of directories then the name table is replicatedin all of the directories, for redundancy. 
 </description>

       <final>true</final>

       </property>

         <property>

                   <name>dfs.datanode.data.dir</name>

                   <value>/u03/hadoop/dfs/data</value>

                   <description>Determineswhere on the local filesystem an DFS data node should store its blocks. If thisis a comma-delimited list of directories, then data will be stored in all nameddirectories, typically on different devices.Directories that do not exist areignored.

                   </description>

                   <final>true</final>

         </property>

         <property>

                   <name>dfs.replication</name>

                   <value>1</value>

         </property>

         <property>

         <name>dfs.permissions</name>

         <value>false</value>

         </property>


</configuration>

修改:mapred-site.xml --如果这个文件不存在,就新增吧

<configuration>

         <property>

         <name>mapreduce.framework.name</name>

         <value>yarn</value>

         </property>

         <property>

                   <name>mapred.system.dir</name>

                   <value>/u03/hadoop/mapred/system</value>

                   <final>true</final>

         </property>

         <property>

         <name>mapred.local.dir</name>

         <value>/u03/hadoop/mapred/local</value>

         <final>true</final>

         </property>

</configuration>

修改:hadoop-env.sh

在最后面新加:

export JAVA_HOME=/usr/java/jdk1.6.0_27

修改:yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    </property>
 <property>
    <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
 </property>


</configuration>

 

靠一堆的配置文件,终于修改完成。

这个时候到格式化了:

[hadoop@redhat ~]$ hadoop namenode -format --这个时候会在你的终端打印一系列的日志了,提示信息了,,乱七八糟的。。。一会就格式化完成。。

惊喜的时候到了,该启动hadoop,这个盼望已久的时刻来了。

[hadoop@redhat sbin]$ pwd
/u03/hadoop/hadoop-2.2.0/sbin
[hadoop@redhat sbin]$ ./start-all.sh   ----全部起得了,哈哈。。赖人就这样。

校验是否正常启动:

[hadoop@redhat sbin]$ jps
30911 Jps
24159 ResourceManager
24022 SecondaryNameNode
23741 NameNode
23863 DataNode
24266 NodeManager
[hadoop@redhat sbin]$   

这里要说一下。我看网上很都都有7个,我这里死活只有6个,我在想是不是我哪里还没有配置正确的原因。。有时间再慢慢研究吧。

做过安装测试吧:http://192.168.2.200:8088 或者 http://192.168.2.200:50070 这两个都出现页面应该正常了。

例如:NameNode 'localhost:8020' (active) 等等的信息。。。本来是有图的,因为这个编辑器的问题,不知道咋个把图加上去,我是典型的懒人。。。

以上的都正常,说明安装hadoop已经成功了。

下面来个应用上的测试吧:

新建目录:

hadoop@redhat ~]$ pwd
/u03/hadoop
[hadoop@redhat ~]$ mkdir tmp
[hadoop@redhat ~]$ cd tmp
[hadoop@redhat tmp]$ ls
file1  file2  file3  file4
[hadoop@redhat tmp]$ cat file1
helo work
[hadoop@redhat tmp]$ cat file2
hello hadoop
[hadoop@redhat tmp]$ cat file3
hello hadoop
[hadoop@redhat tmp]$ cat file4
hello hadoop
[hadoop@redhat tmp]$ hadoop dfs -copyFromLocal /u03/hadoop/tmp/ /in 

hadoop@redhat mapreduce]$ pwd
/u03/hadoop/hadoop-2.2.0/share/hadoop/mapreduce
[hadoop@redhat mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.2.0.jar wordcount /in/tmp /out3

        File System Counters
                FILE: Number of bytes read=103
                FILE: Number of bytes written=394274
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=445
                HDFS: Number of bytes written=31
                HDFS: Number of read operations=15
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=4
                Launched reduce tasks=1
                Data-local map tasks=4
                Total time spent by all maps in occupied slots (ms)=33511
                Total time spent by all reduces in occupied slots (ms)=3229
        Map-Reduce Framework
                Map input records=4
                Map output records=8
                Map output bytes=81
                Map output materialized bytes=121
                Input split bytes=396
                Combine input records=8
                Combine output records=8
                Reduce input groups=4
                Reduce shuffle bytes=121
                Reduce input records=8
                Reduce output records=4
                Spilled Records=16
                Shuffled Maps =4
                Failed Shuffles=0
                Merged Map outputs=4
                GC time elapsed (ms)=346
                CPU time spent (ms)=1840
                Physical memory (bytes) snapshot=700514304
                Virtual memory (bytes) snapshot=1933430784
                Total committed heap usage (bytes)=499990528
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=49
        File Output Format Counters
                Bytes Written=31
[hadoop@redhat mapreduce]$ hadoop fs -cat /out3/part-r-00000

14/01/18 21:53:04 DEBUG hdfs.DFSClient: Connecting to datanode 127.0.0.1:50010
14/01/18 21:53:04 DEBUG ipc.Client: IPC Client (33081055) connection to localhost/127.0.0.1:8020 from hadoop sending #2
14/01/18 21:53:04 DEBUG ipc.Client: IPC Client (33081055) connection to localhost/127.0.0.1:8020 from hadoop got value #2
14/01/18 21:53:04 DEBUG ipc.ProtobufRpcEngine: Call: getServerDefaults took 10ms
hadoop  3
hello   3
helo    1
work    1
14/01/18 21:53:04 DEBUG ipc.Client: Stopping client
14/01/18 21:53:04 DEBUG ipc.Client: IPC Client (33081055) connection to localhost/127.0.0.1:8020 from hadoop: closed
14/01/18 21:53:04 DEBUG ipc.Client: IPC Client (33081055) connection to localhost/127.0.0.1:8020 from hadoop: stopped, remaining connections 0

[hadoop@redhat mapreduce]$ cd -
/u03/hadoop/tmp
[hadoop@redhat tmp]$ pwd
/u03/hadoop/tmp
[hadoop@redhat tmp]$ ls
file1  file2  file3  file4
[hadoop@redhat tmp]$ grep hadoop *
file2:hello hadoop
file3:hello hadoop
file4:hello hadoop
[hadoop@redhat tmp]$  这里的确有3个hadoop 单词...

这就算搭建测试完成。。。。后面要是深入研究得努力了哦。。。。

0 0
原创粉丝点击