hadoop实验记录

来源:互联网 发布:sap供应商主数据导入 编辑:程序博客网 时间:2024/04/27 06:40

实验一:单机模式

1、             实验环境

1)    一台PC机,安装windows操作系统

2)    Vmware虚拟机:VMwareWorkstation7

3)    Linux操作系统:ubuntu-9.10-desktop-i386.iso

4)    Hadoop安装包:hadoop-0.20.2.tar.gz

5)    Java安装包:jdk-6u21-linux-i586.bin

2、             实验准备

1)    在windows操作系统下建立共享文件夹share,将所需的安装Java安装包和Hadoop安装拷入其中。(对应Linux虚拟机的/mnt/hgfs/share)

2)    安装Vmware虚拟机

3)    安装Linux虚拟机

4)    进入Linux虚拟机:

(1)安装Java

$cd /usr

$mkdir java

$cd java

$./mnt/hgfs/share/jdk-6u21-linux-i586.bin

(2) 解压Hadoop安装包

$tar –zxvf /mnt/hgfs/share/hadoop-0.20.2.tar.gz

(3) 配置hadoop/conf/hadoop-env.sh文件,修改:export JAVA_HOME=/usr/java/jdk1.6.0_21

3、实验步骤

$cd /root/hadoop-0.20.2/

$mkdir input

$echo “hello world”>test1.txt

$echo “hello hadoop”>test2.txt

$cd ..

$bin/hadoop jar hadoop-0.20.2-examples.jarwordcount input output

 

10/09/2223:40:12 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker,sessionId=

10/09/2223:40:13 INFO input.FileInputFormat: Total input paths to process : 2

10/09/2223:40:13 INFO mapred.JobClient: Running job: job_local_0001

10/09/2223:40:13 INFO input.FileInputFormat: Total input paths to process : 2

10/09/2223:40:14 INFO mapred.MapTask: io.sort.mb = 100

10/09/2223:40:14 INFO mapred.JobClient:  map 0%reduce 0%

10/09/2223:40:23 INFO mapred.MapTask: data buffer = 79691776/99614720

10/09/2223:40:23 INFO mapred.MapTask: record buffer = 262144/327680

10/09/2223:40:23 INFO mapred.MapTask: Starting flush of map output

10/09/2223:40:24 INFO mapred.MapTask: Finished spill 0

10/09/2223:40:24 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done.And is in the process of commiting

10/09/22 23:40:24INFO mapred.LocalJobRunner:

10/09/2223:40:24 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.

10/09/2223:40:25 INFO mapred.MapTask: io.sort.mb = 100

10/09/2223:40:25 INFO mapred.JobClient:  map 100%reduce 0%

10/09/2223:40:26 INFO mapred.MapTask: data buffer = 79691776/99614720

10/09/2223:40:26 INFO mapred.MapTask: record buffer = 262144/327680

10/09/2223:40:26 INFO mapred.MapTask: Starting flush of map output

10/09/2223:40:26 INFO mapred.MapTask: Finished spill 0

10/09/22 23:40:26INFO mapred.TaskRunner: Task:attempt_local_0001_m_000001_0 is done. And is inthe process of commiting

10/09/2223:40:26 INFO mapred.LocalJobRunner:

10/09/2223:40:26 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000001_0' done.

10/09/2223:40:26 INFO mapred.LocalJobRunner:

10/09/2223:40:26 INFO mapred.Merger: Merging 2 sorted segments

10/09/2223:40:26 INFO mapred.Merger: Down to the last merge-pass, with 2 segments leftof total size: 53 bytes

10/09/2223:40:26 INFO mapred.LocalJobRunner:

10/09/2223:40:26 INFO mapred.TaskRunner: Task:attempt_local_0001_r_000000_0 is done.And is in the process of commiting

10/09/2223:40:26 INFO mapred.LocalJobRunner:

10/09/2223:40:26 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0 is allowedto commit now

10/09/2223:40:27 INFO output.FileOutputCommitter: Saved output of task'attempt_local_0001_r_000000_0' to output

10/09/2223:40:27 INFO mapred.LocalJobRunner: reduce > reduce

10/09/2223:40:27 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' done.

10/09/2223:40:27 INFO mapred.JobClient:  map 100%reduce 100%

10/09/2223:40:27 INFO mapred.JobClient: Job complete: job_local_0001

10/09/2223:40:27 INFO mapred.JobClient: Counters: 12

10/09/2223:40:27 INFO mapred.JobClient:  FileSystemCounters

10/09/2223:40:27 INFO mapred.JobClient:    FILE_BYTES_READ=467497

10/09/2223:40:27 INFO mapred.JobClient:    FILE_BYTES_WRITTEN=512494

10/09/2223:40:27 INFO mapred.JobClient:  Map-Reduce Framework

10/09/2223:40:27 INFO mapred.JobClient:     Reduce input groups=3

10/09/2223:40:27 INFO mapred.JobClient:    Combine output records=4

10/09/2223:40:27 INFO mapred.JobClient:     Mapinput records=2

10/09/2223:40:27 INFO mapred.JobClient:    Reduce shuffle bytes=0

10/09/2223:40:27 INFO mapred.JobClient:    Reduce output records=3

10/09/2223:40:27 INFO mapred.JobClient:    Spilled Records=8

10/09/2223:40:27 INFO mapred.JobClient:     Mapoutput bytes=41

10/09/2223:40:27 INFO mapred.JobClient:    Combine input records=4

10/09/2223:40:27 INFO mapred.JobClient:     Mapoutput records=4

10/09/2223:40:27 INFO mapred.JobClient:    Reduce input records=4

 

$cat output/*

 

hadoop    1

hello       2

world      1

 

实验二:伪分布式模式

3、             实验环境

6)    一台PC机,安装windows操作系统

7)    Vmware虚拟机:VMwareWorkstation7

8)    Linux操作系统:ubuntu-9.10-desktop-i386.iso

9)    Hadoop安装包:hadoop-0.20.2.tar.gz

10) Java安装包:jdk-6u21-linux-i586.bin

4、             实验准备

5)    在windows操作系统下建立共享文件夹share,将所需的安装Java安装包和Hadoop安装拷入其中。(对应Linux虚拟机的/mnt/hgfs/share)

6)    安装Vmware虚拟机

7)    安装Linux虚拟机

8)    进入Linux虚拟机:

(1)安装Java

$cd /usr

$mkdir java

$cd java

$./mnt/hgfs/share/jdk-6u21-linux-i586.bin

(2) 解压Hadoop安装包

$tar –zxvf /mnt/hgfs/share/hadoop-0.20.2.tar.gz

(3) 配置hadoop/conf/hadoop-env.sh文件,修改:export JAVA_HOME=/usr/java/jdk1.6.0_21

(4)配置hadoop/conf/core-site.xml,hdfs-site.xml, mapred-site.xml

--------- core-site.xml

<configuration>

 <property>

 <name>fs.default.name</name>

 <value>hdfs://localhost:9000</value>

 </property>

 <property>

 <name>hadoop.tmp.dir</name>

 <value>/tmp/hadoop/hadoop-${user.name}</value>

 </property>

</configuration>

-----------hdfs-site.xml

<configuration>

<property>

 <name>dfs.replication</name>

 <value>1</value>

 </property>

</configuration>

-----------mapred-site.xml

<configuration>

 <property>

 <name>mapred.job.tracker</name>

 <value>localhost:9001</value>

 </property>

</configuration>

 (5)安装ssh

$ sudo apt-get install ssh

(6)免密码ssh设置

root@ubuntu:~/hadoop-0.20.2#ssh-keygen -t rsa

一直按enter

$cd /root/.ssh

$ cp id_rsa.pub authorized_keys

$ssh localhost

Linux ubuntu 2.6.31-14-generic#48-Ubuntu SMP Fri Oct 16 14:04:26 UTC 2009 i686

 

To access officialUbuntu documentation, please visit:

http://help.ubuntu.com/

 

316 packages canbe updated.

148 updates aresecurity updates.

 

Last login: ThuSep 23 00:05:56 2010 from localhost

3、实验步骤

格式化dfs

$bin/hadoop namenode –format

10/09/2300:10:48 INFO namenode.NameNode: STARTUP_MSG:

/************************************************************

STARTUP_MSG:Starting NameNode

STARTUP_MSG:   host = ubuntu/127.0.0.1

STARTUP_MSG:   args = [-format]

STARTUP_MSG:   version = 0.20.2

STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010

************************************************************/

10/09/2300:10:49 INFO namenode.FSNamesystem: fsOwner=root,root

10/09/2300:10:49 INFO namenode.FSNamesystem: supergroup=supergroup

10/09/2300:10:49 INFO namenode.FSNamesystem: isPermissionEnabled=true

10/09/2300:10:49 INFO common.Storage: Image file of size 94 saved in 0 seconds.

10/09/2300:10:49 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name hasbeen successfully formatted.

10/09/2300:10:49 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG:Shutting down NameNode at ubuntu/127.0.0.1

************************************************************/

 

运行Hadoop

$bin/start-all.sh

 

root@ubuntu:~/hadoop-0.20.2/bin#start-all.sh

startingnamenode, logging to /root/hadoop-0.20.2/bin/../logs/hadoop-root-namenode-ubuntu.out

localhost:starting datanode, logging to /root/hadoop-0.20.2/bin/../logs/hadoop-root-datanode-ubuntu.out

localhost:starting secondarynamenode, logging to /root/hadoop-0.20.2/bin/../logs/hadoop-root-secondarynamenode-ubuntu.out

startingjobtracker, logging to /root/hadoop-0.20.2/bin/../logs/hadoop-root-jobtracker-ubuntu.out

localhost:starting tasktracker, logging to /root/hadoop-0.20.2/bin/../logs/hadoop-root-tasktracker-ubuntu.out

root@ubuntu:~/hadoop-0.20.2/bin#jps

13609 NameNode

13971 JobTracker

13749 DataNode

13900SecondaryNameNode

14112TaskTracker

14150 Jps

 

将input目录复制到hdfs根目录下,重命名为in

$bin/hadoop fs -put input in  ($bin/hadoop dfs –copyromLocal /root/inputin)

(bin/hadoop dfs –ls in)

$bin/hadoop jar hadoop-0.20.2-examples.jarwordcount in out

10/09/2309:38:42 INFO input.FileInputFormat: Total input paths to process : 2

10/09/2309:38:43 INFO mapred.JobClient: Running job: job_201009230935_0002

10/09/2309:38:44 INFO mapred.JobClient:  map 0%reduce 0%

10/09/2309:39:01 INFO mapred.JobClient:  map 100%reduce 0%

10/09/2309:39:19 INFO mapred.JobClient:  map 100%reduce 100%

10/09/2309:39:21 INFO mapred.JobClient: Job complete: job_201009230935_0002

10/09/2309:39:21 INFO mapred.JobClient: Counters: 17

10/09/2309:39:21 INFO mapred.JobClient:   JobCounters

10/09/2309:39:21 INFO mapred.JobClient:    Launched reduce tasks=1

10/09/2309:39:21 INFO mapred.JobClient:    Launched map tasks=2

10/09/2309:39:21 INFO mapred.JobClient:    Data-local map tasks=2

10/09/2309:39:21 INFO mapred.JobClient:  FileSystemCounters

10/09/23 09:39:21INFO mapred.JobClient:    FILE_BYTES_READ=55

10/09/2309:39:21 INFO mapred.JobClient:    HDFS_BYTES_READ=25

10/09/2309:39:21 INFO mapred.JobClient:    FILE_BYTES_WRITTEN=180

10/09/2309:39:21 INFO mapred.JobClient:    HDFS_BYTES_WRITTEN=25

10/09/2309:39:21 INFO mapred.JobClient:  Map-Reduce Framework

10/09/2309:39:21 INFO mapred.JobClient:    Reduce input groups=3

10/09/2309:39:21 INFO mapred.JobClient:    Combine output records=4

10/09/2309:39:21 INFO mapred.JobClient:     Mapinput records=2

10/09/2309:39:21 INFO mapred.JobClient:    Reduce shuffle bytes=61

10/09/2309:39:21 INFO mapred.JobClient:    Reduce output records=3

10/09/2309:39:21 INFO mapred.JobClient:    Spilled Records=8

10/09/2309:39:21 INFO mapred.JobClient:     Mapoutput bytes=41

10/09/2309:39:21 INFO mapred.JobClient:    Combine input records=4

10/09/2309:39:21 INFO mapred.JobClient:     Mapoutput records=4

10/09/2309:39:21 INFO mapred.JobClient:    Reduce input records=4

 

查看结果

$bin/hadoop fs -cat out/*

hadoop    1

hello       2

world      1

cat: Source mustbe a file.

复制到本地查看

$bin/hadoop fs -get out output

$ cat output/*

cat:output/_logs: Is a directory

hadoop    1

hello       2

world      1

 

停止Hadoop守护进程

$bin/stop-all.sh

 

实验三:完全分布式模式

5、             实验环境

11) 二台PC机,安装windows操作系统

12) Vmware虚拟机:VMwareWorkstation7

13) Linux操作系统:ubuntu-9.10-desktop-i386.iso

14) Hadoop安装包:hadoop-0.20.2.tar.gz

15) Java安装包:jdk-6u21-linux-i586.bin

6、             实验准备

9)    在windows操作系统下建立共享文件夹share,将所需的安装Java安装包和Hadoop安装拷入其中。(对应Linux虚拟机的/mnt/hgfs/share)

10) 安装Vmware虚拟机

11) 安装Linux虚拟机

12) 进入Linux虚拟机:

(1)安装Java

$cd /usr

$mkdir java

$cd java

$./mnt/hgfs/share/jdk-6u21-linux-i586.bin

(2) 解压Hadoop安装包

$tar –zxvf /mnt/hgfs/share/hadoop-0.20.2.tar.gz

(3) 配置hadoop/conf/hadoop-env.sh文件,修改:export JAVA_HOME=/usr/java/jdk1.6.0_21

(4)修改etc/hostname:master

Host文件添加ip和主机名

(5)配置hadoop/conf/core-site.xml,hdfs-site.xml, mapred-site.xml

--------- core-site.xml

<configuration>

 <property>

 <name>fs.default.name</name>

 <value>hdfs://master:9000</value>

 </property>

 <property>

 <name>hadoop.tmp.dir</name>

 <value>/tmp/hadoop/hadoop-${user.name}</value>

 </property>

</configuration>

-----------hdfs-site.xml

<configuration>

<property>

 <name>dfs.replication</name>

 <value>1</value>

 </property>

</configuration>

-----------mapred-site.xml

<configuration>

 <property>

 <name>mapred.job.tracker</name>

 <value>master:9001</value>

 </property>

</configuration>

(6)安装ssh

$ sudo apt-get install ssh

(7)免密码ssh设置

$hadoop-0.20.2#ssh-keygen -t rsa

一直按enter

$cd /root/.ssh

$ cp id_rsa.pub authorized_keys

将虚拟机文件复制到另一台计算机,安装好VMWare

(8)启动slave主机,修改/etc/host及hostname(hostname为slave1)

(9)进入master主机/root/.ssh

$scp  authorzed_keys slave1:/root/.ssh

(10)进入slave主机的/root/.ssh

$chmod 644 authorzed_keys

3、实验步骤

格式化dfs

$bin/hadoop namenode –format

10/09/2300:10:48 INFO namenode.NameNode: STARTUP_MSG:

/************************************************************

STARTUP_MSG:Starting NameNode

STARTUP_MSG:   host = ubuntu/127.0.0.1

STARTUP_MSG:   args = [-format]

STARTUP_MSG:   version = 0.20.2

STARTUP_MSG:   build =https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707;compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010

************************************************************/

10/09/2300:10:49 INFO namenode.FSNamesystem: fsOwner=root,root

10/09/2300:10:49 INFO namenode.FSNamesystem: supergroup=supergroup

10/09/2300:10:49 INFO namenode.FSNamesystem: isPermissionEnabled=true

10/09/2300:10:49 INFO common.Storage: Image file of size 94 saved in 0 seconds.

10/09/2300:10:49 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name hasbeen successfully formatted.

10/09/2300:10:49 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG:Shutting down NameNode at ubuntu/127.0.0.1

************************************************************/

 

运行Hadoop

$bin/start-all.sh

 

root@ubuntu:~/hadoop-0.20.2/bin#start-all.sh

startingnamenode, logging to /root/hadoop-0.20.2/bin/../logs/hadoop-root-namenode-ubuntu.out

localhost:starting datanode, logging to /root/hadoop-0.20.2/bin/../logs/hadoop-root-datanode-ubuntu.out

localhost:starting secondarynamenode, logging to /root/hadoop-0.20.2/bin/../logs/hadoop-root-secondarynamenode-ubuntu.out

startingjobtracker, logging to /root/hadoop-0.20.2/bin/../logs/hadoop-root-jobtracker-ubuntu.out

localhost:starting tasktracker, logging to /root/hadoop-0.20.2/bin/../logs/hadoop-root-tasktracker-ubuntu.out

root@ubuntu:~/hadoop-0.20.2/bin#jps

13609 NameNode

13971 JobTracker

13900SecondaryNameNode

14150 Jps

 

将input目录复制到hdfs根目录下,重命名为in

$bin/hadoop fs -put input in

 

$bin/hadoop jar hadoop-0.20.2-examples.jarwordcount in out

10/09/2419:21:42 INFO input.FileInputFormat: Total input paths to process : 3

10/09/2419:21:43 INFO mapred.JobClient: Running job: job_201009241902_0003

10/09/2419:21:44 INFO mapred.JobClient:  map 0%reduce 0%

10/09/2419:21:59 INFO mapred.JobClient:  map 66%reduce 0%

10/09/2419:22:02 INFO mapred.JobClient:  map 100%reduce 0%

10/09/2419:22:11 INFO mapred.JobClient:  map 100%reduce 100%

10/09/2419:22:13 INFO mapred.JobClient: Job complete: job_201009241902_0003

10/09/2419:22:13 INFO mapred.JobClient: Counters: 17

10/09/2419:22:13 INFO mapred.JobClient:   JobCounters

10/09/2419:22:13 INFO mapred.JobClient:    Launched reduce tasks=1

10/09/2419:22:13 INFO mapred.JobClient:    Launched map tasks=3

10/09/2419:22:13 INFO mapred.JobClient:    Data-local map tasks=3

10/09/2419:22:13 INFO mapred.JobClient:  FileSystemCounters

10/09/2419:22:13 INFO mapred.JobClient:    FILE_BYTES_READ=112

10/09/2419:22:13 INFO mapred.JobClient:    HDFS_BYTES_READ=70

10/09/2419:22:13 INFO mapred.JobClient:    FILE_BYTES_WRITTEN=332

10/09/2419:22:13 INFO mapred.JobClient:    HDFS_BYTES_WRITTEN=54

10/09/2419:22:13 INFO mapred.JobClient:   Map-ReduceFramework

10/09/2419:22:13 INFO mapred.JobClient:    Reduce input groups=7

10/09/2419:22:13 INFO mapred.JobClient:    Combine output records=9

10/09/2419:22:13 INFO mapred.JobClient:     Mapinput records=3

10/09/2419:22:13 INFO mapred.JobClient:    Reduce shuffle bytes=124

10/09/2419:22:13 INFO mapred.JobClient:    Reduce output records=7

10/09/2419:22:13 INFO mapred.JobClient:    Spilled Records=18

10/09/2419:22:13 INFO mapred.JobClient:     Mapoutput bytes=118

10/09/2419:22:13 INFO mapred.JobClient:    Combine input records=12

10/09/2419:22:13 INFO mapred.JobClient:     Mapoutput records=12

10/09/2419:22:13 INFO mapred.JobClient:    Reduce input records=9

查看结果

$bin/hadoop dfs -cat out/*

HDFS     1

hadoop    1

hello       5

linux       1

my   1

ubuntu    1

world      2

cat: Source mustbe a file.

停止Hadoop守护进程

$bin/stop-all.sh

 

原创粉丝点击