hadoop实验记录
来源:互联网 发布:sap供应商主数据导入 编辑:程序博客网 时间:2024/04/27 06:40
实验一:单机模式
1、 实验环境
1) 一台PC机,安装windows操作系统
2) Vmware虚拟机:VMwareWorkstation7
3) Linux操作系统:ubuntu-9.10-desktop-i386.iso
4) Hadoop安装包:hadoop-0.20.2.tar.gz
5) Java安装包:jdk-6u21-linux-i586.bin
2、 实验准备
1) 在windows操作系统下建立共享文件夹share,将所需的安装Java安装包和Hadoop安装拷入其中。(对应Linux虚拟机的/mnt/hgfs/share)
2) 安装Vmware虚拟机
3) 安装Linux虚拟机
4) 进入Linux虚拟机:
(1)安装Java
$cd /usr
$mkdir java
$cd java
$./mnt/hgfs/share/jdk-6u21-linux-i586.bin
(2) 解压Hadoop安装包
$tar –zxvf /mnt/hgfs/share/hadoop-0.20.2.tar.gz
(3) 配置hadoop/conf/hadoop-env.sh文件,修改:export JAVA_HOME=/usr/java/jdk1.6.0_21
3、实验步骤
$cd /root/hadoop-0.20.2/
$mkdir input
$echo “hello world”>test1.txt
$echo “hello hadoop”>test2.txt
$cd ..
$bin/hadoop jar hadoop-0.20.2-examples.jarwordcount input output
10/09/2223:40:12 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker,sessionId=
10/09/2223:40:13 INFO input.FileInputFormat: Total input paths to process : 2
10/09/2223:40:13 INFO mapred.JobClient: Running job: job_local_0001
10/09/2223:40:13 INFO input.FileInputFormat: Total input paths to process : 2
10/09/2223:40:14 INFO mapred.MapTask: io.sort.mb = 100
10/09/2223:40:14 INFO mapred.JobClient: map 0%reduce 0%
10/09/2223:40:23 INFO mapred.MapTask: data buffer = 79691776/99614720
10/09/2223:40:23 INFO mapred.MapTask: record buffer = 262144/327680
10/09/2223:40:23 INFO mapred.MapTask: Starting flush of map output
10/09/2223:40:24 INFO mapred.MapTask: Finished spill 0
10/09/2223:40:24 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done.And is in the process of commiting
10/09/22 23:40:24INFO mapred.LocalJobRunner:
10/09/2223:40:24 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
10/09/2223:40:25 INFO mapred.MapTask: io.sort.mb = 100
10/09/2223:40:25 INFO mapred.JobClient: map 100%reduce 0%
10/09/2223:40:26 INFO mapred.MapTask: data buffer = 79691776/99614720
10/09/2223:40:26 INFO mapred.MapTask: record buffer = 262144/327680
10/09/2223:40:26 INFO mapred.MapTask: Starting flush of map output
10/09/2223:40:26 INFO mapred.MapTask: Finished spill 0
10/09/22 23:40:26INFO mapred.TaskRunner: Task:attempt_local_0001_m_000001_0 is done. And is inthe process of commiting
10/09/2223:40:26 INFO mapred.LocalJobRunner:
10/09/2223:40:26 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000001_0' done.
10/09/2223:40:26 INFO mapred.LocalJobRunner:
10/09/2223:40:26 INFO mapred.Merger: Merging 2 sorted segments
10/09/2223:40:26 INFO mapred.Merger: Down to the last merge-pass, with 2 segments leftof total size: 53 bytes
10/09/2223:40:26 INFO mapred.LocalJobRunner:
10/09/2223:40:26 INFO mapred.TaskRunner: Task:attempt_local_0001_r_000000_0 is done.And is in the process of commiting
10/09/2223:40:26 INFO mapred.LocalJobRunner:
10/09/2223:40:26 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0 is allowedto commit now
10/09/2223:40:27 INFO output.FileOutputCommitter: Saved output of task'attempt_local_0001_r_000000_0' to output
10/09/2223:40:27 INFO mapred.LocalJobRunner: reduce > reduce
10/09/2223:40:27 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' done.
10/09/2223:40:27 INFO mapred.JobClient: map 100%reduce 100%
10/09/2223:40:27 INFO mapred.JobClient: Job complete: job_local_0001
10/09/2223:40:27 INFO mapred.JobClient: Counters: 12
10/09/2223:40:27 INFO mapred.JobClient: FileSystemCounters
10/09/2223:40:27 INFO mapred.JobClient: FILE_BYTES_READ=467497
10/09/2223:40:27 INFO mapred.JobClient: FILE_BYTES_WRITTEN=512494
10/09/2223:40:27 INFO mapred.JobClient: Map-Reduce Framework
10/09/2223:40:27 INFO mapred.JobClient: Reduce input groups=3
10/09/2223:40:27 INFO mapred.JobClient: Combine output records=4
10/09/2223:40:27 INFO mapred.JobClient: Mapinput records=2
10/09/2223:40:27 INFO mapred.JobClient: Reduce shuffle bytes=0
10/09/2223:40:27 INFO mapred.JobClient: Reduce output records=3
10/09/2223:40:27 INFO mapred.JobClient: Spilled Records=8
10/09/2223:40:27 INFO mapred.JobClient: Mapoutput bytes=41
10/09/2223:40:27 INFO mapred.JobClient: Combine input records=4
10/09/2223:40:27 INFO mapred.JobClient: Mapoutput records=4
10/09/2223:40:27 INFO mapred.JobClient: Reduce input records=4
$cat output/*
hadoop 1
hello 2
world 1
实验二:伪分布式模式
3、 实验环境
6) 一台PC机,安装windows操作系统
7) Vmware虚拟机:VMwareWorkstation7
8) Linux操作系统:ubuntu-9.10-desktop-i386.iso
9) Hadoop安装包:hadoop-0.20.2.tar.gz
10) Java安装包:jdk-6u21-linux-i586.bin
4、 实验准备
5) 在windows操作系统下建立共享文件夹share,将所需的安装Java安装包和Hadoop安装拷入其中。(对应Linux虚拟机的/mnt/hgfs/share)
6) 安装Vmware虚拟机
7) 安装Linux虚拟机
8) 进入Linux虚拟机:
(1)安装Java
$cd /usr
$mkdir java
$cd java
$./mnt/hgfs/share/jdk-6u21-linux-i586.bin
(2) 解压Hadoop安装包
$tar –zxvf /mnt/hgfs/share/hadoop-0.20.2.tar.gz
(3) 配置hadoop/conf/hadoop-env.sh文件,修改:export JAVA_HOME=/usr/java/jdk1.6.0_21
(4)配置hadoop/conf/core-site.xml,hdfs-site.xml, mapred-site.xml
--------- core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadoop/hadoop-${user.name}</value>
</property>
</configuration>
-----------hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
-----------mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
(5)安装ssh
$ sudo apt-get install ssh
(6)免密码ssh设置
root@ubuntu:~/hadoop-0.20.2#ssh-keygen -t rsa
一直按enter
$cd /root/.ssh
$ cp id_rsa.pub authorized_keys
$ssh localhost
Linux ubuntu 2.6.31-14-generic#48-Ubuntu SMP Fri Oct 16 14:04:26 UTC 2009 i686
To access officialUbuntu documentation, please visit:
http://help.ubuntu.com/
316 packages canbe updated.
148 updates aresecurity updates.
Last login: ThuSep 23 00:05:56 2010 from localhost
3、实验步骤
格式化dfs
$bin/hadoop namenode –format
10/09/2300:10:48 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG:Starting NameNode
STARTUP_MSG: host = ubuntu/127.0.0.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
10/09/2300:10:49 INFO namenode.FSNamesystem: fsOwner=root,root
10/09/2300:10:49 INFO namenode.FSNamesystem: supergroup=supergroup
10/09/2300:10:49 INFO namenode.FSNamesystem: isPermissionEnabled=true
10/09/2300:10:49 INFO common.Storage: Image file of size 94 saved in 0 seconds.
10/09/2300:10:49 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name hasbeen successfully formatted.
10/09/2300:10:49 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG:Shutting down NameNode at ubuntu/127.0.0.1
************************************************************/
运行Hadoop
$bin/start-all.sh
root@ubuntu:~/hadoop-0.20.2/bin#start-all.sh
startingnamenode, logging to /root/hadoop-0.20.2/bin/../logs/hadoop-root-namenode-ubuntu.out
localhost:starting datanode, logging to /root/hadoop-0.20.2/bin/../logs/hadoop-root-datanode-ubuntu.out
localhost:starting secondarynamenode, logging to /root/hadoop-0.20.2/bin/../logs/hadoop-root-secondarynamenode-ubuntu.out
startingjobtracker, logging to /root/hadoop-0.20.2/bin/../logs/hadoop-root-jobtracker-ubuntu.out
localhost:starting tasktracker, logging to /root/hadoop-0.20.2/bin/../logs/hadoop-root-tasktracker-ubuntu.out
root@ubuntu:~/hadoop-0.20.2/bin#jps
13609 NameNode
13971 JobTracker
13749 DataNode
13900SecondaryNameNode
14112TaskTracker
14150 Jps
将input目录复制到hdfs根目录下,重命名为in
$bin/hadoop fs -put input in ($bin/hadoop dfs –copyromLocal /root/inputin)
(bin/hadoop dfs –ls in)
$bin/hadoop jar hadoop-0.20.2-examples.jarwordcount in out
10/09/2309:38:42 INFO input.FileInputFormat: Total input paths to process : 2
10/09/2309:38:43 INFO mapred.JobClient: Running job: job_201009230935_0002
10/09/2309:38:44 INFO mapred.JobClient: map 0%reduce 0%
10/09/2309:39:01 INFO mapred.JobClient: map 100%reduce 0%
10/09/2309:39:19 INFO mapred.JobClient: map 100%reduce 100%
10/09/2309:39:21 INFO mapred.JobClient: Job complete: job_201009230935_0002
10/09/2309:39:21 INFO mapred.JobClient: Counters: 17
10/09/2309:39:21 INFO mapred.JobClient: JobCounters
10/09/2309:39:21 INFO mapred.JobClient: Launched reduce tasks=1
10/09/2309:39:21 INFO mapred.JobClient: Launched map tasks=2
10/09/2309:39:21 INFO mapred.JobClient: Data-local map tasks=2
10/09/2309:39:21 INFO mapred.JobClient: FileSystemCounters
10/09/23 09:39:21INFO mapred.JobClient: FILE_BYTES_READ=55
10/09/2309:39:21 INFO mapred.JobClient: HDFS_BYTES_READ=25
10/09/2309:39:21 INFO mapred.JobClient: FILE_BYTES_WRITTEN=180
10/09/2309:39:21 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=25
10/09/2309:39:21 INFO mapred.JobClient: Map-Reduce Framework
10/09/2309:39:21 INFO mapred.JobClient: Reduce input groups=3
10/09/2309:39:21 INFO mapred.JobClient: Combine output records=4
10/09/2309:39:21 INFO mapred.JobClient: Mapinput records=2
10/09/2309:39:21 INFO mapred.JobClient: Reduce shuffle bytes=61
10/09/2309:39:21 INFO mapred.JobClient: Reduce output records=3
10/09/2309:39:21 INFO mapred.JobClient: Spilled Records=8
10/09/2309:39:21 INFO mapred.JobClient: Mapoutput bytes=41
10/09/2309:39:21 INFO mapred.JobClient: Combine input records=4
10/09/2309:39:21 INFO mapred.JobClient: Mapoutput records=4
10/09/2309:39:21 INFO mapred.JobClient: Reduce input records=4
查看结果
$bin/hadoop fs -cat out/*
hadoop 1
hello 2
world 1
cat: Source mustbe a file.
复制到本地查看
$bin/hadoop fs -get out output
$ cat output/*
cat:output/_logs: Is a directory
hadoop 1
hello 2
world 1
停止Hadoop守护进程
$bin/stop-all.sh
实验三:完全分布式模式
5、 实验环境
11) 二台PC机,安装windows操作系统
12) Vmware虚拟机:VMwareWorkstation7
13) Linux操作系统:ubuntu-9.10-desktop-i386.iso
14) Hadoop安装包:hadoop-0.20.2.tar.gz
15) Java安装包:jdk-6u21-linux-i586.bin
6、 实验准备
9) 在windows操作系统下建立共享文件夹share,将所需的安装Java安装包和Hadoop安装拷入其中。(对应Linux虚拟机的/mnt/hgfs/share)
10) 安装Vmware虚拟机
11) 安装Linux虚拟机
12) 进入Linux虚拟机:
(1)安装Java
$cd /usr
$mkdir java
$cd java
$./mnt/hgfs/share/jdk-6u21-linux-i586.bin
(2) 解压Hadoop安装包
$tar –zxvf /mnt/hgfs/share/hadoop-0.20.2.tar.gz
(3) 配置hadoop/conf/hadoop-env.sh文件,修改:export JAVA_HOME=/usr/java/jdk1.6.0_21
(4)修改etc/hostname:master
Host文件添加ip和主机名
(5)配置hadoop/conf/core-site.xml,hdfs-site.xml, mapred-site.xml
--------- core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadoop/hadoop-${user.name}</value>
</property>
</configuration>
-----------hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
-----------mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:9001</value>
</property>
</configuration>
(6)安装ssh
$ sudo apt-get install ssh
(7)免密码ssh设置
$hadoop-0.20.2#ssh-keygen -t rsa
一直按enter
$cd /root/.ssh
$ cp id_rsa.pub authorized_keys
将虚拟机文件复制到另一台计算机,安装好VMWare
(8)启动slave主机,修改/etc/host及hostname(hostname为slave1)
(9)进入master主机/root/.ssh
$scp authorzed_keys slave1:/root/.ssh
(10)进入slave主机的/root/.ssh
$chmod 644 authorzed_keys
3、实验步骤
格式化dfs
$bin/hadoop namenode –format
10/09/2300:10:48 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG:Starting NameNode
STARTUP_MSG: host = ubuntu/127.0.0.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build =https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707;compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
10/09/2300:10:49 INFO namenode.FSNamesystem: fsOwner=root,root
10/09/2300:10:49 INFO namenode.FSNamesystem: supergroup=supergroup
10/09/2300:10:49 INFO namenode.FSNamesystem: isPermissionEnabled=true
10/09/2300:10:49 INFO common.Storage: Image file of size 94 saved in 0 seconds.
10/09/2300:10:49 INFO common.Storage: Storage directory /tmp/hadoop-root/dfs/name hasbeen successfully formatted.
10/09/2300:10:49 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG:Shutting down NameNode at ubuntu/127.0.0.1
************************************************************/
运行Hadoop
$bin/start-all.sh
root@ubuntu:~/hadoop-0.20.2/bin#start-all.sh
startingnamenode, logging to /root/hadoop-0.20.2/bin/../logs/hadoop-root-namenode-ubuntu.out
localhost:starting datanode, logging to /root/hadoop-0.20.2/bin/../logs/hadoop-root-datanode-ubuntu.out
localhost:starting secondarynamenode, logging to /root/hadoop-0.20.2/bin/../logs/hadoop-root-secondarynamenode-ubuntu.out
startingjobtracker, logging to /root/hadoop-0.20.2/bin/../logs/hadoop-root-jobtracker-ubuntu.out
localhost:starting tasktracker, logging to /root/hadoop-0.20.2/bin/../logs/hadoop-root-tasktracker-ubuntu.out
root@ubuntu:~/hadoop-0.20.2/bin#jps
13609 NameNode
13971 JobTracker
13900SecondaryNameNode
14150 Jps
将input目录复制到hdfs根目录下,重命名为in
$bin/hadoop fs -put input in
$bin/hadoop jar hadoop-0.20.2-examples.jarwordcount in out
10/09/2419:21:42 INFO input.FileInputFormat: Total input paths to process : 3
10/09/2419:21:43 INFO mapred.JobClient: Running job: job_201009241902_0003
10/09/2419:21:44 INFO mapred.JobClient: map 0%reduce 0%
10/09/2419:21:59 INFO mapred.JobClient: map 66%reduce 0%
10/09/2419:22:02 INFO mapred.JobClient: map 100%reduce 0%
10/09/2419:22:11 INFO mapred.JobClient: map 100%reduce 100%
10/09/2419:22:13 INFO mapred.JobClient: Job complete: job_201009241902_0003
10/09/2419:22:13 INFO mapred.JobClient: Counters: 17
10/09/2419:22:13 INFO mapred.JobClient: JobCounters
10/09/2419:22:13 INFO mapred.JobClient: Launched reduce tasks=1
10/09/2419:22:13 INFO mapred.JobClient: Launched map tasks=3
10/09/2419:22:13 INFO mapred.JobClient: Data-local map tasks=3
10/09/2419:22:13 INFO mapred.JobClient: FileSystemCounters
10/09/2419:22:13 INFO mapred.JobClient: FILE_BYTES_READ=112
10/09/2419:22:13 INFO mapred.JobClient: HDFS_BYTES_READ=70
10/09/2419:22:13 INFO mapred.JobClient: FILE_BYTES_WRITTEN=332
10/09/2419:22:13 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=54
10/09/2419:22:13 INFO mapred.JobClient: Map-ReduceFramework
10/09/2419:22:13 INFO mapred.JobClient: Reduce input groups=7
10/09/2419:22:13 INFO mapred.JobClient: Combine output records=9
10/09/2419:22:13 INFO mapred.JobClient: Mapinput records=3
10/09/2419:22:13 INFO mapred.JobClient: Reduce shuffle bytes=124
10/09/2419:22:13 INFO mapred.JobClient: Reduce output records=7
10/09/2419:22:13 INFO mapred.JobClient: Spilled Records=18
10/09/2419:22:13 INFO mapred.JobClient: Mapoutput bytes=118
10/09/2419:22:13 INFO mapred.JobClient: Combine input records=12
10/09/2419:22:13 INFO mapred.JobClient: Mapoutput records=12
10/09/2419:22:13 INFO mapred.JobClient: Reduce input records=9
查看结果
$bin/hadoop dfs -cat out/*
HDFS 1
hadoop 1
hello 5
linux 1
my 1
ubuntu 1
world 2
cat: Source mustbe a file.
停止Hadoop守护进程
$bin/stop-all.sh
- hadoop实验记录
- hadoop 伪分布式安装实验记录一
- 【伊利丹】Hadoop-2.5.0-CDH5.2.0 RM HA实验记录
- Hadoop实验
- 实验记录
- 实验记录
- 【伊利丹】Hadoop-2.5.0-CDH5.2.0 版本升级和数据均衡 实验记录
- Hadoop平台实验报告
- Hadoop的实验环境
- Hadoop实验:wordcount
- Hadoop大数据实验
- hadoop-wordcount 实验总结
- Hadoop实验环境
- 数值运算实验记录
- 实验问题解决记录
- 《windows编程实验记录》
- 第三次上机实验记录
- sql实验记录
- oracle的性能设计和开发
- tcp/ip socket 深入理解
- java常用函数之math
- 位操作
- 使用 JavaScript File API 实现文件上传
- hadoop实验记录
- Linux/ubuntu常用快捷键大集合
- stream故障处理汇总
- 类
- Android uevent
- Hadoop使用常见问题以及解决方法
- hdu 1099 lottery
- iphone学习笔记--获得iPhone通讯录中所有联系人的电话号码和邮箱
- Java 收取 和 发送 邮件+SSL