centos6.4安装hadoop-1.2.1,实现wordcount功能
来源:互联网 发布:阿里云客服真的赚钱吗 编辑:程序博客网 时间:2024/05/10 22:42
经过了近一个月的学习hadoop知识,大致了解了hadoop的功能,工作原理和相关介绍。觉得可是试着搭建hadoop的伪分布模式实现类似入门程序语言 Hello Word 的 wordcount程序了。
Let's go!
软件环境:
1、虚拟机 centos 6.4 virtaulbox
2、java 1.7
3、hadoop 1.2.1
步骤:
1、虚拟机安装。 安装的虚拟机为最小安装模式,分配的是 8G的磁盘空间 1G的RAM
2、安装虚拟机后,配置虚拟机的网络参数。
2.1 使用root 账户登录。
2.2 使用命令 vi /etc/sysconfig/network-scripts/ifcfg-eth0 配置网络,配置如下:
[root@caixen-1 usr]# vi /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
HWADDR=08:00:27:50:23:5B
TYPE=Ethernet
UUID=2688ca88-3ed9-4187-802c-2f84729c56ed
ONBOOT=yes
NM_CONTROLLED=yes
BOOTPROTO=none
IPADDR=192.168.2.100
NETMASK=255.255.255.0
GATEWAY=192.168.2.1
保存并退出
2.3 添加nameserver 解决域名不能解析的问题
[root@caixen-1 usr]# vi /etc/resolv.conf
nameserver 192.168.2.1
保存并退出
2.4 重启network, 使配置生效
[root@caixen-1 usr]# service network restart
Shutting down interface eth0: [ OK ]
Shutting down loopback interface: [ OK ]
Bringing up loopback interface: [ OK ]
Bringing up interface eth0: [ OK ]
2.5 查看是否联通网络
[root@caixen-1 usr]# ping www.baidu.com
PING www.a.shifen.com (61.135.169.121) 56(84) bytes of data.
64 bytes from 61.135.169.121: icmp_seq=1 ttl=55 time=36.9 ms
64 bytes from 61.135.169.121: icmp_seq=2 ttl=54 time=41.2 ms
64 bytes from 61.135.169.121: icmp_seq=3 ttl=54 time=36.4 ms
64 bytes from 61.135.169.121: icmp_seq=4 ttl=55 time=34.8 ms
64 bytes from 61.135.169.121: icmp_seq=5 ttl=55 time=35.7 ms
64 bytes from 61.135.169.121: icmp_seq=6 ttl=54 time=34.9 ms
64 bytes from 61.135.169.121: icmp_seq=7 ttl=54 time=30.4 ms
^C
--- www.a.shifen.com ping statistics ---
7 packets transmitted, 7 received, 0% packet loss, time 6552ms
rtt min/avg/max/mdev = 30.485/35.812/41.270/2.968 ms
网络配置成功!
3、安装JDK 1.7
3.1 在 /usr/ 目录下 创建 java 文件夹用于存放java文件
[root@caixen-1 usr]# mkdir java
3.2 上传JDK到java文件目录下,并解压
[root@caixen-1 usr]# cd java
[root@caixen-1 java]# ls -al
total 140052
drwxr-xr-x. 2 root root 4096 Jan 20 23:21 .
drwxr-xr-x. 14 root root 4096 Jan 20 23:05 ..
-rw-r--r--. 1 root root 143398235 Jan 20 23:21 jdk-7u71-linux-i586.tar.gz
[root@caixen-1 java]# tar -xzf jdk-7u71-linux-i586.tar.gz
[root@caixen-1 java]# ls -al
total 140056
drwxr-xr-x. 3 root root 4096 Jan 20 23:23 .
drwxr-xr-x. 14 root root 4096 Jan 20 23:05 ..
drwxr-xr-x. 8 uucp 143 4096 Sep 27 08:30 jdk1.7.0_71
-rw-r--r--. 1 root root 143398235 Jan 20 23:21 jdk-7u71-linux-i586.tar.gz
3.3 配置JAVA_HOME, PATH
[root@caixen-1 ~]# vi /etc/profile
# java and hadoop enviroment
export JAVA_HOME=/usr/java/jdk1.7.0_71
export CLASSPATH=".:$JAVA_HOME/lib:$CLASSPATH"
export PATH="$JAVA_HOME/bin:$PATH"
保存并退出
3.4 测试JAVA是否能运行
[root@caixen-1 ~]# java -version
java version "1.7.0_71"
Java(TM) SE Runtime Environment (build 1.7.0_71-b14)
Java HotSpot(TM) Client VM (build 24.71-b01, mixed mode)
配置成功!
4、安装hadoop 1.2.1
4.1 在 /usr/ 目录下 创建 hadoop文件夹用于存放hadoop文件
[root@caixen-1 usr]# mkir hadoop
4.2 上传hadoop到hadoop文件夹中,并解压
[root@caixen-1 usr]# cd hadoop/
[root@caixen-1 hadoop]# ls -al
total 62364
drwxr-xr-x. 2 root root 4096 Jan 21 21:36 .
drwxr-xr-x. 14 root root 4096 Jan 20 23:05 ..
-rw-r--r--. 1 root root 63851630 Jan 21 21:37 hadoop-1.2.1.tar.gz
[root@caixen-1 hadoop]# tar -xzf hadoop-1.2.1.tar.gz
[root@caixen-1 hadoop]# ls -al
total 62368
drwxr-xr-x. 3 root root 4096 Jan 21 21:38 .
drwxr-xr-x. 14 root root 4096 Jan 20 23:05 ..
drwxr-xr-x. 15 root root 4096 Jul 23 2013 hadoop-1.2.1
-rw-r--r--. 1 root root 63851630 Jan 21 21:37 hadoop-1.2.1.tar.gz
4.3 配置HADOOP_PREFIX
[root@caixen-1 hadoop]# vi /etc/profile
# java and hadoop enviroment
export JAVA_HOME=/usr/java/jdk1.7.0_71
export HADOOP_PREFIX=/usr/hadoop/hadoop-1.2.1
export CLASSPATH=".:$JAVA_HOME/lib:$CLASSPATH"
export PATH="$JAVA_HOME/bin:$PATH:$HADOOP_PREFIX/bin:$PATH"
export HADOOP_PREFIX PATH CLASSPATH
保存并退出
4.4 测试hadoop是否能正常运行
[root@caixen-1 hadoop]# hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
namenode -format format the DFS filesystem
secondarynamenode run the DFS secondary namenode
namenode run the DFS namenode
datanode run a DFS datanode
dfsadmin run a DFS admin client
mradmin run a Map-Reduce admin client
fsck run a DFS filesystem checking utility
fs run a generic filesystem user client
balancer run a cluster balancing utility
oiv apply the offline fsimage viewer to an fsimage
fetchdt fetch a delegation token from the NameNode
jobtracker run the MapReduce job Tracker node
pipes run a Pipes job
tasktracker run a MapReduce task Tracker node
historyserver run job history servers as a standalone daemon
job manipulate MapReduce jobs
queue get information regarding JobQueues
version print the version
jar <jar> run a jar file
distcp <srcurl> <desturl> copy file or directories recursively
distcp2 <srcurl> <desturl> DistCp version 2
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
classpath prints the class path needed to get the
Hadoop jar and the required libraries
daemonlog get/set the log level for each daemon
or
CLASSNAME run the class named CLASSNAME
Most commands print help when invoked w/o parameters.
正常运行!
5、 配置hadoop
5.1 配置 core-size.xml
[root@caixen-1 hadoop]# vi hadoop-1.2.1/conf/ core-size.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/caixen/hadooptmpdir</value>
<description> a base for other temporay directories.</description>
</property>
</configuration>
5.2 配置hdfs-site.xml
[root@caixen-1 hadoop]# vi hadoop-1.2.1/conf/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
5.3 配置mapred-site.xml
[root@caixen-1 hadoop]# vi hadoop-1.2.1/conf/mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
5.4 配置 hadoop-env.sh 添加 JAVA_HOME路径
export JAVA_HOME=/usr/java/jdk1.7.0_71
handoop配置完成!
6、 停用iptales 和 修改 hosts
[root@caixen-1 hadoop]# service iptables stop
iptables: Flushing firewall rules: [ OK ]
iptables: Setting chains to policy ACCEPT: filter [ OK ]
iptables: Unloading modules: [ OK ]
[root@caixen-1 hadoop]# vi /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.2.100 caixen-1
7、如果安装 server 和 client opneSSH, 并配置 无密钥 SSH 登陆
[root@caixen-1 hadoop]# yum install -y openssh-server.i686 openssh-clients.i686
安装完成后配置无密钥登陆
[root@caixen-1 hadoop]# ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
Generating public/private dsa key pair.
Created directory '/root/.ssh'.
Your identification has been saved in /root/.ssh/id_dsa.
Your public key has been saved in /root/.ssh/id_dsa.pub.
The key fingerprint is:
42:c1:bc:5f:45:8e:95:d5:72:59:8d:12:59:b6:a8:40 root@caixen-1
The key's randomart image is:
+--[ DSA 1024]----+
| o. E .+*+o=|
| oo +=ooo+|
| ... ..o..o |
| .. ... |
| ..S.. |
| .. |
| |
| |
| |
+-----------------+
[root@caixen-1 hadoop]# cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
测试 ssh localhost
[root@caixen-1 hadoop]# ssh localhost
The authenticity of host 'localhost (::1)' can't be established.
RSA key fingerprint is 8b:69:86:bf:3c:78:e3:f8:9e:25:a1:09:ce:25:2a:46.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
Last login: Wed Jan 21 21:19:57 2015 from 192.168.2.102
成功! exit 退出
7、 格式化 namenode
[root@caixen-1 /]# hadoop namenode -format
15/01/21 22:20:16 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = caixen-1/220.250.64.225
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG: java = 1.7.0_71
************************************************************/
15/01/21 22:20:16 INFO util.GSet: Computing capacity for map BlocksMap
15/01/21 22:20:16 INFO util.GSet: VM type = 32-bit
15/01/21 22:20:16 INFO util.GSet: 2.0% max memory = 1013645312
15/01/21 22:20:16 INFO util.GSet: capacity = 2^22 = 4194304 entries
15/01/21 22:20:16 INFO util.GSet: recommended=4194304, actual=4194304
15/01/21 22:20:17 INFO namenode.FSNamesystem: fsOwner=root
15/01/21 22:20:17 INFO namenode.FSNamesystem: supergroup=supergroup
15/01/21 22:20:17 INFO namenode.FSNamesystem: isPermissionEnabled=true
15/01/21 22:20:17 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
15/01/21 22:20:17 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
15/01/21 22:20:17 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
15/01/21 22:20:17 INFO namenode.NameNode: Caching file names occuring more than 10 times
15/01/21 22:20:17 INFO common.Storage: Image file /home/caixen/hadooptmpdir/dfs/name/current/fsimage of size 110 bytes saved in 0 seconds.
15/01/21 22:20:17 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/home/caixen/hadooptmpdir/dfs/name/current/edits
15/01/21 22:20:17 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/home/caixen/hadooptmpdir/dfs/name/current/edits
15/01/21 22:20:17 INFO common.Storage: Storage directory /home/caixen/hadooptmpdir/dfs/name has been successfully formatted.
15/01/21 22:20:17 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at caixen-1/220.250.64.225
************************************************************/
格式化成功!
8、 启动hadoop程序
[root@caixen-1 /]# start-all.sh
starting namenode, logging to /usr/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-namenode-caixen-1.out
localhost: starting datanode, logging to /usr/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-datanode-caixen-1.out
localhost: starting secondarynamenode, logging to /usr/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-secondarynamenode-caixen-1.out
starting jobtracker, logging to /usr/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-jobtracker-caixen-1.out
localhost: starting tasktracker, logging to /usr/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-root-tasktracker-caixen-1.out
9、 jps查看启动状态
[root@caixen-1 /]# jps
2746 DataNode
2646 NameNode
2847 SecondaryNameNode
3189 Jps
3031 TaskTracker
2927 JobTracker
hadoop 成功运行
10、 进行wordcount 程序实例
10.1 本地文件夹hadoop_data中写入 a.txt b.txt文件
[root@caixen-1 /]# mkdir /usr/hadoop_data
[root@caixen-1 /]# cd /usr/hadoop_data/
[root@caixen-1 hadoop_data]# echo "hello hadoop" > a.txt
[root@caixen-1 hadoop_data]# echo "hello bala bala bala" > b.txt
[root@caixen-1 hadoop_data]# ls -al
total 16
drwxr-xr-x. 2 root root 4096 Jan 21 22:30 .
drwxr-xr-x. 15 root root 4096 Jan 21 22:29 ..
-rw-r--r--. 1 root root 13 Jan 21 22:29 a.txt
-rw-r--r--. 1 root root 24 Jan 21 22:30 b.txt
10.2 将txt文件夹 拷贝到 HDFS 中
[root@caixen-1 hadoop_data]# hadoop dfs -mkdir /input
[root@caixen-1 hadoop_data]# hadoop dfs -ls /
Found 2 items
drwxr-xr-x - root supergroup 0 2015-01-21 22:21 /home
drwxr-xr-x - root supergroup 0 2015-01-21 22:32 /input
[root@caixen-1 hadoop_data]# hadoop dfs -put /usr/hadoop_data/* /input
[root@caixen-1 hadoop_data]# hadoop dfs -ls /input/
Found 2 items
-rw-r--r-- 1 root supergroup 13 2015-01-21 22:34 /input/a.txt
-rw-r--r-- 1 root supergroup 24 2015-01-21 22:34 /input/b.txt
拷贝成功!
10.3 执行wordcount 实例
[root@caixen-1 hadoop_data]# hadoop jar /usr/hadoop/hadoop-1.2.1/hadoop-examples-1.2.1.jar wordcount /input /output
15/01/21 22:44:52 INFO input.FileInputFormat: Total input paths to process : 2
15/01/21 22:44:52 INFO util.NativeCodeLoader: Loaded the native-hadoop library
15/01/21 22:44:52 WARN snappy.LoadSnappy: Snappy native library not loaded
15/01/21 22:44:53 INFO mapred.JobClient: Running job: job_201501212242_0001
15/01/21 22:44:54 INFO mapred.JobClient: map 0% reduce 0%
15/01/21 22:45:11 INFO mapred.JobClient: map 100% reduce 0%
15/01/21 22:45:22 INFO mapred.JobClient: map 100% reduce 33%
15/01/21 22:45:24 INFO mapred.JobClient: map 100% reduce 100%
15/01/21 22:45:28 INFO mapred.JobClient: Job complete: job_201501212242_0001
15/01/21 22:45:28 INFO mapred.JobClient: Counters: 29
15/01/21 22:45:28 INFO mapred.JobClient: Job Counters
15/01/21 22:45:28 INFO mapred.JobClient: Launched reduce tasks=1
15/01/21 22:45:28 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=29854
15/01/21 22:45:28 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
15/01/21 22:45:28 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
15/01/21 22:45:28 INFO mapred.JobClient: Launched map tasks=2
15/01/21 22:45:28 INFO mapred.JobClient: Data-local map tasks=2
15/01/21 22:45:28 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=11933
15/01/21 22:45:28 INFO mapred.JobClient: File Output Format Counters
15/01/21 22:45:28 INFO mapred.JobClient: Bytes Written=24
15/01/21 22:45:28 INFO mapred.JobClient: FileSystemCounters
15/01/21 22:45:28 INFO mapred.JobClient: FILE_BYTES_READ=54
15/01/21 22:45:28 INFO mapred.JobClient: HDFS_BYTES_READ=233
15/01/21 22:45:28 INFO mapred.JobClient: FILE_BYTES_WRITTEN=171617
15/01/21 22:45:28 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=24
15/01/21 22:45:28 INFO mapred.JobClient: File Input Format Counters
15/01/21 22:45:28 INFO mapred.JobClient: Bytes Read=37
15/01/21 22:45:28 INFO mapred.JobClient: Map-Reduce Framework
15/01/21 22:45:28 INFO mapred.JobClient: Map output materialized bytes=60
15/01/21 22:45:28 INFO mapred.JobClient: Map input records=2
15/01/21 22:45:28 INFO mapred.JobClient: Reduce shuffle bytes=60
15/01/21 22:45:28 INFO mapred.JobClient: Spilled Records=8
15/01/21 22:45:28 INFO mapred.JobClient: Map output bytes=58
15/01/21 22:45:28 INFO mapred.JobClient: Total committed heap usage (bytes)=247341056
15/01/21 22:45:28 INFO mapred.JobClient: CPU time spent (ms)=2940
15/01/21 22:45:28 INFO mapred.JobClient: Combine input records=6
15/01/21 22:45:28 INFO mapred.JobClient: SPLIT_RAW_BYTES=196
15/01/21 22:45:28 INFO mapred.JobClient: Reduce input records=4
15/01/21 22:45:28 INFO mapred.JobClient: Reduce input groups=3
15/01/21 22:45:28 INFO mapred.JobClient: Combine output records=4
15/01/21 22:45:28 INFO mapred.JobClient: Physical memory (bytes) snapshot=323555328
15/01/21 22:45:28 INFO mapred.JobClient: Reduce output records=3
15/01/21 22:45:28 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1042567168
15/01/21 22:45:28 INFO mapred.JobClient: Map output records=6
执行成功!
10.4 查看执行结果
[root@caixen-1 hadoop_data]# hadoop dfs -ls /output
Found 3 items
-rw-r--r-- 1 root supergroup 0 2015-01-21 22:45 /output/_SUCCESS
drwxr-xr-x - root supergroup 0 2015-01-21 22:44 /output/_logs
-rw-r--r-- 1 root supergroup 24 2015-01-21 22:45 /output/part-r-00000
[root@caixen-1 hadoop_data]# hadoop dfs -cat /output/part-r-00000
bala 3
hadoop 1
hello 2
至此 , hadoop 伪分布测试环境搭建完成
如果在执行 HDFS 删除文件时 提示为 safe mode 需要关闭 safemode
[root@caixen-1 hadoop_data]# hadoop dfs -rmr /output
rmr: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /output. Name node is in safe mode.
[root@caixen-1 hadoop_data]# hadoop dfsadmin -safemode leave
Safe mode is OFF
这时 即可删除 HDFS 中的文件了
[root@caixen-1 hadoop_data]# hadoop dfs -rmr /output
Deleted hdfs://localhost:9000/output
- centos6.4安装hadoop-1.2.1,实现wordcount功能
- Hadoop中wordCount功能实现Demo
- hadoop学习1 实现WordCount
- mac 10.9.5 安装hadoop 1.2.1 运行wordcount
- centos6.4安装hadoop-1.2.1全分布模式
- Eclipse实现Hadoop WordCount
- Hadoop的WordCount实现
- Scala实现WordCount功能
- Hadoop-1.2.1示例程序wordcount运行
- Hadoop-1.2.1下跑WordCount实例
- hadoop 安装+本地运行wordCount
- Centos6.4 +Hadoop 1.2.1集群配置
- centos6.4安装hadoop-2.5.1(完全分布式)
- Ubuntu安装单机1.2.1hadoop与伪分布式模式进行WordCount实验
- Linux下安装Hadoop(2.7.1)详解及WordCount运行
- Linux下安装Hadoop(2.7.1)详解及WordCount运行
- Centos6.4下安装Ganglia监控hadoop
- Hadoop 在centos6.4安装ssh
- MyEclipse/Eclipse 在编辑java时快捷键
- Z-satck Z-tool使用
- 关于Latex
- 彻底解决reason: Connection to https://dl-ssl.google.com refused问题
- IOS之导航栏中添加UITextView控件bug
- centos6.4安装hadoop-1.2.1,实现wordcount功能
- mysql(dml)(data manipulation language)
- 用Maven构建Mahout项目
- Sublime Text 2搭建Java开发环境
- linux下使用pocketsphinx
- 练习使用Spark and ML-Lib 预测航班延误
- oracle的多种字符集含义
- android sqlite3 adb
- Benifits of SOA