http://blog.csdn.net/yczws1/article/category/1780343
这个序列不错
hadoop学习(一)hadoop-1.2.1伪分布式配置及遇到的问题
简化可行安装:如果一遍启动有问题,删了,再一遍,三遍过后,第四遍,你知道问题出现在哪儿了!
1.JDK 安装:下载路径 http://www.Oracle.com/technetwork/Java/javase/downloads/jdk7-downloads-1880260.html
下载为最新版本的jdk,这里为32位:jdk-7u45-Linux-i586.tar.gz
tar -zxvf jdk-7u45-linux-i586.tar.gz
不需要安装,直接解压就OK。解压文件为:jdk1.7.0_45。记住路径。
配置环境变量:
在你的/etc/profile文件中增加一条这样子的配置sudo gedit profile:添加
export JAVA_HOME=/home/zhangzhen/software/jdk1.7.0_45PATH=$JAVA_HOME/bin:$PATH关闭,然后执行:source profile
查看java 版本 :
java -version
ok
如果系统已经安装过jdk或安装jdk之后不知道jdk的安装路径在。
执行:whereis java。这里只是找到你安装的路径。我们需要的是java,jdk的路径。也就是你解压jdk压缩包后的文件,放到了什么地方。如果是在线自动安装java,jdk。用whereis java 查找你解压后的路径。这里推荐,自己下载jdk,解压。
2.安装Hadoop (version 1.2.1)伪分布式:
(1)假设hadoop-jdk-7u45-linux-i586.tar.gztar.gz在桌面,将它复制到安装目录 /usr/local/下;
1
sudo
cp
hadoop-1.2.1.tar.gz
/usr/local/
(2). 解压hadoop-1.2.1.tar.gz;
cd
/usr/local
sudo
tar
-zxvf h
hadoop-1.1.2.tar.gz
(3). 将解压出的文件夹改名为hadoop;
sudo
mv
hadoop-1.2.1 hadoop
(4). 将该hadoop文件夹的属主用户设为hadoop,
(5). 打开hadoop/conf/hadoop-env.sh文件;
1
sudo
gedit hadoop
/conf/hadoop-env
.sh
(6). 配置conf/hadoop-env.sh(找到#export JAVA_HOME=...,去掉#,然后加上本机jdk的路径);
1
export JAVA_HOME=/usr/lib/jvm/java-6-jdk
(7). 打开conf/core-site.xml文件;
sudo
gedit hadoop
/conf/core-site
.xml
修改如下:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
(8). 打开conf/mapred-site.xml文件;
sudo
gedit hadoop
/conf/mapred-site
.xml
修改如下:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>
(9). 打开conf/hdfs-site.xml文件;
1
sudo
gedit hadoop
/conf/hdfs-site
.xml
编辑如下:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/zhangzhen/software/hadoop-1.2.1/</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/home/zhangzhen/software/hadoop-1.2.1/data</value>
</property>
#<property>
#<name>dfs.name.dir</name>
#<value>/home/zhangzhen/software/hadoop-1.2.1/name</value>
#</property>
</configuration>
关于data的目录,在指定的路径自己建立。name不需要指定。
执行:./bin/hadoop namenode -format
执行:/bin/start-all.sh
执行: jps
(我是自己创建的data目录:data,是为以后数据存储的地方。name目录是:默认,不需要配置)
要把data的权限设成755,也可以不管。如果你是自己通过:mkdir data 创建的文件夹,它的权限默认就是755。如果是系统自动生成的data、name目录,要改写一下data的权限:chmod 775 data,就OK了!
3.安装ssh服务
ssh可以实现远程登录和管理,具体可以参考其他相关资料。
(1)安装openssh-server;
sudo
apt-get
install
ssh
openssh-server
(2)配置ssh无密码登录本机
创建ssh-key,这里我们采用dsa方式;
回车后会在/home/zhangzhen/.ssh/下生成两个文件:id_rsa和id_rsa.pub.这两个文件是成对出现的,查看两个文件
# cd /home/zhangzhen/.ssh
# ls -a
. .. authorized_keys id_dsa id_dsa.pub known_hosts
进入~/.ssh/目录下,将id_rsa.pub追加到authorized_keys授权文件中,开始是没有authorized_keys文件的;
cd /home/zhangzhen/.ssh
cat
id_dsa.pub >> authorized_keys
验证ssh是否安装成功,输入命令ssh -version,显示如下$ ssh -version
登录localhost;
单机的时候:要注意一般情况下,ssh是启动的额。
执行退出命令;
4. 在单机上运行hadoop
1. 进入hadoop目录下,格式化hdfs文件系统,初次运行hadoop时一定要有该操作,
cd
/home/zhangzhen/hadoop-1.2.1/
bin
/hadoop
namenode -
format
2. 当你看到下图时,就说明你的hdfs文件系统格式化成功了。
3. 启动bin/start-all.sh
4. 检测hadoop是否启动成功
显示如下
6755 Jps
5432 TaskTracker
4866 DataNode
4638 NameNode
5109 SecondaryNameNode
5201 JobTracker
如果6行都出来,说明搭建成功,漏一个都是有。
如果都列出来,可以通过firefox浏览器查看mapreduce的web页面,使用
http://localhost:50030/
hdfs的web 页面
http://localhost:50070/
5.测试hadoop
- <strong><span style="font-size:12px;">zhangzhen@ubuntu:~$ mkdir input
- zhangzhen@ubuntu:~$ cd input/
- zhangzhen@ubuntu:~/input$ ls
- zhangzhen@ubuntu:~/input$ echo "hello world" >test1.txe
- zhangzhen@ubuntu:~/input$ echo "hello world" >test1.txx
- zhangzhen@ubuntu:~/input$ ls
- test1.txe test1.txx
- zhangzhen@ubuntu:~/input$ rm test1.txe
- zhangzhen@ubuntu:~/input$ rm test1.txx
- zhangzhen@ubuntu:~/input$ echo "hello world" >test1.txt
- zhangzhen@ubuntu:~/input$ echo "hello hadoop">test2.txt
- zhangzhen@ubuntu:~/input$ cat test1.txt
- hello world
- zhangzhen@ubuntu:~/input$ cat test2.txt
- hello hadoop</span></strong>
- <strong><span style="font-size:12px;">
-
- zhangzhen@ubuntu:~/software/hadoop-1.2.1/bin$ ./hadoop dfs -ls ./in/*
- -rw-r--r-- 1 zhangzhen supergroup 12 2014-01-14 04:49 /user/zhangzhen/in/test1.txt
- -rw-r--r-- 1 zhangzhen supergroup 13 2014-01-14 04:49 /user/zhangzhen/in/test2.txt</span></strong>
查看你的java jar 包在什么地方:
- <strong><span style="font-size:12px;"><span style="color:#FF6666;"><span style="color:#FF0000;">-rw-rw-r-- 1 zhangzhen zhangzhen 6842 Jul 22 18:26 hadoop-ant-1.2.1.jar
- -rw-rw-r-- 1 zhangzhen zhangzhen 414 Jul 22 18:26 hadoop-client-1.2.1.jar
- -rw-rw-r-- 1 zhangzhen zhangzhen 4203147 Jul 22 18:26 hadoop-core-1.2.1.jar
- -rw-rw-r-- 1 zhangzhen zhangzhen 142726 Jul 22 18:26 hadoop-examples-1.2.1.jar
- -rw-rw-r-- 1 zhangzhen zhangzhen 417 Jul 22 18:26 hadoop-minicluster-1.2.1.jar
- -rw-rw-r-- 1 zhangzhen zhangzhen 3126576 Jul 22 18:26 hadoop-test-1.2.1.jar
- -rw-rw-r-- 1 zhangzhen zhangzhen 385634 Jul 22 18:26 hadoop-tools-1.2.1.jar</span>
- </span>
- </span></strong>
- <strong><span style="font-size:12px;"><span style="color:#FF0000;">zhangzhen@ubuntu:~/software/hadoop-1.2.1$ cp hadoop-examples-1.2.1.jar /home/zhangzhen/software/hadoop-1.2.1/bin/</span>
- zhangzhen@ubuntu:~/software/hadoop-1.2.1$ ./hadoop jar hadoop-examples-1.2.1.jar wordcount in out
- -bash: ./hadoop: No such file or directory
- zhangzhen@ubuntu:~/software/hadoop-1.2.1$ cd bin/
- <span style="color:#FF0000;">zhangzhen@ubuntu:~/software/hadoop-1.2.1/bin$ ./hadoop jar hadoop-examples-1.2.1.jar wordcount in out</span>
- 14/01/14 05:47:19 INFO input.FileInputFormat: Total input paths to process : 2
- 14/01/14 05:47:19 INFO util.NativeCodeLoader: Loaded the native-hadoop library
- 14/01/14 05:47:19 WARN snappy.LoadSnappy: Snappy native library not loaded
- 14/01/14 05:47:20 INFO mapred.JobClient: Running job: job_201401140428_0001
- <span style="color:#FF0000;">14/01/14 05:47:21 INFO mapred.JobClient: map 0% reduce 0%
- 14/01/14 05:47:33 INFO mapred.JobClient: map 50% reduce 0%
- 14/01/14 05:47:34 INFO mapred.JobClient: map 100% reduce 0%
- 14/01/14 05:47:42 INFO mapred.JobClient: map 100% reduce 33%
- 14/01/14 05:47:43 INFO mapred.JobClient: map 100% reduce 100%</span>
- 14/01/14 05:47:45 INFO mapred.JobClient: Job complete: job_201401140428_0001
- 14/01/14 05:47:45 INFO mapred.JobClient: Counters: 29
- 14/01/14 05:47:45 INFO mapred.JobClient: Job Counters
- 14/01/14 05:47:45 INFO mapred.JobClient: Launched reduce tasks=1
- 14/01/14 05:47:45 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=21137
- 14/01/14 05:47:45 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
- 14/01/14 05:47:45 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
- 14/01/14 05:47:45 INFO mapred.JobClient: Launched map tasks=2
- 14/01/14 05:47:45 INFO mapred.JobClient: Data-local map tasks=2
- 14/01/14 05:47:45 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=10426
- 14/01/14 05:47:45 INFO mapred.JobClient: File Output Format Counters
- 14/01/14 05:47:45 INFO mapred.JobClient: Bytes Written=25
- 14/01/14 05:47:45 INFO mapred.JobClient: FileSystemCounters
- 14/01/14 05:47:45 INFO mapred.JobClient: FILE_BYTES_READ=55
- 14/01/14 05:47:45 INFO mapred.JobClient: HDFS_BYTES_READ=253
- 14/01/14 05:47:45 INFO mapred.JobClient: FILE_BYTES_WRITTEN=172644
- 14/01/14 05:47:45 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=25
- 14/01/14 05:47:45 INFO mapred.JobClient: File Input Format Counters
- 14/01/14 05:47:45 INFO mapred.JobClient: Bytes Read=25
- 14/01/14 05:47:45 INFO mapred.JobClient: Map-Reduce Framework
- 14/01/14 05:47:45 INFO mapred.JobClient: Map output materialized bytes=61
- 14/01/14 05:47:45 INFO mapred.JobClient: Map input records=2
- 14/01/14 05:47:45 INFO mapred.JobClient: Reduce shuffle bytes=61
- 14/01/14 05:47:45 INFO mapred.JobClient: Spilled Records=8
- 14/01/14 05:47:45 INFO mapred.JobClient: Map output bytes=41
- 14/01/14 05:47:45 INFO mapred.JobClient: Total committed heap usage (bytes)=248127488
- 14/01/14 05:47:45 INFO mapred.JobClient: CPU time spent (ms)=3820
- 14/01/14 05:47:45 INFO mapred.JobClient: Combine input records=4
- 14/01/14 05:47:45 INFO mapred.JobClient: SPLIT_RAW_BYTES=228
- 14/01/14 05:47:45 INFO mapred.JobClient: Reduce input records=4
- 14/01/14 05:47:45 INFO mapred.JobClient: Reduce input groups=3
- 14/01/14 05:47:45 INFO mapred.JobClient: Combine output records=4
- 14/01/14 05:47:45 INFO mapred.JobClient: Physical memory (bytes) snapshot=322818048
- 14/01/14 05:47:45 INFO mapred.JobClient: Reduce output records=3
- 14/01/14 05:47:45 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1040166912
- 14/01/14 05:47:45 INFO mapred.JobClient: Map output records=4</span></strong>
- <strong><span style="font-size:12px;">
- zhangzhen@ubuntu:~/software/hadoop-1.2.1$ bin/hadoop dfs -ls
- Found 2 items
- drwxr-xr-x - zhangzhen supergroup 0 2014-01-14 04:49 /user/zhangzhen/in
- drwxr-xr-x - zhangzhen supergroup 0 2014-01-14 05:47 /user/zhangzhen/out
- zhangzhen@ubuntu:~/software/hadoop-1.2.1$ bin/hadoop dfs -ls ./out
- Found 3 items
- -rw-r--r-- 1 zhangzhen supergroup 0 2014-01-14 05:47 /user/zhangzhen/out/_SUCCESS
- drwxr-xr-x - zhangzhen supergroup 0 2014-01-14 05:47 /user/zhangzhen/out/_logs
- -rw-r--r-- 1 zhangzhen supergroup 25 2014-01-14 05:47 /user/zhangzhen/out/part-r-00000
- zhangzhen@ubuntu:~/software/hadoop-1.2.1$ bin/hadoop dfs -cat ./out/*
- hadoop 1
- hello 2
- world 1</span></strong>
注意事项:
1、有些时候虽然按照上面方法去配置伪分布式hadoop,但是难免还是会遇到一些问题:在用jps查看启动项的时候,少了NameNode或DataNode。遇到没有启动起来的节点。通过日志来判断问题的所在。
问题:hadoop--namenode未启动解决
- <strong>zhangzhen@ubuntu:~/software/hadoop-1.2.1/bin$ ./hadoop namenode format
- 14/01/14 04:15:53 INFO namenode.NameNode: STARTUP_MSG:
- /************************************************************
- STARTUP_MSG: Starting NameNode
- STARTUP_MSG: host = ubuntu/127.0.1.1
- STARTUP_MSG: args = [format]
- STARTUP_MSG: version = 1.2.1
- STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
- STARTUP_MSG: java = 1.7.0_45
- ************************************************************/
- Usage: java NameNode [-format [-force ] [-nonInteractive]] | [-upgrade] | [-rollback] | [-finalize] | [-importCheckpoint] | [-recover [ -force ] ]
- 14/01/14 04:15:53 INFO namenode.NameNode: SHUTDOWN_MSG:
- /************************************************************
- SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1
- ************************************************************/
- zhangzhen@ubuntu:~/software/hadoop-1.2.1/bin$ ./start-all.sh
- starting namenode, logging to /home/zhangzhen/software/hadoop-1.2.1/libexec/../logs/hadoop-zhangzhen-namenode-ubuntu.out
- localhost: starting datanode, logging to /home/zhangzhen/software/hadoop-1.2.1/libexec/../logs/hadoop-zhangzhen-datanode-ubuntu.out
- localhost: starting secondarynamenode, logging to /home/zhangzhen/software/hadoop-1.2.1/libexec/../logs/hadoop-zhangzhen-secondarynamenode-ubuntu.out
- starting jobtracker, logging to /home/zhangzhen/software/hadoop-1.2.1/libexec/../logs/hadoop-zhangzhen-jobtracker-ubuntu.out
- localhost: starting tasktracker, logging to /home/zhangzhen/software/hadoop-1.2.1/libexec/../logs/hadoop-zhangzhen-tasktracker-ubuntu.out
- zhangzhen@ubuntu:~/software/hadoop-1.2.1/bin$ jps
- <span style="color:#FF6666;">12020 DataNode
- 12249 SecondaryNameNode
- 12331 JobTracker
- 12571 TaskTracker
- 12639 Jps</span></strong>
解决方法:
原先在 conf/hdfs-site.xml文件配置了name文件的路径:注释掉,采用默认,在指定路径mkdir data 设置数据路径。data为空。
重新格式化,启动hadoop:
- <strong>zhangzhen@ubuntu:~/software/hadoop-1.2.1/bin$ ./hadoop namenode -format
- 14/01/14 04:27:42 INFO namenode.NameNode: STARTUP_MSG:
- /************************************************************
- STARTUP_MSG: Starting NameNode
- STARTUP_MSG: host = ubuntu/127.0.1.1
- STARTUP_MSG: args = [-format]
- STARTUP_MSG: version = 1.2.1
- STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
- STARTUP_MSG: java = 1.7.0_45
- ************************************************************/
- Re-format filesystem in /home/zhangzhen/software/hadoop-1.2.1/dfs/name ? (Y or N) y
- Format aborted in /home/zhangzhen/software/hadoop-1.2.1/dfs/name
- 14/01/14 04:27:47 INFO namenode.NameNode: SHUTDOWN_MSG:
- /************************************************************
- SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1
- ************************************************************/
- zhangzhen@ubuntu:~/software/hadoop-1.2.1/bin$ ./start-all.sh
- starting namenode, logging to /home/zhangzhen/software/hadoop-1.2.1/libexec/../logs/hadoop-zhangzhen-namenode-ubuntu.out
- localhost: starting datanode, logging to /home/zhangzhen/software/hadoop-1.2.1/libexec/../logs/hadoop-zhangzhen-datanode-ubuntu.out
- localhost: starting secondarynamenode, logging to /home/zhangzhen/software/hadoop-1.2.1/libexec/../logs/hadoop-zhangzhen-secondarynamenode-ubuntu.out
- starting jobtracker, logging to /home/zhangzhen/software/hadoop-1.2.1/libexec/../logs/hadoop-zhangzhen-jobtracker-ubuntu.out
- localhost: starting tasktracker, logging to /home/zhangzhen/software/hadoop-1.2.1/libexec/../logs/hadoop-zhangzhen-tasktracker-ubuntu.out
- zhangzhen@ubuntu:~/software/hadoop-1.2.1/bin$ jps
- <span style="color:#FF6666;">17529 TaskTracker
- 17193 SecondaryNameNode
- 16713 NameNode
- 17594 Jps
- 17286 JobTracker
- 16957 DataNode</span></strong>
启动起来!
2、出现上述1问题,可能存在节点没有启动。把防火墙给关掉:
分别执行:
sudo apt-get install ufw
sudo ufw status(查看状态)
sudo ufw disable
然后重新启动Hadoop。
3、一般配置会出现的问题,在向data目录传文件的时候后,重启电脑,开启hadoop服务,执行jps,查看启动项会少一项。一般做的方式,是namenode空间格式化。在bin目录下执行:./hadoop namenode -format 。然后重新启动./hadoop start-all.sh。
但是,重启电脑,就要格式化namenode 空间,这未必会导致数据的流失。我们在实际安装过程中就遇到这种问题。一般都会是namenode和datanode的问题。通过查看:hadoop-zhangzhen-datanode-ubuntu.log 或 hadoop-zhangzhen-namenode-ubuntu.log日志,产看问题出现在哪?
(4)常见的错误分类请参考:http://blog.csdn.NET/yonghutwo/article/details/9206059
0 0