在centos虚拟机中安装hadoop(伪分布式模式)
来源:互联网 发布:四川大学校园网域名 编辑:程序博客网 时间:2024/05/22 15:48
原文来自 http://zhans52.iteye.com/blog/1102649,忘情游天下的CentOS 安装 hadoop(伪分布模式)。我的第一个hadoop测试环境,就是参考这篇文档搭建的。在搭建过程中,对原文一些SSH设置不太详细的地方做一些补充。感谢忘情游天下提供的好文章。
我装的CentOS 6 虚拟机,
软件准备:jdk 1.6 U29
hadoop:hadoop-0.20.205.tar.gz
ava安装文件和后续的hadoop安装文件都可以先下载到windows机器上,然后把windows的目录设置为共享。在centos虚拟机中执行
类似下面这样的命令,mount Windows机器中共享的目录
mount -t cifs -o username=******,password=****** //server/share /local/dir
第一步:ssh检查配置(补充authorized_keys文件权限的修改,并且用的是非root用户)
1)首先切换到要后续用来运行hadoop的用户,本文中用户名为dev
[root@localhost ~]# su dev
2)执行下面脚本,对应屏幕的提示直接回车
[dev@localhost lib]$ ssh-keygen -t rsa Generating public/private rsa key pair.Enter file in which to save the key (/home/dev/.ssh/id_rsa): 直接回车Enter passphrase (empty for no passphrase): 直接回车Enter same passphrase again: 直接回车Your identification has been saved in /home/dev/.ssh/id_rsa.Your public key has been saved in /home/dev/.ssh/id_rsa.pub.The key fingerprint is:21:f8:47:01:e0:44:7c:d2:1d:4f:73:87:c4:c8:76:3f dev@localhost.localdomainThe key's randomart image is:+--[ RSA 2048]----+| o+o.oooo+o.. || oo.o .++o+. || oo. o... . || . o . E || . S . || . || || || |+-----------------+
3)进入用户在/home下对应目录
[dev@localhost lib]$ cd /home/dev/.ssh
4)生成文件authorized_keys文件[dev@localhost .ssh]$ cat id_rsa.pub > authorized_keys
5)修改文件authorized_keys属性(这个地方也是查了很久的网上资料才找到,呵呵)
[dev@localhost .ssh]$ chmod 600 authorized_keys6)验证方法(输入ssh localhost,不再提示输入密码即说明设置成功)
[dev@localhost .ssh]$ ssh localhostLast login: Tue Dec 6 17:23:13 2011 from localhost.localdomain[dev@localhost ~]$ exit
注意点:
1)选择java的版本,64位操作系统,请选择x64后缀的bin
jdk-6u29-linux-i586.binjdk-6u29-linux-x64.bin
2)选择好Java的安装目录
执行类似下面的命令(红字部分和选择的版本有关系)
[root@localhost java]# chmod +x jdk-6u26-linux-i586.bin [root@localhost java]# ./jdk-6u26-linux-i586.bin ...... ...... ...... For more information on what data Registration collects and how it is managed and used, see: http://java.sun.com/javase/registration/JDKRegistrationPrivacy.html Press Enter to continue..... Done.
安装完成后生成文件夹:jdk1.6.0_26
第三步:配置环境变量(红字部分和版本,安装后的具体目录相关)
[root@localhost java]# vi /etc/profile #添加如下信息 # set java environment export JAVA_HOME=/usr/java/jdk1.6.0_29 export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib export PATH=$JAVA_HOME/lib:$JAVA_HOME/jre/bin:$PATH:$HOME/bin export HADOOP_HOME=/usr/local/hadoop/hadoop-0.20.205 export PATH=$PATH:$HADOOP_HOME/bin [root@localhost java]# chmod +x /etc/profile [root@localhost java]# source /etc/profile [root@localhost java]# [root@localhost java]# java -version java version "1.6.0_29" Java(TM) SE Runtime Environment (build 1.6.0_26-b03) Java HotSpot(TM) Client VM (build 20.1-b02, mixed mode, sharing) [root@localhost java]#
第四步:修改HOST文件
- [root@localhost conf]# vi /etc/hosts
- # Do not remove the following line, or various programs
- # that require network functionality will fail.
- 127.0.0.1 localhost.localdomain localhost
- ::1 localhost6.localdomain6 localhost6
- 127.0.0.1 namenode datanode01
第五步:解压缩并安装hadoop
- [root@localhost hadoop]# tar zxvf hadoop-0.20.203.tar.gz
- ......
- ......
- ......
- hadoop-0.20.203.0/src/contrib/ec2/bin/image/create-hadoop-image-remote
- hadoop-0.20.203.0/src/contrib/ec2/bin/image/ec2-run-user-data
- hadoop-0.20.203.0/src/contrib/ec2/bin/launch-hadoop-cluster
- hadoop-0.20.203.0/src/contrib/ec2/bin/launch-hadoop-master
- hadoop-0.20.203.0/src/contrib/ec2/bin/launch-hadoop-slaves
- hadoop-0.20.203.0/src/contrib/ec2/bin/list-hadoop-clusters
- hadoop-0.20.203.0/src/contrib/ec2/bin/terminate-hadoop-cluster
- [root@localhost hadoop]#
第六步:进入hadoop的conf目录修改配置文件(配置文件中具体设置的目录和hadoop安装目录相关)
- ####################################
- [root@localhost conf]# vi hadoop-env.sh
- # 添加代码
- # set java environment
- export JAVA_HOME=/usr/java/jdk1.6.0_26
- #####################################
- [root@localhost conf]# vi core-site.xml
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <!-- Put site-specific property overrides in this file. -->
- <configuration>
- <property>
- <name>fs.default.name</name>
- <value>hdfs://namenode:9000/</value>
- </property>
- <property>
- <name>hadoop.tmp.dir</name>
- <value>/usr/local/hadoop/hadooptmp</value>
- </property>
- </configuration>
- #######################################
- [root@localhost conf]# vi hdfs-site.xml
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <!-- Put site-specific property overrides in this file. -->
- <configuration>
- <property>
- <name>dfs.name.dir</name>
- <value>/usr/local/hadoop/hdfs/name</value>
- </property>
- <property>
- <name>dfs.data.dir</name>
- <value>/usr/local/hadoop/hdfs/data</value>
- </property>
- <property>
- <name>dfs.replication</name>
- <value>1</value>
- </property>
- </configuration>
- #########################################
- [root@localhost conf]# vi mapred-site.xml
- <?xml version="1.0"?>
- <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
- <!-- Put site-specific property overrides in this file. -->
- <configuration>
- <property>
- <name>mapred.job.tracker</name>
- <value>namenode:9001</value>
- </property>
- <property>
- <name>mapred.local.dir</name>
- <value>/usr/local/hadoop/mapred/local</value>
- </property>
- <property>
- <name>mapred.system.dir</name>
- <value>/tmp/hadoop/mapred/system</value>
- </property>
- </configuration>
- #########################################
- [root@localhost conf]# vi masters
- #localhost
- namenode
- #########################################
- [root@localhost conf]# vi slaves
- #localhost
- datanode01
第七步:启动hadoop
- [root@localhost bin]# hadoop namenode -format
- 11/06/23 00:43:54 INFO namenode.NameNode: STARTUP_MSG:
- /************************************************************
- STARTUP_MSG: Starting NameNode
- STARTUP_MSG: host = localhost.localdomain/127.0.0.1
- STARTUP_MSG: args = [-format]
- STARTUP_MSG: version = 0.20.203.0
- STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-203 -r 1099333; compiled by 'oom' on Wed May 4 07:57:50 PDT 2011
- ************************************************************/
- 11/06/23 00:43:55 INFO util.GSet: VM type = 32-bit
- 11/06/23 00:43:55 INFO util.GSet: 2% max memory = 19.33375 MB
- 11/06/23 00:43:55 INFO util.GSet: capacity = 2^22 = 4194304 entries
- 11/06/23 00:43:55 INFO util.GSet: recommended=4194304, actual=4194304
- 11/06/23 00:43:56 INFO namenode.FSNamesystem: fsOwner=root
- 11/06/23 00:43:56 INFO namenode.FSNamesystem: supergroup=supergroup
- 11/06/23 00:43:56 INFO namenode.FSNamesystem: isPermissionEnabled=true
- 11/06/23 00:43:56 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
- 11/06/23 00:43:56 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
- 11/06/23 00:43:56 INFO namenode.NameNode: Caching file names occuring more than 10 times
- 11/06/23 00:43:57 INFO common.Storage: Image file of size 110 saved in 0 seconds.
- 11/06/23 00:43:57 INFO common.Storage: Storage directory /usr/local/hadoop/hdfs/name has been successfully formatted.
- 11/06/23 00:43:57 INFO namenode.NameNode: SHUTDOWN_MSG:
- /************************************************************
- SHUTDOWN_MSG: Shutting down NameNode at localhost.localdomain/127.0.0.1
- ************************************************************/
- [root@localhost bin]#
- ###########################################
- [root@localhost bin]# ./start-all.sh
- starting namenode, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-namenode-localhost.localdomain.out
- datanode01: starting datanode, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-datanode-localhost.localdomain.out
- namenode: starting secondarynamenode, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-secondarynamenode-localhost.localdomain.out
- starting jobtracker, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-jobtracker-localhost.localdomain.out
- datanode01: starting tasktracker, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-tasktracker-localhost.localdomain.out
- [root@localhost bin]# jps
- 11971 TaskTracker
- 11807 SecondaryNameNode
- 11599 NameNode
- 12022 Jps
- 11710 DataNode
- 11877 JobTracker
查看集群状态
- [root@localhost bin]# hadoop dfsadmin -report
- Configured Capacity: 4055396352 (3.78 GB)
- Present Capacity: 464142351 (442.64 MB)
- DFS Remaining: 464089088 (442.59 MB)
- DFS Used: 53263 (52.01 KB)
- DFS Used%: 0.01%
- Under replicated blocks: 0
- Blocks with corrupt replicas: 0
- Missing blocks: 0
- -------------------------------------------------
- Datanodes available: 1 (1 total, 0 dead)
- Name: 127.0.0.1:50010
- Decommission Status : Normal
- Configured Capacity: 4055396352 (3.78 GB)
- DFS Used: 53263 (52.01 KB)
- Non DFS Used: 3591254001 (3.34 GB)
- DFS Remaining: 464089088(442.59 MB)
- DFS Used%: 0%
- DFS Remaining%: 11.44%
- Last contact: Thu Jun 23 01:11:15 PDT 2011
- [root@localhost bin]#
其他问题: 1
- ####################启动报错##########
- [root@localhost bin]# ./start-all.sh
- starting namenode, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-namenode-localhost.localdomain.out
- The authenticity of host 'datanode01 (127.0.0.1)' can't be established.
- RSA key fingerprint is 41:c8:d4:e4:60:71:6f:6a:33:6a:25:27:62:9b:e3:90.
- Are you sure you want to continue connecting (yes/no)? y
- Please type 'yes' or 'no': yes
- datanode01: Warning: Permanently added 'datanode01' (RSA) to the list of known hosts.
- datanode01: starting datanode, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-datanode-localhost.localdomain.out
- <strong><span style="color: #ff0000;">datanode01: Unrecognized option: -jvm
- datanode01: Could not create the Java virtual machine.</span>
- </strong>
- namenode: starting secondarynamenode, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-secondarynamenode-localhost.localdomain.out
- starting jobtracker, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-jobtracker-localhost.localdomain.out
- datanode01: starting tasktracker, logging to /usr/local/hadoop/hadoop-0.20.203/bin/../logs/hadoop-root-tasktracker-localhost.localdomain.out
- [root@localhost bin]# jps
- 10442 JobTracker
- 10533 TaskTracker
- 10386 SecondaryNameNode
- 10201 NameNode
- 10658 Jps
- ################################################
- [root@localhost bin]# vi hadoop
- elif [ "$COMMAND" = "datanode" ] ; then
- CLASS='org.apache.hadoop.hdfs.server.datanode.DataNode'
- if [[ $EUID -eq 0 ]]; then
- HADOOP_OPTS="$HADOOP_OPTS -jvm server $HADOOP_DATANODE_OPTS"
- else
- HADOOP_OPTS="$HADOOP_OPTS -server $HADOOP_DATANODE_OPTS"
- fi
- #http://javoft.net/2011/06/hadoop-unrecognized-option-jvm-could-not-create-the-java-virtual-machine/
- #改为
- elif [ "$COMMAND" = "datanode" ] ; then
- CLASS='org.apache.hadoop.hdfs.server.datanode.DataNode'
- # if [[ $EUID -eq 0 ]]; then
- # HADOOP_OPTS="$HADOOP_OPTS -jvm server $HADOOP_DATANODE_OPTS"
- # else
- HADOOP_OPTS="$HADOOP_OPTS -server $HADOOP_DATANODE_OPTS"
- # fi
- #或者换非root用户启动
- #启动成功
2,启动时要关闭防火墙
查看运行情况:
http://localhost:50070
- NameNode 'localhost.localdomain:9000'
- Started: Thu Jun 23 01:07:18 PDT 2011
- Version: 0.20.203.0, r1099333
- Compiled: Wed May 4 07:57:50 PDT 2011 by oom
- Upgrades: There are no upgrades in progress.
- Browse the filesystem
- Namenode Logs
- Cluster Summary
- 6 files and directories, 1 blocks = 7 total. Heap Size is 31.38 MB / 966.69 MB (3%)
- Configured Capacity : 3.78 GB
- DFS Used : 52.01 KB
- Non DFS Used : 3.34 GB
- DFS Remaining : 442.38 MB
- DFS Used% : 0 %
- DFS Remaining% : 11.44 %
- Live Nodes : 1
- Dead Nodes : 0
- Decommissioning Nodes : 0
- Number of Under-Replicated Blocks : 0
- NameNode Storage:
- Storage Directory Type State
- /usr/local/hadoop/hdfs/name IMAGE_AND_EDITS Active
http://localhost:50030
- namenode Hadoop Map/Reduce Administration
- Quick Links
- * Scheduling Info
- * Running Jobs
- * Retired Jobs
- * Local Logs
- State: RUNNING
- Started: Thu Jun 23 01:07:30 PDT 2011
- Version: 0.20.203.0, r1099333
- Compiled: Wed May 4 07:57:50 PDT 2011 by oom
- Identifier: 201106230107
- Cluster Summary (Heap Size is 15.31 MB/966.69 MB)
- Running Map Tasks Running Reduce Tasks Total Submissions Nodes Occupied Map Slots Occupied Reduce Slots Reserved Map Slots Reserved Reduce Slots Map Task Capacity Reduce Task Capacity Avg. Tasks/Node Blacklisted Nodes Graylisted Nodes Excluded Nodes
- 0 0 0 1 0 0 0 0 2 2 4.00 0 0 0
- Scheduling Information
- Queue Name State Scheduling Information
- default running N/A
- Filter (Jobid, Priority, User, Name)
- Example: 'user:smith 3200' will filter by 'smith' only in the user field and '3200' in all fields
- Running Jobs
- none
- Retired Jobs
- none
- Local Logs
- Log directory, Job Tracker History This is Apache Hadoop release 0.20.203.0
测试:
- ##########建立目录名称##########
- [root@localhost bin]# hadoop fs -mkdir testFolder
- ###############拷贝文件到文件夹中
- [root@localhost local]# ls
- bin etc games hadoop include lib libexec sbin share src SSH_key_file
- [root@localhost local]# hadoop fs -copyFromLocal SSH_key_file testFolder
- 进入web页面即可查看
参考:http://bxyzzy.blog.51cto.com/854497/352692
- 在centos虚拟机中安装hadoop(伪分布式模式)
- centOS中安装hadoop伪分布式(一)
- centOS中安装hadoop伪分布式(二)
- Ubuntu中安装hadoop(伪分布式模式)
- Centos中安装配置local/standalone模式和伪分布式模式hadoop集群
- Hadoop的安装(伪分布式模式和分布式模式)
- CentOS hadoop 伪分布式安装步骤
- Centos 7中的HaDoop伪分布式安装
- 虚拟机中伪分布式Hadoop的部署
- CentOS中Hadoop单机伪分布式配置
- VMWare虚拟机安装CentOS 7 Linux及Hadoop与Eclipse学习环境(2-伪分布模式hadoop环境)
- hadoop的安装与配置(伪分布式模式安装)
- hadoop 安装(伪分布式)
- Hadoop CDH5 手动安装伪分布式模式
- 伪分布式模式安装hadoop-2.6.0
- hadoop 2.6.0伪分布式模式安装
- centos6 伪分布式模式安装hadoop
- Hadoop伪分布式模式安装部署
- linux启动成功修改logo
- 给Android应用开发者的十个建议
- T-SQL和PL/SQL 区别
- [jQuery插件]图片居中裁切效果
- 一个简单的图片滚动实例
- 在centos虚拟机中安装hadoop(伪分布式模式)
- 2012年后全球IPv6开始转换 日本居领先地位
- socket connect 函数设置超时
- 四极管:恶补数据结构
- 利用 curl 多线程 模拟 并发
- ASCII, Unicode and UTF8 (Big endian and Little endian)
- Design Patterns Quick Reference
- asp.net常用加密解密方法
- 传智博客学习笔记1