Hadoop Cluster 搭建

来源:互联网 发布:网络宽带设置 编辑:程序博客网 时间:2024/05/17 06:42


难得周末,学习了一点Hadoop,作此笔记


实验环境:

Debian 6.0

VirtualBox 4.1.4

Haddop 0.21.0

Linux配置:

/etc/host

127.0.0.1  localhost192.168.1.11  node1192.168.1.12  node2192.168.1.13  node3192.168.1.14  node4

/etc/resolv.conf

nameserver 8.8.8.8nameserver 8.8.4.4


/etc/hostname

node1  # node1 ~ node4


/etc/network/interfaces

auto loiface lo inet loopbackauto eth0iface lo inet staticaddress 192.168.1.11  # 11 ~ 14netmask 255.255.255.0network 192.168.1.0broadcast 192.168.1.255gateway 192.168.1.1


Hadoop环境:

NameNode:node1

JobTracker:node2

DataNode & TaskTracker:node3 & node4

# conf/slaves192.168.1.13192.168.1.14

在四台虚拟机中

apt-get install sshapt-get install rsyncssh-keygen #一路回车

在宿主机上(方便一点点)

scp root@192.168.1.11:/root/.ssh/id_rsa.pub node1_pubscp root@192.168.1.12:/root/.ssh/id_rsa.pub node2_pubcat node1_pub node2_pub >> authorized_keysscp authorized_keys root@192.168.1.13scp authorized_keys root@192.168.1.14

仍然在宿主机上,下载解压hadoop-0.21.0.tar.gz,编辑配置文件


conf/hadoop-env.sh,更改JAVA_HOME就行来(适情更改)

export JAVA_HOME=/root/jdk1.6.0_29


conf/core-site.xml

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Put site-specific property overrides in this file. --><configuration><property><name>fs.default.name</name><value>http://192.168.1.11:5161/</value></property></configuration>


conf/hdfs-site.xml

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Put site-specific property overrides in this file. --><configuration><property><name>dfs.name.dir</name><value>/root/hadoop-0.21.0/var/name</value></property><property><name>dfs.data.dir</name><value>/root/hadoop-0.21.0/var/data</value></property><property><name>dfs.block.size</name><value>134217728</value></property></configuration>

conf/mapred-site.xml

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- Put site-specific property overrides in this file. --><configuration><property><name>mapreduce.jobtracker.address</name><value>http://192.168.1.12:5162/</value></property><property><name>mapreduce.jobtracker.system.dir</name><value>/root/hadoop-0.21.0/var/system</value></property><property><name>mapreduce.cluster.local.dir</name><value>/root/hadoop-0.21.0/var/local</value></property></configuration>

同时附上本人的/root/.bashrc,注意HADOOP_HOME,不然启动的时候可能会报错

PATH=$PATH:/root/jdk1.6.0_29/bin:/root/jdk1.6.0_29/jre/binJAVA_HOME=/root/jdk1.6.0_29JRE_HOME=/root/jdk1.6.0_29/jreCLASSPATH=.:/root/jdk1.6.0_29/lib/tools.jar:/root/jdk1.6.0_29/lib/dt.jarHADOOP_HOME=/root/hadoop-0.21.0export PATHexport JRE_HOMEexport JAVA_HOMEexport CLASSPATHexport HADOOP_HOME

将配置好的Hadoop复制到到虚拟机

scp -r hadoop-0.21.0 root@192.168.1.11:/root/scp -r hadoop-0.21.0 root@192.168.1.12:/root/scp -r hadoop-0.21.0 root@192.168.1.13:/root/scp -r hadoop-0.21.0 root@192.168.1.14:/root/

启动服务

在node1启动HDFS

bin/hadoop namenode -format #第一次bin/start-dfs.sh

在node2 启动MapReduce

bin/start-mapred.sh

经验小结

1. 启动时卡住,查看log,发现抛出NameUnsolve之类(具体名字忘了)的异常,原因在于忘了在/etc/hosts为hostname做解析

2. Hadoop common not found,这主要是因为没有HADOOP_HOME这个环境变量

3. 其实没有必要在master上授权salve,基本不用,直接在Master上启动服务。