hadoop+spark+zookeeper

来源:互联网 发布:vero moda除了淘宝 编辑:程序博客网 时间:2024/05/21 04:38

主机名

IP地址

安装的软件

运行的进程

Node10

192.168.18.23

jdk,hadoop,spark

namenode,resourcemanager,zkfc,Master

Node20

192.168.18.230

jdk,hadoop,spark

namenode,resourcemanager,zkfc,Master

Node30

192.168.18.248

jdk,hadoop,zookeeper,spark

datanode,nodemanager,journalnode,QuorumPeerMain,Worker

Node40

192.168.18.246

jdk,hadoop,zookeeper,spark

datanode,nodemanager,journalnode,QuorumPeerMain,Worker

Node50

192.168.18.232

jdk,hadoop,zookeeper,spark

datanode,nodemanager,journalnode,QuorumPeerMain,Worker

 

 

1.关闭selinuxiptables

vi /etc/sysconfig/selinux

SELINUX=disabled

service iptables stop

chkconfig iptables off

 

2. 配置/etc/hosts

Vi /etc/hosts

192.168.1.23     node10

192.168.1.230    node20

192.168.1.248    node30

192.168.1.246    node40

192.168.1.232    node50

 

3.创建用户并配置ssh免密码登录

[root@localhost ~]# useradd heren

[root@localhost ~]# passwd heren

[root@localhost ~]# su - heren

[heren@localhost ~]$ ssh-keygen -t rsa

[heren@localhost ~]$ cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys

[heren@localhost ~]$ chmod 600 .ssh/authorized_keys

 

下面操作只需在node10上执行即可

[heren@node10 ~]$  ssh node20 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

[heren@node10 ~]$  ssh node30 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

[heren@node10 ~]$  ssh node40 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

[heren@node10 ~]$  ssh node50 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

[heren@node10 ~]$ scp ~/.ssh/authorized_keys node20:/home/heren/.ssh/authorized_keys

[heren@node10 ~]$ scp ~/.ssh/authorized_keys node30:/home/heren/.ssh/authorized_keys

[heren@node10 ~]$ scp ~/.ssh/authorized_keys node40:/home/heren/.ssh/authorized_keys

[heren@node10 ~]$ scp ~/.ssh/authorized_keys node50:/home/heren/.ssh/authorized_keys

 

 

4.安装jdk1.7,配置环境变量

rpm -qa | grep java | xargs rpm -e --nodeps

mkdir -p /usr/java

[root@localhost ~]#cp /root/jdk-7u80-linux-x64.tar.gz /usr/java/

[root@localhost java]#tar xf jdk-7u80-linux-x64.tar.gz

[root@localhost ~]#vi /etc/profile

 

export JAVA_HOME=/usr/java/jdk1.7.0_80

export JRE_HOME=/usr/java/jdk1.7.0_80/jre

export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH

export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$JAVA_HOME:$PATH

 

export ZOOKEEPER_HOME=/usr/local/software/zookeeper

export CLASSPATH=$CLASSPATH:$ZOOKEEPER_HOME/lib

export PATH=$PATH:$ZOOKEEPER_HOME/bin

 

export HADOOP_HOME=/usr/local/software/hadoop

export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

 

export SCALA_HOME=/usr/local/software/scala

export PATH=$SCALA_HOME/bin:$PATH

 

export SPARK_HOME=/usr/local/software/spark

export PATH=$SPARK_HOME/bin:$PATH

                                               

[root@localhost ~]#source .bash_profile

[root@localhost ~]#. /etc/profile

[root@localhost ~]#java -version

 

5.创建目录

创建应用目录

[root@node10 ~]# mkdir -p /usr/local/software

隶属于heren用户

[root@node10 ~]#chown -R heren:heren /usr/local/software

创建数据目录

mkdir -p /data/hadoop/

mkdir -p /data/spark/

mkdir -p /data/zookeeper/

chown -R heren:heren /data/

 

6.安装zookeeper集群  node30 node40 node50

[root@node30 software]# su - heren

[heren@node30 ~]$ cd /usr/local/software/

[heren@node30 software]$ tar xf zookeeper-3.4.6.tar.gz

[heren@node30 software]$ mv zookeeper-3.4.6 zookeeper

配置文件

[heren@node30 software]$ cp zookeeper/conf/zoo_sample.cfg zookeeper/conf/zoo.cfg

[heren@node30 software]$ vi zookeeper/conf/zoo.cfg

dataLogDir=/data/zookeeper/logs

dataDir=/data/zookeeper/data

server.1=node30:2888:3888

server.2=node40:2888:3888

server.3=node50:2888:3888

创建id文件

mkdir -p /data/zookeeper/data/

echo '1' > /data/zookeeper/data/myid

echo '2' > /data/zookeeper/data/myid

echo '3' > /data/zookeeper/data/myid

 

在各个节点上分别启动zookooper  (node30 node40 node50)

[heren@node30 zookeeper]$ ./bin/zkServer.sh start

JMX enabled by default

Using config: /usr/local/software/zookeeper/bin/../conf/zoo.cfg

Starting zookeeper ... STARTED

[heren@node30 ~]$ /usr/local/software/zookeeper/bin/zkServer.sh status

JMX enabled by default

Using config: /usr/local/software/zookeeper/bin/../conf/zoo.cfg

Mode: leader

 

启动zookeeper  sh /usr/local/software/zookeeper/bin/zkServer.sh start

检查zookeeper是否启动成功 sh /usr/local/software/zookeeper/bin/zkServer.sh status

停止zookeeper  sh /usr/local/software/zookeeper/bin/zkServer.sh stop

 

7.安装配置hadoop集群 node10-node50

[root@node10 ~]# su - heren

[heren@node10 ~]$ cd /usr/local/software/

[heren@node10 software]$ tar xf hadoop-2.6.0.tar.gz

[heren@node10 software]$ mv hadoop-2.6.0 hadoop

修改hadoop-env.sh

[heren@node10 hadoop]$ vi hadoop-env.sh

export JAVA_HOME=/usr/java/jdk1.7.0_80

修改core-site.xml

[heren@node10 hadoop]$ vi core-site.xml

<configuration>

<!-- 指定hdfsnameservicemasters -->

<property>

<name>fs.defaultFS</name>

<value>hdfs://masters</value>

</property>

<!-- 指定hadoop临时目录-->

<property>

<name>hadoop.tmp.dir</name>

<value>/usr/local/software/hadoop/tmp</value>

</property>

<!-- 指定zookeeper地址-->

<property>

<name>ha.zookeeper.quorum</name>

<value>node30:2181,node40:2181,node50:2181</value>

</property>

</configuration>

修改hdfs-stie.xml

[heren@node10 hadoop]$ vi hdfs-site.xml

<configuration>

<!--指定hdfsnameservicemasters,需要和core-site.xml中的保持一致-->

<property>

<name>dfs.nameservices</name>

<value>masters</value>

</property>

<!-- Master下面有两个NameNode,分别是MasterSlave1 -->

<property>

<name>dfs.ha.namenodes.masters</name>

<value>node10,node20</value>

</property>

<!-- MasterRPC通信地址-->

<property>

<name>dfs.namenode.rpc-address.masters.node10</name>

<value>node10:9000</value>

</property>

<!-- Masterhttp通信地址-->

<property>

<name>dfs.namenode.http-address.masters.node10</name>

<value>node10:50070</value>

</property>

<!-- Slave1RPC通信地址-->

        <property>

<name>dfs.namenode.rpc-address.masters.node20</name>

<value>node20:9000</value>

        </property>

<!-- Slave1http通信地址-->

        <property>

<name>dfs.namenode.http-address.masters.node20</name>

<value>node20:50070</value>

       </property>

<!-- 指定NameNode的元数据在JournalNode上的存放位置-->

        <property>

<name>dfs.namenode.shared.edits.dir</name>

<value>qjournal://node30:8485;node40:8485;node50:8485/masters</value>

       </property>

<!-- 指定JournalNode在本地磁盘存放数据的位置-->

        <property>

<name>dfs.journalnode.edits.dir</name>

<value>/usr/local/software/hadoop/journal</value>

       </property>

<!-- 开启NameNode失败自动切换-->

        <property>

<name>dfs.ha.automatic-failover.enabled</name>

<value>true</value>

       </property>

<!-- 配置失败自动切换实现方式 -->

        <property>

<name>dfs.client.failover.proxy.provider.masters</name>

<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

       </property>

<!-- 配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行-->

        <property>

<name>dfs.ha.fencing.methods</name>

<value>

   sshfence

   shell(/bin/true)

</value>

       </property>

<!-- 使用sshfence隔离机制时需要ssh免登陆-->

<property>

<name>dfs.ha.fencing.ssh.private-key-files</name>

<value>/home/heren/.ssh/id_rsa</value>

</property>

<!-- 配置sshfence隔离机制超时时间-->

<property>

<name>dfs.ha.fencing.ssh.connect-timeout</name>

<value>30000</value>

</property>

</configuration>

修改mapred-site.xml

[heren@node10 hadoop]$ vi  mapred-site.xml

<configuration>

<!-- 指定mr框架为yarn方式-->

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

</configuration>

修改yarn-site.xml

[heren@node10 hadoop]$ vi yarn-site.xml

<configuration>

 

<!-- Site specific YARN configuration properties -->

<!-- 开启RM高可靠-->

<property>

<name>yarn.resourcemanager.ha.enabled</name>

<value>true</value>

</property>

<!-- 指定RMcluster id -->

<property>

<name>yarn.resourcemanager.cluster-id</name>

<value>RM_HA_ID</value>

</property>

<!-- 指定RM的名字-->

<property>

<name>yarn.resourcemanager.ha.rm-ids</name>

<value>rm1,rm2</value>

</property>

<!-- 分别指定RM的地址-->

<property>

<name>yarn.resourcemanager.hostname.rm1</name>

<value>node10</value>

</property>

<property>

<name>yarn.resourcemanager.hostname.rm2</name>

<value>node20</value>

</property>

<property>

<name>yarn.resourcemanager.recovery.enabled</name>

<value>true</value>

</property>

<property>

<name>yarn.resourcemanager.store.class</name>

<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>

</property>

<!-- 指定zk集群地址-->

<property>

<name>yarn.resourcemanager.zk-address</name>

<value>node30:2181,node40:2181,node50:2181</value>

</property>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

</configuration>

 

修改slaves

[heren@node10 hadoop]$ vi slaves

node30

node40

node50

将配置好的hadoop复制到其他节点

[heren@node10 etc]$ scp -r hadoop heren@192.168.1.230:/usr/local/software/hadoop/etc/

[heren@node10 etc]$ scp -r hadoop heren@192.168.1.246:/usr/local/software/hadoop/etc/

[heren@node10 etc]$ scp -r hadoop heren@192.168.1.248:/usr/local/software/hadoop/etc/

[heren@node10 etc]$ scp -r hadoop heren@192.168.1.232:/usr/local/software/hadoop/etc/

启动格式化启动hadoop集群

启动journalnode(分别在node30,node40,node50上执行 

[heren@node30 software]$  hadoop-daemon.sh start journalnode

starting journalnode, logging to /usr/local/software/hadoop/logs/hadoop-heren-journalnode-node30.out

 

格式化hdfs

[heren@node10 software]$ hdfs namenode -format

[heren@node10 hadoop]$ scp -r tmp/ heren@node20:/usr/local/software/hadoop/

 

格式化zknode10上执行

[heren@node10 hadoop]$ hdfs zkfc -formatZK

启动hdfs

[heren@node10 hadoop]$ start-dfs.sh

 

启动yarn

[heren@node10 hadoop]$ start-yarn.sh

Node20上的standby resourcemanger是需要手动启动的

[heren@node20 hadoop]$  yarn-daemon.sh start resourcemanager

 

通过web查看集群状态

查看namenode

http://node10:50070/

http://node20:50070/

查看resourcemanger

http://node10:8088/

http://node20:8088/

通过hdfs命名查看集群状态

[heren@node10 hadoop]$  hdfs dfsadmin -report

下载完以后,解压到hadooplib/native目录下,覆盖原有文件即可

[heren@node10 lib]$tar xf hadoop-native-64-2.6.0.tar

[heren@node10 lib]$ mv libh* native

 

 

验证HDFS HA

首先向hdfs上传一个文件

[heren@node10 hadoop]$ hadoop fs -put /etc/profile /profile

[heren@node10 hadoop]$ hadoop fs -ls /

然后再killactiveNameNode

[heren@node10 hadoop]$ kill -9 10020

文件仍然在

[heren@node10 hadoop]$ hadoop fs -ls /

手动启动那个挂掉的NameNode

[heren@node10 hadoop]$ sbin/hadoop-daemon.sh start namenode

 

验证YARN
运行一下hadoop提供的demo中的WordCount程序:

[heren@node10 mapreduce]$ hadoop jar /usr/local/software/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /profile /out

 

启动停止集群命令

/usr/local/software/hadoop/sbin/start-dfs.sh

/usr/local/software/hadoop/sbin/stop-dfs.sh

 

 

8.安装spark集群

[root@node10 ~]#mkdir -p /usr/scala

[root@node10 scala]# tar xf scala-2.11.7.tgz

 

[heren@node10 software]$ tar xf spark-1.5.2-bin-hadoop2.6.tgz

[heren@node10 software]$ mv spark-1.5.2-bin-hadoop2.6 spark

 

[heren@node10 conf]$ vi spark-env.sh

export JAVA_HOME=/usr/java/jdk1.7.0_80

export SCALA_HOME=/usr/scala/scala-2.11.7

export SPARK_WORKER_MEMORY=1g

export HADOOP_CONF_DIR=/usr/local/software/hadoop/etc/hadoop

export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_HOME/lib/native"

export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER

-Dspark.deploy.zookeeper.url=node30:2181,node40:2181,node50:2181 -Dspark.deploy.zookeeper.dir=/spark"



[heren@node10 conf]$ vi spark-defaults.conf

spark.master            spark://node10:7077,node20:7077

spark.serializer        org.apache.spark.serializer.KryoSerializer

spark.eventLog.enabled  true

spark.eventLog.dir      hdfs://mycluster/sparklogs



[heren@node10 conf]$ vi slaves

node20

node30

node40

node50

 

启动停止spark集群

/usr/local/software/spark1.3/sbin/start-all.sh

/usr/local/software/spark1.3/sbin/stop-all.sh

 

原创粉丝点击