基于docker搭建跨主机的spark集群并配置notebook(一)

来源:互联网 发布:淘宝访客平均价值 编辑:程序博客网 时间:2024/06/09 23:40

一、制作docker镜像

知识水平尚有欠缺,对于dockerfile那一套理论也不熟悉,所以采取的制作一个docker镜像作为启动集群的载体:
该集群有12个节点,即12个容器,平均分配在两个宿主机上,具体名字是master、node1、node2、......、node11

(一)在宿主机安装docker:

不同的系统安装docker的方法

(二)拉取一个docker镜像

我选择的ubuntu 16.04
#docker pull ubuntu:16.04

(三)启动一个容器

#docker run -v /home/docker/software/:/software -it ubuntu:16.04
-v参数将宿主机的/home/docker/software目录映射到了容器的software目录下

(四)将需要安装的软件拷贝到宿主机/home/docker/software目录下

1、jdk(1.8)
2、Zookeeper(3.4.5)
3、Hadoop(2.7.3)
4、Spark(2.1.0)
5、scala(2.10.5)
6、Anaconda2(4.3.0)
在容器的/software目录下就可以看到这些安装包了

(五)在容器中安装ssh

#apt-get install ssh
如果下载速度很慢,建议换下源
将ssh服务配置为开机启动
#vim ~/.bashrc
加入/usr/sbin/sshd
#vim /etc/rc.local
加入/usr/sbin/sshd
生成访问密钥
#cd ~/#ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa#cd .ssh#cat id_rsa.pub >> authorized_keys

(六)安装jdk

解压/software目录下的jdk安装包,将解压后的jdk文件夹移动到新建的/usr/java目录下,重命名为jdk
配置环境变量
#vim ~/.bashrc
在末尾添加
export JAVA_HOME=/usr/java/jdk
export PATH=$PATH:$JAVA_HOME/bin
检查是否安装成功
#java -version

(七)安装Zookeeper

将zookeeper安装包解压到/root目录下并重命名为zookeeper
#mv /software/zookeeper-3.4.5.tar.gz ~ 
#tar -zxvf software/zookeeper-3.4.5.tar.gz #mv ~/zookeeper-3.4.5 zookeeper
配置zookeeper
#cd ~/zookeeper/conf/#cp zoo_sample.cfg zoo.cfg#vim zoo.cfg
修改dataDir = /root/zookeeper/tmp
在最后添加
server.1=node3:2888:3888
server.2=node4:2888:3888
server.3=node5:2888:3888
保存退出,新建一个tmp文件夹
#mkdir ~/zookeeper/tmp
在创建一个空文件
#touch ~/zookeeper/tmp/myid
向该文件写入ID
#echo 1 > ~/zookeeper/tmp/myid

(七)安装Hadoop

将/sofware目录下的hadoop安装包解压到root目录下并重命名为hadoop

#mv /software/hadoop-2.7.3-64bit.tar.gz ~#tar -zxvf hadoop-2.7.3-64bit.tar.gz#mv hadoop-2.7.3 hadoop
配置hadoop
#cd ~/hadoop/etc/hadoop#vim hadoop-env.sh
加入java环境变量
export JAVA_HOME=/usr/java/jdk
#vim core-site.xml
<configuration> <property>   <name>fs.defaultFS</name>   <value>hdfs://ns1</value> </property> <property>   <name>hadoop.tmp.dir</name>   <value>/root/hadoop/tmp</value> </property> <property>   <name>ha.zookeeper.quorum</name>   <value>node3:2181,node4:2181,node5:2181</value> </property></configuration>
#vim hdfs-site.xml
<configuration><property><name>dfs.nameservices</name><value>ns1</value></property><property><name>dfs.ha.namenodes.ns1</name><value>nn1,nn2</value></property><property><name>dfs.namenode.rpc-address.ns1.nn1</name><value>master:9000</value></property><property><name>dfs.namenode.http-address.ns1.nn1</name><value>master:50070</value></property><property><name>dfs.namenode.rpc-address.ns1.nn2</name><value>node1:9000</value></property><property><name>dfs.namenode.http-address.ns1.nn2</name><value>node1:50070</value></property><property><name>dfs.namenode.shared.edits.dir</name><value>qjournal://node3:8485;node4:8485;node5:8485/ns1</value></property><property><name>dfs.journalnode.edits.dir</name><value>/root/hadoop/journal</value></property><property><name>dfs.ha.automatic-failover.enabled</name><value>true</value></property><property><name>dfs.client.failover.proxy.provider.ns1</name><value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value></property><property><name>dfs.ha.fencing.methods</name><value>sshfenceshell(/bin/true)</value></property><property><name>dfs.ha.fencing.ssh.private-key-files</name><value>/root/.ssh/id_rsa</value></property><property><name>dfs.ha.fencing.ssh.connect-timeout</name><value>30000</value></property></configuration>
#mv mapred-site.xml.template mapred-site.xml#vim mapred-site.xml
<configuration><property><name>mapreduce.framework.name</name><value>yarn</value></property></configuration>
#vim yarn-site.xml
<configuration><!-- Site specific YARN configuration properties --><property><name>yarn.resourcemanager.hostname</name><value>node2</value></property><property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value></property></configuration>        
#vim slaves
masternode1node2node3node4node5node6node7node8node9node10node11

(八)安装Spark

将/sofware目录下的scala安装包解压到root目录下并重命名为scala
#mv /software/scala-2.10.5.tgz ~ #tar -zxvf scala-2.10.5.tgz#mv scala-2.10.5 scala#vim ~/.bashrc
export JAVA_HOME=/usr/java/jdk
export HADOOP_HOME=/root/hadoop
export SCALA_HOME=/root/scala      
export SPARK_HOME=/root/spark      
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin
#mv /software/spark-2.1.0-bin-hadoop2.7tgz ~#tar -zxvf spark-2.1.0-bin-hadoop2.7.tgz #mv spark-2.1.0-bin-hadoop2.3 spark
修改spark文件slaves
#vim ~/spark/conf/slaves
masternode1node2node3node4node5node6node7node8node9node10node11
#mv spark-env.sh.template spark-env.sh#vim ~/spark/conf/spark-env.sh 
export SPARK_MASTER_IP=master
export SPARK_WORKER_MEMORY=1024m 
export JAVA_HOME=/usr/java/jdk 
export SCALA_HOME=/root/scala 
export SPARK_HOME=/root/spark 
export HADOOP_CONF_DIR=/root/hadoop/etc/hadoop 
export SPARK_LIBRARY_PATH=$$SPARK_HOME/lib 
export SCALA_LIBRARY_PATH=$SPARK_LIBRARY_PATH 
export SPARK_WORKER_CORES=1 
export SPARK_WORKER_INSTANCES=1 
export SPARK_MASTER_PORT=7077

(九)安装Anaconda2

#bash /software/Anaconda2-4.3.0-Linux-x86_64.sh
可能需要先安装bizp2(#apt-get install bzip2)

(十)编写修改hosts的脚本:

由于需要对每个容器的地址进行规划,避免启动后再进行修改,所以先写一个脚本,启动容器后再执行该脚本即可规划好所有容器的地址
#vim ~/change_hosts.sh
#!/bin/bash ssh root@172.16.0.2 "exec /root/run.sh" ssh root@172.16.0.3 "exec /root/run.sh" ssh root@172.16.0.4 "exec /root/run.sh" ssh root@172.16.0.5 "exec /root/run.sh" ssh root@172.16.0.6 "exec /root/run.sh" ssh root@172.16.0.7 "exec /root/run.sh" ssh root@172.16.0.8 "exec /root/run.sh" ssh root@172.16.0.9 "exec /root/run.sh" ssh root@172.16.0.10 "exec /root/run.sh" ssh root@172.16.0.11 "exec /root/run.sh" ssh root@172.16.0.12 "exec /root/run.sh" ssh root@172.16.0.13 "exec /root/run.sh"/etc/init.d/ssh start -D
#vim run.sh
#!/bin/bash   echo "127.0.0.1 localhost" > /etc/hosts   echo "172.16.0.2 master" >> /etc/hosts   echo "172.16.0.3 node1" >> /etc/hosts   echo "172.16.0.4 node2" >> /etc/hosts   echo "172.16.0.5 node3" >> /etc/hosts   echo "172.16.0.6 node4" >> /etc/hosts   echo "172.16.0.7 node5" >> /etc/hosts   echo "172.16.0.8 node6" >> /etc/hosts   echo "172.16.0.9 node7" >> /etc/hosts   echo "172.16.0.10 node8" >> /etc/hosts   echo "172.16.0.11 node9" >> /etc/hosts   echo "172.16.0.12 node10" >> /etc/hosts   echo "172.16.0.13 node11" >> /etc/hosts/etc/init.d/ssh start -D

(十一)提交该镜像为一个新的镜像

在容器中
#docker ps -a
查看容器的containerID
在宿主机中
#docker commit {containerId}
会返回一个ID
#docker tag {ID}  orientsoft/spark:1.0
至此镜像制作完毕,下一篇基于该镜像进行部署spark集群






















0 0