在基于docker的Hadoop集群上搭建Spark

来源:互联网 发布:ie10 淘宝图片不显示 编辑:程序博客网 时间:2024/05/17 03:05

Install Hadoop Clustering

1.      Run a docker container within boot2docker vm

docker@default:~$ docker run -it -v /c/Users/liming.zhu/boot2dockerShareFolder:/hostFolder --name hadoopYarn ubuntu:14.04

2.      Install java within the docker container

Tar zxvf /hostfolder/jdk-8u20-linux-x64.tar.gz –C /root/JDK

3.      Export java environment variable

vi /root/.bashrc

export JAVA_HOME=/usr/local/JDK/jdk1.8.0_20/

PATH=$PATH:$HOME/bin:$JAVA_HOME/bin

4.      Install Hadoop

tar zxvf /hostFolder/hadoop-2.7.1.tar.gz -C /usr/local/Hadoop

5.      Install ssh

apt-get install ssh

ssh-keygen -t rsa -P '' -f ~/.ssh/id_dsa

cd .ssh/

cat id_dsa.pub >> authorized_keys

6.      Auto start sshd

Vi ~/.bashrc

#autorun

/usr/sbin/sshd

7.      Export Hadoop environment variables

vi /root/.bashrc

export HADOOP_HOME=/usr/local/hadoop/hadoop-2.7.1

export HADOOP_CONFIG_HOME=$HADOOP_HOME/etc/hadoop

export PATH=$PATH:$HADOOP_HOME/bin

export PATH=$PATH:$HADOOP_HOME/sbin

8.      Create Hadoop work dir

9.      Create core-site.xml

<property>

   <name>hadoop.tmp.dir</name>

   <value>/usr/local/hadoop/work/tmp</value>

   <description>A base for other temporarydirectories.</description>

</property>

<property>

                <name>fs.defaultFS</name>

                <value>hdfs://master/</value>

                <final>true</final>

</property>

10.  Create hdfs-site.xml

<property>

                <name>dfs.replication</name>

                <value>2</value>

                <final>true</final>

                <description>Defaultblock replication.

                Theactual number of replications can be specified when the file is created.

                Thedefault is used if replication is not specified in create time.

                </description>

</property>

<property>

                <name>dfs.namenode.name.dir</name>

                <value>/usr/local/hadoop/work/namenode</value>

</property>

<property>

                <name>dfs.datanode.data.dir</name>

                <value>/usr/local/hadoop/work/datanode</value>

</property>

11.  Edit the yarn-site.xml

<property>

                <name>yarn.resourcemanager.hostname</name>

                <value>master</value>

</property>

<property>

                <name>yarn.nodemanager.local-dirs</name>

                <value>/usr/local/hadoop/work/datanode</value>

</property>

12.  Format name node

hadoop namenode –format

13.  Commit the container

docker commit -m "hadoop install" 1cecd529ddc5ubuntu:hadoop2

14.  Run 3 contains

docker run -it -h master --name master3 -v /c/Users/liming.zhu/boot2dockerShareFolder/:/hostFolder-p 8088:8088 ubuntu:hadoop2

docker run -it -h salve3 --name salve3 -v/c/Users/liming.zhu/boot2dockerShareFolder/:/hostFolder ubuntu:hadoop2

docker run -it -h salve4 --name salve4 -v/c/Users/liming.zhu/boot2dockerShareFolder/:/hostFolder ubuntu:hadoop2

15.  Modify slaves file

Cd /usr/local/hadoop/hadoop-2.7.1/etc/Hadoop

Vi salves

Salve3

Salve4

16.  Change hosts file in all the containers

172.17.0.19 salve4

172.17.0.18 master

Install Spark

1.      Copy spark tar to user home folder

Cp /hostFolder/spark-1.4.1-bin-without-hadoop.tgz ~/spark

Cd ~/spark

Tar zxvf spark-1.4.1-bin-without-hadoop.tgz

Mv spark-1.4.1-bin-without-hadoop spark1.4.1

Cd spark1.4.1

2.      Modify yarn-site.xml in order to increase the containermemory

<property>

       <name>yarn.nodemanager.vmem-pmem-ratio</name>

       <value>4.2</value>

</property>

3.      Config spark-yarn variable

Vi conf/spark-env.sh

export SPARK_DIST_CLASSPATH=$(hadoop classpath)

export YARN_CONF_DIR=/usr/local/hadoop/hadoop-2.7.1/etc/Hadoop

4.      Launch spark-shell

Bin/spark-shell –master yarn-client

5.      Check the web ui

http://192.168.59.103:8088/

0 0
原创粉丝点击