Spark runs on Docker
来源:互联网 发布:网络零售经纪模式 编辑:程序博客网 时间:2024/05/01 03:40
最近看了不少Docker相关的文档,也做了不少相关的实验。想着能不能在docker上运行spark,然后google了一下,发现
1 https://registry.hub.docker.com/u/amplab/spark-master/ ,https://github.com/amplab/docker-scripts 这里的两个应该是同一个,不过前者没给guide
2 spark官方源码里面提供了docker的子项目
本文采用spark提供的docker子项目来构建Spark集群
Docker的安装,这里不再做叙述。请参考官方文档。
下面我们看一看Docker子项目的目录结构
docker
---------build
---------readme.md
---------spark-test
----------------------build
----------------------readme.md
----------------------base
-------------------------------dockerfile
----------------------master
-------------------------------dockerfile
-------------------------------default_cmd
----------------------worker
-------------------------------dockerfile
-------------------------------default_cmd
根据目录结构可以看出这里是要构造3个image,分别是base, master, worker
docker/build, 这个脚本调用spark-test下面的build脚本,这里只做第一步检查,不用sudo的情况下药可以执行docker的命令
docker images > /dev/null || { echo Please install docker in non-sudo mode. ; exit; }./spark-test/build
解决方法:
1. 如果还没有docker group就添加一个:
sudo groupadd docker
2.将用户加入该group内。然后退出并重新登录就生效啦。
sudo gpasswd -a ${USER} docker
3.重启docker
sudo service docker restart
大功告成!
docker/spark-test/build,这里的脚本开始调用下层的dockerfile,分别创建image
docker build -t spark-test-base spark-test/base/docker build -t spark-test-master spark-test/master/docker build -t spark-test-worker spark-test/worker/
docker/spark-test/readme.md
Spark Docker files usable for testing and development purposes.These images are intended to be run like so:docker run -v $SPARK_HOME:/opt/spark spark-test-masterdocker run -v $SPARK_HOME:/opt/spark spark-test-worker spark://<master_ip>:7077Using this configuration, the containers will have their Spark directoriesmounted to your actual `SPARK_HOME`, allowing you to modify and recompileyour Spark source and have them immediately usable in the docker images(without rebuilding them).
这里需要做一下解释:
在host机器上
安装Scala-2.10.x版本
安装Spark 1.1.0版本
sudo vi /etc/profile,增加$SPARK_HOME, $SCALA_HOME, $SPARK_BIN, $SCALA_BIN环境变量设置。source /etc/profile
另开一个shell,启动master,在master机器的屏幕上会打印出来master的IP地址,这个IP地址在启动worker的时候会用到。
同样,另开一个shell,启动worker。master和worker的屏幕上会有部分log打印出来,worker和master的shell窗口都是不可交互的界面,如果需要交互显示,可以修改下面的default.cmd,改成后台启动。或者通过ssh的方式登陆master和worker。
启动好之后,就可以启动host机器上的spark-shell来做基本的测试了。 $SPARK_HOME/bin/MASTER=spark://masterip:port ./spark-shell ,这样就启动了shell
这里复用了host上编译好的spark可执行代码,所以需要host上安装spark。
docker/spark-test/base/dockerfile
FROM ubuntu:preciseRUN echo "deb http://archive.ubuntu.com/ubuntu precise main universe" > /etc/apt/sources.listRUN echo "deb http://cz.archive.ubuntu.com/ubuntu precise main" >> /etc/apt/sources.listRUN echo "deb http://security.ubuntu.com/ubuntu precise-security main universe" >> /etc/apt/sources.list# export proxyENV http_proxy http://www-proxy.xxxx.se:8080ENV https_proxy http://www-proxy.xxxxx.se:8080# Upgrade package indexRUN apt-get update# install a few other useful packages plus Open Jdk 7RUN apt-get install -y less openjdk-7-jre-headless net-tools vim-tiny sudo openssh-serverENV SCALA_VERSION 2.10.4ENV CDH_VERSION cdh4ENV SCALA_HOME /opt/scala-$SCALA_VERSIONENV SPARK_HOME /opt/sparkENV PATH $SPARK_HOME:$SCALA_HOME/bin:$PATH# Install ScalaADD http://www.scala-lang.org/files/archive/scala-$SCALA_VERSION.tgz /RUN (cd / && gunzip < scala-$SCALA_VERSION.tgz)|(cd /opt && tar -xvf -)RUN rm /scala-$SCALA_VERSION.tgz
如果需要设置代理的话,按照上面的方式设置。
在安装的过程中发现,有些软件无法正常安装,于是增加了2个新的apt源,如上的最后两个源是我加上的。
docker/spark-test/master/dockerfile
FROM spark-test-baseADD default_cmd /root/CMD ["/root/default_cmd"]
IP=$(ip -o -4 addr list eth0 | perl -n -e 'if (m{inet\s([\d\.]+)\/\d+\s}xms) { print $1 }')echo "CONTAINER_IP=$IP"export SPARK_LOCAL_IP=$IPexport SPARK_PUBLIC_DNS=$IP# Avoid the default Docker behavior of mapping our IP address to an unreachable host nameumount /etc/hosts/opt/spark/bin/spark-class org.apache.spark.deploy.master.Master -i $IP
看到这里就可以发现,其实basebuild出来,master和worker基本上只是启动的过程中执行不同的命令而已,本质上image与base区别很小
docker/spark-test/worker/dockerfile
FROM spark-test-baseENV SPARK_WORKER_PORT 8888ADD default_cmd /root/ENTRYPOINT ["/root/default_cmd"]
default.cmd
IP=$(ip -o -4 addr list eth0 | perl -n -e 'if (m{inet\s([\d\.]+)\/\d+\s}xms) { print $1 }')echo "CONTAINER_IP=$IP"export SPARK_LOCAL_IP=$IPexport SPARK_PUBLIC_DNS=$IP# Avoid the default Docker behavior of mapping our IP address to an unreachable host nameumount /etc/hosts/opt/spark/bin/spark-class org.apache.spark.deploy.worker.Worker $1
- Spark runs on Docker
- Docker Runs on Windows Server 2016
- docker on spark
- World Runs on Kubernetes
- U-BOOT runs in RAM on FL2440
- 2012-6-6: esp runs on iis, guessnum works ok
- Spark视频第14期:Spark亚太研究院决胜大数据时代公益大讲坛:Spark on Docker深入揭秘
- LDD3 example runs on arm11 based on linux-2.6.31 (1)
- docker部署spark集群
- 基于docker安装spark
- docker 构建spark集群
- spark-03-spark on yarn
- using XCode 4.2 and iOS 5 SDK to create app runs on iPhone 3g
- TexLive runs slow on Windows machine, TexLive编译的很慢
- Running docker on Android
- Hadoop On Docker 实践
- Setup docker on OSX
- Hadoop on Docker
- 第三章 实时传输协议
- CentOS6.2(64bit)下mysql5.6.16主从同步配置
- TCP校验和的原理和实现
- POJ2676:Sudoku(DFS)
- Mvc Razor视图引擎基础与控制器
- Spark runs on Docker
- Android ADT渲染器问题
- HTTP详解(1)-工作原理
- Cocos2d-x加载骨骼动画
- UITabBarController使用详解
- SQL Server2005数据库连接过程详解(拖控件方式完成)
- Tomcat 处理URL中文参数问题
- 网页中调用COM组件的方法
- SquashFS with LZMA compression