Running Spark on Mesos[ 安装及运行 ]

来源：互联网发布：nginx main函数编辑：程序博客网时间：2024/05/06 02:53

安装scala

解压文档：tar -zxvf scala-2.9.2.tgz

将下面语句加入到~/.bashrc 或 .profile
export SCALA_HOME="/opt/scala"
export PATH="${SCALA_HOME}/bin:${JAVA_HOME}/bin:${PATH}"

然后 $ source ~/.bashrc

测试scala安装是否成功

$ scala

安装Spark-0.6.1

Spark requires Scala 2.9.2. You will need to have Scala’s bin directory in yourPATH,or you will need to set theSCALA_HOME environment variable to pointto where you’ve installed Scala. Scala must also be accessible through oneof these methods on slave nodes on your cluster.

Spark uses Simple Build Tool, which is bundled with it. To compile the code, go into the top-level Spark directory and run

sbt/sbt package

Testing the Build

Spark comes with a number of sample programs in theexamples directory.To run one of the samples, use./run <class> <params> in the top-level Spark directory(therun script sets up the appropriate paths and launches that program).For example,./run spark.examples.SparkPi will run a sample program that estimates Pi. Each of theexamples prints usage help if no params are given.

Note that all of the sample programs take a <master> parameter specifying the cluster URLto connect to. This can be a URL for a distributed cluster,or local to run locally with one thread, orlocal[N] to run locally with N threads. You should start by usinglocal for testing.

Finally, Spark can be used interactively from a modified version of the Scala interpreter that you can start through./spark-shell. This is a great way to learn Spark.

Running Spark on Mesos

Spark can run on private clusters managed by the Apache Mesos resource manager. Follow the steps below to install Mesos and Spark:

Download and build Spark using the instructions here.
Download Mesos 0.9.0-incubating from a mirror.
Configure Mesos using the configure script, passing the location of yourJAVA_HOME using--with-java-home. Mesos comes with “template” configure scripts for different platforms, such asconfigure.macosx, that you can run. See the README file in Mesos for other options.Note: If you want to run Mesos without installing it into the default paths on your system (e.g. if you don’t have administrative privileges to install it), you should also pass the--prefix option toconfigure to tell it where to install. For example, pass--prefix=/home/user/mesos. By default the prefix is/usr/local.
Build Mesos using make, and then install it usingmake install.
Create a file called spark-env.sh in Spark’sconf directory, by copyingconf/spark-env.sh.template, and add the following lines in it:
- export MESOS_NATIVE_LIBRARY=<path to libmesos.so>. This path is usually<prefix>/lib/libmesos.so (where the prefix is/usr/local by default). Also, on Mac OS X, the library is calledlibmesos.dylib instead of.so.
- export SCALA_HOME=<path to Scala directory>.
Copy Spark and Mesos to the same paths on all the nodes in the cluster (or, for Mesos,make install on every node).
Configure Mesos for deployment:
- On your master node, edit <prefix>/var/mesos/deploy/masters to list your master and<prefix>/var/mesos/deploy/slaves to list the slaves, where<prefix> is the prefix where you installed Mesos (/usr/local by default).
- On all nodes, edit <prefix>/var/mesos/conf/mesos.conf and add the linemaster=HOST:5050, where HOST is your master node.
- Run <prefix>/sbin/mesos-start-cluster.sh on your master to start Mesos. If all goes well, you should see Mesos’s web UI on port 8080 of the master machine.
- See Mesos’s README file for more information on deploying it.
To run a Spark job against the cluster, when you create yourSparkContext, pass the stringmesos://HOST:5050 as the first parameter, whereHOST is the machine running your Mesos master. In addition, pass the location of Spark on your nodes as the third parameter, and a list of JAR files containing your JAR’s code as the fourth (these will automatically get copied to the workers). For example:

new SparkContext("mesos://HOST:5050", "My Job Name", "/home/user/spark", List("my-job.jar"))

运行SparkKMeans算法在Mesos

启动各个节点的mesos服务，检查WebUI各个slaves有没有挂载上，启动hadoop alongside, 上传keansdata.txt到hdfs上，在master进入spark目录，运行kmeans算法。

./run spark.examples.SparkKMeans 192.168.1.130:5050 hdfs://master:9000/user/liu/testdata/kmeansdata.txt 8 2.0

注意添加环境变量

export JAVA_HOME=$HOME/jdk1.7.0_05export HADOOP_VERSION=1.0.4export HADOOP_HOME=$HOME/hadoop-$HADOOP_VERSIONexport SCALA_HOME=$HOME/scala-2.9.2export MESOS_HOME=$HOME/mesos-0.9.0export MESOS_NATIVE_LIBRARY=$MESOS_HOME/src/.libs/libmesos.soexport SPARK_HOME=$HOME/spark-0.6.1export LD_LIBRARY_PATH=$MESOS_HOME/src/.libsexport CLASSPATH=/home/hadoop/spark-0.6.1/core/target/spark-core-assembly-0.6.1.jar:.:$JAVA_HOME/lib:$JAVA_HOME/jre/libexport PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$SCALA_HOME/bin

Note:

第八步说明的不是很具体，这里以Spark自带的SparkKMeans.scala为例，如何编译与运行程序。以下步骤都只在master节点上操作即可。可参考spark programming guide

首先生成 Spark和依赖的jar包(core/target/spark-core-assembly-0.6.0.jar)

sbt/sbt assembly

将此jar包加入到CLASSPATH中

export CLASSPATH=/home/hadoop/spark-0.6.1/core/target/spark-core-assembly-0.6.1.jar:.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib

将下面语句加入到scala程序文件中:

import spark.SparkContextimport SparkContext._

编译scala程序

scalac SparkKMeans.scala

运行编译好的 SparkKMeans 程序

scala spark.examples.SparkKMeans mesos://192.168.1.130:5050 hdfs://192.168.1.130:9000/dataset/Square-10m.txt 8 2.0

如何写Spark程序

The first thing a Spark program must do is to create aSparkContext object, which tells Spark how to access a cluster.This is done through the following constructor:

new SparkContext(master, jobName, [sparkHome], [jars])

The master parameter is a string specifying aMesos cluster to connect to, or a special “local” string to run in local mode, as described below.jobName is a name for your job, which will be shown in the Mesos web UI when running on a cluster. Finally, the last two parameters are needed to deploy your code to a cluster if running in distributed mode, as described later.

In the Spark shell, a special interpreter-aware SparkContext is already created for you, in the variable calledsc. Making your own SparkContext will not work. You can set which master the context connects to using theMASTER environment variable. For example, to run on four cores, use

$ MASTER=local[4] ./spark-shell