阿里云ECS Spark安装

来源:互联网 发布:c语言求最短路径 编辑:程序博客网 时间:2024/06/06 05:54

续上篇Hbase


下载spark2.11.0和scala
以下是对spark的版本描述
Spark runs on Java 7+, Python 2.6+/3.4+ and R 3.1+. For the Scala API, Spark 2.1.1 uses Scala 2.11. You will need to use a compatible Scala version (2.11.x).
Note that support for Java 7 and Python 2.6 are deprecated as of Spark 2.0.0, and support for Scala 2.10 and versions of Hadoop before 2.6 are deprecated as of Spark 2.1.0, and may be removed in Spark 2.2.0.


下载包

wget https://d3kbcqa49mib13.cloudfront.net/spark-2.1.1-bin-hadoop2.7.tgzwget https://downloads.lightbend.com/scala/2.11.11/scala-2.11.11.tgz

创建文件夹

mkdir -p /opt/scalamkdir -p /opt/spark

解压包

tar -zxvf scala-2.11.11.tgz -C /opt/scalatar -zxvf spark-2.1.1-bin-hadoop2.7.tgz -C /opt/spark

分别创建用户级的环境变量文件

/etc/profile.d/scala.shexport SCALA_HOME=/opt/scala/current/etc/profile.d/spark.shexport SPARK_HOME=/opt/spark/currentexport PATH=$PATH:${SPARK_HOME}/bin


修改spark配置文件

cp ./conf/spark-env.sh.template ./conf/spark-env.sh
编辑spark-env.sh

export SCALA_HOME=${SCALA_HOME}export JAVA_HOME=${JAVA_HOME}export SPARK_MASTER_IP=masterexport SPARK_WORKER_MEMORY=500mexport HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop

cp slaves.template slaves
编辑slaves

masterslave01slave02
赋权

chown -R hadoop:hadoop /opt/scalachown -R hadoop:hadoop /opt/spark
并把对应文件夹scp到其余机器上

在hadoop用户下创建软连接

ln -s /opt/scala/scala-2.11.11 ./currentln -s /opt/spark/spark-2.1.1-bin-hadoop2.7 ./current

测试

spark-shellval file=sc.textFile("hdfs://iZuf68ho3sfplkorf9r8akZ:9000/stella/input/wordcount.txt")val rdd = file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)rdd.collect()rdd.foreach(println)


详细安装可以参考,转载: http://www.cnblogs.com/purstar/p/6293605.html












原创粉丝点击