Spark集群安装及Streaming调试

来源:互联网 发布:除了淘宝联盟哪个最好 编辑:程序博客网 时间:2024/06/08 19:10

安装前置条件

1.         系统需要安装the Oracle Java Development Kit(not OpenJDK),安装jdk1.7以上,下载目录:http://www.oracle.com/technetwork/java/javase/downloads/index-jsp-138363.html?ssSourceSiteId=ocomen。

  检查方法:java -version

2.         安装python2.7 以上,并配置环境变量和PATH,到https://www.python.org/ftp/python/2.7.3/Python-2.7.3.tar.bz2下载64位版本

  检查方法:python -version

3.         在集群所有机器中,安装scala 2.10以上,并配置环境变量和PATH,下载目录:http://downloads.typesafe.com/scala/2.10.5/scala-2.10.5.tgz

  检查方法:scala命令可使用

4.         安装rsync组件;

  检查方法:scp命令可使用

5.         安装ssh并在e3base用户下完成集群之间的无密码访问配置

6.         集群之间保证ntp时钟同步。

        检查方法:service ntp status、service ntpd restart

7.         集群各主机一定要永久关闭防火墙和selinux。

        检查方法:service iptables status、service iptables stop

8.         将集群中所有的主机名及ip对应关系都添加到每台主机的/etc/hosts中。

Scala安装

解压安装包

tar -xzvf scala-2.10.6.tgz

mv scala-2.10.6 scala

配置环境变量

exportSCALA_HOME=/home/hadoop/cdh5.5.0/scala

export PATH=$SCALA_HOME/bin:$PATH

使环境变量生效

source ~/.bash_profile

验证是否安装成功

执行如下命令:

scala –version

出现如下信息则安装成功

Spark集群部署

安装

解压压缩包

tar -xzvf spark-1.5.0-cdh5.5.0.tar.gz

mv spark-1.5.0-cdh5.5.0 spark

配置环境变量

vi ~/.bash_profile

exportSPARK_HOME=/home/hadoop/cdh5.5.0/spark

export PATH=$SPARK_HOME/bin:$PATH

使环境变量生效

source ~/.bash_profile

配置

spark-env.sh

cp spark-env.sh.template spark-env.sh

 

写入如下信息

vi spark-env.sh

slaves

cp slaves.template slaves

将Worker主机写入slaves

启动

将配置copy到Worker主机

 

cd $SPARK_HOME/sbin

sh start-all.sh

 

注意:启动之前保证Hadoop已经启动,否则Master会启动失败)

Master HA配置

Master在从standby状态到active状态时,这个过程会影响新程序的提交,已经运行的程序不受影响。

 

export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER-Dspark.deploy.zookeeper.url=worker1:2181,worker2:2181,worker3:2181-Dspark.deploy.zookeeper.dir=/spark"

参数:

spark.deploy.recoveryMode=ZOOKEEPER

spark.deploy.zookeeper.url=worker1:2181,worker2:2181,worker3:2181

spark.deploy.zookeeper.dir=/spark

 

spark.deploy.recoveryMode

恢复模式(Master重新启劢的模式),有三种:1.ZooKeeper,2. FileSystem, 3 NONE

 

spark.deploy.zookeeper.url

ZooKeeper的Server地址

 

spark.deploy.zookeeper.dir

/spark,ZooKeeper 保存集群元数据信息的文件目录,包括Worker,Driver和Application。

问题解决

启动失败

问题描述

Exception in thread "main"java.lang.NoClassDefFoundError: org/slf4j/Logger

       at java.lang.Class.getDeclaredMethods0(Native Method)

       at java.lang.Class.privateGetDeclaredMethods(Class.java:2625)

       at java.lang.Class.getMethod0(Class.java:2866)

       at java.lang.Class.getMethod(Class.java:1676)

       at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494)

       at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486)

Caused by:java.lang.ClassNotFoundException: org.slf4j.Logger

       at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

       at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

       at java.security.AccessController.doPrivileged(Native Method)

       at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

       at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

       at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)

       at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

问题解决

在spark-env.sh配置文件中增加slf4j相关jar包,如下:

for f in$HADOOP_HOME/share/hadoop/common/lib/*.jar; do

  if[ "$SPARK_CLASSPATH" ]; then

   export SPARK_CLASSPATH=$SPARK_CLASSPATH:$f

 else

   export SPARK_CLASSPATH=$f

  fi

done

 

for f in$HADOOP_HOME/share/hadoop/common/*.jar; do

  if[ "$SPARK_CLASSPATH" ]; then

   export SPARK_CLASSPATH=$SPARK_CLASSPATH:$f

 else

   export SPARK_CLASSPATH=$f

  fi

done

启动成功后一会自动shut down

问题描述

17/04/06 23:11:59 ERROR ActorSystemImpl:Uncaught fatal error from thread [sparkMaster-akka.actor.default-dispatcher-4]shutting down ActorSystem [sparkMaster]

java.lang.NoClassDefFoundError:com/fasterxml/jackson/databind/Module

       at java.lang.Class.forName0(Native Method)

       at java.lang.Class.forName(Class.java:278)

       at org.apache.spark.util.Utils$.classForName(Utils.scala:173)

       atorg.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:190)

       at org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:186)

       atscala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)

       atscala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)

       atscala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)

       at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)

       at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)

       at org.apache.spark.metrics.MetricsSystem.registerSinks(MetricsSystem.scala:186)

       at org.apache.spark.metrics.MetricsSystem.start(MetricsSystem.scala:100)

       at org.apache.spark.deploy.master.Master.onStart(Master.scala:152)

       at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$preStart$1.apply$mcV$sp(AkkaRpcEnv.scala:100)

       atorg.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:197)

       at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1.preStart(AkkaRpcEnv.scala:99)

       at akka.actor.ActorCell.create(ActorCell.scala:562)

       at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:425)

       at akka.actor.ActorCell.systemInvoke(ActorCell.scala:447)

       at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:262)

       at akka.dispatch.Mailbox.run(Mailbox.scala:218)

       atakka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)

       at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

       at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

       atscala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

       atscala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by:java.lang.ClassNotFoundException: com.fasterxml.jackson.databind.Module

       at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

       at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

       at java.security.AccessController.doPrivileged(Native Method)

       at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

       at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

       at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)

       at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

       ... 26 more

17/04/06 23:11:59 ERROR ErrorMonitor:Uncaught fatal error from thread [sparkMaster-akka.actor.default-dispatcher-4]shutting down ActorSystem [sparkMaster]

java.lang.NoClassDefFoundError:com/fasterxml/jackson/databind/Module

       at java.lang.Class.forName0(Native Method)

       at java.lang.Class.forName(Class.java:278)

       at org.apache.spark.util.Utils$.classForName(Utils.scala:173)

       atorg.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:190)

       atorg.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:186)

       atscala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)

       at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)

       atscala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)

       at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)

       at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)

       atorg.apache.spark.metrics.MetricsSystem.registerSinks(MetricsSystem.scala:186)

       at org.apache.spark.metrics.MetricsSystem.start(MetricsSystem.scala:100)

       at org.apache.spark.deploy.master.Master.onStart(Master.scala:152)

       atorg.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$preStart$1.apply$mcV$sp(AkkaRpcEnv.scala:100)

       atorg.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:197)

       atorg.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1.preStart(AkkaRpcEnv.scala:99)

       at akka.actor.ActorCell.create(ActorCell.scala:562)

       at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:425)

        atakka.actor.ActorCell.systemInvoke(ActorCell.scala:447)

       at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:262)

       at akka.dispatch.Mailbox.run(Mailbox.scala:218)

       at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)

       at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

       atscala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

       at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

       atscala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Caused by:java.lang.ClassNotFoundException: com.fasterxml.jackson.databind.Module

       at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

       at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

       at java.security.AccessController.doPrivileged(Native Method)

       at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

       at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

       at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)

       at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

       ... 26 more

17/04/06 23:12:00 WARN MetricsSystem:Stopping a MetricsSystem that is not running

17/04/06 23:12:00 ERROR AkkaRpcEnv: Ignoreerror: null

java.lang.NullPointerException

       at org.apache.spark.deploy.master.Master.onStop(Master.scala:198)

       at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$postStop$1.apply$mcV$sp(AkkaRpcEnv.scala:143)

       atorg.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:197)

       at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1.postStop(AkkaRpcEnv.scala:142)

       atakka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:201)

       atakka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:163)

       at akka.actor.ActorCell.terminate(ActorCell.scala:338)

       at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:431)

       at akka.actor.ActorCell.systemInvoke(ActorCell.scala:447)

       at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:262)

       at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:240)

       at akka.dispatch.Mailbox.run(Mailbox.scala:219)

       at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)

       at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)

       atscala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)

       at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)

       atscala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

17/04/06 23:12:00 INFORemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.

17/04/06 23:12:00 INFORemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceedingwith flushing remote transports.

17/04/06 23:12:01 INFO Remoting: Remotingshut down

17/04/06 23:12:01 INFORemoteActorRefProvider$RemotingTerminator: Remoting shut down.

问题解决

CLASSPATH中缺少jackson包

 

for f in$HADOOP_HOME/share/hadoop/mapreduce*/lib/*.jar; do

  if[ "$SPARK_CLASSPATH" ]; then

   export SPARK_CLASSPATH=$SPARK_CLASSPATH:$f

 else

   export SPARK_CLASSPATH=$f

  fi

done

 

Spark Streaming调试

使用sparkexample自带的streaming用例进行测试:计算从数据服务器监听TCP套接字接收的文本数据字数。

1、  启动netcat作为服务器

nc –lk 9999

2、  启动Streaming的实例

run-example streaming.NetworkWordCount 172.21.3.60 9999

3、  在终端netcat服务器发送数据