Spark集群安装及Streaming调试
来源:互联网 发布:除了淘宝联盟哪个最好 编辑:程序博客网 时间:2024/06/08 19:10
安装前置条件
1. 系统需要安装the Oracle Java Development Kit(not OpenJDK),安装jdk1.7以上,下载目录:http://www.oracle.com/technetwork/java/javase/downloads/index-jsp-138363.html?ssSourceSiteId=ocomen。
检查方法:java -version
2. 安装python2.7 以上,并配置环境变量和PATH,到https://www.python.org/ftp/python/2.7.3/Python-2.7.3.tar.bz2下载64位版本
检查方法:python -version
3. 在集群所有机器中,安装scala 2.10以上,并配置环境变量和PATH,下载目录:http://downloads.typesafe.com/scala/2.10.5/scala-2.10.5.tgz
检查方法:scala命令可使用
4. 安装rsync组件;
检查方法:scp命令可使用
5. 安装ssh并在e3base用户下完成集群之间的无密码访问配置
6. 集群之间保证ntp时钟同步。
检查方法:service ntp status、service ntpd restart
7. 集群各主机一定要永久关闭防火墙和selinux。
检查方法:service iptables status、service iptables stop
8. 将集群中所有的主机名及ip对应关系都添加到每台主机的/etc/hosts中。
Scala安装
解压安装包
tar -xzvf scala-2.10.6.tgz
mv scala-2.10.6 scala
配置环境变量
exportSCALA_HOME=/home/hadoop/cdh5.5.0/scala
export PATH=$SCALA_HOME/bin:$PATH
使环境变量生效
source ~/.bash_profile
验证是否安装成功
执行如下命令:
scala –version
出现如下信息则安装成功
Spark集群部署
安装
解压压缩包
tar -xzvf spark-1.5.0-cdh5.5.0.tar.gz
mv spark-1.5.0-cdh5.5.0 spark
配置环境变量
vi ~/.bash_profile
exportSPARK_HOME=/home/hadoop/cdh5.5.0/spark
export PATH=$SPARK_HOME/bin:$PATH
使环境变量生效
source ~/.bash_profile
配置
spark-env.sh
cp spark-env.sh.template spark-env.sh
写入如下信息
vi spark-env.sh
slaves
cp slaves.template slaves
将Worker主机写入slaves
启动
将配置copy到Worker主机
cd $SPARK_HOME/sbin
sh start-all.sh
(注意:启动之前保证Hadoop已经启动,否则Master会启动失败)
Master HA配置
Master在从standby状态到active状态时,这个过程会影响新程序的提交,已经运行的程序不受影响。
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER-Dspark.deploy.zookeeper.url=worker1:2181,worker2:2181,worker3:2181-Dspark.deploy.zookeeper.dir=/spark"
参数:
spark.deploy.recoveryMode=ZOOKEEPER
spark.deploy.zookeeper.url=worker1:2181,worker2:2181,worker3:2181
spark.deploy.zookeeper.dir=/spark
spark.deploy.recoveryMode
恢复模式(Master重新启劢的模式),有三种:1.ZooKeeper,2. FileSystem, 3 NONE
spark.deploy.zookeeper.url
ZooKeeper的Server地址
spark.deploy.zookeeper.dir
/spark,ZooKeeper 保存集群元数据信息的文件目录,包括Worker,Driver和Application。
问题解决
启动失败
问题描述
Exception in thread "main"java.lang.NoClassDefFoundError: org/slf4j/Logger
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2625)
at java.lang.Class.getMethod0(Class.java:2866)
at java.lang.Class.getMethod(Class.java:1676)
at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486)
Caused by:java.lang.ClassNotFoundException: org.slf4j.Logger
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
问题解决
在spark-env.sh配置文件中增加slf4j相关jar包,如下:
for f in$HADOOP_HOME/share/hadoop/common/lib/*.jar; do
if[ "$SPARK_CLASSPATH" ]; then
export SPARK_CLASSPATH=$SPARK_CLASSPATH:$f
else
export SPARK_CLASSPATH=$f
fi
done
for f in$HADOOP_HOME/share/hadoop/common/*.jar; do
if[ "$SPARK_CLASSPATH" ]; then
export SPARK_CLASSPATH=$SPARK_CLASSPATH:$f
else
export SPARK_CLASSPATH=$f
fi
done
启动成功后一会自动shut down
问题描述
17/04/06 23:11:59 ERROR ActorSystemImpl:Uncaught fatal error from thread [sparkMaster-akka.actor.default-dispatcher-4]shutting down ActorSystem [sparkMaster]
java.lang.NoClassDefFoundError:com/fasterxml/jackson/databind/Module
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
at org.apache.spark.util.Utils$.classForName(Utils.scala:173)
atorg.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:190)
at org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:186)
atscala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
atscala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
atscala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
at org.apache.spark.metrics.MetricsSystem.registerSinks(MetricsSystem.scala:186)
at org.apache.spark.metrics.MetricsSystem.start(MetricsSystem.scala:100)
at org.apache.spark.deploy.master.Master.onStart(Master.scala:152)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$preStart$1.apply$mcV$sp(AkkaRpcEnv.scala:100)
atorg.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:197)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1.preStart(AkkaRpcEnv.scala:99)
at akka.actor.ActorCell.create(ActorCell.scala:562)
at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:425)
at akka.actor.ActorCell.systemInvoke(ActorCell.scala:447)
at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:262)
at akka.dispatch.Mailbox.run(Mailbox.scala:218)
atakka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
atscala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
atscala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by:java.lang.ClassNotFoundException: com.fasterxml.jackson.databind.Module
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 26 more
17/04/06 23:11:59 ERROR ErrorMonitor:Uncaught fatal error from thread [sparkMaster-akka.actor.default-dispatcher-4]shutting down ActorSystem [sparkMaster]
java.lang.NoClassDefFoundError:com/fasterxml/jackson/databind/Module
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
at org.apache.spark.util.Utils$.classForName(Utils.scala:173)
atorg.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:190)
atorg.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:186)
atscala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
atscala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
atorg.apache.spark.metrics.MetricsSystem.registerSinks(MetricsSystem.scala:186)
at org.apache.spark.metrics.MetricsSystem.start(MetricsSystem.scala:100)
at org.apache.spark.deploy.master.Master.onStart(Master.scala:152)
atorg.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$preStart$1.apply$mcV$sp(AkkaRpcEnv.scala:100)
atorg.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:197)
atorg.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1.preStart(AkkaRpcEnv.scala:99)
at akka.actor.ActorCell.create(ActorCell.scala:562)
at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:425)
atakka.actor.ActorCell.systemInvoke(ActorCell.scala:447)
at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:262)
at akka.dispatch.Mailbox.run(Mailbox.scala:218)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
atscala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
atscala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by:java.lang.ClassNotFoundException: com.fasterxml.jackson.databind.Module
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 26 more
17/04/06 23:12:00 WARN MetricsSystem:Stopping a MetricsSystem that is not running
17/04/06 23:12:00 ERROR AkkaRpcEnv: Ignoreerror: null
java.lang.NullPointerException
at org.apache.spark.deploy.master.Master.onStop(Master.scala:198)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$postStop$1.apply$mcV$sp(AkkaRpcEnv.scala:143)
atorg.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:197)
at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1.postStop(AkkaRpcEnv.scala:142)
atakka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:201)
atakka.actor.dungeon.FaultHandling$class.terminate(FaultHandling.scala:163)
at akka.actor.ActorCell.terminate(ActorCell.scala:338)
at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:431)
at akka.actor.ActorCell.systemInvoke(ActorCell.scala:447)
at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:262)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:240)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
atscala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
atscala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
17/04/06 23:12:00 INFORemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
17/04/06 23:12:00 INFORemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceedingwith flushing remote transports.
17/04/06 23:12:01 INFO Remoting: Remotingshut down
17/04/06 23:12:01 INFORemoteActorRefProvider$RemotingTerminator: Remoting shut down.
问题解决
CLASSPATH中缺少jackson包
for f in$HADOOP_HOME/share/hadoop/mapreduce*/lib/*.jar; do
if[ "$SPARK_CLASSPATH" ]; then
export SPARK_CLASSPATH=$SPARK_CLASSPATH:$f
else
export SPARK_CLASSPATH=$f
fi
done
Spark Streaming调试
使用sparkexample自带的streaming用例进行测试:计算从数据服务器监听TCP套接字接收的文本数据字数。
1、 启动netcat作为服务器
nc –lk 9999
2、 启动Streaming的实例
run-example streaming.NetworkWordCount 172.21.3.60 9999
3、 在终端netcat服务器发送数据
- Spark集群安装及Streaming调试
- spark streaming 调试技巧
- 安装spark集群及spark介绍
- Spark学习笔记(30)集群运行模式下的Spark Streaming调试
- 提交任务到spark集群及spark集群的安装
- Spark定制班第31课:集群运行模式下的Spark Streaming调试和难点解决实战经验分享
- Spark Streaming场景应用- Spark Streaming计算模型及监控
- 第31课:集群运行模式下的Spark Streaming调试和难点解决实战经验分享
- Hadoop集群安装spark集群
- Spark集群安装
- spark集群安装
- Spark集群安装
- spark HA集群安装
- Spark standalone集群安装
- Spark集群安装
- 安装spark集群
- Spark集群安装
- Spark集群安装部署
- Oracle基本操作十三:函数和包
- SpringMVC学习(二)——SpringMVC架构及组件
- 卷积神经网络秒懂
- Android 6.0 运行时权限处理完全解析
- 找零钱问题
- Spark集群安装及Streaming调试
- Linux环境使用Nexus3搭建Maven私服
- 4-2 多项式求值 (15分)
- JAVA的反射机制==>用反射分析类的实现
- 深入理解Java类加载器(二):线程上下文类加载器
- [CVPR2017]CFNet_End-to-end representation learning for Correlation Filter based tracking
- Oracle 如何规范清理v$archived_log记录
- 虚拟机windows2003DHCP服务器配置后如何验证?
- ERR_EMPTY_RESPONSE解决方案