spark实践
来源:互联网 发布:网络行为管理 编辑:程序博客网 时间:2024/05/17 07:43
1 遇到的坑
0、使用quartz定时调用spark,必须有main函数,要不然work节点找不到函数入口
1、maven 未引入hadoop-client,结果报错误,找不到 Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.2.0</version>
</dependency>
2
16/07/28 09:52:01 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[appclient-registration-retry-thread,5,main]
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@6ed152eb rejected from java.util.concurrent.ThreadPoolExecutor@709afb23[Running, pool size = 1, active threads = 0, queued tasks = 0, completed tasks = 1]
at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2048)
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:821)
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1372)
at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:110)
at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1.apply(AppClient.scala:96)
at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1.apply(AppClient.scala:95)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at org.apache.spark.deploy.client.AppClient$ClientEndpoint.tryRegisterAllMasters(AppClient.scala:95)
at org.apache.spark.deploy.client.AppClient$ClientEndpoint.org$apache$spark$deploy$client$AppClient$ClientEndpoint$$registerWithMaster(AppClient.scala:121)
at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2$$anonfun$run$1.apply$mcV$sp(AppClient.scala:132)
at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1119)
at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2.run(AppClient.scala:124)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2 spark 远程可以提交job 一开始不能提交 spark-core 引用 1.5.2 换成 1.6.1就可以了
3 java.lang.VerifyError
引用三个版本的netty <groupId>io.netty</groupId>
<artifactId>netty</artifactId>
<packaging>bundle</packaging>
<version>3.8.0.Final</version>
<groupId>io.netty</groupId>
<artifactId>netty-parent</artifactId>
<version>4.0.29.Final</version>
<groupId>org.jboss.netty</groupId>
<artifactId>netty</artifactId>
<packaging>bundle</packaging>
<version>3.2.7.Final</version>
删除3.2.7 就可以了
4 报错:ERROR executor.CoarseGrainedExecutorBackend
这个错误比较隐晦,从信息上看来不知道是什么问题,但是归根结底还是内存的问题,有两个方法可以解决这个错误,一是,如上面所说,加大excutor-memory的值,减少executor-cores的数量,问题可以解决。二是,加大executor.overhead的值,但是这样其实并没有解决掉根本的问题。所以如果集群的资源是支持的话,就用1的办法吧。
另外,这个错误也出现在partitionBy(new HashPartition(partiton-num))时,如果partiton-num太大或者太小的时候会报这种错误,说白了也是内存的原因,不过这个时候增加内存和overhead没有什么用,得去调整这个partiton-num的值。
5:spark版本问题,本地使用spark1.6.2jar部署spark-1.6.2现网环境是spark-1.6.1报如下错误,修改本地jar为spark-1.6.1正常
org.apache.spark.SparkException: Job aborted due to stage failure: Task 4 in stage 572.0 failed 4 times, most recent failure: Lost task 4.3 in stage 572.0 (TID 1711, ip-10-8-3-18.ap-southeast-1.compute.internal): java.lang.ClassNotFoundException: org.apache.spark.rdd.PairRDDFunctions$$anonfun$reduceByKey$1$$anonfun$apply$16
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:278)
- spark实践
- Spark实践-日志查询
- Spark实践-日志查询
- Spark实践项目4:Spark基本概念
- Spark实践的阶段性总结
- 网易的Spark技术实践
- Spark企业级开发最佳实践
- Spark深入浅出企业级最佳实践
- spark和hive结合实践
- Spark实践之join优化
- Spark 动手实践 (2)
- Spark Streaming实践和优化
- spark streaming原理与实践
- Spark Streaming实践和优化
- Spark Streaming实践和优化
- [完]Spark安装学习实践
- Spark Streaming实践和优化
- Spark实践的阶段性总结
- MBProgressHUD防止show的时候阻塞用户交互
- 高并发系统设计
- Redis简介
- CvMat,Mat和IplImage之间的转化和拷贝
- 用css构建一个三角形
- spark实践
- 【Java基础】比较运算符compareTo()、equals()、==之间的区别与应用总结
- 程序员日常——后台和前端的友谊小船,说翻就翻
- 逃离迷宫(bfs)
- 【HDU5735 2016 Multi-University Training Contest 2B】【暴力做法 + 折半法】Born Slippy 祖先链的最大运算权值
- JavaEE程序猿的免费午餐③
- 华为在线训练(3)
- Centos几个处理文件常用的命令
- 扩展KMP学习