spark 案例集群测试整理
来源:互联网 发布:石金鑫知乎 编辑:程序博客网 时间:2024/05/29 07:29
工作过程:今天打算使用spark 自带的案例sparkpi 对集群进行测试,主要向了解集群启动过程及机器的负载情况。没想到问题还还真不少,感谢群友,特别是hali 支持。
主要的问题有3个:
1.测试spark 集群与local 运行方式使用的差别及集群测试时Ip 与机器访问的处理
2.spark 集群不能重启问题的处理
1。.测试spark 集群与local 运行方式使用的差别
1.1 本地启动
./run-example org.apache.spark.examples.SparkPi 2 spark://10.7.12.117:7077 这样启动,启动方式其实是Local模式。可以通过查看run-example脚本看出,并且./run-example org.apache.spark.examples.SparkPi 2 local 这样不行。注意本地启动,在http://10.7.12.117:8080/ 下看不到job 情况 ,
1.2 集群启动
./bin/spark-submit --master spark://jt-host-kvm-17:7077 --class org.apache.spark.examples.SparkPi --executor-memory 300m ./lib/spark-examples-1.1.0-hadoop2.4.0.jar 1
这里用ip有问题,错误如下
15/02/10 13:45:53 INFO storage.BlockManagerMaster: Updated info of block broadcast_0_piece0
15/02/10 13:45:53 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from Stage 0 (MappedRDD[1] at map at SparkPi.scala:31)
15/02/10 13:45:53 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
15/02/10 13:46:08 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
15/02/10 13:46:13 INFO client.AppClient$ClientActor: Connecting to master spark://10.7.12.117:7077...
15/02/10 13:46:23 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
15/02/10 13:46:33 INFO client.AppClient$ClientActor: Connecting to master spark://10.7.12.117:7077...
15/02/10 13:46:38 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
15/02/10 13:46:53 ERROR cluster.SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
15/02/10 13:46:53 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
15/02/10 13:46:53 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0
15/02/10 13:46:53 INFO scheduler.DAGScheduler: Failed to run reduce at SparkPi.scala:35
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: All masters are unresponsive! Giving up.
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
其他群友支持的资料
http://www.datastax.com/dev/blog/common-spark-troubleshooting
2.spark 集群不能重启问题的处理:
执行stop-all.sh 停止spark 集群命令后提示,如下
jt-host-kvm-17: no org.apache.spark.deploy.worker.Worker to stop
jt-host-kvm-19: no org.apache.spark.deploy.worker.Worker to stop
jt-host-kvm-18: no org.apache.spark.deploy.worker.Worker to stop
no org.apache.spark.deploy.master.Master to stop
初步分析是worker.pid或者master.pid默认位置 在/tmp 文件夹下,可能被删除了 因为 在RHEL6中,系统自动清理/tmp文件夹的默认时限是30天
配置环境变量 SPARK_PID_DIR
- spark 案例集群测试整理
- spark 案例集群测试整理
- spark 案例集群测试整理
- spark streaming案例整理
- spark测试集群搭建
- Spark集群搭建与测试
- Spark集群搭建与测试
- Spark入门 - 3 测试Spark集群
- spark-submit提交任务到集群-案例
- Hadoop集群案例测试-中文乱码处理
- 精通Spark集群搭建与测试
- Spark集群搭建及测试中的问题
- 通过Spark Shell测试Spark集群以cache机制
- Spark 集群搭建从零开始之3 Spark Standalone集群安装、配置与测试
- 编译spark方法,及编译、案例测试问题总结
- spark集群
- spark 集群
- spark集群
- 第14章 端口
- 重新教自己学算法之递归排序——合并排序(五)
- 本地项目关联SVN
- 第15章 外中断
- java打包成桌面exe文件
- spark 案例集群测试整理
- Cloudera Manager 5和CDH5离线安装
- oracle_常用初级命令
- hihocoder1033(数位dp)
- spark开发指南
- 第16章 直接定址表
- 文章标题
- java串口包安装
- OC 语言学习第一天—OC 语法概览