解决spark运行时Java heap space问题
来源:互联网 发布:淘宝介入买家怎么赢 编辑:程序博客网 时间:2024/05/19 12:25
问题描述:
在执行spark程序时,需要读取200w数据作为缓存。在利用.broadcast广播这些数据时,遇到Exception in thread "main" java.lang.OutOfMemoryError: Java heap space问题。
报错信息如下:
15/09/15 05:26:09 INFO storage.BlockManagerInfo: Removed broadcast_3_piece0 on ip-172-31-10-136.ec2.internal:34472 in memory (size: 2.0 KB, free: 397.3 MB)15/09/15 05:26:09 INFO spark.ContextCleaner: Cleaned broadcast 3Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.io.ObjectOutputStream$HandleTable.growEntries(ObjectOutputStream.java:2351) at java.io.ObjectOutputStream$HandleTable.assign(ObjectOutputStream.java:2276) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1428) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) at java.util.ArrayList.writeObject(ArrayList.java:762) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1496) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:44) at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:202) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:101) at org.apache.spark.broadcast.TorrentBroadcast.<init>(TorrentBroadcast.scala:84) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:29) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1051) at org.apache.spark.api.java.JavaSparkContext.broadcast(JavaSparkContext.scala:648) at com.myspark.spark.task.Spark_task.main(Spark_task.java:77) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)进一步地,查看报错位置之前的几句信息:
15/09/15 05:26:09 INFO storage.MemoryStore: Block broadcast_3 of size 3488 dropped from memory (free 280236528)15/09/15 05:26:09 INFO storage.BlockManagerInfo: Removed broadcast_3_piece0 on ip-172-31-10-135.ec2.internal:51942 in memory (size: 2.0 KB, free: 398.1 MB)15/09/15 05:26:09 INFO storage.BlockManagerInfo: Removed broadcast_3_piece0 on ip-172-31-10-136.ec2.internal:34472 in memory (size: 2.0 KB, free: 397.3 MB)15/09/15 05:26:09 INFO spark.ContextCleaner: Cleaned broadcast 3说明内存不够了。
解决办法:
spark不能通过java -Xms32m -Xmx800m className来添加内存,spark不支持该格式,从./bin/spark-submit --help中也没有看到该格式。所以只能从spark本身入手。
查看./bin/spark-submit --help,发现
--driver-memory MEM Memory for driver (e.g. 1000M, 2G) (Default: 512M).于是,修改运行提交语句为,运行成功:
./bin/spark-submit --class com.myspark.spark.task.Spark_task --master yarn-client --driver-memory 1g /home/hadoop/myspark/spark-example-test-0.0.1-SNAPSHOT.jar s3://********** s3://*********** /test/myspark/spark35
对于executor-memory,由于我是在基于yarn的spark上运行的,可能这个是有yarn自己来控制。这里设置时,是无效的。可能在local模式时,可以设置。具体细节待实验研究。
--executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G)
./bin/spark-submit --help具体信息如下:
Options: --master MASTER_URL spark://host:port, mesos://host:port, yarn, or local. --deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or on one of the worker machines inside the cluster ("cluster") (Default: client). --class CLASS_NAME Your application's main class (for Java / Scala apps). --name NAME A name of your application. --jars JARS Comma-separated list of local jars to include on the driver and executor classpaths. --packages Comma-separated list of maven coordinates of jars to include on the driver and executor classpaths. Will search the local maven repo, then maven central and any additional remote repositories given by --repositories. The format for the coordinates should be groupId:artifactId:version. --repositories Comma-separated list of additional remote repositories to search for the maven coordinates given with --packages. --py-files PY_FILES Comma-separated list of .zip, .egg, or .py files to place on the PYTHONPATH for Python apps. --files FILES Comma-separated list of files to be placed in the working directory of each executor. --conf PROP=VALUE Arbitrary Spark configuration property. --properties-file FILE Path to a file from which to load extra properties. If not specified, this will look for conf/spark-defaults.conf. --driver-memory MEM Memory for driver (e.g. 1000M, 2G) (Default: 512M). --driver-java-options Extra Java options to pass to the driver. --driver-library-path Extra library path entries to pass to the driver. --driver-class-path Extra class path entries to pass to the driver. Note that jars added with --jars are automatically included in the classpath. --executor-memory MEM Memory per executor (e.g. 1000M, 2G) (Default: 1G). --proxy-user NAME User to impersonate when submitting the application. --help, -h Show this help message and exit --verbose, -v Print additional debug output --version, Print the version of current Spark Spark standalone with cluster deploy mode only: --driver-cores NUM Cores for driver (Default: 1). --supervise If given, restarts the driver on failure. --kill SUBMISSION_ID If given, kills the driver specified. --status SUBMISSION_ID If given, requests the status of the driver specified. Spark standalone and Mesos only: --total-executor-cores NUM Total cores for all executors. YARN-only: --driver-cores NUM Number of cores used by the driver, only in cluster mode (Default: 1). --executor-cores NUM Number of cores per executor (Default: 1). --queue QUEUE_NAME The YARN queue to submit to (Default: "default"). --num-executors NUM Number of executors to launch (Default: 2). --archives ARCHIVES Comma separated list of archives to be extracted into the working directory of each executor.
0 0
- 解决spark运行时Java heap space问题
- 解决Hadoop运行jar包时MapReduce任务启动前OutOfMemoryError:Java heap space问题
- Spark 运行出现java.lang.OutOfMemoryError: Java heap space
- 如何解决java heap space问题
- java heap space问题终于解决了
- Java heap space解决
- Java heap space问题
- Mtalab 运行问题:java.lang.OutOfMemoryError:Java heap space
- 解决android大型项目的 Java heap space的问题
- spark java.lang.OutOfMemoryError: Java heap space
- Spark java.lang.OutOfMemoryError: Java heap space
- myeclipse中运行项目时出现Java heap space的问题
- 项目运行时Java heap space(OutOfMemoryError)
- 解决eclipse因导入jar包太大导致Unable to execute dex: Java heap space Java heap space的问题
- 解决java.lang.OutOfMemoryError: Java heap space
- Java heap space----java 内存溢出解决
- java.lang.OutOfMemoryError: Java heap space 解决
- 如何解决OutOfMemoryError: Java heap space
- [LeetCode-190] Reverse Bits(反转位)
- Mantis运行环境的安装以及邮件相关配置
- 【springmvc+mybatis项目实战】杰信商贸-14.购销合同添加+修改+删除+查看
- 麻将432牌型听牌判断流程图
- Druid数据库连接池使用
- 解决spark运行时Java heap space问题
- 增量实现
- Effective C++——条款38(第6章)
- Python--os模块
- Hadoop配置项整理(hdfs-site.xml)
- Android之从网络上获取图片的两种方式讲解:thread+handle和AsyncTask方式
- 基于OpenCV实现视频中人脸定位
- Hadoop配置项整理(core-site.xml)
- php 框架感悟