spark history server内存不足服务自动挂掉

来源:互联网 发布:受警醒明底线知敬畏 编辑:程序博客网 时间:2024/05/29 12:04

版本:Spark 1.5.2 built for Hadoop 2.4.0

今天spark的history server自己挂掉了,查看日志:

16/05/13 14:12:30 WARN DFSClient: Failed to connect to /192.168.2.77:50010 for block, add to deadNodes and continue. java.nio.channels.ClosedByInterruptException

java.nio.channels.ClosedByInterruptException
16/05/13 14:12:30 WARN BlockReaderFactory: I/O error constructing remote block reader.
java.nio.channels.ClosedByInterruptException
16/05/13 14:12:30 WARN DFSClient: Failed to connect to /192.168.2.45:50010 for block, add to deadNodes and continue. java.nio.channels.ClosedByInterruptException
java.nio.channels.ClosedByInterruptException
    at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
16/05/13 14:12:30 INFO DFSClient: Could not obtain BP-334845286-192.168.2.4-1418890858930:blk_1166565322_93633541 from any node: java.io.IOException: No live nodes contain current block No live nodes contain current block Block locations: 192.168.2.70:50010 192.168.2.77:50010 192.168.2.45:50010 Dead nodes:  192.168.2.45:50010 192.168.2.70:50010 192.168.2.77:50010. Will get new block locations from namenode and retry...
16/05/13 14:12:30 WARN DFSClient: DFS chooseDataNode: got # 1 IOException, will wait for 1999.7114150519237 msec.
16/05/13 14:12:30 WARN QueuedThreadPool: 1 threads could not be stopped
16/05/13 14:12:30 INFO ShutdownHookManager: Shutdown hook called
16/05/13 14:12:30 INFO DFSClient: Successfully connected to /192.168.2.70:50010 for BP-334845286-192.168.2.4-1418890858930:blk_1166565322_93633541
16/05/13 14:12:30 WARN ServletHandler:
javax.servlet.ServletException: java.util.concurrent.ExecutionException: java.io.IOException: Filesystem closed


是不是内存太小了,导致挂掉了,查看history java内存。


#/usr/jdk64/jdk1.7.0_67/bin/jstat -gcutil 93962 1000
  S0     S1     E              O      P     YGC     YGCT    FGC    FGCT     GCT   
  0.00   0.00 100.00  99.97  99.68   1118   19.415  4701 3516.056 3535.471
  0.00   0.00 100.00  99.97  99.68   1118   19.415  4702 3517.133 3536.548

主要看看O项都是99多

#top -p 93962
top - 16:33:17 up 277 days,  6:45,  1 user,  load average: 34.29, 27.25, 19.73
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Cpu(s): 96.9%us,  1.9%sy,  0.0%ni,  1.1%id,  0.0%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:  132103952k total, 123735940k used,  8368012k free,   364688k buffers
Swap:  8191992k total,    28440k used,  8163552k free, 61424260k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                             
 93962 root      20   0 6771m 1.2g  22m S 712.4  1.0 925:02.61 java  

# jps -v |grep 93962
235969 HistoryServer -Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=20 -Dspark.history.fs.logDirectory=hdfs://cluster:8020/spark-history -Xms1g -Xmx1g -XX:MaxPermSize=256m


修改spark-env.sh文件

增加 -Xms4096m -Xmx4096m

修改后内容(截选):
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Xms4096m -Xmx4096m -Dspark.history.retainedApplications=20 -Dspark.history.fs.logDirectory=hdfs://cluster:8020/spark-history"


重新启动

#/usr/local/spark/sbin/stop-history-server.sh
#/usr/local/spark/sbin/start-history-server.sh


#/usr/jdk64/jdk1.7.0_67/bin/jstat -gcutil 235969 1000
[root@dn12 conf]# /usr/jdk64/jdk1.7.0_67/bin/jstat -gcutil 235969 1000
  S0     S1      E            O      P     YGC     YGCT    FGC    FGCT     GCT   
  0.00   6.25   2.01   99.07  98.55    279    2.039     0    0.000    2.039
 12.50   0.00  10.06   99.07  98.55    280    2.043     0    0.000    2.043

#jps -v |grep 235969
235969 HistoryServer -Dspark.history.ui.port=18080 -Xms4g -Xmx4g -Dspark.history.retainedApplications=20 -Dspark.history.fs.logDirectory=hdfs://cluster:8020/spark-history-Xms1g -Xmx1g -XX:MaxPermSize=256m


#top -p 235969
top - 16:48:38 up 277 days,  7:00,  1 user,  load average: 27.02, 31.59, 29.16
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Cpu(s): 98.0%us,  1.9%sy,  0.0%ni,  0.1%id,  0.0%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:  132103952k total, 123061968k used,  9041984k free,   367248k buffers
Swap:  8191992k total,    28440k used,  8163552k free, 61682036k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                             
235969 root      20   0 5015m 478m  22m S 94.2  0.4   4:50.71 java                                 


设置貌似没有生效,-Xms4g -Xmx4g  和 -Xms1g -Xmx1g 同时存在。

找官网看看设置选项:
Environment Variable    Meaning
SPARK_DAEMON_MEMORY     Memory to allocate to the history server (default: 1g).
SPARK_DAEMON_JAVA_OPTS     JVM options for the history server (default: none).
SPARK_PUBLIC_DNS     The public address for the history server. If this is not set, links to application history may use the internal address of the server, resulting in broken links (default: none).
SPARK_HISTORY_OPTS     spark.history.* configuration options for the history server (default: none).

应该是设置SPARK_DAEMON_MEMORY

增加设置:
SPARK_DAEMON_MEMORY=2048m


修改后:

export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=20 -Dspark.history.fs.logDirectory=hdfs://geocloudcluster:8020/spark-history"
export SPARK_DAEMON_MEMORY=4096m



其他设置未用到(没测试):
SPARK_DAEMON_JAVA_OPTS=8128m
SPARK_WORKER_MEMORY=2048m
SPARK_REPL_OPTS=-XX:MaxPermSize=2048m


重复以上操作,目前看起来好了很多,还可以增加内存大小:
[root@dn12 conf]# /usr/jdk64/jdk1.7.0_67/bin/jstat -gcutil 104645 1000
  S0     S1        E           O          P     YGC     YGCT    FGC    FGCT     GCT   
  6.25   0.00  27.74  67.34  99.48    414    3.196     0    0.000    3.196
  6.25   0.00  27.74  67.34  99.48    414    3.196     0    0.000    3.196



# jps -v |grep 104645
104645 HistoryServer -Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=20 -Dspark.history.fs.logDirectory=hdfs://cluster:8020/spark-history -Xms4096m -Xmx4096m -XX:MaxPermSize=256m
参数已经生效。
0 0
原创粉丝点击