hadoop那些事儿_1

来源:互联网 发布:iphone网络制式怎么看 编辑:程序博客网 时间:2024/06/09 21:26

一、背景

1、上周生产集群加入几台节点,执行start-balancer后进度十分地缓慢,连续几天未完成。

2、屋漏偏逢连夜雨,周六供电线路被施工挖断,机房UPS在坚持几个小时后,集群整体宕掉。

3、周一供电正常后,集群再次启动。

二、问题

1、症状

(1)hadoop,hdfs启动后上传文件正常,日志中未发现异常。

(2)hbase,可以启动,但是启动后很多表的regions无法正常加载,执行hbase hbck异常比较多。hbase启动后hdfs上传文件出现错误。hbase表可以访问,但是其访问速度异常地慢。

2、解决

(1)排除硬件服务器异常。

(2)通过检查发现部分服务器的时间未与时钟服务器同步,手机同步一次,检查及重新配置执行计划。

(3)重点,根据节点日志上报的明显错误,调整了hdfs-site.xml中的参数。重启hdfs及hbase后正常。

三、日志

1、hadoop hdfs上传文件报出问题
17/03/29 13:30:31 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.io.EOFException: Premature EOF: no length prefix available
    at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2282)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1346)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1266)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
17/03/29 13:30:31 INFO hdfs.DFSClient: Abandoning BP-903121414-10.141.17.33-1461912427616:blk_1076230712_2574868
17/03/29 13:30:31 INFO hdfs.DFSClient: Excluding datanode DatanodeInfoWithStorage[10.141.17.47:50010,DS-9cf11117-1b97-400e-87f7-0dd4aad6c266,DISK]
17/03/29 13:30:31 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Got error, status message , ack with firstBadLink as 192.168.17.46:50010
    at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:140)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1363)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1266)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
17/03/29 13:30:31 INFO hdfs.DFSClient: Abandoning BP-903121414-10.141.17.33-1461912427616:blk_1076230713_2574869
17/03/29 13:30:31 INFO hdfs.DFSClient: Excluding datanode DatanodeInfoWithStorage[192.168.17.46:50010,DS-91025977-762a-46b7-bdd9-be49b4873cb5,DISK]
17/03/29 13:30:32 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Got error, status message , ack with firstBadLink as 192.168.17.37:50010
    at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:140)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1363)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1266)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
17/03/29 13:30:32 INFO hdfs.DFSClient: Abandoning BP-903121414-10.141.17.33-1461912427616:blk_1076230715_2574871
17/03/29 13:30:32 INFO hdfs.DFSClient: Excluding datanode DatanodeInfoWithStorage[192.168.17.37:50010,DS-ab386599-037c-44e1-929d-ca955d21dcf3,DISK]
17/03/29 13:30:32 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.io.EOFException: Premature EOF: no length prefix available
    at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2282)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1346)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1266)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
17/03/29 13:30:32 INFO hdfs.DFSClient: Abandoning BP-903121414-10.141.17.33-1461912427616:blk_1076230716_2574872
17/03/29 13:30:32 INFO hdfs.DFSClient: Excluding datanode DatanodeInfoWithStorage[192.168.17.42:50010,DS-8f01a8c7-beed-4609-98bb-529501104d90,DISK]
17/03/29 13:30:32 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.io.EOFException: Premature EOF: no length prefix available
    at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2282)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1346)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1266)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
17/03/29 13:30:32 INFO hdfs.DFSClient: Abandoning BP-903121414-10.141.17.33-1461912427616:blk_1076230717_2574873
17/03/29 13:30:32 INFO hdfs.DFSClient: Excluding datanode DatanodeInfoWithStorage[192.168.17.34:50010,DS-cce3cb5f-30d4-4e8c-97b6-b0104be9589e,DISK]
17/03/29 13:30:32 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.io.EOFException: Premature EOF: no length prefix available
    at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2282)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1346)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1266)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
17/03/29 13:30:32 INFO hdfs.DFSClient: Abandoning BP-903121414-10.141.17.33-1461912427616:blk_1076230718_2574874
17/03/29 13:30:32 INFO hdfs.DFSClient: Excluding datanode DatanodeInfoWithStorage[192.168.17.48:50010,DS-3ef57df8-b025-4d1e-8a10-271353590385,DISK]
17/03/29 13:30:32 WARN hdfs.DFSClient: DataStreamer Exception
java.io.IOException: Unable to create new block.
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1279)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
17/03/29 13:30:32 WARN hdfs.DFSClient: Could not get block locations. Source file "/user/hadoop/jhm/2016-01-01pmsln.txt._COPYING_" - Aborting...
put: Premature EOF: no length prefix available

2、datanode节点警告
2017-03-30 08:58:21,378 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: datanode35:50010:DataXceiverServer:
java.io.IOException: Xceiver count 8193 exceeds the limit of concurrent xcievers: 8192
        at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:140)
        at java.lang.Thread.run(Thread.java:744)
2017-03-30 08:58:21,484 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: datanode35:50010:DataXceiverServer:
java.io.IOException: Xceiver count 8193 exceeds the limit of concurrent xcievers: 8192
        at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:140)
        at java.lang.Thread.run(Thread.java:744)
2017-03-30 08:58:21,494 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: datanode35:50010:DataXceiverServer:
java.io.IOException: Xceiver count 8193 exceeds the limit of concurrent xcievers: 8192
        at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:140)
        at java.lang.Thread.run(Thread.java:744)
2017-03-30 08:58:22,302 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: datanode35:50010:DataXceiverServer:
java.io.IOException: Xceiver count 8193 exceeds the limit of concurrent xcievers: 8192
        at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:140)
        at java.lang.Thread.run(Thread.java:744)
2017-03-30 08:58:22,304 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: datanode35:50010:DataXceiverServer:
java.io.IOException: Xceiver count 8193 exceeds the limit of concurrent xcievers: 8192

四、经验

1、排除集群问题需要多多地分析日志,不仅要分析namenode也要分析datanode。

2、hdfs-site.xml中的dfs.datanode.max.xceievers参数值,由8192调整为32768,它设置过小会直接地影响hbase线程请求。这个值设置多少合适?8192需要占用节点内存8G,32768就是32G,我的节点内存都是384G,CPU也足够强大,修改后不影响。

0 0
原创粉丝点击