HDFS block写报错

来源:互联网 发布:六级英语听力软件 编辑:程序博客网 时间:2024/06/02 04:12

客户端日志

17/08/08 00:26:35 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps17/08/08 00:29:13 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-1824149417-26.6.0.28-1450834291847:blk_1861569948_787849489java.net.SocketTimeoutException: 75000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/26.6.0.35:49436 remote=/26.6.0.35:50010]    at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)    at java.io.FilterInputStream.read(FilterInputStream.java:83)    at java.io.FilterInputStream.read(FilterInputStream.java:83)    at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2239)    at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:235)

错误分析:

block 开始写入时间 00:27:58,在29:13 时,报75秒写入超时;
pipeline中后面两个datanode已经BR=BA,且已经向namenode ack,第一个datanode没有进行BA,后续block replica重新调整的时候,会删除第一个replica。

namenode 日志

2017-08-08 00:27:39,319 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /user/dsp/.staging/job_1498626555117_1919461/libjars/kite-data-core.jar. BP-1824149417-26.6.0.28-1450834291847 blk_1861569948_787849489{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-959c2b78-f49e-44af-9b6f-9889082ad46d:NORMAL:26.6.0.35:50010|RBW], ReplicaUnderConstruction[[DISK]DS-e879f580-e428-40e6-906b-9d8cd00bf9c7:NORMAL:26.6.0.74:50010|RBW], ReplicaUnderConstruction[[DISK]DS-65f2839f-e294-4f56-a1b4-f3b2c17fa9ff:NORMAL:26.6.0.73:50010|RBW]]}2017-08-08 00:27:58,039 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 26.6.0.73:50010 is added to blk_1861569948_787849489{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-959c2b78-f49e-44af-9b6f-9889082ad46d:NORMAL:26.6.0.35:50010|RBW], ReplicaUnderConstruction[[DISK]DS-ce28452d-1ab5-437e-bb81-10639a186144:NORMAL:26.6.0.74:50010|RBW], ReplicaUnderConstruction[[DISK]DS-c4d0815e-baa7-4565-86b2-e313107ad046:NORMAL:26.6.0.73:50010|RBW]]} size 02017-08-08 00:27:58,039 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 26.6.0.74:50010 is added to blk_1861569948_787849489{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-959c2b78-f49e-44af-9b6f-9889082ad46d:NORMAL:26.6.0.35:50010|RBW], ReplicaUnderConstruction[[DISK]DS-ce28452d-1ab5-437e-bb81-10639a186144:NORMAL:26.6.0.74:50010|RBW], ReplicaUnderConstruction[[DISK]DS-c4d0815e-baa7-4565-86b2-e313107ad046:NORMAL:26.6.0.73:50010|RBW]]} size 02017-08-08 00:29:13,185 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1824149417-26.6.0.28-1450834291847:blk_1861569948_787849489, newGenerationStamp=787854121, newLength=2200892, newNodes=[26.6.0.74:50010, 26.6.0.73:50010], clientName=DFSClient_NONMAPREDUCE_-1205074322_1)-- 更新block id及block副本2017-08-08 00:29:13,185 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(BP-1824149417-26.6.0.28-1450834291847:blk_1861569948_787849489) successfully to BP-1824149417-26.6.0.28-1450834291847:blk_1861569948_7878541212017-08-08 00:29:15,658 INFO BlockStateChange: BLOCK* ask 26.6.0.73:50010 to replicate blk_1861569948_787854121 to datanode(s) 26.6.0.85:50010 26.6.0.89:50010 26.6.0.44:50010 26.6.0.49:50010 26.6.0.55:50010 26.6.0.70:50010 26.6.0.58:50010 26.6.0.86:500102017-08-08 00:29:16,552 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 26.6.0.85:50010 is added to blk_1861569948_787854121 size 22008922017-08-08 00:29:16,556 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 26.6.0.89:50010 is added to blk_1861569948_787854121 size 22008922017-08-08 00:29:16,557 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 26.6.0.44:50010 is added to blk_1861569948_787854121 size 22008922017-08-08 00:29:16,559 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 26.6.0.49:50010 is added to blk_1861569948_787854121 size 22008922017-08-08 00:29:16,560 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 26.6.0.55:50010 is added to blk_1861569948_787854121 size 22008922017-08-08 00:29:16,560 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 26.6.0.58:50010 is added to blk_1861569948_787854121 size 22008922017-08-08 00:29:16,560 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 26.6.0.70:50010 is added to blk_1861569948_787854121 size 22008922017-08-08 00:29:16,561 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: 26.6.0.86:50010 is added to blk_1861569948_787854121 size 2200892-- 删除失效的block副本2017-08-08 00:29:23,526 INFO BlockStateChange: BLOCK* invalidateBlock: blk_1861569948_787849489(stored=blk_1861569948_787854121) on 26.6.0.35:500102017-08-08 00:29:23,526 INFO BlockStateChange: BLOCK* InvalidateBlocks: add blk_1861569948_787849489 to 26.6.0.35:50010

DataNode 日志

2017-08-08 00:27:43,949 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving BP-1824149417-26.6.0.28-1450834291847:blk_1861569948_787849489 src: /26.6.0.35:49436 dest: /26.6.0.35:50010收到终止写入的命令2017-08-08 00:29:19,141 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /26.6.0.35:49436, dest: /26.6.0.35:50010, bytes: 2200892, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-1205074322_1, offset: 0, srvID: 270c1a98-75e6-472d-b8a9-82e2cc465a6a, blockid: BP-1824149417-26.6.0.28-1450834291847:blk_1861569948_787849489, duration: 110038372017-08-08 00:29:19,141 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-1824149417-26.6.0.28-1450834291847:blk_1861569948_787849489, type=HAS_DOWNSTREAM_IN_PIPELINE terminating-- 异步磁盘管理服务删除失效block2017-08-08 00:29:36,955 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService: Deleted BP-1824149417-26.6.0.28-1450834291847 blk_1861569948_787849489 file /data/data10/dn/current/BP-1824149417-26.6.0.28-1450834291847/current/finalized/subdir245/subdir77/blk_1861569948

其他

异常指标数据:

dfs.datanode.HeartbeatsAvgTime

http://26.6.0.24/ganglia/graph.php?r=day&z=xlarge&c=pasc&h=cnsz033402&jr=&js=&event=hide&ts=0&v=1.1666666666666665&m=dfs.datanode.HeartbeatsAvgTime

查看最新的文件,block分布

sudo su - hdfs -c 'hadoop fsck /user/dsp/.staging/job_1498626555117_1919461/libjars/kite-data-core.jar  -files -blocks -locations -racks'FSCK started by hdfs (auth:SIMPLE) from /26.6.0.28 for path /user/dsp/.staging/job_1498626555117_1919461/libjars/kite-data-core.jar at Tue Aug 08 11:42:53 CST 2017/user/dsp/.staging/job_1498626555117_1919461/libjars/kite-data-core.jar 2200892 bytes, 1 block(s):  OK0. BP-1824149417-26.6.0.28-1450834291847:blk_1861569948_787854121 len=2200892 Live_repl=10 [/GL1-16A/26.6.0.74:50010, /GL1-16A/26.6.0.73:50010, /GL1-21A/26.6.0.85:50010, /GL1-21A/26.6.0.89:50010, /GL1-18A/26.6.0.44:50010, /GL1-19A/26.6.0.49:50010, /GL1-19A/26.6.0.55:50010, /GL1-20A/26.6.0.58:50010, /GL1-20A/26.6.0.70:50010, /GL1-22A/26.6.0.86:50010]2017-08-08 00:27:39,319 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: /user/dsp/.staging/job_1498626555117_1919461/libjars/kite-data-core.jar. BP-1824149417-26.6.0.28-1450834291847 blk_1861569948_787849489{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-959c2b78-f49e-44af-9b6f-9889082ad46d:NORMAL:26.6.0.35:50010|RBW], ReplicaUnderConstruction[[DISK]DS-e879f580-e428-40e6-906b-9d8cd00bf9c7:NORMAL:26.6.0.74:50010|RBW], ReplicaUnderConstruction[[DISK]DS-65f2839f-e294-4f56-a1b4-f3b2c17fa9ff:NORMAL:26.6.0.73:50010|RBW]]}2017-08-08 00:29:13,246 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /user/dsp/.staging/job_1498626555117_1919461/libjars/kite-data-core.jar is closed by DFSClient_NONMAPREDUCE_-1205074322_12017-08-08 00:29:13,247 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Increasing replication from 3 to 10 for /user/dsp/.staging/job_1498626555117_1919461/libjars/kite-data-core.jar