hadoop运维系列笔记之FlumeNG往Hadoop写数据超时

来源:互联网 发布:matlab求矩阵方差 编辑:程序博客网 时间:2024/06/04 18:28

最近在测试环境下,把NameNode节点更换了一台机器,并增加了一些数据节点。

在FlumeNG往hadoop集群写数据的过程中,有些节点报以下类似的错误

2013-04-24 13:25:50,180 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 0 for block blk_7250572864082551945_1009 terminating
2013-04-24 13:25:50,180 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_7250572864082551945_1009 received exception java.io.EOFException: while trying to read 65557 bytes
2013-04-24 13:25:50,180 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(192.168.1.64:50010, storageID=DS-618726863-192.168.1.64-50010-1366713546117, infoPort=50075, ipcPort=50020):DataXceiver
java.io.EOFException: while trying to read 65557 bytes
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:268)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:312)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:376)
        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:532)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:398)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:107)
        at java.lang.Thread.run(Thread.java:662)
2013-04-24 13:32:38,796 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting asynchronous block report scan
2013-04-24 13:32:38,797 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Finished asynchronous block report scan in 0ms

同时Hadoop的客户端(FlumeNG HDFS sink)报一下错误

24 Apr 2013 16:49:44,238 INFO  [hdfs-hdfs_sink1-call-runner-0] (org.apache.flume.sink.hdfs.BucketWriter.doOpen:208)  - Creating hdfs://wxlab41:9100/bet/13-04-24/1640/00/FlumeData.1366793384056.tmp
24 Apr 2013 16:50:56,401 WARN  [ResponseProcessor for block blk_3082549089740838379_1022] (org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run:2631)  - DFSOutputStream ResponseProcessor exception  for block blk_3082549089740838379_1022java.net.SocketTimeoutException: 69000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.1.31:21709 remote=/192.168.1.58:50010]
        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
        at java.io.DataInputStream.readFully(DataInputStream.java:178)
        at java.io.DataInputStream.readLong(DataInputStream.java:399)
        at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:124)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2587)
关于这个问题的详细的分析可以参见
http://blog.csdn.net/zhaokunwu/article/details/7336892

Dfs client端socket超时,需要修改 client端的dfs.socket.timeout参数。

发现FlumeNG1.3.1还不支持对HDFS Configuration客户端参数配置,因此修改了通过更改源码的方式解决这个问题。
 
修改org.apache.flume.sink.hdfs.BucketWriter类的

  private void doOpen() throws IOException {
    if ((filePath == null) || (writer == null) || (formatter == null)) {
      throw new IOException("Invalid file settings");
    }

    Configuration config = new Configuration();
    // disable FileSystem JVM shutdown hook
    config.setBoolean("fs.automatic.close", false);

    //增加以下两行

    config.set("dfs.socket.timeout", "3600000");
    config.set("dfs.datanode.socket.write.timeout", "3600000");

 


 

 

 

原创粉丝点击