Flume agent端event重复发送(数据暴增不一致)的问题

来源:互联网 发布:nxlog windows 编辑:程序博客网 时间:2024/06/06 00:20

       一直用flume做数据收集,用了好一阵子都没出现数据误差的问题,今天在导入数据的时候却突然出现了数据暴增的问题,查看原始数据文件,只有几十万条数据,可却收集到了几百万的event,而且还在持续的增加。很奇怪,首先以为是插件更新的问题,后来换成最原始的console作为Collector的sink,问题依旧存在。然后清理配置数据重新启动,问题还是没有解决。

      查看原始数据并与原有的数据对比,发现有超大内容的的event,是业务端的错误。所以估计应该是大event的问题,然后查看源码,发现异样

public class SyslogUdpSource extends EventSource.Base {  static final Logger LOG = LoggerFactory.getLogger(SyslogUdpSource.class);  final public static int SYSLOG_UDP_PORT = 514;  int port = SYSLOG_UDP_PORT; // default udp syslog port  int maxsize = 1 << 16; // 64k is max allowable in RFC 5426  long rejects = 0;  DatagramSocket sock;  public SyslogUdpSource() {  }  public SyslogUdpSource(int port) {    this.port = port;  }  @Override  public void close() throws IOException {    LOG.info("closing SyslogUdpSource on port " + port);    if (sock == null) {      LOG.warn("double close of SyslogUdpSocket on udp:" + port          + " , (this is ok but odd)");      return;    }    sock.close();  }  @Override  public Event next() throws IOException {    byte[] buf = new byte[maxsize];    DatagramPacket pkt = new DatagramPacket(buf, maxsize);    Event e = null;    do { // loop until we get a valid packet, drop bad ones.      sock.receive(pkt);      ByteBuffer bb = ByteBuffer.wrap(buf, 0, pkt.getLength());      ByteBufferInputStream bbis = new ByteBufferInputStream(bb);      DataInputStream in = new DataInputStream(bbis);      try {        e = SyslogWireExtractor.extractEvent(in);      } catch (EventExtractException ex) {        rejects++;        LOG.warn(rejects + " rejected packets. packet: " + pkt, ex);        LOG.debug("raw bytes " + Arrays.toString(pkt.getData()));        // TODO (jon) maybe have a hook here to do something with rejects      }      // need a sane way to fall out of his loop.    } while (e == null);    updateEventProcessingStats(e);    return e;  }  @Override  public void open() throws IOException {    sock = new DatagramSocket(port);  }  public static SourceBuilder builder() {    return new SourceBuilder() {      @Override      public EventSource build(Context ctx, String... argv) {        int port = SYSLOG_UDP_PORT; // default udp port, need root permissions        // for this.        if (argv.length > 1) {          throw new IllegalArgumentException("usage: syslogUdp([port no]) ");        }        if (argv.length == 1) {          port = Integer.parseInt(argv[0]);        }        return new SyslogUdpSource(port);      }    };  }}

64k is max allowable in RFC 5426(有关RFC 5426  event大小的限制)

 

然后网上查看各种资料,原来当event大于64K时,flume agent端会重复发送该event,从而导致数据暴增的问题(更详细的原因有待有空的时候考证)。所以应该在业务端控制好event的大小。

 

 


 

原创粉丝点击