Storm OutputCollector并发问题导致NullPointerException的解决

来源:互联网 发布:java 高级程序员 书 编辑:程序博客网 时间:2024/05/16 10:40

最近在搞Storm,用的apache的0.9.1版本,给业务部门使用的时候,报过来一个NPE,堆栈信息如下:

java.lang.RuntimeException: java.lang.NullPointerException    at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:84)    at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:55)    at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:56)    at backtype.storm.disruptor$consume_loop_STAR_$fn__1596.invoke(disruptor.clj:67)    at backtype.storm.util$async_loop$fn__465.invoke(util.clj:377)    at clojure.lang.AFn.run(AFn.java:24)    at java.lang.Thread.run(Thread.java:662)Caused by: java.lang.NullPointerException    at backtype.storm.serialization.KryoTupleSerializer.serialize(KryoTupleSerializer.java:24)    at backtype.storm.daemon.worker$mk_transfer_fn$fn__4126$fn__4130.invoke(worker.clj:99)    at backtype.storm.util$fast_list_map.invoke(util.clj:771)    at backtype.storm.daemon.worker$mk_transfer_fn$fn__4126.invoke(worker.clj:99)    at backtype.storm.daemon.executor$start_batch_transfer__GT_worker_handler_BANG_$fn__3904.invoke(executor.clj:205)    at backtype.storm.disruptor$clojure_handler$reify__1584.onEvent(disruptor.clj:43)    at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:81)    ... 6 more
这是一个deep inside Storm bug,谷歌了一下大体定位是OutputCollector并发使用存在问题,然后仔细看了一下业务同学写的Bolt,的确在Bolt中起了多个自己的线程来emit/ack数据,虽然官方文档宣称Bolt的OutputCollector是thread-safe的:
Its perfectly fine to launch new threads in bolts that do processing asynchronously. OutputCollector is thread-safe and can be called at any time.
(http://storm.incubator.apache.org/documentation/Concepts.html)

解决办法很简单,将需要多线程使用的这个OutputCollector的emit/ack/fail操作进行同步,如下:

synchronized (collector){    collector.ack(tuple);}// andsynchronized (collector){    collector.emit(tuple, new Tuple(..));}


经测试,再也没有抛该NPE。可能官方宣称的thread-safe是指Storm自身多个executor线程的情况吧,用户自己写的多线程处理还是有的问题的,至少0.9.1还是存在。



0 0
原创粉丝点击