Storm Trident中的Spout源码解读

来源:互联网 发布:高斯分布 知乎 编辑:程序博客网 时间:2024/05/07 18:35

    • 一概述
      • 1简介
      • 2关键类
        • 1Spout的创建
        • 2spout的消息流
      • 3spout调用的整体流程
      • 4TSC与TSE
      • 5spout如何被 加载到拓扑中
    • 二Spout的创建
      • 1ItridentSpout
      • 2BatchCoordinator
      • 3Emmitter
      • 4一个示例
    • 三spout实际的消息流
      • 1MasterBatchCoordinator
      • 2TridentSpoutCoordinator
      • 3TridentSpoutExecutor
    • 四在TridentTopologyBuilder中设置Spout
      • 1TridentTopologyBuilder
      • 2TridentTopology

(一)概述

1、简介

trident是storm的更高层次抽象,相对storm,它主要提供了3个方面的好处:
(1)提供了更高层次的抽象,将常用的count,sum等封装成了方法,可以直接调用,不需要自己实现。
(2)以批次代替单个元组,每次处理一个批次的数据。
(3)提供了事务支持,可以保证数据均处理且只处理了一次。

本文介绍了在一个Trident拓扑中,spout是如何被产生并被调用的。首先介绍了用户如何创建一个Spout以及其基本原理,然后介绍了Spout的实际数据流,最后解释了在创建topo时如何设置一个Spout。

2、关键类

MaterBatchCorodeinator —————> ITridentSpout.Coordinator#isReady
|
|
v
TridentSpoutCoordinator —————> ITridentSpout.Coordinator#[initialTransaction, success, close]
|
|
v
TridentSpoutExecutor —————> ITridentSpout.Emitter#(emitBatch, success(),close)

Spout中涉及2组类,第一组类定义了用户如何创建一个Spout,这些用户的代码会被第二组的类调用。第二组类定义了实际的数据流是如何发起并传送的。

(1)Spout的创建

涉及三个类:ItridentSpout, BatchCoordinator, Emitter,其中后面2个是第一个的内部类。
用户创建一个Spout需要实现上述三个接口。比如storm-kafka中的Spout就是实现了这3个接口或者其子接口。

(2)spout的消息流

也是涉及三个类:MasterBatchCoordinator, TridentSpoutCoordinator, TridentSpoutExecutor。它们除了自身固定的逻辑以外,还会调用用户的代码,就是上面介绍的Spout代码。
它们的定义分别为:

MasterBatchCoordinator extends BaseRichSpoutTridentSpoutCoordinator implements IBasicBoltTridentSpoutExecutor implements ITridentBatchBolt

可以看出来,MasterBatchCoordinator才是真正的spout,另外2个都是bolt。
MasterBatchCoordinator会调用用户定义的BatchCoordinator的isReady()方法,返回true的话,则会发送一个id为batchTridentSpoutCoordinatorMBCbatch流后,会调用BatchCoordinator的initialTransaction()初始化一个消息,并继续向外发送 batchTridentSpoutExecutorbatch流后,会调用用户代码中的TridentSpoutExecutor#emitBatch()方法,开始发送实际的业务数据。

3、spout调用的整体流程

(1)MasterBatchCoordinator是Trident中真正的Spout,它可以包含多个TridentSpoutCoordinator的节点。MBC向外发送id为$batch的流,作为整个数据流的起点。

if(!_activeTx.containsKey(curr) && isReady(curr)) {       ..........      _collector.emit(BATCH_STREAM_ID, new Values(attempt), attempt);         ..........                }

(2)当整个消息被成功处理完后,会调用MBC的ack()方法,ack方法会将事务的状态从PROCESSING改为PROCESSED:

if(status.status==AttemptStatus.PROCESSING) {     status.status = AttemptStatus.PROCESSED;}

当然,如果fail掉了,则会调用fail()方法。
当sync()方法接收到事务状态为PROCESSED时,将其改为COMMITTING的状态,并向外发送id为$commit的流。

if(maybeCommit!=null && maybeCommit.status == AttemptStatus.PROCESSED) {            maybeCommit.status = AttemptStatus.COMMITTING;            _collector.emit(COMMIT_STREAM_ID, new Values(maybeCommit.attempt), maybeCommit.attempt);        }

(3)当$commit流处理完后,MBC的ack方法会被再次调用,同时向外发送$success流

else if(status.status==AttemptStatus.COMMITTING) {                //如果当前状态是COMMITTING,则将事务从_activeTx及_attemptIds去掉,并发送$success流。                _activeTx.remove(tx.getTransactionId());                _attemptIds.remove(tx.getTransactionId());                _collector.emit(SUCCESS_STREAM_ID, new Values(tx));                _currTransaction = nextTransactionId(tx.getTransactionId());                for(TransactionalState state: _states) {                    state.setData(CURRENT_TX, _currTransaction);                                    }

4、TSC与TSE

由上面分析可知,MBC依次发送$batch, $commit, $success流。
而TSC只处理$batch, $success 2个流,TSE处理全部三个流。

TSC处理$succss流:

if(tuple.getSourceStreamId().equals(MasterBatchCoordinator.SUCCESS_STREAM_ID)) {        _state.cleanupBefore(attempt.getTransactionId());        _coord.success(attempt.getTransactionId());    }

主要是调用用户在coodinatior中定义 的success方法。

TSE处理$commit, $success流:

if(input.getSourceStreamId().equals(MasterBatchCoordinator.COMMIT_STREAM_ID)) {        if(attempt.equals(_activeBatches.get(attempt.getTransactionId()))) {            ((ICommitterTridentSpout.Emitter) _emitter).commit(attempt);            _activeBatches.remove(attempt.getTransactionId());        } else {             throw new FailedException("Received commit for different transaction attempt");        }    } else if(input.getSourceStreamId().equals(MasterBatchCoordinator.SUCCESS_STREAM_ID)) {        // valid to delete before what's been committed since         // those batches will never be accessed again        _activeBatches.headMap(attempt.getTransactionId()).clear();        _emitter.success(attempt);    }

总结说就是消息是从MasterBatchCoordinator开始的,它是一个真正的spout,而TridentSpoutCoordinator与TridentSpoutExecutor都是bolt,MasterBatchCoordinator发起协调消息,最后的结果是TridentSpoutExecutor发送业务消息。而发送协调消息与业务消息的都是调用用户Spout中BatchCoordinator与Emitter中定义的代码。

可以参考《storm源码分析》P458的流程图

5、spout如何被 加载到拓扑中

(1)在TridentTopologyBuilder的buildTopololg方法中设置了topo的相关信息
(2)在TridentTopology中调用newStream方法,将spout节点加入拓扑。

(二)Spout的创建

1、ItridentSpout

在Trident中用户定义的Spout需要实现ItridentSpout接口。我们先看看ItridentSpout的定义

package storm.trident.spout;import backtype.storm.task.TopologyContext;import storm.trident.topology.TransactionAttempt;import backtype.storm.tuple.Fields;import java.io.Serializable;import java.util.Map;import storm.trident.operation.TridentCollector;public interface ITridentSpout<T> extends Serializable {    public interface BatchCoordinator<X> {        X initializeTransaction(long txid, X prevMetadata, X currMetadata);               void success(long txid);          boolean isReady(long txid)        void close();    }    public interface Emitter<X> {        void emitBatch(TransactionAttempt tx, X coordinatorMeta, TridentCollector collector);        void success(TransactionAttempt tx);        void close();    }    BatchCoordinator<T> getCoordinator(String txStateId, Map conf, TopologyContext context);    Emitter<T> getEmitter(String txStateId, Map conf, TopologyContext context);     Map getComponentConfiguration();    Fields getOutputFields();}

它有2个内部接口,分别是BatchCoordinator和Emitter,分别是用于协调的Spout接口和发送消息的Bolt接口。实现一个Spout的主要工作就在于实现这2个接口,创建实际工作的Coordinator和Emitter。Spout中提供了2个get方法用于分别用于指定使用哪个Coordinator和Emitter类,这些类会由用户定义。稍后我们再分析Coordinator和Emitter的内容。
除此之外,还提供了getComponentConfiguration用于获取配置信息,getOutputFields获取输出field。

我们再看看2个内部接口的代码。

2、BatchCoordinator

public interface BatchCoordinator<X> {     X initializeTransaction(long txid, X prevMetadata, X currMetadata);     void success(long txid);     boolean isReady(long txid);     void close();}

(1)initializeTransaction方法返回一个用户定义的事务元数据。X是用户自定义的与事务相关的数据类型,返回的数据会存储到zk中。
其中txid为事务序列号,prevMetadata是前一个事务所对应的元数据。若当前事务为第一个事务,则其为空。currMetadata是当前事务的元数据,如果是当前事务的第一次尝试,则为空,否则为事务上一次尝试所产生的元数据。
(2)isReady方法用于判断事务所对应的数据是否已经准备好,当为true时,表示可以开始一个新事务。其参数是当前的事务号。
BatchCoordinator中实现的方法会被部署到多个节点中运行,其中isReady是在真正的Spout(MasterBatchCoordinator)中执行的,其余方法在TridentSpoutCoordinator中执行。

3、Emmitter

public interface Emitter<X> {     void emitBatch(TransactionAttempt tx, X coordinatorMeta, TridentCollector collector);     void success(TransactionAttempt tx);     void close();}

消息发送节点会接收协调spout的$batch和$success流。
(1)当收到$batch消息时,节点便调用emitBatch方法来发送消息。
(2)当收到$success消息时,会调用success方法对事务进行后处理

4、一个示例

参考 DiagnosisEventSpout

(1)Spout的代码

package com.packtpub.storm.trident.spout;import backtype.storm.task.TopologyContext;import backtype.storm.tuple.Fields;import storm.trident.spout.ITridentSpout;import java.util.Map;@SuppressWarnings("rawtypes")public class DiagnosisEventSpout implements ITridentSpout<Long> {    private static final long serialVersionUID = 1L;    BatchCoordinator<Long> coordinator = new DefaultCoordinator();    Emitter<Long> emitter = new DiagnosisEventEmitter();    @Override    public BatchCoordinator<Long> getCoordinator(String txStateId, Map conf, TopologyContext context) {        return coordinator;    }    @Override    public Emitter<Long> getEmitter(String txStateId, Map conf, TopologyContext context) {        return emitter;    }    @Override    public Map getComponentConfiguration() {        return null;    }    @Override    public Fields getOutputFields() {        return new Fields("event");    }}

(2)BatchCoordinator的代码

package com.packtpub.storm.trident.spout;import org.slf4j.Logger;import org.slf4j.LoggerFactory;import storm.trident.spout.ITridentSpout.BatchCoordinator;import java.io.Serializable;public class DefaultCoordinator implements BatchCoordinator<Long>, Serializable {    private static final long serialVersionUID = 1L;    private static final Logger LOG = LoggerFactory.getLogger(DefaultCoordinator.class);    @Override    public boolean isReady(long txid) {        return true;    }    @Override    public void close() {    }    @Override    public Long initializeTransaction(long txid, Long prevMetadata, Long currMetadata) {        LOG.info("Initializing Transaction [" + txid + "]");        return null;    }    @Override    public void success(long txid) {        LOG.info("Successful Transaction [" + txid + "]");    }}

(3)Emitter的代码

package com.packtpub.storm.trident.spout;import com.packtpub.storm.trident.model.DiagnosisEvent;import storm.trident.operation.TridentCollector;import storm.trident.spout.ITridentSpout.Emitter;import storm.trident.topology.TransactionAttempt;import java.io.Serializable;import java.util.ArrayList;import java.util.List;import java.util.concurrent.atomic.AtomicInteger;public class DiagnosisEventEmitter implements Emitter<Long>, Serializable {    private static final long serialVersionUID = 1L;    AtomicInteger successfulTransactions = new AtomicInteger(0);    @Override    public void emitBatch(TransactionAttempt tx, Long coordinatorMeta, TridentCollector collector) {        for (int i = 0; i < 10000; i++) {            List<Object> events = new ArrayList<Object>();            double lat = new Double(-30 + (int) (Math.random() * 75));            double lng = new Double(-120 + (int) (Math.random() * 70));            long time = System.currentTimeMillis();            String diag = new Integer(320 + (int) (Math.random() * 7)).toString();            DiagnosisEvent event = new DiagnosisEvent(lat, lng, time, diag);            events.add(event);            collector.emit(events);        }    }    @Override    public void success(TransactionAttempt tx) {        successfulTransactions.incrementAndGet();    }    @Override    public void close() {    }}

(4)最后,在创建topo时指定spout

    TridentTopology topology = new TridentTopology();    DiagnosisEventSpout spout = new DiagnosisEventSpout();    Stream inputStream = topology.newStream("event", spout);

(三)spout实际的消息流

以上的内容说明了如何在用户代码中创建一个Spout,以及其基本原理。但创建Spout后,它是怎么被加载到拓扑真正的Spout中呢?我们继续看trident的实现。

1、MasterBatchCoordinator

总体而言,MasterBatchCoordinator作为一个数据流的真正起点:
* 首先调用open方法完成初始化,包括读取之前的拓扑处理到的事务序列号,最多同时处理的tuple数量,每个事务的尝试次数等。
* 然后nextTuple会改变事务的状态,或者是创建事务并发送$batch流。
* 最后,ack方法会根据流的状态向外发送$commit流,或者是重新调用sync方法,开始创建新的事务。

总而言之,MasterBatchCoordinator作为拓扑数据流的真正起点,通过循环发送协调信息,不断的处理数据流。MasterBatchCoordinator的真正作用在于协调消息的起点,里面所有的map,如_activeTx,_attemptIds等都只是为了保存当前正在处理的情况而已。

(1)MasterBatchCoordinator是一个真正的spout

  public class MasterBatchCoordinator extends BaseRichSpout 

一个Trident拓扑的真正逻辑就是从MasterBatchCoordinator开始的,先调用open方法完成一些初始化,然后是在nextTuple中发送$batch和$commit流。

(2)看一下open方法

   @Override    public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {        _throttler = new WindowedTimeThrottler((Number)conf.get(Config.TOPOLOGY_TRIDENT_BATCH_EMIT_INTERVAL_MILLIS), 1);        for(String spoutId: _managedSpoutIds) {            //每个MasterBatchSpout可以处理多个ITridentSpout,这里将多个spout的元数据放到_states这个Map中。稍后再看看放进来的是什么内容。            _states.add(TransactionalState.newCoordinatorState(conf, spoutId));        }        //从zk中获取当前的transation事务序号,当拓扑新启动时,需要从zk恢复之前的状态。也就是说zk存储的是下一个需要提交的事务序号,而不是已经提交的事务序号。        _currTransaction = getStoredCurrTransaction();        _collector = collector;        //任何时刻中,一个spout task最多可以同时处理的tuple数量,即已经emite,但未acked的tuple数量。        Number active = (Number) conf.get(Config.TOPOLOGY_MAX_SPOUT_PENDING);        if(active==null) {            _maxTransactionActive = 1;        } else {            _maxTransactionActive = active.intValue();        }        //每一个事务的当前尝试编号,即_currTransaction这个事务序号中,各个事务的尝试次数。        _attemptIds = getStoredCurrAttempts(_currTransaction, _maxTransactionActive);        for(int i=0; i<_spouts.size(); i++) {            //将各个Spout的Coordinator保存在_coordinators这个List中。            String txId = _managedSpoutIds.get(i);            _coordinators.add(_spouts.get(i).getCoordinator(txId, conf, context));        }    }

(3)再看一下nextTuple()方法,它只调用了sync()方法,主要完成了以下功能:
* 如果事务状态是PROCESSED,则将其状态改为COMMITTING,然后发送commitcommit流的节点会调用finishBatch方法,进行事务的提交和后处理
* 如果_activeTx.size()小于_maxTransactionActive,则新建事务,放到_activeTx中,同时向外发送$batch流,等待Coordinator的处理。( 当ack方法被 调用时,这个事务会被从_activeTx中移除)
注意:当前处于acitve状态的应该是序列在[_currTransaction,_currTransaction+_maxTransactionActive-1]之间的事务。

    private void sync() {    // note that sometimes the tuples active may be less than max_spout_pending, e.g.    // max_spout_pending = 3    // tx 1, 2, 3 active, tx 2 is acked. there won't be a commit for tx 2 (because tx 1 isn't committed yet),    // and there won't be a batch for tx 4 because there's max_spout_pending tx active    //判断当前事务_currTransaction是否为PROCESSED状态,如果是的话,将其状态改为COMMITTING,然后发送$commit流。接收到$commit流的节点会调用finishBatch方法,进行事务的提交和后处理。    TransactionStatus maybeCommit = _activeTx.get(_currTransaction);    if(maybeCommit!=null && maybeCommit.status == AttemptStatus.PROCESSED) {        maybeCommit.status = AttemptStatus.COMMITTING;        _collector.emit(COMMIT_STREAM_ID, new Values(maybeCommit.attempt), maybeCommit.attempt);    }    //用于产生一个新事务。最多存在_maxTransactionActive个事务同时运行,当前active的事务序号区间处于[_currTransaction,_currTransaction+_maxTransactionActive-1]之间。注意只有在当前    //事务结束之后,系统才会初始化新的事务,所以系统中实际活跃的事务可能少于_maxTransactionActive。    if(_active) {        if(_activeTx.size() < _maxTransactionActive) {            Long curr = _currTransaction;            //创建_maxTransactionActive个事务。            for(int i=0; i<_maxTransactionActive; i++) {                //如果事务序号不存在_activeTx中,则创建新事务,并发送$batch流。当ack被调用时,这个序号会被remove掉,详见ack方法。                if(!_activeTx.containsKey(curr) && isReady(curr)) {                    // by using a monotonically increasing attempt id, downstream tasks                    // can be memory efficient by clearing out state for old attempts                    // as soon as they see a higher attempt id for a transaction                    Integer attemptId = _attemptIds.get(curr);                    if(attemptId==null) {                        attemptId = 0;                    } else {                        attemptId++;                    }                    //_activeTx记录的是事务序号和事务状态的map,而_activeTx则记录事务序号与尝试次数的map。                    _attemptIds.put(curr, attemptId);                    for(TransactionalState state: _states) {                        state.setData(CURRENT_ATTEMPTS, _attemptIds);                    }                    //TransactionAttempt包含事务序号和尝试编号2个变量,对应于一个具体的事务。                    TransactionAttempt attempt = new TransactionAttempt(curr, attemptId);                    _activeTx.put(curr, new TransactionStatus(attempt));                    _collector.emit(BATCH_STREAM_ID, new Values(attempt), attempt);                    _throttler.markEvent();                }                //如果事务序号已经存在_activeTx中,则curr递增,然后再循环检查下一个。                curr = nextTransactionId(curr);            }        }    }}

完整代码见最后。

(4)继续往下,看看ack方法。

@Overridepublic void ack(Object msgId) {    //获取某个事务的状态    TransactionAttempt tx = (TransactionAttempt) msgId;    TransactionStatus status = _activeTx.get(tx.getTransactionId());    if(status!=null && tx.equals(status.attempt)) {        //如果当前状态是PROCESSING,则改为PROCESSED        if(status.status==AttemptStatus.PROCESSING) {            status.status = AttemptStatus.PROCESSED;        } else if(status.status==AttemptStatus.COMMITTING) {            //如果当前状态是COMMITTING,则将事务从_activeTx及_attemptIds去掉,并发送$success流。            _activeTx.remove(tx.getTransactionId());            _attemptIds.remove(tx.getTransactionId());            _collector.emit(SUCCESS_STREAM_ID, new Values(tx));            _currTransaction = nextTransactionId(tx.getTransactionId());            for(TransactionalState state: _states) {                state.setData(CURRENT_TX, _currTransaction);                                }        }        //由于有些事务状态已经改变,需要重新调用sync()继续后续处理,或者发送新tuple。        sync();    }}

(5)还有fail方法和declareOutputFileds方法。

@Overridepublic void fail(Object msgId) {    TransactionAttempt tx = (TransactionAttempt) msgId;    TransactionStatus stored = _activeTx.remove(tx.getTransactionId());    if(stored!=null && tx.equals(stored.attempt)) {        _activeTx.tailMap(tx.getTransactionId()).clear();        sync();    }}@Overridepublic void declareOutputFields(OutputFieldsDeclarer declarer) {    // in partitioned example, in case an emitter task receives a later transaction than it's emitted so far,    // when it sees the earlier txid it should know to emit nothing    declarer.declareStream(BATCH_STREAM_ID, new Fields("tx"));    declarer.declareStream(COMMIT_STREAM_ID, new Fields("tx"));    declarer.declareStream(SUCCESS_STREAM_ID, new Fields("tx"));}

2、TridentSpoutCoordinator

TridentSpoutCoordinator接收来自MasterBatchCoordinator的$success流与$batch流,并通过调用用户代码,实现真正的逻辑。此外还向TridentSpoutExecuter发送$batch流,以触发后者开始真正发送业务数据流。

(1)TridentSpoutCoordinator是一个bolt

 public class TridentSpoutCoordinator implements IBasicBolt

(2)在创建TridentSpoutCoordinator时,需要传递一个ITridentSpout对象,

 public TridentSpoutCoordinator(String id, ITridentSpout spout) {        _spout = spout;        _id = id;    }

然后使用这个对象来获取到用户定义的Coordinator:

_coord = _spout.getCoordinator(_id, conf, context);

(3)_state和_underlyingState保存了zk中的元数据信息

_underlyingState = TransactionalState.newCoordinatorState(conf, _id);_state = new RotatingTransactionalState(_underlyingState, META_DIR);

(4)在execute方法中,TridentSpoutCoordinator接收$success流与$batch流,先看看$success流:

if(tuple.getSourceStreamId().equals(MasterBatchCoordinator.SUCCESS_STREAM_ID)) {_state.cleanupBefore(attempt.getTransactionId());_coord.success(attempt.getTransactionId());}

即接收到$success流时,调用用户定义的Coordinator中的success方法。同时还清理了zk中的数据。
(5)再看看$batch流

else {        long txid = attempt.getTransactionId();        Object prevMeta = _state.getPreviousState(txid);        Object meta = _coord.initializeTransaction(txid, prevMeta, _state.getState(txid));        _state.overrideState(txid, meta);        collector.emit(MasterBatchCoordinator.BATCH_STREAM_ID, new Values(attempt, meta));    }

当收到$batch流流时,初始化一个事务并将其发送出去。由于在trident中消息有可能是重放的,因此需要prevMeta。注意,trident是在bolt中初始化一个事务的。

3、TridentSpoutExecutor

TridentSpoutExecutor接收来自TridentSpoutCoordinator的消息流,包括$commit,$success与$batch流,前面2个分别调用emmitter的commit与success方法,$batch则调用emmitter的emitBatch方法,开始向外发送业务数据。

(1) TridentSpoutExecutor与是一个bolt

 publicclassTridentSpoutExecutorimplementsITridentBatchBolt

(2)核心的execute方法

@Overridepublic void execute(BatchInfo info, Tuple input) {    // there won't be a BatchInfo for the success stream    TransactionAttempt attempt = (TransactionAttempt) input.getValue(0);    if(input.getSourceStreamId().equals(MasterBatchCoordinator.COMMIT_STREAM_ID)) {        if(attempt.equals(_activeBatches.get(attempt.getTransactionId()))) {            ((ICommitterTridentSpout.Emitter) _emitter).commit(attempt);            _activeBatches.remove(attempt.getTransactionId());        } else {             throw new FailedException("Received commit for different transaction attempt");        }    } else if(input.getSourceStreamId().equals(MasterBatchCoordinator.SUCCESS_STREAM_ID)) {        // valid to delete before what's been committed since         // those batches will never be accessed again        _activeBatches.headMap(attempt.getTransactionId()).clear();        _emitter.success(attempt);    } else {                    _collector.setBatch(info.batchId);        //发送业务消息        _emitter.emitBatch(attempt, input.getValue(1), _collector);        _activeBatches.put(attempt.getTransactionId(), attempt);    }}

(四)在TridentTopologyBuilder中设置Spout

通过上面的分析,一个Spout是准备好了,但如何将它加载到拓扑中,并开始真正的数据流:
(1)在TridentTopologyBuilder的buildTopololg方法中设置了topo的相关信息
(2)在TridentTopology中调用newStream方法,将spout节点加入拓扑。

1、TridentTopologyBuilder

在TridentTopologyBuilder中的buildTopology的前半部分中,设置了Spout的相关信息。后半部分设置了bolt的信息。这里我们只看spout相关的内容:

TopologyBuilder builder = new TopologyBuilder();        Map<GlobalStreamId, String> batchIdsForSpouts = fleshOutStreamBatchIds(false);        Map<GlobalStreamId, String> batchIdsForBolts = fleshOutStreamBatchIds(true);        Map<String, List<String>> batchesToCommitIds = new HashMap<String, List<String>>();        Map<String, List<ITridentSpout>> batchesToSpouts = new HashMap<String, List<ITridentSpout>>();        for(String id: _spouts.keySet()) {            TransactionalSpoutComponent c = _spouts.get(id);            if(c.spout instanceof IRichSpout) {                //TODO: wrap this to set the stream name                builder.setSpout(id, (IRichSpout) c.spout, c.parallelism);            } else {                String batchGroup = c.batchGroupId;                if(!batchesToCommitIds.containsKey(batchGroup)) {                    batchesToCommitIds.put(batchGroup, new ArrayList<String>());                }                batchesToCommitIds.get(batchGroup).add(c.commitStateId);                if(!batchesToSpouts.containsKey(batchGroup)) {                    batchesToSpouts.put(batchGroup, new ArrayList<ITridentSpout>());                }                batchesToSpouts.get(batchGroup).add((ITridentSpout) c.spout);                BoltDeclarer scd =                      builder.setBolt(spoutCoordinator(id), new TridentSpoutCoordinator(c.commitStateId, (ITridentSpout) c.spout))                        .globalGrouping(masterCoordinator(c.batchGroupId), MasterBatchCoordinator.BATCH_STREAM_ID)                        .globalGrouping(masterCoordinator(c.batchGroupId), MasterBatchCoordinator.SUCCESS_STREAM_ID);                for(Map m: c.componentConfs) {                    scd.addConfigurations(m);                }                Map<String, TridentBoltExecutor.CoordSpec> specs = new HashMap();                specs.put(c.batchGroupId, new CoordSpec());                BoltDeclarer bd = builder.setBolt(id,                        new TridentBoltExecutor(                          new TridentSpoutExecutor(                            c.commitStateId,                            c.streamName,                            ((ITridentSpout) c.spout)),                            batchIdsForSpouts,                            specs),                        c.parallelism);                bd.allGrouping(spoutCoordinator(id), MasterBatchCoordinator.BATCH_STREAM_ID);                bd.allGrouping(masterCoordinator(batchGroup), MasterBatchCoordinator.SUCCESS_STREAM_ID);                if(c.spout instanceof ICommitterTridentSpout) {                    bd.allGrouping(masterCoordinator(batchGroup), MasterBatchCoordinator.COMMIT_STREAM_ID);                }                for(Map m: c.componentConfs) {                    bd.addConfigurations(m);                }            }        }        for(String id: _batchPerTupleSpouts.keySet()) {            SpoutComponent c = _batchPerTupleSpouts.get(id);            SpoutDeclarer d = builder.setSpout(id, new RichSpoutBatchTriggerer((IRichSpout) c.spout, c.streamName, c.batchGroupId), c.parallelism);            for(Map conf: c.componentConfs) {                d.addConfigurations(conf);            }        }        for(String batch: batchesToCommitIds.keySet()) {            List<String> commitIds = batchesToCommitIds.get(batch);            builder.setSpout(masterCoordinator(batch), new MasterBatchCoordinator(commitIds, batchesToSpouts.get(batch)));        }

2、TridentTopology

创建一个spout节点,并将之add到拓扑中。

public Stream newStream(String txId, ITridentSpout spout) {    Node n = new SpoutNode(getUniqueStreamId(), spout.getOutputFields(), txId, spout, SpoutNode.SpoutType.BATCH);    return addNode(n);}
0 0