Storm Trident中的Spout源码解读

来源：互联网发布：高斯分布知乎编辑：程序博客网时间：2024/05/07 18:35

- 一概述
  - 1简介
  - 2关键类
    - 1Spout的创建
    - 2spout的消息流
  - 3spout调用的整体流程
  - 4TSC与TSE
  - 5spout如何被加载到拓扑中
- 二Spout的创建
  - 1ItridentSpout
  - 2BatchCoordinator
  - 3Emmitter
  - 4一个示例
- 三spout实际的消息流
  - 1MasterBatchCoordinator
  - 2TridentSpoutCoordinator
  - 3TridentSpoutExecutor
- 四在TridentTopologyBuilder中设置Spout
  - 1TridentTopologyBuilder
  - 2TridentTopology

（一）概述

1、简介

trident是storm的更高层次抽象，相对storm，它主要提供了3个方面的好处：
（1）提供了更高层次的抽象，将常用的count,sum等封装成了方法，可以直接调用，不需要自己实现。
（2）以批次代替单个元组，每次处理一个批次的数据。
（3）提供了事务支持，可以保证数据均处理且只处理了一次。

本文介绍了在一个Trident拓扑中，spout是如何被产生并被调用的。首先介绍了用户如何创建一个Spout以及其基本原理，然后介绍了Spout的实际数据流，最后解释了在创建topo时如何设置一个Spout。

2、关键类

MaterBatchCorodeinator —————> ITridentSpout.Coordinator#isReady
|
|
v
TridentSpoutCoordinator —————> ITridentSpout.Coordinator#[initialTransaction, success, close]
|
|
v
TridentSpoutExecutor —————> ITridentSpout.Emitter#(emitBatch, success(),close)

Spout中涉及2组类，第一组类定义了用户如何创建一个Spout，这些用户的代码会被第二组的类调用。第二组类定义了实际的数据流是如何发起并传送的。

（1）Spout的创建

涉及三个类：ItridentSpout, BatchCoordinator, Emitter，其中后面2个是第一个的内部类。
用户创建一个Spout需要实现上述三个接口。比如storm-kafka中的Spout就是实现了这3个接口或者其子接口。

（2）spout的消息流

也是涉及三个类：MasterBatchCoordinator, TridentSpoutCoordinator, TridentSpoutExecutor。它们除了自身固定的逻辑以外，还会调用用户的代码，就是上面介绍的Spout代码。
它们的定义分别为：

MasterBatchCoordinator extends BaseRichSpoutTridentSpoutCoordinator implements IBasicBoltTridentSpoutExecutor implements ITridentBatchBolt

可以看出来，MasterBatchCoordinator才是真正的spout，另外2个都是bolt。
MasterBatchCoordinator会调用用户定义的BatchCoordinator的isReady()方法，返回true的话，则会发送一个id为batch的消息流，从而开始一个数据流转。TridentSpoutCoordinator接到MBC的batch流后，会调用BatchCoordinator的initialTransaction()初始化一个消息，并继续向外发送 batch流。TridentSpoutExecutor接到batch流后，会调用用户代码中的TridentSpoutExecutor#emitBatch()方法，开始发送实际的业务数据。

3、spout调用的整体流程

（1）MasterBatchCoordinator是Trident中真正的Spout，它可以包含多个TridentSpoutCoordinator的节点。MBC向外发送id为$batch的流，作为整个数据流的起点。

if(!_activeTx.containsKey(curr) && isReady(curr)) {       ..........      _collector.emit(BATCH_STREAM_ID, new Values(attempt), attempt);         ..........                }

（2）当整个消息被成功处理完后，会调用MBC的ack()方法，ack方法会将事务的状态从PROCESSING改为PROCESSED：

if(status.status==AttemptStatus.PROCESSING) {     status.status = AttemptStatus.PROCESSED;}

当然，如果fail掉了，则会调用fail()方法。
当sync()方法接收到事务状态为PROCESSED时，将其改为COMMITTING的状态，并向外发送id为$commit的流。

if(maybeCommit!=null && maybeCommit.status == AttemptStatus.PROCESSED) {            maybeCommit.status = AttemptStatus.COMMITTING;            _collector.emit(COMMIT_STREAM_ID, new Values(maybeCommit.attempt), maybeCommit.attempt);        }

（3）当$commit流处理完后，MBC的ack方法会被再次调用，同时向外发送$success流

else if(status.status==AttemptStatus.COMMITTING) {                //如果当前状态是COMMITTING，则将事务从_activeTx及_attemptIds去掉，并发送$success流。                _activeTx.remove(tx.getTransactionId());                _attemptIds.remove(tx.getTransactionId());                _collector.emit(SUCCESS_STREAM_ID, new Values(tx));                _currTransaction = nextTransactionId(tx.getTransactionId());                for(TransactionalState state: _states) {                    state.setData(CURRENT_TX, _currTransaction);                                    }

4、TSC与TSE

由上面分析可知，MBC依次发送$batch, $commit, $success流。
而TSC只处理$batch, $success 2个流，TSE处理全部三个流。

TSC处理$succss流：

if(tuple.getSourceStreamId().equals(MasterBatchCoordinator.SUCCESS_STREAM_ID)) {        _state.cleanupBefore(attempt.getTransactionId());        _coord.success(attempt.getTransactionId());    }

主要是调用用户在coodinatior中定义的success方法。

TSE处理$commit, $success流：

if(input.getSourceStreamId().equals(MasterBatchCoordinator.COMMIT_STREAM_ID)) {        if(attempt.equals(_activeBatches.get(attempt.getTransactionId()))) {            ((ICommitterTridentSpout.Emitter) _emitter).commit(attempt);            _activeBatches.remove(attempt.getTransactionId());        } else {             throw new FailedException("Received commit for different transaction attempt");        }    } else if(input.getSourceStreamId().equals(MasterBatchCoordinator.SUCCESS_STREAM_ID)) {        // valid to delete before what's been committed since         // those batches will never be accessed again        _activeBatches.headMap(attempt.getTransactionId()).clear();        _emitter.success(attempt);    }

总结说就是消息是从MasterBatchCoordinator开始的，它是一个真正的spout，而TridentSpoutCoordinator与TridentSpoutExecutor都是bolt，MasterBatchCoordinator发起协调消息，最后的结果是TridentSpoutExecutor发送业务消息。而发送协调消息与业务消息的都是调用用户Spout中BatchCoordinator与Emitter中定义的代码。

可以参考《storm源码分析》P458的流程图

5、spout如何被加载到拓扑中

（1）在TridentTopologyBuilder的buildTopololg方法中设置了topo的相关信息
（2）在TridentTopology中调用newStream方法，将spout节点加入拓扑。

（二）Spout的创建

1、ItridentSpout

在Trident中用户定义的Spout需要实现ItridentSpout接口。我们先看看ItridentSpout的定义

package storm.trident.spout;import backtype.storm.task.TopologyContext;import storm.trident.topology.TransactionAttempt;import backtype.storm.tuple.Fields;import java.io.Serializable;import java.util.Map;import storm.trident.operation.TridentCollector;public interface ITridentSpout<T> extends Serializable {    public interface BatchCoordinator<X> {        X initializeTransaction(long txid, X prevMetadata, X currMetadata);               void success(long txid);          boolean isReady(long txid)        void close();    }    public interface Emitter<X> {        void emitBatch(TransactionAttempt tx, X coordinatorMeta, TridentCollector collector);        void success(TransactionAttempt tx);        void close();    }    BatchCoordinator<T> getCoordinator(String txStateId, Map conf, TopologyContext context);    Emitter<T> getEmitter(String txStateId, Map conf, TopologyContext context);     Map getComponentConfiguration();    Fields getOutputFields();}

它有2个内部接口，分别是BatchCoordinator和Emitter，分别是用于协调的Spout接口和发送消息的Bolt接口。实现一个Spout的主要工作就在于实现这2个接口，创建实际工作的Coordinator和Emitter。Spout中提供了2个get方法用于分别用于指定使用哪个Coordinator和Emitter类，这些类会由用户定义。稍后我们再分析Coordinator和Emitter的内容。
除此之外，还提供了getComponentConfiguration用于获取配置信息，getOutputFields获取输出field。

我们再看看2个内部接口的代码。

2、BatchCoordinator

public interface BatchCoordinator<X> {     X initializeTransaction(long txid, X prevMetadata, X currMetadata);     void success(long txid);     boolean isReady(long txid);     void close();}

（1）initializeTransaction方法返回一个用户定义的事务元数据。X是用户自定义的与事务相关的数据类型，返回的数据会存储到zk中。
其中txid为事务序列号，prevMetadata是前一个事务所对应的元数据。若当前事务为第一个事务，则其为空。currMetadata是当前事务的元数据，如果是当前事务的第一次尝试，则为空，否则为事务上一次尝试所产生的元数据。
（2）isReady方法用于判断事务所对应的数据是否已经准备好，当为true时，表示可以开始一个新事务。其参数是当前的事务号。
BatchCoordinator中实现的方法会被部署到多个节点中运行，其中isReady是在真正的Spout(MasterBatchCoordinator)中执行的，其余方法在TridentSpoutCoordinator中执行。

3、Emmitter

public interface Emitter<X> {     void emitBatch(TransactionAttempt tx, X coordinatorMeta, TridentCollector collector);     void success(TransactionAttempt tx);     void close();}

消息发送节点会接收协调spout的$batch和$success流。
（1）当收到$batch消息时，节点便调用emitBatch方法来发送消息。
（2）当收到$success消息时，会调用success方法对事务进行后处理

4、一个示例

参考 DiagnosisEventSpout

(1)Spout的代码

package com.packtpub.storm.trident.spout;import backtype.storm.task.TopologyContext;import backtype.storm.tuple.Fields;import storm.trident.spout.ITridentSpout;import java.util.Map;@SuppressWarnings("rawtypes")public class DiagnosisEventSpout implements ITridentSpout<Long> {    private static final long serialVersionUID = 1L;    BatchCoordinator<Long> coordinator = new DefaultCoordinator();    Emitter<Long> emitter = new DiagnosisEventEmitter();    @Override    public BatchCoordinator<Long> getCoordinator(String txStateId, Map conf, TopologyContext context) {        return coordinator;    }    @Override    public Emitter<Long> getEmitter(String txStateId, Map conf, TopologyContext context) {        return emitter;    }    @Override    public Map getComponentConfiguration() {        return null;    }    @Override    public Fields getOutputFields() {        return new Fields("event");    }}

（2）BatchCoordinator的代码

package com.packtpub.storm.trident.spout;import org.slf4j.Logger;import org.slf4j.LoggerFactory;import storm.trident.spout.ITridentSpout.BatchCoordinator;import java.io.Serializable;public class DefaultCoordinator implements BatchCoordinator<Long>, Serializable {    private static final long serialVersionUID = 1L;    private static final Logger LOG = LoggerFactory.getLogger(DefaultCoordinator.class);    @Override    public boolean isReady(long txid) {        return true;    }    @Override    public void close() {    }    @Override    public Long initializeTransaction(long txid, Long prevMetadata, Long currMetadata) {        LOG.info("Initializing Transaction [" + txid + "]");        return null;    }    @Override    public void success(long txid) {        LOG.info("Successful Transaction [" + txid + "]");    }}

（3）Emitter的代码

package com.packtpub.storm.trident.spout;import com.packtpub.storm.trident.model.DiagnosisEvent;import storm.trident.operation.TridentCollector;import storm.trident.spout.ITridentSpout.Emitter;import storm.trident.topology.TransactionAttempt;import java.io.Serializable;import java.util.ArrayList;import java.util.List;import java.util.concurrent.atomic.AtomicInteger;public class DiagnosisEventEmitter implements Emitter<Long>, Serializable {    private static final long serialVersionUID = 1L;    AtomicInteger successfulTransactions = new AtomicInteger(0);    @Override    public void emitBatch(TransactionAttempt tx, Long coordinatorMeta, TridentCollector collector) {        for (int i = 0; i < 10000; i++) {            List<Object> events = new ArrayList<Object>();            double lat = new Double(-30 + (int) (Math.random() * 75));            double lng = new Double(-120 + (int) (Math.random() * 70));            long time = System.currentTimeMillis();            String diag = new Integer(320 + (int) (Math.random() * 7)).toString();            DiagnosisEvent event = new DiagnosisEvent(lat, lng, time, diag);            events.add(event);            collector.emit(events);        }    }    @Override    public void success(TransactionAttempt tx) {        successfulTransactions.incrementAndGet();    }    @Override    public void close() {    }}

（4）最后，在创建topo时指定spout

    TridentTopology topology = new TridentTopology();    DiagnosisEventSpout spout = new DiagnosisEventSpout();    Stream inputStream = topology.newStream("event", spout);

（三）spout实际的消息流

以上的内容说明了如何在用户代码中创建一个Spout，以及其基本原理。但创建Spout后，它是怎么被加载到拓扑真正的Spout中呢？我们继续看trident的实现。

1、MasterBatchCoordinator

总体而言，MasterBatchCoordinator作为一个数据流的真正起点：
* 首先调用open方法完成初始化，包括读取之前的拓扑处理到的事务序列号，最多同时处理的tuple数量，每个事务的尝试次数等。
* 然后nextTuple会改变事务的状态，或者是创建事务并发送$batch流。
* 最后，ack方法会根据流的状态向外发送$commit流，或者是重新调用sync方法，开始创建新的事务。

总而言之，MasterBatchCoordinator作为拓扑数据流的真正起点，通过循环发送协调信息，不断的处理数据流。MasterBatchCoordinator的真正作用在于协调消息的起点，里面所有的map，如_activeTx，_attemptIds等都只是为了保存当前正在处理的情况而已。

（1）MasterBatchCoordinator是一个真正的spout

  public class MasterBatchCoordinator extends BaseRichSpout

一个Trident拓扑的真正逻辑就是从MasterBatchCoordinator开始的，先调用open方法完成一些初始化，然后是在nextTuple中发送$batch和$commit流。

（2）看一下open方法

   @Override    public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {        _throttler = new WindowedTimeThrottler((Number)conf.get(Config.TOPOLOGY_TRIDENT_BATCH_EMIT_INTERVAL_MILLIS), 1);        for(String spoutId: _managedSpoutIds) {            //每个MasterBatchSpout可以处理多个ITridentSpout,这里将多个spout的元数据放到_states这个Map中。稍后再看看放进来的是什么内容。            _states.add(TransactionalState.newCoordinatorState(conf, spoutId));        }        //从zk中获取当前的transation事务序号，当拓扑新启动时，需要从zk恢复之前的状态。也就是说zk存储的是下一个需要提交的事务序号，而不是已经提交的事务序号。        _currTransaction = getStoredCurrTransaction();        _collector = collector;        //任何时刻中，一个spout task最多可以同时处理的tuple数量，即已经emite,但未acked的tuple数量。        Number active = (Number) conf.get(Config.TOPOLOGY_MAX_SPOUT_PENDING);        if(active==null) {            _maxTransactionActive = 1;        } else {            _maxTransactionActive = active.intValue();        }        //每一个事务的当前尝试编号，即_currTransaction这个事务序号中，各个事务的尝试次数。        _attemptIds = getStoredCurrAttempts(_currTransaction, _maxTransactionActive);        for(int i=0; i<_spouts.size(); i++) {            //将各个Spout的Coordinator保存在_coordinators这个List中。            String txId = _managedSpoutIds.get(i);            _coordinators.add(_spouts.get(i).getCoordinator(txId, conf, context));        }    }

（3）再看一下nextTuple()方法，它只调用了sync()方法，主要完成了以下功能：
* 如果事务状态是PROCESSED，则将其状态改为COMMITTING，然后发送commit流。接收到commit流的节点会调用finishBatch方法，进行事务的提交和后处理
* 如果_activeTx.size()小于_maxTransactionActive，则新建事务，放到_activeTx中，同时向外发送$batch流，等待Coordinator的处理。（当ack方法被调用时，这个事务会被从_activeTx中移除）
注意：当前处于acitve状态的应该是序列在[_currTransaction,_currTransaction+_maxTransactionActive-1]之间的事务。

    private void sync() {    // note that sometimes the tuples active may be less than max_spout_pending, e.g.    // max_spout_pending = 3    // tx 1, 2, 3 active, tx 2 is acked. there won't be a commit for tx 2 (because tx 1 isn't committed yet),    // and there won't be a batch for tx 4 because there's max_spout_pending tx active    //判断当前事务_currTransaction是否为PROCESSED状态，如果是的话，将其状态改为COMMITTING，然后发送$commit流。接收到$commit流的节点会调用finishBatch方法，进行事务的提交和后处理。    TransactionStatus maybeCommit = _activeTx.get(_currTransaction);    if(maybeCommit!=null && maybeCommit.status == AttemptStatus.PROCESSED) {        maybeCommit.status = AttemptStatus.COMMITTING;        _collector.emit(COMMIT_STREAM_ID, new Values(maybeCommit.attempt), maybeCommit.attempt);    }    //用于产生一个新事务。最多存在_maxTransactionActive个事务同时运行，当前active的事务序号区间处于[_currTransaction,_currTransaction+_maxTransactionActive-1]之间。注意只有在当前    //事务结束之后，系统才会初始化新的事务，所以系统中实际活跃的事务可能少于_maxTransactionActive。    if(_active) {        if(_activeTx.size() < _maxTransactionActive) {            Long curr = _currTransaction;            //创建_maxTransactionActive个事务。            for(int i=0; i<_maxTransactionActive; i++) {                //如果事务序号不存在_activeTx中，则创建新事务，并发送$batch流。当ack被调用时，这个序号会被remove掉，详见ack方法。                if(!_activeTx.containsKey(curr) && isReady(curr)) {                    // by using a monotonically increasing attempt id, downstream tasks                    // can be memory efficient by clearing out state for old attempts                    // as soon as they see a higher attempt id for a transaction                    Integer attemptId = _attemptIds.get(curr);                    if(attemptId==null) {                        attemptId = 0;                    } else {                        attemptId++;                    }                    //_activeTx记录的是事务序号和事务状态的map，而_activeTx则记录事务序号与尝试次数的map。                    _attemptIds.put(curr, attemptId);                    for(TransactionalState state: _states) {                        state.setData(CURRENT_ATTEMPTS, _attemptIds);                    }                    //TransactionAttempt包含事务序号和尝试编号2个变量，对应于一个具体的事务。                    TransactionAttempt attempt = new TransactionAttempt(curr, attemptId);                    _activeTx.put(curr, new TransactionStatus(attempt));                    _collector.emit(BATCH_STREAM_ID, new Values(attempt), attempt);                    _throttler.markEvent();                }                //如果事务序号已经存在_activeTx中，则curr递增，然后再循环检查下一个。                curr = nextTransactionId(curr);            }        }    }}

完整代码见最后。

（4）继续往下，看看ack方法。

@Overridepublic void ack(Object msgId) {    //获取某个事务的状态    TransactionAttempt tx = (TransactionAttempt) msgId;    TransactionStatus status = _activeTx.get(tx.getTransactionId());    if(status!=null && tx.equals(status.attempt)) {        //如果当前状态是PROCESSING，则改为PROCESSED        if(status.status==AttemptStatus.PROCESSING) {            status.status = AttemptStatus.PROCESSED;        } else if(status.status==AttemptStatus.COMMITTING) {            //如果当前状态是COMMITTING，则将事务从_activeTx及_attemptIds去掉，并发送$success流。            _activeTx.remove(tx.getTransactionId());            _attemptIds.remove(tx.getTransactionId());            _collector.emit(SUCCESS_STREAM_ID, new Values(tx));            _currTransaction = nextTransactionId(tx.getTransactionId());            for(TransactionalState state: _states) {                state.setData(CURRENT_TX, _currTransaction);                                }        }        //由于有些事务状态已经改变，需要重新调用sync()继续后续处理，或者发送新tuple。        sync();    }}

（5）还有fail方法和declareOutputFileds方法。

@Overridepublic void fail(Object msgId) {    TransactionAttempt tx = (TransactionAttempt) msgId;    TransactionStatus stored = _activeTx.remove(tx.getTransactionId());    if(stored!=null && tx.equals(stored.attempt)) {        _activeTx.tailMap(tx.getTransactionId()).clear();        sync();    }}@Overridepublic void declareOutputFields(OutputFieldsDeclarer declarer) {    // in partitioned example, in case an emitter task receives a later transaction than it's emitted so far,    // when it sees the earlier txid it should know to emit nothing    declarer.declareStream(BATCH_STREAM_ID, new Fields("tx"));    declarer.declareStream(COMMIT_STREAM_ID, new Fields("tx"));    declarer.declareStream(SUCCESS_STREAM_ID, new Fields("tx"));}

2、TridentSpoutCoordinator

TridentSpoutCoordinator接收来自MasterBatchCoordinator的$success流与$batch流，并通过调用用户代码，实现真正的逻辑。此外还向TridentSpoutExecuter发送$batch流，以触发后者开始真正发送业务数据流。

（1）TridentSpoutCoordinator是一个bolt

 public class TridentSpoutCoordinator implements IBasicBolt

（2）在创建TridentSpoutCoordinator时，需要传递一个ITridentSpout对象，

 public TridentSpoutCoordinator(String id, ITridentSpout spout) {        _spout = spout;        _id = id;    }

然后使用这个对象来获取到用户定义的Coordinator:

_coord = _spout.getCoordinator(_id, conf, context);

（3）_state和_underlyingState保存了zk中的元数据信息

_underlyingState = TransactionalState.newCoordinatorState(conf, _id);_state = new RotatingTransactionalState(_underlyingState, META_DIR);

（4）在execute方法中，TridentSpoutCoordinator接收$success流与$batch流，先看看$success流：

if(tuple.getSourceStreamId().equals(MasterBatchCoordinator.SUCCESS_STREAM_ID)) {_state.cleanupBefore(attempt.getTransactionId());_coord.success(attempt.getTransactionId());}

即接收到$success流时，调用用户定义的Coordinator中的success方法。同时还清理了zk中的数据。
（5）再看看$batch流

else {        long txid = attempt.getTransactionId();        Object prevMeta = _state.getPreviousState(txid);        Object meta = _coord.initializeTransaction(txid, prevMeta, _state.getState(txid));        _state.overrideState(txid, meta);        collector.emit(MasterBatchCoordinator.BATCH_STREAM_ID, new Values(attempt, meta));    }

当收到$batch流流时，初始化一个事务并将其发送出去。由于在trident中消息有可能是重放的，因此需要prevMeta。注意，trident是在bolt中初始化一个事务的。

3、TridentSpoutExecutor

TridentSpoutExecutor接收来自TridentSpoutCoordinator的消息流，包括$commit,$success与$batch流，前面2个分别调用emmitter的commit与success方法，$batch则调用emmitter的emitBatch方法，开始向外发送业务数据。

（1） TridentSpoutExecutor与是一个bolt

 publicclassTridentSpoutExecutorimplementsITridentBatchBolt

（2）核心的execute方法

@Overridepublic void execute(BatchInfo info, Tuple input) {    // there won't be a BatchInfo for the success stream    TransactionAttempt attempt = (TransactionAttempt) input.getValue(0);    if(input.getSourceStreamId().equals(MasterBatchCoordinator.COMMIT_STREAM_ID)) {        if(attempt.equals(_activeBatches.get(attempt.getTransactionId()))) {            ((ICommitterTridentSpout.Emitter) _emitter).commit(attempt);            _activeBatches.remove(attempt.getTransactionId());        } else {             throw new FailedException("Received commit for different transaction attempt");        }    } else if(input.getSourceStreamId().equals(MasterBatchCoordinator.SUCCESS_STREAM_ID)) {        // valid to delete before what's been committed since         // those batches will never be accessed again        _activeBatches.headMap(attempt.getTransactionId()).clear();        _emitter.success(attempt);    } else {                    _collector.setBatch(info.batchId);        //发送业务消息        _emitter.emitBatch(attempt, input.getValue(1), _collector);        _activeBatches.put(attempt.getTransactionId(), attempt);    }}

（四）在TridentTopologyBuilder中设置Spout

通过上面的分析，一个Spout是准备好了，但如何将它加载到拓扑中，并开始真正的数据流：
（1）在TridentTopologyBuilder的buildTopololg方法中设置了topo的相关信息
（2）在TridentTopology中调用newStream方法，将spout节点加入拓扑。

1、TridentTopologyBuilder

在TridentTopologyBuilder中的buildTopology的前半部分中，设置了Spout的相关信息。后半部分设置了bolt的信息。这里我们只看spout相关的内容：

TopologyBuilder builder = new TopologyBuilder();        Map<GlobalStreamId, String> batchIdsForSpouts = fleshOutStreamBatchIds(false);        Map<GlobalStreamId, String> batchIdsForBolts = fleshOutStreamBatchIds(true);        Map<String, List<String>> batchesToCommitIds = new HashMap<String, List<String>>();        Map<String, List<ITridentSpout>> batchesToSpouts = new HashMap<String, List<ITridentSpout>>();        for(String id: _spouts.keySet()) {            TransactionalSpoutComponent c = _spouts.get(id);            if(c.spout instanceof IRichSpout) {                //TODO: wrap this to set the stream name                builder.setSpout(id, (IRichSpout) c.spout, c.parallelism);            } else {                String batchGroup = c.batchGroupId;                if(!batchesToCommitIds.containsKey(batchGroup)) {                    batchesToCommitIds.put(batchGroup, new ArrayList<String>());                }                batchesToCommitIds.get(batchGroup).add(c.commitStateId);                if(!batchesToSpouts.containsKey(batchGroup)) {                    batchesToSpouts.put(batchGroup, new ArrayList<ITridentSpout>());                }                batchesToSpouts.get(batchGroup).add((ITridentSpout) c.spout);                BoltDeclarer scd =                      builder.setBolt(spoutCoordinator(id), new TridentSpoutCoordinator(c.commitStateId, (ITridentSpout) c.spout))                        .globalGrouping(masterCoordinator(c.batchGroupId), MasterBatchCoordinator.BATCH_STREAM_ID)                        .globalGrouping(masterCoordinator(c.batchGroupId), MasterBatchCoordinator.SUCCESS_STREAM_ID);                for(Map m: c.componentConfs) {                    scd.addConfigurations(m);                }                Map<String, TridentBoltExecutor.CoordSpec> specs = new HashMap();                specs.put(c.batchGroupId, new CoordSpec());                BoltDeclarer bd = builder.setBolt(id,                        new TridentBoltExecutor(                          new TridentSpoutExecutor(                            c.commitStateId,                            c.streamName,                            ((ITridentSpout) c.spout)),                            batchIdsForSpouts,                            specs),                        c.parallelism);                bd.allGrouping(spoutCoordinator(id), MasterBatchCoordinator.BATCH_STREAM_ID);                bd.allGrouping(masterCoordinator(batchGroup), MasterBatchCoordinator.SUCCESS_STREAM_ID);                if(c.spout instanceof ICommitterTridentSpout) {                    bd.allGrouping(masterCoordinator(batchGroup), MasterBatchCoordinator.COMMIT_STREAM_ID);                }                for(Map m: c.componentConfs) {                    bd.addConfigurations(m);                }            }        }        for(String id: _batchPerTupleSpouts.keySet()) {            SpoutComponent c = _batchPerTupleSpouts.get(id);            SpoutDeclarer d = builder.setSpout(id, new RichSpoutBatchTriggerer((IRichSpout) c.spout, c.streamName, c.batchGroupId), c.parallelism);            for(Map conf: c.componentConfs) {                d.addConfigurations(conf);            }        }        for(String batch: batchesToCommitIds.keySet()) {            List<String> commitIds = batchesToCommitIds.get(batch);            builder.setSpout(masterCoordinator(batch), new MasterBatchCoordinator(commitIds, batchesToSpouts.get(batch)));        }

2、TridentTopology

创建一个spout节点，并将之add到拓扑中。

public Stream newStream(String txId, ITridentSpout spout) {    Node n = new SpoutNode(getUniqueStreamId(), spout.getOutputFields(), txId, spout, SpoutNode.SpoutType.BATCH);    return addNode(n);}

0 0