Storm Trident中的Spout源码解读
来源:互联网 发布:高斯分布 知乎 编辑:程序博客网 时间:2024/05/07 18:35
- 一概述
- 1简介
- 2关键类
- 1Spout的创建
- 2spout的消息流
- 3spout调用的整体流程
- 4TSC与TSE
- 5spout如何被 加载到拓扑中
- 二Spout的创建
- 1ItridentSpout
- 2BatchCoordinator
- 3Emmitter
- 4一个示例
- 三spout实际的消息流
- 1MasterBatchCoordinator
- 2TridentSpoutCoordinator
- 3TridentSpoutExecutor
- 四在TridentTopologyBuilder中设置Spout
- 1TridentTopologyBuilder
- 2TridentTopology
- 一概述
(一)概述
1、简介
trident是storm的更高层次抽象,相对storm,它主要提供了3个方面的好处:
(1)提供了更高层次的抽象,将常用的count,sum等封装成了方法,可以直接调用,不需要自己实现。
(2)以批次代替单个元组,每次处理一个批次的数据。
(3)提供了事务支持,可以保证数据均处理且只处理了一次。
本文介绍了在一个Trident拓扑中,spout是如何被产生并被调用的。首先介绍了用户如何创建一个Spout以及其基本原理,然后介绍了Spout的实际数据流,最后解释了在创建topo时如何设置一个Spout。
2、关键类
MaterBatchCorodeinator —————> ITridentSpout.Coordinator#isReady
|
|
v
TridentSpoutCoordinator —————> ITridentSpout.Coordinator#[initialTransaction, success, close]
|
|
v
TridentSpoutExecutor —————> ITridentSpout.Emitter#(emitBatch, success(),close)
Spout中涉及2组类,第一组类定义了用户如何创建一个Spout,这些用户的代码会被第二组的类调用。第二组类定义了实际的数据流是如何发起并传送的。
(1)Spout的创建
涉及三个类:ItridentSpout, BatchCoordinator, Emitter,其中后面2个是第一个的内部类。
用户创建一个Spout需要实现上述三个接口。比如storm-kafka中的Spout就是实现了这3个接口或者其子接口。
(2)spout的消息流
也是涉及三个类:MasterBatchCoordinator, TridentSpoutCoordinator, TridentSpoutExecutor。它们除了自身固定的逻辑以外,还会调用用户的代码,就是上面介绍的Spout代码。
它们的定义分别为:
MasterBatchCoordinator extends BaseRichSpoutTridentSpoutCoordinator implements IBasicBoltTridentSpoutExecutor implements ITridentBatchBolt
可以看出来,MasterBatchCoordinator才是真正的spout,另外2个都是bolt。
MasterBatchCoordinator会调用用户定义的BatchCoordinator的isReady()方法,返回true的话,则会发送一个id为
3、spout调用的整体流程
(1)MasterBatchCoordinator是Trident中真正的Spout,它可以包含多个TridentSpoutCoordinator的节点。MBC向外发送id为$batch的流,作为整个数据流的起点。
if(!_activeTx.containsKey(curr) && isReady(curr)) { .......... _collector.emit(BATCH_STREAM_ID, new Values(attempt), attempt); .......... }
(2)当整个消息被成功处理完后,会调用MBC的ack()方法,ack方法会将事务的状态从PROCESSING改为PROCESSED:
if(status.status==AttemptStatus.PROCESSING) { status.status = AttemptStatus.PROCESSED;}
当然,如果fail掉了,则会调用fail()方法。
当sync()方法接收到事务状态为PROCESSED时,将其改为COMMITTING的状态,并向外发送id为$commit的流。
if(maybeCommit!=null && maybeCommit.status == AttemptStatus.PROCESSED) { maybeCommit.status = AttemptStatus.COMMITTING; _collector.emit(COMMIT_STREAM_ID, new Values(maybeCommit.attempt), maybeCommit.attempt); }
(3)当$commit流处理完后,MBC的ack方法会被再次调用,同时向外发送$success流
else if(status.status==AttemptStatus.COMMITTING) { //如果当前状态是COMMITTING,则将事务从_activeTx及_attemptIds去掉,并发送$success流。 _activeTx.remove(tx.getTransactionId()); _attemptIds.remove(tx.getTransactionId()); _collector.emit(SUCCESS_STREAM_ID, new Values(tx)); _currTransaction = nextTransactionId(tx.getTransactionId()); for(TransactionalState state: _states) { state.setData(CURRENT_TX, _currTransaction); }
4、TSC与TSE
由上面分析可知,MBC依次发送$batch, $commit, $success流。
而TSC只处理$batch, $success 2个流,TSE处理全部三个流。
TSC处理$succss流:
if(tuple.getSourceStreamId().equals(MasterBatchCoordinator.SUCCESS_STREAM_ID)) { _state.cleanupBefore(attempt.getTransactionId()); _coord.success(attempt.getTransactionId()); }
主要是调用用户在coodinatior中定义 的success方法。
TSE处理$commit, $success流:
if(input.getSourceStreamId().equals(MasterBatchCoordinator.COMMIT_STREAM_ID)) { if(attempt.equals(_activeBatches.get(attempt.getTransactionId()))) { ((ICommitterTridentSpout.Emitter) _emitter).commit(attempt); _activeBatches.remove(attempt.getTransactionId()); } else { throw new FailedException("Received commit for different transaction attempt"); } } else if(input.getSourceStreamId().equals(MasterBatchCoordinator.SUCCESS_STREAM_ID)) { // valid to delete before what's been committed since // those batches will never be accessed again _activeBatches.headMap(attempt.getTransactionId()).clear(); _emitter.success(attempt); }
总结说就是消息是从MasterBatchCoordinator开始的,它是一个真正的spout,而TridentSpoutCoordinator与TridentSpoutExecutor都是bolt,MasterBatchCoordinator发起协调消息,最后的结果是TridentSpoutExecutor发送业务消息。而发送协调消息与业务消息的都是调用用户Spout中BatchCoordinator与Emitter中定义的代码。
可以参考《storm源码分析》P458的流程图
5、spout如何被 加载到拓扑中
(1)在TridentTopologyBuilder的buildTopololg方法中设置了topo的相关信息
(2)在TridentTopology中调用newStream方法,将spout节点加入拓扑。
(二)Spout的创建
1、ItridentSpout
在Trident中用户定义的Spout需要实现ItridentSpout接口。我们先看看ItridentSpout的定义
package storm.trident.spout;import backtype.storm.task.TopologyContext;import storm.trident.topology.TransactionAttempt;import backtype.storm.tuple.Fields;import java.io.Serializable;import java.util.Map;import storm.trident.operation.TridentCollector;public interface ITridentSpout<T> extends Serializable { public interface BatchCoordinator<X> { X initializeTransaction(long txid, X prevMetadata, X currMetadata); void success(long txid); boolean isReady(long txid) void close(); } public interface Emitter<X> { void emitBatch(TransactionAttempt tx, X coordinatorMeta, TridentCollector collector); void success(TransactionAttempt tx); void close(); } BatchCoordinator<T> getCoordinator(String txStateId, Map conf, TopologyContext context); Emitter<T> getEmitter(String txStateId, Map conf, TopologyContext context); Map getComponentConfiguration(); Fields getOutputFields();}
它有2个内部接口,分别是BatchCoordinator和Emitter,分别是用于协调的Spout接口和发送消息的Bolt接口。实现一个Spout的主要工作就在于实现这2个接口,创建实际工作的Coordinator和Emitter。Spout中提供了2个get方法用于分别用于指定使用哪个Coordinator和Emitter类,这些类会由用户定义。稍后我们再分析Coordinator和Emitter的内容。
除此之外,还提供了getComponentConfiguration用于获取配置信息,getOutputFields获取输出field。
我们再看看2个内部接口的代码。
2、BatchCoordinator
public interface BatchCoordinator<X> { X initializeTransaction(long txid, X prevMetadata, X currMetadata); void success(long txid); boolean isReady(long txid); void close();}
(1)initializeTransaction方法返回一个用户定义的事务元数据。X是用户自定义的与事务相关的数据类型,返回的数据会存储到zk中。
其中txid为事务序列号,prevMetadata是前一个事务所对应的元数据。若当前事务为第一个事务,则其为空。currMetadata是当前事务的元数据,如果是当前事务的第一次尝试,则为空,否则为事务上一次尝试所产生的元数据。
(2)isReady方法用于判断事务所对应的数据是否已经准备好,当为true时,表示可以开始一个新事务。其参数是当前的事务号。
BatchCoordinator中实现的方法会被部署到多个节点中运行,其中isReady是在真正的Spout(MasterBatchCoordinator)中执行的,其余方法在TridentSpoutCoordinator中执行。
3、Emmitter
public interface Emitter<X> { void emitBatch(TransactionAttempt tx, X coordinatorMeta, TridentCollector collector); void success(TransactionAttempt tx); void close();}
消息发送节点会接收协调spout的$batch和$success流。
(1)当收到$batch消息时,节点便调用emitBatch方法来发送消息。
(2)当收到$success消息时,会调用success方法对事务进行后处理
4、一个示例
参考 DiagnosisEventSpout
(1)Spout的代码
package com.packtpub.storm.trident.spout;import backtype.storm.task.TopologyContext;import backtype.storm.tuple.Fields;import storm.trident.spout.ITridentSpout;import java.util.Map;@SuppressWarnings("rawtypes")public class DiagnosisEventSpout implements ITridentSpout<Long> { private static final long serialVersionUID = 1L; BatchCoordinator<Long> coordinator = new DefaultCoordinator(); Emitter<Long> emitter = new DiagnosisEventEmitter(); @Override public BatchCoordinator<Long> getCoordinator(String txStateId, Map conf, TopologyContext context) { return coordinator; } @Override public Emitter<Long> getEmitter(String txStateId, Map conf, TopologyContext context) { return emitter; } @Override public Map getComponentConfiguration() { return null; } @Override public Fields getOutputFields() { return new Fields("event"); }}
(2)BatchCoordinator的代码
package com.packtpub.storm.trident.spout;import org.slf4j.Logger;import org.slf4j.LoggerFactory;import storm.trident.spout.ITridentSpout.BatchCoordinator;import java.io.Serializable;public class DefaultCoordinator implements BatchCoordinator<Long>, Serializable { private static final long serialVersionUID = 1L; private static final Logger LOG = LoggerFactory.getLogger(DefaultCoordinator.class); @Override public boolean isReady(long txid) { return true; } @Override public void close() { } @Override public Long initializeTransaction(long txid, Long prevMetadata, Long currMetadata) { LOG.info("Initializing Transaction [" + txid + "]"); return null; } @Override public void success(long txid) { LOG.info("Successful Transaction [" + txid + "]"); }}
(3)Emitter的代码
package com.packtpub.storm.trident.spout;import com.packtpub.storm.trident.model.DiagnosisEvent;import storm.trident.operation.TridentCollector;import storm.trident.spout.ITridentSpout.Emitter;import storm.trident.topology.TransactionAttempt;import java.io.Serializable;import java.util.ArrayList;import java.util.List;import java.util.concurrent.atomic.AtomicInteger;public class DiagnosisEventEmitter implements Emitter<Long>, Serializable { private static final long serialVersionUID = 1L; AtomicInteger successfulTransactions = new AtomicInteger(0); @Override public void emitBatch(TransactionAttempt tx, Long coordinatorMeta, TridentCollector collector) { for (int i = 0; i < 10000; i++) { List<Object> events = new ArrayList<Object>(); double lat = new Double(-30 + (int) (Math.random() * 75)); double lng = new Double(-120 + (int) (Math.random() * 70)); long time = System.currentTimeMillis(); String diag = new Integer(320 + (int) (Math.random() * 7)).toString(); DiagnosisEvent event = new DiagnosisEvent(lat, lng, time, diag); events.add(event); collector.emit(events); } } @Override public void success(TransactionAttempt tx) { successfulTransactions.incrementAndGet(); } @Override public void close() { }}
(4)最后,在创建topo时指定spout
TridentTopology topology = new TridentTopology(); DiagnosisEventSpout spout = new DiagnosisEventSpout(); Stream inputStream = topology.newStream("event", spout);
(三)spout实际的消息流
以上的内容说明了如何在用户代码中创建一个Spout,以及其基本原理。但创建Spout后,它是怎么被加载到拓扑真正的Spout中呢?我们继续看trident的实现。
1、MasterBatchCoordinator
总体而言,MasterBatchCoordinator作为一个数据流的真正起点:
* 首先调用open方法完成初始化,包括读取之前的拓扑处理到的事务序列号,最多同时处理的tuple数量,每个事务的尝试次数等。
* 然后nextTuple会改变事务的状态,或者是创建事务并发送$batch流。
* 最后,ack方法会根据流的状态向外发送$commit流,或者是重新调用sync方法,开始创建新的事务。
总而言之,MasterBatchCoordinator作为拓扑数据流的真正起点,通过循环发送协调信息,不断的处理数据流。MasterBatchCoordinator的真正作用在于协调消息的起点,里面所有的map,如_activeTx,_attemptIds等都只是为了保存当前正在处理的情况而已。
(1)MasterBatchCoordinator是一个真正的spout
public class MasterBatchCoordinator extends BaseRichSpout
一个Trident拓扑的真正逻辑就是从MasterBatchCoordinator开始的,先调用open方法完成一些初始化,然后是在nextTuple中发送$batch和$commit流。
(2)看一下open方法
@Override public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) { _throttler = new WindowedTimeThrottler((Number)conf.get(Config.TOPOLOGY_TRIDENT_BATCH_EMIT_INTERVAL_MILLIS), 1); for(String spoutId: _managedSpoutIds) { //每个MasterBatchSpout可以处理多个ITridentSpout,这里将多个spout的元数据放到_states这个Map中。稍后再看看放进来的是什么内容。 _states.add(TransactionalState.newCoordinatorState(conf, spoutId)); } //从zk中获取当前的transation事务序号,当拓扑新启动时,需要从zk恢复之前的状态。也就是说zk存储的是下一个需要提交的事务序号,而不是已经提交的事务序号。 _currTransaction = getStoredCurrTransaction(); _collector = collector; //任何时刻中,一个spout task最多可以同时处理的tuple数量,即已经emite,但未acked的tuple数量。 Number active = (Number) conf.get(Config.TOPOLOGY_MAX_SPOUT_PENDING); if(active==null) { _maxTransactionActive = 1; } else { _maxTransactionActive = active.intValue(); } //每一个事务的当前尝试编号,即_currTransaction这个事务序号中,各个事务的尝试次数。 _attemptIds = getStoredCurrAttempts(_currTransaction, _maxTransactionActive); for(int i=0; i<_spouts.size(); i++) { //将各个Spout的Coordinator保存在_coordinators这个List中。 String txId = _managedSpoutIds.get(i); _coordinators.add(_spouts.get(i).getCoordinator(txId, conf, context)); } }
(3)再看一下nextTuple()方法,它只调用了sync()方法,主要完成了以下功能:
* 如果事务状态是PROCESSED,则将其状态改为COMMITTING,然后发送
* 如果_activeTx.size()小于_maxTransactionActive,则新建事务,放到_activeTx中,同时向外发送$batch流,等待Coordinator的处理。( 当ack方法被 调用时,这个事务会被从_activeTx中移除)
注意:当前处于acitve状态的应该是序列在[_currTransaction,_currTransaction+_maxTransactionActive-1]之间的事务。
private void sync() { // note that sometimes the tuples active may be less than max_spout_pending, e.g. // max_spout_pending = 3 // tx 1, 2, 3 active, tx 2 is acked. there won't be a commit for tx 2 (because tx 1 isn't committed yet), // and there won't be a batch for tx 4 because there's max_spout_pending tx active //判断当前事务_currTransaction是否为PROCESSED状态,如果是的话,将其状态改为COMMITTING,然后发送$commit流。接收到$commit流的节点会调用finishBatch方法,进行事务的提交和后处理。 TransactionStatus maybeCommit = _activeTx.get(_currTransaction); if(maybeCommit!=null && maybeCommit.status == AttemptStatus.PROCESSED) { maybeCommit.status = AttemptStatus.COMMITTING; _collector.emit(COMMIT_STREAM_ID, new Values(maybeCommit.attempt), maybeCommit.attempt); } //用于产生一个新事务。最多存在_maxTransactionActive个事务同时运行,当前active的事务序号区间处于[_currTransaction,_currTransaction+_maxTransactionActive-1]之间。注意只有在当前 //事务结束之后,系统才会初始化新的事务,所以系统中实际活跃的事务可能少于_maxTransactionActive。 if(_active) { if(_activeTx.size() < _maxTransactionActive) { Long curr = _currTransaction; //创建_maxTransactionActive个事务。 for(int i=0; i<_maxTransactionActive; i++) { //如果事务序号不存在_activeTx中,则创建新事务,并发送$batch流。当ack被调用时,这个序号会被remove掉,详见ack方法。 if(!_activeTx.containsKey(curr) && isReady(curr)) { // by using a monotonically increasing attempt id, downstream tasks // can be memory efficient by clearing out state for old attempts // as soon as they see a higher attempt id for a transaction Integer attemptId = _attemptIds.get(curr); if(attemptId==null) { attemptId = 0; } else { attemptId++; } //_activeTx记录的是事务序号和事务状态的map,而_activeTx则记录事务序号与尝试次数的map。 _attemptIds.put(curr, attemptId); for(TransactionalState state: _states) { state.setData(CURRENT_ATTEMPTS, _attemptIds); } //TransactionAttempt包含事务序号和尝试编号2个变量,对应于一个具体的事务。 TransactionAttempt attempt = new TransactionAttempt(curr, attemptId); _activeTx.put(curr, new TransactionStatus(attempt)); _collector.emit(BATCH_STREAM_ID, new Values(attempt), attempt); _throttler.markEvent(); } //如果事务序号已经存在_activeTx中,则curr递增,然后再循环检查下一个。 curr = nextTransactionId(curr); } } }}
完整代码见最后。
(4)继续往下,看看ack方法。
@Overridepublic void ack(Object msgId) { //获取某个事务的状态 TransactionAttempt tx = (TransactionAttempt) msgId; TransactionStatus status = _activeTx.get(tx.getTransactionId()); if(status!=null && tx.equals(status.attempt)) { //如果当前状态是PROCESSING,则改为PROCESSED if(status.status==AttemptStatus.PROCESSING) { status.status = AttemptStatus.PROCESSED; } else if(status.status==AttemptStatus.COMMITTING) { //如果当前状态是COMMITTING,则将事务从_activeTx及_attemptIds去掉,并发送$success流。 _activeTx.remove(tx.getTransactionId()); _attemptIds.remove(tx.getTransactionId()); _collector.emit(SUCCESS_STREAM_ID, new Values(tx)); _currTransaction = nextTransactionId(tx.getTransactionId()); for(TransactionalState state: _states) { state.setData(CURRENT_TX, _currTransaction); } } //由于有些事务状态已经改变,需要重新调用sync()继续后续处理,或者发送新tuple。 sync(); }}
(5)还有fail方法和declareOutputFileds方法。
@Overridepublic void fail(Object msgId) { TransactionAttempt tx = (TransactionAttempt) msgId; TransactionStatus stored = _activeTx.remove(tx.getTransactionId()); if(stored!=null && tx.equals(stored.attempt)) { _activeTx.tailMap(tx.getTransactionId()).clear(); sync(); }}@Overridepublic void declareOutputFields(OutputFieldsDeclarer declarer) { // in partitioned example, in case an emitter task receives a later transaction than it's emitted so far, // when it sees the earlier txid it should know to emit nothing declarer.declareStream(BATCH_STREAM_ID, new Fields("tx")); declarer.declareStream(COMMIT_STREAM_ID, new Fields("tx")); declarer.declareStream(SUCCESS_STREAM_ID, new Fields("tx"));}
2、TridentSpoutCoordinator
TridentSpoutCoordinator接收来自MasterBatchCoordinator的$success流与$batch流,并通过调用用户代码,实现真正的逻辑。此外还向TridentSpoutExecuter发送$batch流,以触发后者开始真正发送业务数据流。
(1)TridentSpoutCoordinator是一个bolt
public class TridentSpoutCoordinator implements IBasicBolt
(2)在创建TridentSpoutCoordinator时,需要传递一个ITridentSpout对象,
public TridentSpoutCoordinator(String id, ITridentSpout spout) { _spout = spout; _id = id; }
然后使用这个对象来获取到用户定义的Coordinator:
_coord = _spout.getCoordinator(_id, conf, context);
(3)_state和_underlyingState保存了zk中的元数据信息
_underlyingState = TransactionalState.newCoordinatorState(conf, _id);_state = new RotatingTransactionalState(_underlyingState, META_DIR);
(4)在execute方法中,TridentSpoutCoordinator接收$success流与$batch流,先看看$success流:
if(tuple.getSourceStreamId().equals(MasterBatchCoordinator.SUCCESS_STREAM_ID)) {_state.cleanupBefore(attempt.getTransactionId());_coord.success(attempt.getTransactionId());}
即接收到$success流时,调用用户定义的Coordinator中的success方法。同时还清理了zk中的数据。
(5)再看看$batch流
else { long txid = attempt.getTransactionId(); Object prevMeta = _state.getPreviousState(txid); Object meta = _coord.initializeTransaction(txid, prevMeta, _state.getState(txid)); _state.overrideState(txid, meta); collector.emit(MasterBatchCoordinator.BATCH_STREAM_ID, new Values(attempt, meta)); }
当收到$batch流流时,初始化一个事务并将其发送出去。由于在trident中消息有可能是重放的,因此需要prevMeta。注意,trident是在bolt中初始化一个事务的。
3、TridentSpoutExecutor
TridentSpoutExecutor接收来自TridentSpoutCoordinator的消息流,包括$commit,$success与$batch流,前面2个分别调用emmitter的commit与success方法,$batch则调用emmitter的emitBatch方法,开始向外发送业务数据。
(1) TridentSpoutExecutor与是一个bolt
publicclassTridentSpoutExecutorimplementsITridentBatchBolt
(2)核心的execute方法
@Overridepublic void execute(BatchInfo info, Tuple input) { // there won't be a BatchInfo for the success stream TransactionAttempt attempt = (TransactionAttempt) input.getValue(0); if(input.getSourceStreamId().equals(MasterBatchCoordinator.COMMIT_STREAM_ID)) { if(attempt.equals(_activeBatches.get(attempt.getTransactionId()))) { ((ICommitterTridentSpout.Emitter) _emitter).commit(attempt); _activeBatches.remove(attempt.getTransactionId()); } else { throw new FailedException("Received commit for different transaction attempt"); } } else if(input.getSourceStreamId().equals(MasterBatchCoordinator.SUCCESS_STREAM_ID)) { // valid to delete before what's been committed since // those batches will never be accessed again _activeBatches.headMap(attempt.getTransactionId()).clear(); _emitter.success(attempt); } else { _collector.setBatch(info.batchId); //发送业务消息 _emitter.emitBatch(attempt, input.getValue(1), _collector); _activeBatches.put(attempt.getTransactionId(), attempt); }}
(四)在TridentTopologyBuilder中设置Spout
通过上面的分析,一个Spout是准备好了,但如何将它加载到拓扑中,并开始真正的数据流:
(1)在TridentTopologyBuilder的buildTopololg方法中设置了topo的相关信息
(2)在TridentTopology中调用newStream方法,将spout节点加入拓扑。
1、TridentTopologyBuilder
在TridentTopologyBuilder中的buildTopology的前半部分中,设置了Spout的相关信息。后半部分设置了bolt的信息。这里我们只看spout相关的内容:
TopologyBuilder builder = new TopologyBuilder(); Map<GlobalStreamId, String> batchIdsForSpouts = fleshOutStreamBatchIds(false); Map<GlobalStreamId, String> batchIdsForBolts = fleshOutStreamBatchIds(true); Map<String, List<String>> batchesToCommitIds = new HashMap<String, List<String>>(); Map<String, List<ITridentSpout>> batchesToSpouts = new HashMap<String, List<ITridentSpout>>(); for(String id: _spouts.keySet()) { TransactionalSpoutComponent c = _spouts.get(id); if(c.spout instanceof IRichSpout) { //TODO: wrap this to set the stream name builder.setSpout(id, (IRichSpout) c.spout, c.parallelism); } else { String batchGroup = c.batchGroupId; if(!batchesToCommitIds.containsKey(batchGroup)) { batchesToCommitIds.put(batchGroup, new ArrayList<String>()); } batchesToCommitIds.get(batchGroup).add(c.commitStateId); if(!batchesToSpouts.containsKey(batchGroup)) { batchesToSpouts.put(batchGroup, new ArrayList<ITridentSpout>()); } batchesToSpouts.get(batchGroup).add((ITridentSpout) c.spout); BoltDeclarer scd = builder.setBolt(spoutCoordinator(id), new TridentSpoutCoordinator(c.commitStateId, (ITridentSpout) c.spout)) .globalGrouping(masterCoordinator(c.batchGroupId), MasterBatchCoordinator.BATCH_STREAM_ID) .globalGrouping(masterCoordinator(c.batchGroupId), MasterBatchCoordinator.SUCCESS_STREAM_ID); for(Map m: c.componentConfs) { scd.addConfigurations(m); } Map<String, TridentBoltExecutor.CoordSpec> specs = new HashMap(); specs.put(c.batchGroupId, new CoordSpec()); BoltDeclarer bd = builder.setBolt(id, new TridentBoltExecutor( new TridentSpoutExecutor( c.commitStateId, c.streamName, ((ITridentSpout) c.spout)), batchIdsForSpouts, specs), c.parallelism); bd.allGrouping(spoutCoordinator(id), MasterBatchCoordinator.BATCH_STREAM_ID); bd.allGrouping(masterCoordinator(batchGroup), MasterBatchCoordinator.SUCCESS_STREAM_ID); if(c.spout instanceof ICommitterTridentSpout) { bd.allGrouping(masterCoordinator(batchGroup), MasterBatchCoordinator.COMMIT_STREAM_ID); } for(Map m: c.componentConfs) { bd.addConfigurations(m); } } } for(String id: _batchPerTupleSpouts.keySet()) { SpoutComponent c = _batchPerTupleSpouts.get(id); SpoutDeclarer d = builder.setSpout(id, new RichSpoutBatchTriggerer((IRichSpout) c.spout, c.streamName, c.batchGroupId), c.parallelism); for(Map conf: c.componentConfs) { d.addConfigurations(conf); } } for(String batch: batchesToCommitIds.keySet()) { List<String> commitIds = batchesToCommitIds.get(batch); builder.setSpout(masterCoordinator(batch), new MasterBatchCoordinator(commitIds, batchesToSpouts.get(batch))); }
2、TridentTopology
创建一个spout节点,并将之add到拓扑中。
public Stream newStream(String txId, ITridentSpout spout) { Node n = new SpoutNode(getUniqueStreamId(), spout.getOutputFields(), txId, spout, SpoutNode.SpoutType.BATCH); return addNode(n);}
- Storm Trident中的Spout源码解读
- storm trident 自定义spout
- storm-[4] -java.lang.NoClassDefFoundError: storm/trident/spout/ITridentSpout
- storm-trident源码学习
- Storm 0.9.0中trident的Spout接口变化
- storm trident的多数据流,多spout
- Storm的Spout和Bolt中的方法
- [Trident] Storm Trident 教程
- Storm Trident
- storm trident
- Storm Trident
- Storm Trident
- Trident-storm
- storm Trident
- Storm:Trident
- Storm Trident
- Storm源码分析之四: Trident源码分析
- Storm bolt/spout生命周期
- Java编程练习之输出考试成绩的前三名
- libpython2.7.a(abstract.o) recompile with -fPIC
- 5.7 copy--简单拷贝和深度拷贝对象
- 【bzoj1030】[JSOI2007]文本生成器
- 数据库事物
- Storm Trident中的Spout源码解读
- 看清谁才更爱你
- Activity返回栈
- 互斥的软件实现:Peterson算法和Dekker算法
- Nice------Patch图片制作及广播学习
- spring+Struts+hibernate学习
- 使用Vitamio库打造万能播放器(一)
- sed将换行替换成空格或者其他字符
- 单个对象内存管理分析