storm——本地eclipse上调试wordcount程序

来源:互联网 发布:brtn北京网络电视台 编辑:程序博客网 时间:2024/05/19 16:21

通常,对于hadoop或者storm这种任务类型的程序,我们都希望能够在本地进行一次调试,然后再提交到集群上跑任务。

storm和hadoop类似,有本地模式和集群模式。相比hadoop而言,storm的本地模式更加简单,不需要在本地(windows环境)安装任何storm的软件或者工具等(什么都不需要额外安装,只需要maven引入storm的jar即可)。本文就是如何在windows上调试简单storm程序。

1、一个简单的wordcount程序:

1)建立maven项目,pom.xml

   <dependency>         <groupId>org.apache.storm</groupId>         <artifactId>storm-core</artifactId>         <version>0.10.0</version>         <scope>provided</scope>     </dependency>

2)RandomSentenceSpout:(相当于数据生产者)

package cn.edu.nuc.StormTest.wordcount;import java.util.Map;import java.util.Random;import backtype.storm.spout.SpoutOutputCollector;import backtype.storm.task.TopologyContext;import backtype.storm.topology.OutputFieldsDeclarer;import backtype.storm.topology.base.BaseRichSpout;import backtype.storm.tuple.Fields;import backtype.storm.tuple.Values;import backtype.storm.utils.Utils;public class RandomSentenceSpout extends BaseRichSpout{/** *  */private static final long serialVersionUID = 1L;SpoutOutputCollector _collector;Random _rand;@Overridepublic void open(Map conf, TopologyContext context,SpoutOutputCollector collector) {_collector = collector;_rand = new Random();}@Overridepublic void nextTuple() {// 睡眠一段时间后再产生一个数据Utils.sleep(100);// 句子数组String[] sentences = new String[] { "the cow jumped over the moon","an apple a day keeps the doctor away","four score and seven years ago","snow white and the seven dwarfs","i am at two with nature" };// 随机选择一个句子String sentence = sentences[_rand.nextInt(sentences.length)];// 发射该句子给Bolt_collector.emit(new Values(sentence));}// 确认函数@Overridepublic void ack(Object id) {}// 处理失败的时候调用@Overridepublic void fail(Object id) {}@Overridepublic void declareOutputFields(OutputFieldsDeclarer declarer) {// 定义一个字段worddeclarer.declare(new Fields("word"));}}

3)SplitSentenceBolt:(这里的bolt相当于mapreduce中的map函数)

package cn.edu.nuc.StormTest.wordcount;import java.util.StringTokenizer;import backtype.storm.topology.BasicOutputCollector;import backtype.storm.topology.OutputFieldsDeclarer;import backtype.storm.topology.base.BaseBasicBolt;import backtype.storm.tuple.Fields;import backtype.storm.tuple.Tuple;import backtype.storm.tuple.Values;public class SplitSentenceBolt extends BaseBasicBolt{/** *  */private static final long serialVersionUID = 1L;@Overridepublic void execute(Tuple tuple, BasicOutputCollector collector) {// 接收到一个句子String sentence = tuple.getString(0);// 把句子切割为单词StringTokenizer iter = new StringTokenizer(sentence);// 发送每一个单词while (iter.hasMoreElements()) {collector.emit(new Values(iter.nextToken()));}}@Overridepublic void declareOutputFields(OutputFieldsDeclarer declarer) {// 定义一个字段declarer.declare(new Fields("word"));}}

4)WordCountBolt:(这里的bolt相当于mapreduce中的reduce函数)

package cn.edu.nuc.StormTest.wordcount;import java.util.HashMap;import java.util.Map;import backtype.storm.topology.BasicOutputCollector;import backtype.storm.topology.OutputFieldsDeclarer;import backtype.storm.topology.base.BaseBasicBolt;import backtype.storm.tuple.Fields;import backtype.storm.tuple.Tuple;import backtype.storm.tuple.Values;public class WordCountBolt extends BaseBasicBolt{/** *  */private static final long serialVersionUID = 1L;Map<String, Integer> counts = new HashMap<String, Integer>();@Overridepublic void execute(Tuple tuple, BasicOutputCollector collector) {// 接收一个单词String word = tuple.getString(0);// 获取该单词对应的计数Integer count = counts.get(word);if (count == null)count = 0;// 计数增加count++;// 将单词和对应的计数加入map中counts.put(word, count);System.out.println("hello word!");System.out.println(word + "  " + count);// 发送单词和计数(分别对应字段word和count)collector.emit(new Values(word, count));}@Overridepublic void declareOutputFields(OutputFieldsDeclarer declarer) {// 定义两个字段word和countdeclarer.declare(new Fields("word", "count"));}}


5)TopoMain:(任务提交入口类,提供cluster和Local两种运行模式,在本地调试,可以使用local模式)

package cn.edu.nuc.StormTest.wordcount;import cn.edu.nuc.StormTest.WordCountTopolopgyAllInJava.WordCount;import backtype.storm.Config;import backtype.storm.LocalCluster;import backtype.storm.StormSubmitter;import backtype.storm.topology.TopologyBuilder;import backtype.storm.tuple.Fields;import backtype.storm.utils.Utils;public class TopoMain {public static void main(String[] args) throws Exception {          TopologyBuilder builder = new TopologyBuilder();           builder.setSpout("spout", new RandomSentenceSpout());          builder.setBolt("split", new SplitSentenceBolt()).shuffleGrouping("spout");         builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split",new Fields("word"));        Config conf = new Config();          conf.setDebug(false);         if (args != null && args.length > 0) {              conf.setNumWorkers(3);              StormSubmitter.submitTopology(args[0], conf, builder.createTopology());          } else {              LocalCluster cluster = new LocalCluster();              cluster.submitTopology("wordcount", conf, builder.createTopology());              Utils.sleep(3000);              cluster.killTopology("wordcount");              cluster.shutdown();          }      }  }

开发完毕后,在eclipse点击运行,可以看到:

7013 [Thread-34-count] INFO  b.s.d.executor - Prepared bolt count:(10)7021 [Thread-14-count] INFO  b.s.d.executor - Preparing bolt count:(2)7021 [Thread-36-split] INFO  b.s.d.executor - Preparing bolt split:(14)7022 [Thread-36-split] INFO  b.s.d.executor - Prepared bolt split:(14)7022 [Thread-14-count] INFO  b.s.d.executor - Prepared bolt count:(2)hello word!the  1hello word!cow  1hello word!jumped  1hello word!the  2hello word!over  1hello word!moon  1

2、上面使用的是继承方式,下面用接口的方式:

1)WordRead:(spout)

package cn.edu.nuc.StormTest.wordcount1;import java.io.BufferedReader;import java.io.FileNotFoundException;import java.io.FileReader;import java.util.Map;import backtype.storm.spout.SpoutOutputCollector;import backtype.storm.task.TopologyContext;import backtype.storm.topology.IRichSpout;import backtype.storm.topology.OutputFieldsDeclarer;import backtype.storm.tuple.Fields;import backtype.storm.tuple.Values;public class WordReader implements IRichSpout {private static final long serialVersionUID = 1L;      private SpoutOutputCollector collector;      private FileReader fileReader;      private boolean completed = false;        public boolean isDistributed() {          return false;      }      /**      * 这是第一个方法,里面接收了三个参数,第一个是创建Topology时的配置,      * 第二个是所有的Topology数据,第三个是用来把Spout的数据发射给bolt      * **/      public void open(Map conf, TopologyContext context,              SpoutOutputCollector collector) {          try {              //获取创建Topology时指定的要读取的文件路径              this.fileReader = new FileReader(conf.get("wordsFile").toString());          } catch (FileNotFoundException e) {              throw new RuntimeException("Error reading file ["                      + conf.get("wordFile") + "]");          }          //初始化发射器          this.collector = collector;        }      /**      * 这是Spout最主要的方法,在这里我们读取文本文件,并把它的每一行发射出去(给bolt)      * 这个方法会不断被调用,为了降低它对CPU的消耗,当任务完成时让它sleep一下      * **/      public void nextTuple() {          if (completed) {              try {                  Thread.sleep(1000);              } catch (InterruptedException e) {                  // Do nothing              }              return;          }          String str;          // Open the reader          BufferedReader reader = new BufferedReader(fileReader);          try {              // Read all lines              while ((str = reader.readLine()) != null) {                  /**                  * 发射每一行,Values是一个ArrayList的实现                  */                  this.collector.emit(new Values(str), str);              }          } catch (Exception e) {              throw new RuntimeException("Error reading tuple", e);          } finally {              completed = true;          }        }      public void declareOutputFields(OutputFieldsDeclarer declarer) {          declarer.declare(new Fields("line"));        }      public void close() {          // TODO Auto-generated method stub      }            public void activate() {          // TODO Auto-generated method stub        }      public void deactivate() {          // TODO Auto-generated method stub        }      public void ack(Object msgId) {          System.out.println("OK:" + msgId);      }      public void fail(Object msgId) {          System.out.println("FAIL:" + msgId);        }      public Map<String, Object> getComponentConfiguration() {          // TODO Auto-generated method stub          return null;      }  }
2)WordNormalizer:(bolt,相当于map)

package cn.edu.nuc.StormTest.wordcount1;import java.util.ArrayList;import java.util.List;import java.util.Map;import backtype.storm.task.OutputCollector;import backtype.storm.task.TopologyContext;import backtype.storm.topology.IRichBolt;import backtype.storm.topology.OutputFieldsDeclarer;import backtype.storm.tuple.Fields;import backtype.storm.tuple.Tuple;import backtype.storm.tuple.Values;public class WordNormalizer implements IRichBolt{/** *  */private static final long serialVersionUID = 1L;private OutputCollector collector;      public void prepare(Map stormConf, TopologyContext context,              OutputCollector collector) {          this.collector = collector;      }          /**这是bolt中最重要的方法,每当接收到一个tuple时,此方法便被调用      * 这个方法的作用就是把文本文件中的每一行切分成一个个单词,并把这些单词发射出去(给下一个bolt处理)      * **/      public void execute(Tuple input) {          String sentence = input.getString(0);          String[] words = sentence.split(" ");          for (String word : words) {              word = word.trim();              if (!word.isEmpty()) {                  word = word.toLowerCase();                  // Emit the word                  List a = new ArrayList();                  a.add(input);                  collector.emit(a, new Values(word));              }          }          //确认成功处理一个tuple          collector.ack(input);      }      public void declareOutputFields(OutputFieldsDeclarer declarer) {          declarer.declare(new Fields("word"));        }      public void cleanup() {          // TODO Auto-generated method stub        }      public Map<String, Object> getComponentConfiguration() {          // TODO Auto-generated method stub          return null;      }  }
3)WordCount:(bolt,相当于reduce)

package cn.edu.nuc.StormTest.wordcount1;import java.util.HashMap;import java.util.Map;import backtype.storm.task.OutputCollector;import backtype.storm.task.TopologyContext;import backtype.storm.topology.IRichBolt;import backtype.storm.topology.OutputFieldsDeclarer;import backtype.storm.tuple.Tuple;public class WordCounter implements IRichBolt{/** *  */private static final long serialVersionUID = 1L;Integer id;      String name;      Map<String, Integer> counters;      private OutputCollector collector;        public void prepare(Map stormConf, TopologyContext context,              OutputCollector collector) {          this.counters = new HashMap<String, Integer>();          this.collector = collector;          this.name = context.getThisComponentId();          this.id = context.getThisTaskId();        }      public void execute(Tuple input) {          String str = input.getString(0);          if (!counters.containsKey(str)) {              counters.put(str, 1);          } else {              Integer c = counters.get(str) + 1;              counters.put(str, c);          }          // 确认成功处理一个tuple          collector.ack(input);      }      /**      * Topology执行完毕的清理工作,比如关闭连接、释放资源等操作都会写在这里      * 因为这只是个Demo,我们用它来打印我们的计数器      * */      public void cleanup() {          System.out.println("-- Word Counter [" + name + "-" + id + "] --");          for (Map.Entry<String, Integer> entry : counters.entrySet()) {              System.out.println(entry.getKey() + ": " + entry.getValue());          }          counters.clear();      }      public void declareOutputFields(OutputFieldsDeclarer declarer) {          // TODO Auto-generated method stub        }      public Map<String, Object> getComponentConfiguration() {          // TODO Auto-generated method stub          return null;      }  }
4)主函数:

package cn.edu.nuc.StormTest.wordcount1;import backtype.storm.Config;import backtype.storm.LocalCluster;import backtype.storm.topology.TopologyBuilder;import backtype.storm.tuple.Fields;import backtype.storm.utils.Utils;public class WordCountTopologyMain {public static void main(String[] args) throws InterruptedException {          //定义一个Topology          TopologyBuilder builder = new TopologyBuilder();          builder.setSpout("word-reader",new WordReader(),1);          builder.setBolt("word-normalizer", new WordNormalizer()).shuffleGrouping("word-reader");          builder.setBolt("word-counter", new WordCounter(),2).fieldsGrouping("word-normalizer", new Fields("word"));          //配置          Config conf = new Config();          conf.put("wordsFile", "d:/test.txt");          conf.setDebug(false);          //提交Topology          conf.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 1);          //创建一个本地模式cluster          LocalCluster cluster = new LocalCluster();          cluster.submitTopology("Getting-Started-Toplogie", conf,builder.createTopology());          Utils.sleep(3000);          cluster.killTopology("Getting-Started-Toplogie");          cluster.shutdown();      }  }


原创粉丝点击