storm——本地eclipse上调试wordcount程序
来源:互联网 发布:brtn北京网络电视台 编辑:程序博客网 时间:2024/05/19 16:21
通常,对于hadoop或者storm这种任务类型的程序,我们都希望能够在本地进行一次调试,然后再提交到集群上跑任务。
storm和hadoop类似,有本地模式和集群模式。相比hadoop而言,storm的本地模式更加简单,不需要在本地(windows环境)安装任何storm的软件或者工具等(什么都不需要额外安装,只需要maven引入storm的jar即可)。本文就是如何在windows上调试简单storm程序。
1、一个简单的wordcount程序:
1)建立maven项目,pom.xml
<dependency> <groupId>org.apache.storm</groupId> <artifactId>storm-core</artifactId> <version>0.10.0</version> <scope>provided</scope> </dependency>
2)RandomSentenceSpout:(相当于数据生产者)
package cn.edu.nuc.StormTest.wordcount;import java.util.Map;import java.util.Random;import backtype.storm.spout.SpoutOutputCollector;import backtype.storm.task.TopologyContext;import backtype.storm.topology.OutputFieldsDeclarer;import backtype.storm.topology.base.BaseRichSpout;import backtype.storm.tuple.Fields;import backtype.storm.tuple.Values;import backtype.storm.utils.Utils;public class RandomSentenceSpout extends BaseRichSpout{/** * */private static final long serialVersionUID = 1L;SpoutOutputCollector _collector;Random _rand;@Overridepublic void open(Map conf, TopologyContext context,SpoutOutputCollector collector) {_collector = collector;_rand = new Random();}@Overridepublic void nextTuple() {// 睡眠一段时间后再产生一个数据Utils.sleep(100);// 句子数组String[] sentences = new String[] { "the cow jumped over the moon","an apple a day keeps the doctor away","four score and seven years ago","snow white and the seven dwarfs","i am at two with nature" };// 随机选择一个句子String sentence = sentences[_rand.nextInt(sentences.length)];// 发射该句子给Bolt_collector.emit(new Values(sentence));}// 确认函数@Overridepublic void ack(Object id) {}// 处理失败的时候调用@Overridepublic void fail(Object id) {}@Overridepublic void declareOutputFields(OutputFieldsDeclarer declarer) {// 定义一个字段worddeclarer.declare(new Fields("word"));}}
3)SplitSentenceBolt:(这里的bolt相当于mapreduce中的map函数)
package cn.edu.nuc.StormTest.wordcount;import java.util.StringTokenizer;import backtype.storm.topology.BasicOutputCollector;import backtype.storm.topology.OutputFieldsDeclarer;import backtype.storm.topology.base.BaseBasicBolt;import backtype.storm.tuple.Fields;import backtype.storm.tuple.Tuple;import backtype.storm.tuple.Values;public class SplitSentenceBolt extends BaseBasicBolt{/** * */private static final long serialVersionUID = 1L;@Overridepublic void execute(Tuple tuple, BasicOutputCollector collector) {// 接收到一个句子String sentence = tuple.getString(0);// 把句子切割为单词StringTokenizer iter = new StringTokenizer(sentence);// 发送每一个单词while (iter.hasMoreElements()) {collector.emit(new Values(iter.nextToken()));}}@Overridepublic void declareOutputFields(OutputFieldsDeclarer declarer) {// 定义一个字段declarer.declare(new Fields("word"));}}
4)WordCountBolt:(这里的bolt相当于mapreduce中的reduce函数)
package cn.edu.nuc.StormTest.wordcount;import java.util.HashMap;import java.util.Map;import backtype.storm.topology.BasicOutputCollector;import backtype.storm.topology.OutputFieldsDeclarer;import backtype.storm.topology.base.BaseBasicBolt;import backtype.storm.tuple.Fields;import backtype.storm.tuple.Tuple;import backtype.storm.tuple.Values;public class WordCountBolt extends BaseBasicBolt{/** * */private static final long serialVersionUID = 1L;Map<String, Integer> counts = new HashMap<String, Integer>();@Overridepublic void execute(Tuple tuple, BasicOutputCollector collector) {// 接收一个单词String word = tuple.getString(0);// 获取该单词对应的计数Integer count = counts.get(word);if (count == null)count = 0;// 计数增加count++;// 将单词和对应的计数加入map中counts.put(word, count);System.out.println("hello word!");System.out.println(word + " " + count);// 发送单词和计数(分别对应字段word和count)collector.emit(new Values(word, count));}@Overridepublic void declareOutputFields(OutputFieldsDeclarer declarer) {// 定义两个字段word和countdeclarer.declare(new Fields("word", "count"));}}
package cn.edu.nuc.StormTest.wordcount;import cn.edu.nuc.StormTest.WordCountTopolopgyAllInJava.WordCount;import backtype.storm.Config;import backtype.storm.LocalCluster;import backtype.storm.StormSubmitter;import backtype.storm.topology.TopologyBuilder;import backtype.storm.tuple.Fields;import backtype.storm.utils.Utils;public class TopoMain {public static void main(String[] args) throws Exception { TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("spout", new RandomSentenceSpout()); builder.setBolt("split", new SplitSentenceBolt()).shuffleGrouping("spout"); builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split",new Fields("word")); Config conf = new Config(); conf.setDebug(false); if (args != null && args.length > 0) { conf.setNumWorkers(3); StormSubmitter.submitTopology(args[0], conf, builder.createTopology()); } else { LocalCluster cluster = new LocalCluster(); cluster.submitTopology("wordcount", conf, builder.createTopology()); Utils.sleep(3000); cluster.killTopology("wordcount"); cluster.shutdown(); } } }
开发完毕后,在eclipse点击运行,可以看到:
7013 [Thread-34-count] INFO b.s.d.executor - Prepared bolt count:(10)7021 [Thread-14-count] INFO b.s.d.executor - Preparing bolt count:(2)7021 [Thread-36-split] INFO b.s.d.executor - Preparing bolt split:(14)7022 [Thread-36-split] INFO b.s.d.executor - Prepared bolt split:(14)7022 [Thread-14-count] INFO b.s.d.executor - Prepared bolt count:(2)hello word!the 1hello word!cow 1hello word!jumped 1hello word!the 2hello word!over 1hello word!moon 1
2、上面使用的是继承方式,下面用接口的方式:
1)WordRead:(spout)
package cn.edu.nuc.StormTest.wordcount1;import java.io.BufferedReader;import java.io.FileNotFoundException;import java.io.FileReader;import java.util.Map;import backtype.storm.spout.SpoutOutputCollector;import backtype.storm.task.TopologyContext;import backtype.storm.topology.IRichSpout;import backtype.storm.topology.OutputFieldsDeclarer;import backtype.storm.tuple.Fields;import backtype.storm.tuple.Values;public class WordReader implements IRichSpout {private static final long serialVersionUID = 1L; private SpoutOutputCollector collector; private FileReader fileReader; private boolean completed = false; public boolean isDistributed() { return false; } /** * 这是第一个方法,里面接收了三个参数,第一个是创建Topology时的配置, * 第二个是所有的Topology数据,第三个是用来把Spout的数据发射给bolt * **/ public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) { try { //获取创建Topology时指定的要读取的文件路径 this.fileReader = new FileReader(conf.get("wordsFile").toString()); } catch (FileNotFoundException e) { throw new RuntimeException("Error reading file [" + conf.get("wordFile") + "]"); } //初始化发射器 this.collector = collector; } /** * 这是Spout最主要的方法,在这里我们读取文本文件,并把它的每一行发射出去(给bolt) * 这个方法会不断被调用,为了降低它对CPU的消耗,当任务完成时让它sleep一下 * **/ public void nextTuple() { if (completed) { try { Thread.sleep(1000); } catch (InterruptedException e) { // Do nothing } return; } String str; // Open the reader BufferedReader reader = new BufferedReader(fileReader); try { // Read all lines while ((str = reader.readLine()) != null) { /** * 发射每一行,Values是一个ArrayList的实现 */ this.collector.emit(new Values(str), str); } } catch (Exception e) { throw new RuntimeException("Error reading tuple", e); } finally { completed = true; } } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("line")); } public void close() { // TODO Auto-generated method stub } public void activate() { // TODO Auto-generated method stub } public void deactivate() { // TODO Auto-generated method stub } public void ack(Object msgId) { System.out.println("OK:" + msgId); } public void fail(Object msgId) { System.out.println("FAIL:" + msgId); } public Map<String, Object> getComponentConfiguration() { // TODO Auto-generated method stub return null; } }2)WordNormalizer:(bolt,相当于map)
package cn.edu.nuc.StormTest.wordcount1;import java.util.ArrayList;import java.util.List;import java.util.Map;import backtype.storm.task.OutputCollector;import backtype.storm.task.TopologyContext;import backtype.storm.topology.IRichBolt;import backtype.storm.topology.OutputFieldsDeclarer;import backtype.storm.tuple.Fields;import backtype.storm.tuple.Tuple;import backtype.storm.tuple.Values;public class WordNormalizer implements IRichBolt{/** * */private static final long serialVersionUID = 1L;private OutputCollector collector; public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) { this.collector = collector; } /**这是bolt中最重要的方法,每当接收到一个tuple时,此方法便被调用 * 这个方法的作用就是把文本文件中的每一行切分成一个个单词,并把这些单词发射出去(给下一个bolt处理) * **/ public void execute(Tuple input) { String sentence = input.getString(0); String[] words = sentence.split(" "); for (String word : words) { word = word.trim(); if (!word.isEmpty()) { word = word.toLowerCase(); // Emit the word List a = new ArrayList(); a.add(input); collector.emit(a, new Values(word)); } } //确认成功处理一个tuple collector.ack(input); } public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); } public void cleanup() { // TODO Auto-generated method stub } public Map<String, Object> getComponentConfiguration() { // TODO Auto-generated method stub return null; } }3)WordCount:(bolt,相当于reduce)
package cn.edu.nuc.StormTest.wordcount1;import java.util.HashMap;import java.util.Map;import backtype.storm.task.OutputCollector;import backtype.storm.task.TopologyContext;import backtype.storm.topology.IRichBolt;import backtype.storm.topology.OutputFieldsDeclarer;import backtype.storm.tuple.Tuple;public class WordCounter implements IRichBolt{/** * */private static final long serialVersionUID = 1L;Integer id; String name; Map<String, Integer> counters; private OutputCollector collector; public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) { this.counters = new HashMap<String, Integer>(); this.collector = collector; this.name = context.getThisComponentId(); this.id = context.getThisTaskId(); } public void execute(Tuple input) { String str = input.getString(0); if (!counters.containsKey(str)) { counters.put(str, 1); } else { Integer c = counters.get(str) + 1; counters.put(str, c); } // 确认成功处理一个tuple collector.ack(input); } /** * Topology执行完毕的清理工作,比如关闭连接、释放资源等操作都会写在这里 * 因为这只是个Demo,我们用它来打印我们的计数器 * */ public void cleanup() { System.out.println("-- Word Counter [" + name + "-" + id + "] --"); for (Map.Entry<String, Integer> entry : counters.entrySet()) { System.out.println(entry.getKey() + ": " + entry.getValue()); } counters.clear(); } public void declareOutputFields(OutputFieldsDeclarer declarer) { // TODO Auto-generated method stub } public Map<String, Object> getComponentConfiguration() { // TODO Auto-generated method stub return null; } }4)主函数:
package cn.edu.nuc.StormTest.wordcount1;import backtype.storm.Config;import backtype.storm.LocalCluster;import backtype.storm.topology.TopologyBuilder;import backtype.storm.tuple.Fields;import backtype.storm.utils.Utils;public class WordCountTopologyMain {public static void main(String[] args) throws InterruptedException { //定义一个Topology TopologyBuilder builder = new TopologyBuilder(); builder.setSpout("word-reader",new WordReader(),1); builder.setBolt("word-normalizer", new WordNormalizer()).shuffleGrouping("word-reader"); builder.setBolt("word-counter", new WordCounter(),2).fieldsGrouping("word-normalizer", new Fields("word")); //配置 Config conf = new Config(); conf.put("wordsFile", "d:/test.txt"); conf.setDebug(false); //提交Topology conf.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 1); //创建一个本地模式cluster LocalCluster cluster = new LocalCluster(); cluster.submitTopology("Getting-Started-Toplogie", conf,builder.createTopology()); Utils.sleep(3000); cluster.killTopology("Getting-Started-Toplogie"); cluster.shutdown(); } }
阅读全文
0 0
- storm——本地eclipse上调试wordcount程序
- Storm入门程序——WordCount
- 在eclipse中调试storm-starter程序
- 如何在eclipse调试storm程序
- storm程序-单词统计wordcount
- Hadoop在eclipse上实践之一——WordCount
- 第一个storm代码-wordcount-本地模式
- linux下eclipse上运行hadoop自带wordcount程序
- Eclipse——调试程序
- Eclipse重写Wordcount类实现处理中文字符,利用hadoop Eclipse插件远程调试hadoop运行WordCount程序
- 一个storm的完整例子——WordCount
- MapReduce程序——wordCount
- 如何本地调试Storm topology
- 本地eclipse连接远程hadoop集群运行wordcount实例,实现远程调试
- Windows 使用Eclipse配置连接hadoop,编译运行MapReduce --本地调试WordCount
- ubuntu 安装本地版storm并运行WordCount
- Hadoop4Win + Eclipse 运行 WordCount 程序
- Hadoop4Win + Eclipse 运行 WordCount 程序
- 大数据预科班13
- 状态机
- Jq实现广告弹出与隐藏(Jq概述,Jq选择器)
- JS实现多选删除(DOM)
- mysql中模糊查询的四种用法介绍
- storm——本地eclipse上调试wordcount程序
- Android-图片加载优化
- 触摸屏驱动程序设计 之 触摸屏驱动分析
- 计算机体系结构基本概念
- ROS机器人高效编程(原书第3版)勘误、问题及资料汇总
- 用c++实现bp神经网络
- div 中的文字水平居中和垂直居中
- 迅雷2018笔试——整数求和
- 单例模式展示与分析