Trident WordCount代码示例

来源:互联网 发布:淘宝宝贝详情免费模板 编辑:程序博客网 时间:2024/05/22 00:12

Trident WordCount代码示例

完整代码

package com.test;import backtype.storm.Config;import backtype.storm.LocalDRPC;import backtype.storm.StormSubmitter;import backtype.storm.generated.AlreadyAliveException;import backtype.storm.generated.DRPCExecutionException;import backtype.storm.generated.InvalidTopologyException;import backtype.storm.generated.StormTopology;import backtype.storm.tuple.Fields;import backtype.storm.tuple.Values;import backtype.storm.utils.DRPCClient;import org.apache.thrift7.TException;import storm.trident.TridentState;import storm.trident.TridentTopology;import storm.trident.operation.builtin.Count;import storm.trident.operation.builtin.FilterNull;import storm.trident.operation.builtin.MapGet;import storm.trident.operation.builtin.Sum;import storm.trident.testing.FixedBatchSpout;import storm.trident.testing.MemoryMapState;import storm.trident.testing.Split;public class WordCount {    private static StormTopology buildTopology(LocalDRPC drpc) {        /* 创建spout */        FixedBatchSpout spout = new FixedBatchSpout(new Fields("sentence"), 3,                new Values("the cow jumped over the moon"),                new Values("the man went to the store and bought some candy"),                new Values("four score and seven years ago"),                new Values("how many apples can you eat"));        spout.setCycle(true);        /* 创建topology */        TridentTopology topology = new TridentTopology();        /* 创建Stream spout1, 分词、统计 */        TridentState wordCounts =                topology.newStream("spout1", spout)                        .each(new Fields("sentence"), new Split(), new Fields("word"))                        .groupBy(new Fields("word"))                        .persistentAggregate(new MemoryMapState.Factory(), new Count(), new Fields("count"))                        .parallelismHint(6);        /* 创建Stream words,方法名为words,对入参分次,分别获取words 对应count,然后计算和 */        topology.newDRPCStream("words", drpc)                .each(new Fields("args"), new Split(), new Fields("word"))                .groupBy(new Fields("word"))                .stateQuery(wordCounts, new Fields("word"), new MapGet(), new Fields("count"))                .each(new Fields("count"), new FilterNull())                .aggregate(new Fields("count"), new Sum(), new Fields("sum"));        return topology.build();    }    public static void main(String[] args) {        Config conf = new Config();        conf.setMaxSpoutPending(20);        try {            StormSubmitter.submitTopology("WordCount", conf, buildTopology(null));            DRPCClient client = new DRPCClient("wonderwoman", 1234);            for (int i = 0; i < 100; i++) {                try {                    System.out.println("DRPC Result: " +  client.execute("words", "cat the dog jumped"));                    Thread.sleep(1000);                } catch (InterruptedException e) {                    System.out.println(e.getMessage());                }            }        } catch (AlreadyAliveException e) {            e.printStackTrace();        } catch (InvalidTopologyException e) {            e.printStackTrace();        } catch (TException e) {            e.printStackTrace();        } catch (DRPCExecutionException e) {            e.printStackTrace();        }    }}

POM文件

<?xml version="1.0" encoding="UTF-8"?><project xmlns="http://maven.apache.org/POM/4.0.0"         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">    <modelVersion>4.0.0</modelVersion>    <groupId>wordCount</groupId>    <artifactId>wordCount</artifactId>    <version>1.0-SNAPSHOT</version>    <packaging>jar</packaging>    <dependencies>        <dependency>            <groupId>storm</groupId>            <artifactId>storm</artifactId>            <version>0.8.1</version>            <scope>provided</scope>        </dependency>    </dependencies></project>

编译打包

mvn clean installmvn package

运行&查看

./bin/storm jar wordCount-1.0-SNAPSHOT.jar WordCount./bin/storm list

代码过程解读

  1. 创建spout,循环特定句子产生spout;
  2. 创建topology;
  3. 创建Stream spout1,以spout为流输入,进行分次、统计,结果以Map形式存储于内存;
    记录Trident状态。
  4. 创建Stream words,以DRPC的words方法为流输入,对入参进行分次。依据Trident状态,查询输入的每个单词的Count,然后计算和。
  5. main方法,调用DRPC的words方法,计算结果。

后记

小白网上看了半天,终于能搞起来了。网上东西太乱,太杂,这是个忧伤的悖论。

0 0
原创粉丝点击