storm的第一个例子

来源:互联网 发布:java abstract final 编辑:程序博客网 时间:2024/05/22 06:11

Storm的一个简单例子:

本文不涉及到各种细节,只是一个简单的storm程序,用于快速入门

例子简介

有一个数据源,不断随机发送字符串aa,bb,cc,dd,ee,ff 中选择一个发送给一个程序进行处理。这个程序将这个字符串打印到控制台写道log里面,然后传给下一个程序,下一个程序把这个字符串保存到本地文件。

设计

这就是一个流式的处理过程。联想到Storm的拓扑图
spout-flow
我们可以想到, Spout用来产生数据,有两个Bolt,一个Bolt用来将这个字符串打印到控制台写入log,另外一个Bolt用来将这个字符串写入到本地文件。
所以,我们只需要安装上述思路就行了。

Topology介绍与程序编写

这个用来控制数据流的方向,是从自己出发后,进入到哪一个Bolt,下一个Bolt又进入到哪一个Bolt。
有点类似于Hadoop中的Main方法。 下面是一些需要的设置
先设置参数

        // 设置参数        Config cfg = new Config(); //import backtype.storm.Config;        cfg.setNumWorkers(2); //设置worker数为2,现在暂时不需要仔细研究这块        cfg.setDebug(true);

设置数据流的流转方向和shuffle的设置
下面的代码我们知道,数据源是new PWSpout() 这个对象产生, 然后命名为spout发出去,
接下来是new PrintBolt() 这个对象接收了spout,处理后,命名为print-bolt发出去,然后
new WriteBolt() 这个对象接收了print-bolt, 然后进行处理,以write-bolt发出去,后面没有Bolt则不再对数据进行处理了
我这里使用的是随机分组shuffleGrouping,随机派发stream里面的tuple, 保证每个bolt接收到的tuple数目相同

        TopologyBuilder builder = new TopologyBuilder();        builder.setSpout("spout", new PWSpout());        builder.setBolt("print-bolt", new PrintBolt()).shuffleGrouping("spout");        builder.setBolt("write-bolt", new WriteBolt()).shuffleGrouping("print-bolt");

设置storm的运行方式
因为这个例子是在本地运行,那么就这样设置
创建一个LocalCluster, 取名为top1,10秒钟后关闭这个程序

        LocalCluster cluster = new LocalCluster();        cluster.submitTopology("top1", cfg, builder.createTopology());        Thread.sleep(10000);        cluster.killTopology("top1");        cluster.shutdown();

如果是在集群上设置,那么这样设置即可

StormSubmitter.submitTopology("top1", cfg, builder.createTopology());

Topology完整代码如下

public class PWTopology1 {    public static void main(String[] args) throws Exception {        // 设置参数        Config cfg = new Config();        cfg.setNumWorkers(2);        cfg.setDebug(true);        TopologyBuilder builder = new TopologyBuilder();        builder.setSpout("spout", new PWSpout());        builder.setBolt("print-bolt", new PrintBolt()).shuffleGrouping("spout");        builder.setBolt("write-bolt", new WriteBolt()).shuffleGrouping("print-bolt");        //1 本地模式        LocalCluster cluster = new LocalCluster();        cluster.submitTopology("top1", cfg, builder.createTopology());        Thread.sleep(10000);        cluster.killTopology("top1");        cluster.shutdown();        //2 集群模式//      StormSubmitter.submitTopology("top1", cfg, builder.createTopology());    }}

Spout介绍与程序编写

上面创建了一个Spout, 用来产生数据
Spout需要继承BaseRichSpout, 并且需要重写其中的一些方法
- open方法,open方法实现的是ISpout接口中的, 看源码即知道,这个open就是一个上下文的
第一个就是前面拓扑中的conf传过来的,第二个是Topology的Context,第三个是用来发射tuple的

    /**     * Called when a task for this component is initialized within a worker on the cluster.     * It provides the spout with the environment in which the spout executes.     *     * <p>This includes the:</p>     *     * @param conf The Storm configuration for this spout. This is the configuration provided to the topology merged in with cluster configuration on this machine.     * @param context This object can be used to get information about this task's place within the topology, including the task id and component id of this task, input and output information, etc.     * @param collector The collector is used to emit tuples from this spout. Tuples can be emitted at any time, including the open and close methods. The collector is thread-safe and should be saved as an instance variable of this spout object.     */ void open(Map conf, TopologyContext context, SpoutOutputCollector collector);

我们需要用一个collector把数据发到下一个节点,所以对spout的collector给赋值

    @Override    public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {        //对spout进行初始化        this.collector = collector;        //System.out.println(this.collector);    }
  • nextTupe方法
    就是不断的产生数据的
    /**     * When this method is called, Storm is requesting that the Spout emit tuples to the      * output collector. This method should be non-blocking, so if the Spout has no tuples     * to emit, this method should return. nextTuple, ack, and fail are all called in a tight     * loop in a single thread in the spout task. When there are no tuples to emit, it is courteous     * to have nextTuple sleep for a short amount of time (like a single millisecond)     * so as not to waste too much CPU.     */    void nextTuple();

我们每0.5秒钟随机发送一个数据给下面的bolt进行处理

    @Override    public void nextTuple() {        //随机发送一个单词        final Random r = new Random();        int num = r.nextInt(6);        try {            Thread.sleep(500);        } catch (InterruptedException e) {            e.printStackTrace();        }        this.collector.emit(new Values(lists.get(num)));    }
  • declareOutputFields
    声明发送数据的field,接下来的Bolts可以通过get这个field来得到这个数据
    @Override    public void declareOutputFields(OutputFieldsDeclarer declarer) {        //进行声明        declarer.declare(new Fields("print"));    }

Spout的完整代码如下

public class PWSpout extends BaseRichSpout {    private static final long serialVersionUID = 1L;    private SpoutOutputCollector collector;    private static List<String> lists  = null;    static{        lists = Arrays.asList("aa", "bb", "cc", "dd", "ee", "ff");    }    @Override    public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {        //对spout进行初始化        this.collector = collector;        //System.out.println(this.collector);    }    /**     * <B>方法名称:</B>轮询tuple<BR>     * <B>概要说明:</B><BR>     * @see backtype.storm.spout.ISpout#nextTuple()     */    @Override    public void nextTuple() {        //随机发送一个单词        final Random r = new Random();        int num = r.nextInt(6);        try {            Thread.sleep(500);        } catch (InterruptedException e) {            e.printStackTrace();        }        this.collector.emit(new Values(lists.get(num)));    }    /**     * <B>方法名称:</B>declarer声明发送数据的field<BR>     * <B>概要说明:</B><BR>     * @see backtype.storm.topology.IComponent#declareOutputFields(backtype.storm.topology.OutputFieldsDeclarer)     */    @Override    public void declareOutputFields(OutputFieldsDeclarer declarer) {        //进行声明        declarer.declare(new Fields("print"));    }}

Bolt介绍与程序编写

Bolt一般需要继承 BaseBasicBolt(当然也可以实现接口)有以下方法
- prepare, 类似于hadoop中的setup,第一次进来的时候运行一次

void prepare(Map stormConf, TopologyContext context);
  • excete, 用来执行这个数据流的方法
void execute(Tuple input, BasicOutputCollector collector);
  • declareOutputFields。 声明发送数据的field,接下来的Bolts可以通过get这个field来得到这个数据
void declareOutputFields(OutputFieldsDeclarer declarer);

上面创建了两个bolt,分别有不同的功能

PrintBolt

用来输出到控制台,打印到log
完整代码如下

public class PrintBolt extends BaseBasicBolt {    private static final Log log = LogFactory.getLog(PrintBolt.class);    private static final long serialVersionUID = 1L;    @Override    public void execute(Tuple input, BasicOutputCollector collector) {        //获取上一个组件所声明的Field        String print = input.getStringByField("print");        log.info("【print】: " + print);        System.out.println("print: "+print);        collector.emit(new Values(print));    }    @Override    public void declareOutputFields(OutputFieldsDeclarer declarer) {        declarer.declare(new Fields("write"));    }}

WriteBolt

public class WriteBolt extends BaseBasicBolt {    private static final long serialVersionUID = 1L;    private static final Log log = LogFactory.getLog(WriteBolt.class);    private FileWriter writer ;    @Override    public void execute(Tuple input, BasicOutputCollector collector) {        //获取上一个组件所声明的Field        String text = input.getStringByField("write");        try {            if(writer == null){                if(System.getProperty("os.name").equals("Windows 10")){                    writer = new FileWriter("F:\\testdir\\" + this);                } else if(System.getProperty("os.name").equals("Windows 8.1")){                    writer = new FileWriter("F:\\testdir\\" + this);                } else if(System.getProperty("os.name").equals("Windows 7")){                    writer = new FileWriter("F:\\testdir\\" + this);                } else if(System.getProperty("os.name").equals("Linux")){                    System.out.println("----:" + System.getProperty("os.name"));                    writer = new FileWriter("/usr/local/temp/" + this);                }            }            log.info("【write】: 写入文件");            writer.write(text);            writer.write("\n");            writer.flush();        } catch (Exception e) {            e.printStackTrace();        }    }    @Override    public void declareOutputFields(OutputFieldsDeclarer declarer) {    }}

程序需要storm相关jar包,maven的pom.xml如下
主要需要引入

    <dependency>        <groupId>org.apache.storm</groupId>        <artifactId>storm-core</artifactId>        <version>0.9.2-incubating</version>    </dependency>

完整的pom如下

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">  <modelVersion>4.0.0</modelVersion>  <groupId>storm01</groupId>  <artifactId>storm01</artifactId>  <version>0.0.1-SNAPSHOT</version>  <packaging>jar</packaging>  <name>storm01</name>  <url>http://maven.apache.org</url>  <repositories>      <repository>          <id>central</id>          <name>Central Repository</name>          <url>http://maven.aliyun.com/nexus/content/repositories/central</url>          <layout>default</layout>          <snapshots>              <enabled>false</enabled>          </snapshots>      </repository>      <repository>          <id>central</id>          <name>Maven Repository Switchboard</name>          <layout>default</layout>          <url>http://repo2.maven.org/maven2</url>          <snapshots>              <enabled>false</enabled>          </snapshots>      </repository>        <!-- Repository where we can found the storm dependencies  -->        <repository>            <id>clojars.org</id>            <url>http://clojars.org/repo</url>        </repository>  </repositories>  <properties>    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>  </properties>  <dependencies>    <dependency>        <groupId>org.apache.storm</groupId>        <artifactId>storm-core</artifactId>        <version>0.9.2-incubating</version>    </dependency>    <dependency>      <groupId>junit</groupId>      <artifactId>junit</artifactId>      <version>4.11</version>      <scope>test</scope>    </dependency>  </dependencies>    <build>    <finalName>storm01</finalName>   <plugins>        <plugin>            <groupId>org.apache.maven.plugins</groupId>            <artifactId>maven-war-plugin</artifactId>            <version>2.4</version>        </plugin>        <plugin>            <groupId>org.apache.maven.plugins</groupId>            <artifactId>maven-compiler-plugin</artifactId>            <version>3.5</version>            <configuration>                <source>1.7</source>                <target>1.7</target>            </configuration>        </plugin>        <!-- 单元测试 -->        <plugin>            <groupId>org.apache.maven.plugins</groupId>            <artifactId>maven-surefire-plugin</artifactId>            <configuration>                <skip>true</skip>                <includes>                    <include>**/*Test*.java</include>                </includes>            </configuration>        </plugin>        <plugin>            <groupId>org.apache.maven.plugins</groupId>            <artifactId>maven-source-plugin</artifactId>            <version>2.1.2</version>            <executions>                <!-- 绑定到特定的生命周期之后,运行maven-source-pluin 运行目标为jar-no-fork -->                <execution>                    <phase>package</phase>                    <goals>                        <goal>jar-no-fork</goal>                    </goals>                </execution>            </executions>        </plugin>       </plugins>      </build></project>

目录结构如图
storm01结构图

本地运行后
将会在控制台随机输出字符串,并且在本地F:\testdir\有文件记录有控制台输出的字符串