storm集成kafka

来源：互联网发布：花椒直播mac版编辑：程序博客网时间：2024/05/17 03:05

本文主要介绍如何在Storm编程实现与Kafka的集成

　　一、实现模型

　　　数据流程：

　　　　1、Kafka Producter生成topic1主题的消息　

　　　　2、Storm中有个Topology，包含了KafkaSpout、SenqueceBolt、KafkaBolt三个组件。其中KafkaSpout订阅了topic1主题消息，然后发送

　　　　　　给SenqueceBolt加工处理，最后数据由KafkaBolt生成topic2主题消息发送给Kafka

　　　　3、Kafka Consumer负责消费topic2主题的消息

　　二、Topology实现

　　　　1、创建maven工程，配置pom.xml

　　　　　　需要依赖storm-core、kafka_2.10、storm-kafka三个包

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
<dependencies> 
    <dependency>
         <groupId>org.apache.storm</groupId>
           <artifactId>storm-core</artifactId>
           <version>0.9.2-incubating</version>
           <scope>provided</scope>
     </dependency>
  
  <dependency>
      <groupId>org.apache.kafka</groupId>
      <artifactId>kafka_2.10</artifactId>
      <version>0.8.1.1</version>
      <exclusions>
          <exclusion>
              <groupId>org.apache.zookeeper</groupId>
              <artifactId>zookeeper</artifactId>
          </exclusion>
          <exclusion>
              <groupId>log4j</groupId>
              <artifactId>log4j</artifactId>
          </exclusion>
      </exclusions>
  </dependency>
      
      <dependency> 
        <groupId>org.apache.storm</groupId> 
       <artifactId>storm-kafka</artifactId> 
        <version>0.9.2-incubating</version> 
  </dependency> 
</dependencies>
 
<build>
  <plugins>
    <plugin>
      <artifactId>maven-assembly-plugin</artifactId>
      <version>2.4</version>
      <configuration>
        <descriptorRefs>
          <descriptorRef>jar-with-dependencies</descriptorRef>
        </descriptorRefs>
      </configuration>
      <executions>
        <execution>
          <id>make-assembly</id>
          <phase>package</phase>
          <goals>
            <goal>single</goal>
          </goals>
        </execution>
      </executions>
    </plugin>
  </plugins>
</build>

　　　　2、KafkaSpout

　　　　　　KafkaSpout是Storm中自带的Spout，源码在https://github.com/apache/incubator-storm/tree/master/external

　　　　　　使用KafkaSpout时需要子集实现Scheme接口，它主要负责从消息流中解析出需要的数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
publicclassMessageScheme implementsScheme { 
     
    /* (non-Javadoc)
     * @see backtype.storm.spout.Scheme#deserialize(byte[])
     */
    publicList<Object> deserialize(byte[] ser) {
        try{
            String msg = newString(ser,"UTF-8");
            returnnewValues(msg);
        }catch(UnsupportedEncodingException e) {  
          
        }
        returnnull;
    }
     
     
    /* (non-Javadoc)
     * @see backtype.storm.spout.Scheme#getOutputFields()
     */
    publicFields getOutputFields() {
        // TODO Auto-generated method stub
        returnnewFields("msg"); 
    } 
}

　　　　3、SenqueceBolt

　　　　　　　SenqueceBolt实现很简单，在接收的spout的消息前面加上“I‘m”　

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
publicclassSenqueceBolt extendsBaseBasicBolt{
     
    /* (non-Javadoc)
     * @see backtype.storm.topology.IBasicBolt#execute(backtype.storm.tuple.Tuple, backtype.storm.topology.BasicOutputCollector)
     */
    publicvoidexecute(Tuple input, BasicOutputCollector collector) {
        // TODO Auto-generated method stub
         String word = (String) input.getValue(0); 
         String out = "I'm " + word +  "!"; 
         System.out.println("out="+ out);
         collector.emit(newValues(out));
    }
     
    /* (non-Javadoc)
     * @see backtype.storm.topology.IComponent#declareOutputFields(backtype.storm.topology.OutputFieldsDeclarer)
     */
    publicvoiddeclareOutputFields(OutputFieldsDeclarer declarer) {
        declarer.declare(newFields("message"));
    }
}

　　　　4、KafkaBolt

　　　　　　KafkaBolt是Storm中自带的Bolt，负责向Kafka发送主题消息

　　　　5、Topology

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
publicclassStormKafkaTopo {   
    publicstaticvoid main(String[] args) throwsException { 
　　　　// 配置Zookeeper地址
        BrokerHosts brokerHosts = newZkHosts("node04:2181,node05:2181,node06:2181");
        // 配置Kafka订阅的Topic，以及zookeeper中数据节点目录和名字
        SpoutConfig spoutConfig = newSpoutConfig(brokerHosts,"topic1","/zkkafkaspout","kafkaspout");
        
　　　　// 配置KafkaBolt中的kafka.broker.properties
        Config conf = newConfig(); 
        Map<String, String> map = newHashMap<String, String>(); 
　　　　// 配置Kafka broker地址       
        map.put("metadata.broker.list","node04:9092");
        // serializer.class为消息的序列化类
        map.put("serializer.class","kafka.serializer.StringEncoder");
        conf.put("kafka.broker.properties", map);
　　　　// 配置KafkaBolt生成的topic
        conf.put("topic","topic2");
         
        spoutConfig.scheme = newSchemeAsMultiScheme(newMessageScheme()); 
        TopologyBuilder builder = newTopologyBuilder();  
        builder.setSpout("spout",newKafkaSpout(spoutConfig)); 
        builder.setBolt("bolt",newSenqueceBolt()).shuffleGrouping("spout");
        builder.setBolt("kafkabolt",newKafkaBolt<String, Integer>()).shuffleGrouping("bolt");       
 
        if(args != null&& args.length > 0) {  
            conf.setNumWorkers(3); 
            StormSubmitter.submitTopology(args[0], conf, builder.createTopology());  
        }else{ 
   
            LocalCluster cluster = newLocalCluster(); 
            cluster.submitTopology("Topo", conf, builder.createTopology());  
            Utils.sleep(100000); 
            cluster.killTopology("Topo"); 
            cluster.shutdown(); 
        } 
    } 
}

　　三、测试验证

　　　　1、使用Kafka client模拟Kafka Producter ，生成topic1主题

　　　　　　bin/kafka-console-producer.sh --broker-list node04:9092 --topic topic1

　　　　2、使用Kafka client模拟Kafka Consumer，订阅topic2主题

　　　　　　bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic topic2 --from-beginning

　　　　3、运行Strom Topology

　　　　　　bin/storm jar storm-kafka-0.0.1-SNAPSHOT-jar-with-dependencies.jar StormKafkaTopo KafkaStorm

　　　　4、运行结果

转自：http://www.tuicool.com/articles/f6RVvq

http://www.sxt.cn/u/756/blog/4584

1 0