Storm与Kafka集成

来源:互联网 发布:淘宝商家微淘怎么设置 编辑:程序博客网 时间:2024/05/17 05:01

因为项目上要做推荐,推荐对时间的要求比较高,如果用Spark的话存在一定的延迟(Spark的时间窗口为单位为1s),所以决定用storm做数据的实时处理,大概的架构思路为

若要做数据清洗的话可以用Spark SQL(离线数据),Kafka做为消息队列(数据来源),storm做数据的处理,然后推荐算法的模型为协同过滤(ItemBase),某个用户的推荐结果可以放到redis(key_values,用用ID做为Key,Valeus可以为Json)的形式,后端可以从redis去读取数据.


* storm-kafka 是用的kafka的low level APi把Offset写入到ZK里面,我的topic有5个partion在ZK中分别从0到4,查看某个partition的数据

 get /zkkafkaspout/kafkaspout/partition_3

数据如下:

{"topology":{"id":"f4df2f64-5207-4713-800f-9a87633aa37e","name":"Topo"},"offset":101235,"partition":3,"broker":{"host":"localhost","port":9092},"topic":"testPartion"}


1.Storm 和Kafka的整合        BrokerHosts brokerHosts = new ZkHosts("localhost:2181");        //SpoutConfig继承KafkaConfig接口并有序列化,默认是60秒向ZK写入offset(可查看Zkhosts源码)        SpoutConfig spoutConfig = new SpoutConfig(brokerHosts, "testPartion", "/zkkafkaspout", "kafkaspout");        spoutConfig.zkPort = 2181;        List<String> zkServers = new ArrayList<String>();        zkServers.add("localhost");        spoutConfig.zkServers = zkServers;        Config config = new Config();        Map<String, String> map = new HashMap<String, String>();        map.put("metadata.broker.list", "localhost:9092");        map.put("serializer.class", "kafka.serializer.StringEncoder");        config.put("kafka.broker.properties", map);        //SchemeAsMultiScheme实现了 MultiScheme 接口,构造方法参数为Scheme 来自Storm-core的方法,另外 MessageScheme 类其实是实现了        spoutConfig.scheme = new SchemeAsMultiScheme(new MessageScheme());        TopologyBuilder builder = new TopologyBuilder();        MessageScheme
//实现了Scheme的接口,并没有做任何操作只是向Kafka把数据读取过来,传到下一个blotpublic class MessageScheme implements Scheme {    @Override    public List<Object> deserialize(byte[] bytes) {        try {            String msg = new String(bytes, "utf-8");            return new Values(msg);        } catch (UnsupportedEncodingException e) {            e.printStackTrace();        }        return null;    }    @Override    public Fields getOutputFields() {        return new Fields("msg");    }}SenqueceBoltpublic class extends BaseBasicBolt {@Override    public void execute(Tuple tuple, BasicOutputCollector basicOutputCollector) {        String word = (String) tuple.getStringByField("msg");        String out = "Message From Kafka:" + word + "!" + "\r\n";        //此段把从kafka到数据存入到文件里面,也可以写入到入Redis,Hbase,ElasticSearch,在此思路为kafka可以穿Json数据,此blot可以做解析Json等        try {        FileOutputStream fileOutputStream = new FileOutputStream(new File("/Users/stormKafka.txt"), true);        fileOutputStream.write(out.getBytes());        fileOutputStream.close();        } catch (FileNotFoundException e) {        e.printStackTrace();        } catch (IOException e) {        e.printStackTrace();        }        System.out.println("out=" + out);        Values values = new Values();        basicOutputCollector.emit(new Values(out));        }@Overridepublic void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {        outputFieldsDeclarer.declare(new Fields("message"));        }        }public class extends BaseBasicBolt {@Override   
 public void execute(Tuple tuple, BasicOutputCollector basicOutputCollector) {        
String word = (String) tuple.getStringByField("msg");        
String out = "Message From Kafka:" + word + "!" + "\r\n";        //此段把从kafka到数据存入到文件里面,也可以写入到入Redis,Hbase,ElasticSearch,在此思路为kafka可以穿Json数据,此blot可以做解析Json等        try {        FileOutputStream fileOutputStream = new FileOutputStream(new File("/Users/stormKafka.txt"), true);        fileOutputStream.write(out.getBytes());        fileOutputStream.close();        } catch (FileNotFoundException e) {        e.printStackTrace();        } catch (IOException e) {        e.printStackTrace();        }        System.out.println("out=" + out);        Values values = new Values();        basicOutputCollector.emit(new Values(out));        }@Overridepublic void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {        outputFieldsDeclarer.declare(new Fields("message"));        }        }


0 0
原创粉丝点击