网上搜集的storm有用资料 2
来源:互联网 发布:2017淘宝店铺扣分规则 编辑:程序博客网 时间:2024/05/17 04:20
We are thinking of evaluating the following 4 tools:
Kafka (LinkedIn) : Looks promising, but it's written in Scala. We have heard mixed reports about Scala so a bit concerned about its future.
Flume: Will be replaced by Flume NG which is NOT production ready. Not clear when it will be.
Scribe (FB): Not under active development. Will be replaced by Calligraphus - no idea when?
Storm (Twitter) : Looks promising, but not clear if it is designed for log processing in mind, although, I can't see why it can't be used for that purpose.
Storm + Kafka is a very effective log processing solution. A number of users of Storm use this combination, including us at Twitter in a few instances. Kafka gives you a high throughput, reliable way to persist/replay log messages, and Storm gives you the ability to process those messages in arbitrarily complex ways.
We've been developing logging and reporting solutions on top of Storm which archives and streams logging information. Further, the ability of Storm to add another stream for the exceptional case has been key to making our logging infrastructure useful. I'd highly recommend it, whether you use Kafka or you use AMQP or even direct syslog traffic at a spout. A custom Log4j appender is easy to write.
It sounds like Storm by itself is not enough to do the log processing. A tool such as Kafka is needed for persistence. I guess then Storm can be used as a 'Consumer'?
Pardon my naive question but what functionality does Storm provide that's not built into Kafka? Sounds to me like we will have to maintain a cluster of machines for Kafka + a cluster of machines for Storm (plus our existing Hadoop cluster). Trying to figure out if so many layers are indeed needed.
Storm is then used as it's marketed: a distributed stream processor. It will do whatever you need to do to actually process the logs (conditionally filter, extract text, etc, etc) in a distributed manner. Log processing is a really good use case for Storm, since typically there are a LOT of logs - it is truly a real time big data problem. So, instead of centralizing the logs and churning over the data using MapReduce, you're doing that work as streams within a Storm cluster...and your output is what you would normally output from your M/R algorithms.
- 网上搜集的storm有用资料 2
- 网上搜集的storm 一些有用的资料
- 网上搜集的有用资料备忘
- storm网上中文资料搜集大全
- storm网上中文资料搜集大全
- storm 网上中文资料搜集大全
- 一些网上搜集的Qtopia的资料
- openwrt 网上资料搜集
- 网上搜集到的10个比较有用php代码
- 从网上搜集的基于角色的权限设计资料
- java中图片显示-网上搜集的资料
- 搜集的有用的
- 有用的jquery搜集
- VR杂谈(网上搜集资料)
- 单播、多播、广播的区别(看到网上有用的搜集)
- 搜集的学习资料
- 搜集的资料
- 托管和非托管的区别(网上搜集的资料)
- JAVA Serialization 基础介绍
- Jersey 极致简单的Restful WebService实现
- hadoop、hbase、hive环境搭建时候遇到的问题汇总
- 怎样格式化硬盘分区
- Oracle11g_JDBC入门级示例
- 网上搜集的storm有用资料 2
- Deformable Part Model的学习
- Java使用ojdbc连接Oracle数据库时不能使用服务名连接的问题
- 邂逅在华灯初上
- Cocoa教学:Windows OOP与Cocoa MVC之对比
- view之间传递数据的方式
- C类型转换
- xpath语法规则
- nyoj 90 整数划分