storm 基本原理
来源:互联网 发布:知道网络课答案 编辑:程序博客网 时间:2024/05/21 04:20
- Rationale
- 基本原理
The past decade has seen a revolution in data processing. MapReduce, Hadoop, and related technologies have made it possible to store and
过去几十年见证了数据处理的改革,MapReduce, Hadoop和其他相关技术使存储和处理大规模的数据成为可能,这在以前是不敢想的,
process data at scales previously unthinkable. Unfortunately, these data processing technologies are not realtime systems, nor are they meant to
但不幸的事是,这些处理技术不是实时的处理系统,他们注定不是这种系统。
be. There’s no hack that will turn Hadoop into a realtime system; realtime data processing has a fundamentally different set of requirements than batch processing.
也没有办法,把hadoop变成一个实时的数据处理系统,实时数据处理,相对于批处理来说有一些根本的不同的要求。
However, realtime data processing at massive scale is becoming more and more of a requirement for businesses. The lack of a “Hadoop of realtime” has become the biggest hole in the data processing ecosystem.
然而,商业越来越需要,这一个可以实时处理大数据的系统,hadoop上的实时处理系统的缺失,是最大的一个缺失,在hadoop生态系统上,
Storm fills that hole.
storm 填补了那个空白
Before Storm, you would typically have to manually build a network of queues and workers to do realtime processing. Workers would process
在strom之前,你不得不自己建立一个网络队列和工作者来做实时处理,工作
messages off a queue, update databases, and send new messages to other queues for further processing. Unfortunately, this approach has
会处理消息队列,更新数据库,再发送新的消息到其它队列来进一步处理,不幸的是,
serious limitations:
这样做有一些限制。
- Tedious: You spend most of your development time configuring where to send messages, deploying workers, and deploying intermediate
- queues. The realtime processing logic that you care about corresponds to a relatively small percentage of your codebase.
- Brittle: There’s little fault-tolerance. You’re responsible for keeping each worker and queue up.
- Painful to scale: When the message throughput get too high for a single worker or queue, you need to partition how the data is spread around. You need to reconfigure the other workers to know the new locations to send messages. This introduces moving parts and new pieces that can fail.
Although the queues and workers paradigm breaks down for large numbers of messages, message processing is clearly the fundamental paradigm for realtime computation. The question is: how do you do it in a way that doesn’t lose data, scales to huge volumes of messages, and is dead-simple to use and operate?
即使queue和workers 范例会崩溃由于很大的数据量,消息处理是实时计算最根本的功能,问题是,你怎么做才能使数据不丢失,吞吐大量消息,而且非常简单的使用和操作。
Storm satisfies these goals.
Storm 符合这些要求
Why Storm is important
Storm exposes a set of primitives for doing realtime computation. Like how MapReduce greatly eases the writing of parallel batch processing, Storm’s primitives greatly ease the writing of parallel realtime computation.
storm为实时计算暴露了一系列基础操作。就map/reduce使编写并行批处理变得简单。 storm的一些基本操作很大程度上简化了编写并写实时计算的过程。
The key properties of Storm are:
- Extremely broad set of use cases: Storm can be used for processing messages and updating databases (stream processing), doing a continuous query on data streams and streaming the results into clients (continuous computation), parallelizing an intense query like a search query on the fly (distributed RPC), and more. Storm’s small set of primitives satisfy a stunning number of use cases.
- Scalable: Storm scales to massive numbers of messages per second. To scale a topology, all you have to do is add machines and increase the parallelism settings of the topology. As an example of Storm’s scale, one of Storm’s initial applications processed 1,000,000 messages per second on a 10 node cluster, including hundreds of database calls per second as part of the topology. Storm’s usage of Zookeeper for cluster coordination makes it scale to much larger cluster sizes.
- Guarantees no data loss: A realtime system must have strong guarantees about data being successfully processed. A system that drops data has a very limited set of use cases. Storm guarantees that every message will be processed, and this is in direct contrast with other systems like S4.
- Extremely robust: Unlike systems like Hadoop, which are notorious for being difficult to manage, Storm clusters just work. It is an explicit goal of the Storm project to make the user experience of managing Storm clusters as painless as possible.
- Fault-tolerant: If there are faults during execution of your computation, Storm will reassign tasks as necessary. Storm makes sure that a computation can run forever (or until you kill the computation).
- Programming language agnostic: Robust and scalable realtime processing shouldn’t be limited to a single platform. Storm topologies and processing components can be defined in any language, making Storm accessible to nearly anyone.
- Storm 基本原理
- storm 基本原理
- Storm基本原理
- storm基本原理及框架
- storm 文档(2)----基本原理
- storm 文档(2)----基本原理
- Storm记录级容错的基本原理
- Storm记录级容错的基本原理(acker工作原理)
- 基本原理
- 基本原理
- 基本原理
- STORM
- storm
- Storm
- storm
- storm
- storm
- storm
- C++模板(template)使用介绍
- cgroup-----freezer子系统
- android版音乐播放器----卡拉OK歌词实现(一)
- 【LeetCode】Best Time to Buy and Sell Stock
- cgroup----ns子系统
- storm 基本原理
- cgroup----memory子系统
- cgroup-----cpuset子系统
- Css content attr
- mac上使用g++编译出错“Undefined symbols for architecture x86_64:” 错误解决办法
- iPhone弹出视图
- http://enterprise.huawei.com/cn/feedback/buy_feedback/index.htm?nodeName=1
- Hadoop Tool,ToolRunner原理分析
- IOS之路--NO ONE