初识Apache Storm
来源:互联网 发布:sql server下载版安装 编辑:程序博客网 时间:2024/05/21 22:49
Apache Storm
Why use Storm?
Apache Storm 是免费开源的分布式实时计算系统,可以简单且可靠的处理无限制的流式数据,Storm支持多种语言,并且提供了很强大的功能。Apache Storm 支持实时分析,机器学习,持续计算,分布式 RPC, ETL等等
Apache Storm 很快,每个节点每秒钟可处理100W个元组
Apache Storm 支持常用的队列,数据库组件
Project Information
- Kestrel
- RabbitMQ / AMQP
- Kafka
- JMS
- Amazon Kinesis
【元组】:
When programming on Storm, you manipulate and transform streams of tuples, and a tuple is a named list of values.Tuples can contain objects of any type; if you want to use a type Storm doesn't know about it's very easy to register a serializer for that type.
【三个抽象概念】:spouts, bolts, and topologies
spout
A spout is a source of streams in a computation. Typically a spout reads from a queueing broker such as Kestrel, RabbitMQ, or Kafka, but a spout can also generate its own stream or read from somewhere like the Twitter streaming API. Spout implementations already exist for most queueing systems.
bolts
A bolt processes any number of input streams and produces any number of new output streams. Most of the logic of a computation goes into bolts, such as functions, filters, streaming joins, streaming aggregations, talking to databases, and so on.
topologies
A topology is a network of spouts and bolts, with each edge in the network representing a bolt subscribing to the output stream of some other spout or bolt. A topology is an arbitrarily complex multi-stage stream computation. Topologies run indefinitely when deployed.
【开发和调试】Storm has a "local mode" where a Storm cluster is simulated in-process. This is useful for development and testing. The "storm" command line client is used when ready to submit a topology for execution on an actual cluster.
【如何入门】The storm-starter project contains example topologies for learning the basics of Storm. Learn more about how to use Storm by reading the tutorial and the documentation.
Scalable
【可伸缩】
Storm topologies are inherently(天生的) parallel(并发) and run across a cluster of machines. Different parts of the topology can be scaled individually by tweaking their parallelism. The "Storm topologies are inherently(天生的) parallel(并发) and run across a cluster of machines. Different parts of the topology can be scaled individually by tweaking their parallelism. The "rebalance" command of the "storm" command line client can adjust the parallelism of running topologies on the fly.
Storm's inherent parallelism means it can process very high throughputs of messages with very low latency(延迟). Storm was benchmarked at processing one million 100 byte messages per second per node on hardware with the following specs:
- Processor: 2x Intel E5645@2.4Ghz
- Memory: 24 GB
【高容错】
Storm is fault-tolerant: when workers die, Storm will automatically restart them. If a node dies, the worker will be restarted on another node.
The Storm daemons, Nimbus and the Supervisors, are designed to be stateless and fail-fast. So if they die, they will restart like nothing happened. This means you can kill -9 the Storm daemons without affecting the health of the cluster or your topologies.
Read more about Storm's fault-tolerance on the manual.
Guarantees data processing【准确数据处理】
Storm guarantees(担保) every tuple will be fully processed. One of Storm's core mechanisms is the ability to track the lineage of a tuple as it makes its way through the topology in an extremely efficient way(任何一台机器都有能力追踪到一个元组在拓补结构中的高效处理流中的痕迹). Read more about how this works here.
Storm's basic abstractions provide an at-least-once(至少一次) processing guarantee, the same guarantee you get when using a queueing system. Messages are only replayed when there are failures.
Using Trident, a higher level abstraction over Storm's basic abstractions, you can achieve exactly-once(正好一次) processing semantics.
Use with any language【支持所有语言】
Storm was designed from the ground up(完全彻底的) to be usable with any programming language. At the core of Storm is a Thrift definition for defining and submitting topologies. Since Thrift(Apache Thrift) can be used in any language, topologies can be defined and submitted from any language.
Similarly, spouts and bolts can be defined in any language. Non-JVM spouts and bolts communicate to Storm over a JSON-based protocol over stdin/stdout. Adapters that implement this protocol exist for Ruby, Python, Javascript, Perl.
storm-starter has an example topology that implements one of the bolts in Python.
Easy to deploy and operate【易于部署和操作】
Storm clusters are easy to deploy, requiring a minimum of setup and configuration to get up and running. Storm's out of the box configurations are suitable for production. Read more about how to deploy a Storm cluster here.
Additionally, Storm is easy to operate once deployed. Storm has been designed to be extremely robust – the cluster will just keep on running, month after month.
Free and open source【免费且开源】
Apache Storm is a free and open source project licensed under the Apache License, Version 2.0
Storm has a large and growing ecosystem(生态系统) of libraries and tools to use in conjunction(联合|连接) with Storm including everything from:
- Spouts: These spouts integrate with queueing systems such as JMS, Kafka, Redis pub/sub, and more.
- storm-state: storm-state makes it easy to manage large amounts of in-memory state in your computations in a reliable by using a distributed filesystem for persistence
- Database integrations: There are helper bolts for integrating with various databases, such as MongoDB, RDBMS's, Cassandra(分布式key-value数据库), and more.
- Other miscellaneous(五花八门) utilities
The Storm documentation has links to notable(显著的) Storm-related projects hosted outside of Apache.
- 初识Apache Storm
- storm初识
- storm 初识
- 初识storm
- Storm初识
- storm初识
- Apache Storm
- 初识STORM 快速入手
- storm初识印象
- Apache初识
- apache storm ExclamationTopology例子
- apache-storm安装使用
- Apache Storm 命令行操作
- apache storm笔记
- Apache Storm 简述
- Apache Storm简介
- Apache Storm 的安装
- Apache-Storm介绍
- cocos2dx 3.10 网狐土豪金版PC+手机端棋牌平台搭建
- HttpClient 学习笔记
- 用visualVM监控java进程
- 大数据整理笔记
- 欢迎使用CSDN-markdown编辑器
- 初识Apache Storm
- svn sqlite[S5]:database is locked
- 用Crontab在Linux服务器上运行Symfony自带的脚本
- Codeforces 851D Arpa and a list of numbers Round #432 (Div. 2
- 客户端灵活渲染——环境镶嵌数据集的渲染模板定制
- 多线程--基础概念
- 在Intellij IDEA中使用Debug
- listView 触发长点击事件之后,还会执行点击事件的问题,
- Android的启动模式与Flags