Spark 与Storm 异同
来源:互联网 发布:stussy 正品淘宝 编辑:程序博客网 时间:2024/05/16 12:23
http://xinhstechblog.blogspot.com/2014/06/storm-vs-spark-streaming-side-by-side.html
Storm vs. Spark Streaming: Side-by-side comparison
Overview
Processing Model, Latency
Fault Tolerance, Data Guarantees
Summary
In short, Storm is a good choice if you need sub-second latency and no data loss. Spark Streaming is better if you need stateful computation, with the guarantee that each event is processed exactly once. Spark Streaming programming logic may also be easier because it is similar to batch programming, in that you are working with batches (albeit very small ones).Implementation, Programming API
Implementation
Storm is primarily implemented in Clojure, while Spark Streaming is implemented in Scala. This is something to keep in mind if you want to look into the code to see how each system works or to make your own customizations. Storm was developed at BackType and Twitter; Spark Streaming was developed at UC Berkeley.Programming API
Storm comes with a Java API, as well as support for other languages. Spark Streaming can be programmed in Scala as well as Java.Batch Framework Integration
One nice feature of Spark Streaming is that it runs on Spark. Thus, you can use the same (or very similar) code that you write for batch processing and/or interactive queries in Spark, on Spark Streaming. This reduces the need to write separate code to process streaming data and historical data.Storm vs. Spark Streaming: implementation and programming API.
Summary
Two advantages of Spark Streaming are that (1) it is not implemented in Clojure :) and (2) it is well integrated with the Spark batch computation framework.Production, Support
Production Use
Hadoop Distribution, Support
Cluster Manager Integration
Summary
Storm has run in production much longer than Spark Streaming. However, Spark Streaming has the advantages that (1) it has a company dedicated to supporting it (Databricks), and (2) it is compatible with YARN.Further Reading
For an overview of Storm, see these slides.For a good overview of Spark Streaming, see the slides to a Strata Conference talk. A more detailed description can be found in this research paper.
http://stackoverflow.com/questions/24119897/apache-spark-vs-apache-storm
Apache Spark is an in-memory distributed data analysis platform-- primarily targeted at speeding up batch analysis jobs, iterative machine learning jobs, interactive query and graph processing. One of Spark's primary distinctions is its use of RDDs or Resilient Distributed Datasets. RDDs are great for pipelining parallel operators for computation and are, by definition, immutable, which allows Spark a unique form of fault tolerance based on lineage information. If you are interested in, for example, executing a Hadoop MapReduce job much faster, Spark is a great option (although memory requirements must be considered).
Apache Storm is focused on stream processing or what some call complex event processing. Storm implements a fault tolerant method for performing a computation or pipelining multiple computations on an event as it flows into a system. One might use Storm to transform unstructured data as it flows into a system into a desired format.
Storm and Spark are focused on fairly different use cases. The more "apples-to-apples" comparison would be between Storm and Spark Streaming. Since Spark's RDDs are inherently immutable, Spark Streaming implements a method for "batching" incoming updates in user-defined time intervals that get transformed into their own RDDs. Spark's parallel operators can then perform computations on these RDDs. This is different from Storm which deals with each event individually.
One key difference between these two technologies is that Spark performs Data-Parallel computationswhile Storm performs Task-Parallel computations. Either design makes tradeoffs that are worth knowing. I would suggest checking out these links.
- Spark 与Storm 异同
- hadoop与spark的异同
- MapReduce\Tez\Storm\Spark四个框架的异同
- MapReduce\Tez\Storm\Spark四个框架的异同
- MapReduce\Tez\Storm\Spark四个框架的异同
- MapReduce、Tez、Storm、Spark四个框架的异同
- 分析MapReduce与Storm的异同
- 分析MapReduce与Storm的异同
- Storm与Spark Streaming比较
- Storm与Spark Streaming比较
- Storm与Spark Streaming比较
- spark与storm的对比
- Storm与Spark Streaming比较
- Storm与Spark Streaming比较
- spark与storm的对比
- spark与storm的对比
- spark与storm的对比
- spark与storm的对比
- 程序员既要写好代码,又要写好文档
- Merge into的使用的例子
- android studio快捷键中英文对照,一些使用技巧,一些设置
- HDU 1058 Humble Numbers(DP,数)
- 端到端的通信
- Spark 与Storm 异同
- hdoj 2501 Tiling_easy version(递推)
- hdu 4026 Unlock the Cell Phone(DP-状态DP)
- 如何通过创建一个Fraction类(分数)来实现分数的加减乘除,比较大小、约分等方法(方法的实现部分)
- Transaction rolled back because it has been marked as rollback-only
- 远程桌面连接窗口与主机不同步解决办法
- linux下tomcat服务的启动、关闭与错误跟踪
- Java IO 的一般使用原则:
- vector向量容器