Spark Streaming和Storm对比
来源:互联网 发布:恋爱技巧大全软件 编辑:程序博客网 时间:2024/05/16 08:29
Spark Streaming
Storm
Data sources
HDFS, HBase, Cassandra, Kafka
HDFS, Base, Cassandra, Kafka
Resource Manager
YARN, Mesos
YARN, Mesos
Latency
Few seconds
<1 second
Fault tolerance(every recourd processed)
Exactly once
At least once
Reliability
Imporoved reliability(Spark + YARN)
Guarantees on data loss(Storm + Kafka)
区别:
1.Latency
2.Fault tolerance
3.Reliability
Storm架构
Supervisor:从节点,物理机器
Worker: Supervisor的进程
executor: Worker的线程
Spark Streaming架构:
worker: 物理节点
executor: 进程
------------------------------
Data Source:
- SQL
- NOSQL
- Log Data
- Streaming Data
Ingestion:
- Flume
- Sqoop
- NFS
- Kafka
Processing:
- MapReduce
- Spark
- Storm
- Drill
- Mahout
- Ooize
- Hive
- Pig
- HBase
- Elasticsearch
- Slor
Visualization:
- WebTier
- Banana
- Kibana
- Data Warehouse
ELK:
- E: Elasticsearch
- L: Logstash
- K: Kibana
Alluxio(Tachyon)<br>
Real time Processing:
Storm/Trident/Spark Streaming/Samza/Flink<br>
Latency:<br>
Spark: Few Seconds<br>
Storm: <1 Seconds<br>
Fault tolerance:<br>
Spark: Exactly once<br>
Storm: at least once/at most once<br>
Trident:Exactly once<br>
Reliability:<br>
Spark: Improved reliability (cache)<br>
Storm: Guarantees no data loss<br>
Storm:<br>
- Niumus
- Zookeeper
- Supervisor
Worker: 进程<br>
Executor:线程<br>
Task: Bolt/Spout<br>
Spark Streaming:<br>
- Cluster Manager<br>
Mesos<br>
Yarn<br>
- Executor
Task<br>
cache<br>
Spark Streaming: DStream<br>
```
val conf = new SparkConf().setAppName().setMaster("local")
val ssc = new StreamingContext(conf, Seconds(1))
val lines = ssc.socketTextStream("spark001", 9999)
val lines2 = ssc.textFileStream("hdfs://spark:9000/wordcount_dir");
```
Streaming Window:<br>
reduceByKeyAndWindow(f(), Durations.seconds(60), Durations.seconds(10))<br>
每隔10s计算前60s的RDD。<br>
Kafka + Spark Streaming + HBase<br>
```
val lines = KafkaUtils.createStream(
ssc,
"192.168.80.201:2181,192.168.80.202:2181,192.168.80.203:2181", //zookeeper
"wordcountGroup",
"topicThreadMap"
)
```
```
topics.add("topic1")
KafkaParms.put("meta.broker.list",
192.168.80.201:9092,192.168.80.202:9092,192.168.80.203:9092); //broker.list
val lines = KafkaUtils.createDirectStream(
ssc,
String.class //key类型
String.class //value类型
StringDecoder.class, //解码器
StringDecode.class,
KafkaParms,
topics
)
```
jar包提交时 --master 会覆盖代码中的.setMaster("local")<br>
hdfs dfs -copyFromLocal spakr.txt /wordcount_dir<br>
- Spark Streaming和Storm对比
- Storm和Spark Streaming框架对比
- spark streaming 与 storm的对比
- Spark Streaming与Storm的对比分析
- Storm与Spark Streaming横向对比
- Storm介绍及与Spark Streaming对比
- Storm vs. Spark Streaming: 横向对比
- Spark Streaming与Storm的对比分析
- spark和storm的对比
- Spark streaming&storm流计算的相关对比
- Spark streaming&storm流计算的相关对比
- Spark streaming & storm流式计算框架对比
- (转)Spark Streaming与Storm的对比分析
- 3.Spark Streaming:与Storm的对比分析
- Spark streaming&storm流计算的相关对比
- spark streaming & storm
- Spark Streaming和Storm的区别和联系
- 实时流Streaming大数据:Storm,Spark和Samza
- Servlet 生命周期
- ceph
- SparkStreaming入门-1
- Tomcat7.0安装配置
- mtr分析网络情况
- Spark Streaming和Storm对比
- maven详解
- sort排序
- 快餐店装修材料之石膏板吊顶的施工
- [mysql] 集合函数作为过滤条件
- C++总结3——volatile、explicit、mutable关键字
- css的简介
- 【C面试】一道简单的C语言面试题的思考——打印星阵
- Douban_Crawler Learning Notes