storm入门之资料收集

来源：互联网发布：大数据时代.pdf 编辑：程序博客网时间：2024/06/05 21:56

官网

Read these first

Rationale
Tutorial
Setting up development environment
Creating a new Storm project

Documentation

Documentation Index 官网文档列表（一定要先看这个目录，这样大概知道storm有哪些功能模块）
Manual
Javadoc
FAQ

      Trident

    Trident Tutorial-- basic concepts and walkthrough
    Trident API Overview -- operations for transforming and orchestrating data
    Trident State -- exactly-once processing and fast, persistent aggregation
    Trident spouts -- transactional and non-transactional data intake

对其他工具的集成

Kafka
HDFS
HBase
Hive
JDBC
Redis
Solr

storm-0.10 官网API

官网文档列表里边几个重要的要先看：

1. 看basics of storm中的内容

a.turtorial

b. 理解storm的进程/任务/执行器《What makes a running topology: worker processes, executors and tasks》

2. trident相关

3 理解消息的可靠性保障（事务相关）:有中文翻译

官网的中文翻译

Apache Storm 官方文档中文版例如《Apache Storm 官方文档 —— 消息的可靠性保障》《Trident 教程》

csdn上《翻译：Storm可靠性及事务性相关设计: Acker及Trident State》

其他

《Getting Started With Storm》及中文翻译《storm入门的中文翻译》：非常不错的介绍文档，建议先看

Storm入门《Getting Started With Storm》

源码分析： Storm源码分析--Nimbus启动过程

storm的并发

总结

Storm是一个实时流计算框架，Trident是对storm的一个更高层次的抽象，Trident最大的特点以batch的形式处理stream。
一些最基本的操作函数有Filter、Function，Filter可以过滤掉tuple，Function可以修改tuple内容，输出0或多个tuple，并能把新增的字段追加到tuple后面。
聚合有partitionAggregate和Aggregator接口。partitionAggregate对当前partition中的tuple进行聚合，它不是重定向操作。Aggregator有三个接口：CombinerAggregator, ReducerAggregator，Aggregator，它们属于重定向操作，它们会把stream重定向到一个partition中进行聚合操作。
重定向操作会改变数据流向，但不会改变数据内容，重定向操会产生网络传输，可能影响一部分效率。而Filter、Function、partitionAggregate则属于本地操作，不会产生网络传输。
GroupBy会根据指定字段，把整个stream切分成一个个grouped stream，如果在grouped stream上做聚合操作，那么聚合就会发生在这些grouped stream上而不是整个batch。如果groupBy后面跟的是aggregator，则是聚合操作，如果跟的是partitionAggregate，则不是聚合操作。

0 0