6-Druid Design设计模式

来源：互联网发布：seo h标签编辑：程序博客网时间：2024/05/16 18:50

原文

http://druid.io/docs/0.10.1/design/design.html

一、Architecture

不同的节点：

Historical nodes
作为主力，处理storage和"historical" data (non-realtime)的查询，从deep storage加载数据，响应来自broker的查询，返回结果给broker。通过Zookeeper告知自己的存在和所服务的segments，利用Zookeeper监控 load or drop new segments
Coordinator nodes
监控historical nodes的分组，保证数据的可用性、数据的备份的最优配置。通过从metadata storage 读取segment metadata信息决定segments是否需要加载到集群，利用Zookeeper判定那些Historical nodes存在和向Historical nodes发送加载或卸载segments的信息
Broker nodes
接收外部的请求，将这些请求发送到Realtime和Historical节点，Broker得到结果后，将结果合并返回caller。通过Zookeeper得知 Realtime and Historical nodes存在拓扑
Indexing Service nodes
workers组织成一个cluster，负责加载批数据和real-time data 到系统
Realtime nodes
加载real-time data 到system，ndexing service简单，有一些生产使用的限制 limitations

以上节点各司其职，将Historical和Realtime分开， real-time stream处理的内存的监控和怎样将实时数据加入系统分离；Coordinator和Broker分离，将如何控制数据在集群中分布从集群的负担中分离

除了以上部分，还有3个外部依赖:

ZooKeeper ：集群管理和管理数据拓扑（segments在historical nodes中分布）
metadata storage instance：管理segments的metadata
"deep storage" LOB store/file system ：存储 segments

Segments and Data Storage

数据进入系统后需要分析构建索引，压缩数据：

转换为列存储模式
用bitmap构建索引
用不同的压缩算法压缩数据
- LZ4 所有列
- 对String columns，构建Dictionary encoding w/ id存储
- Bitmap compression对bitmap indexes压缩

indexing的过程输segment，Segments是主要的数据结构，其中包含各种 dimensions和metrics，以列式存储，这些列也是有索引的。Segments存储在"deep storage" LOB store/file system (see Deep Storage for information about potential options)，Data然后被加载到Historical nodes，开始是加载到本地磁盘，然后对数据做 memory-mapping之后对外提供查询服务。如果Historical node宕，该节点上的数据不能继续服务，但 "deep storage"中的相同的数据扔可以被其他的节点加载对外提供服务。可以从集群中将所有的Historical node拿掉，更新后再提供服务，同时如果 "deep storage" 宕，Historical node中已有的数据仍可服务。为了让segment在集群中存在并提供服务，建立一个entry必须放在metadata storage实例中的一个table中，entry是segment的metadata信息的描述，含有如segment的schema，size，在deep storage中的location，Coordinator通过entries知道那些data在cluster中可用。

Fault Tolerance

如果Historical node宕，该节点上的数据不能继续服务，但 "deep storage"中的相同的数据扔可以被其他的节点加载对外提供服务
Coordinator可以配置为hot fail-over模式，如果coordinators不能工作，data topology的变化停止 (no new data and no data balancing decisions),系统继续运行
Broker 并行运行和 hot fail-over 模式运行.
Indexing Service Workers 以多个复制的ingestion tasks 运行, coordination piece hot fail-over.
Realtime 依赖于delivery stream的semantics，可并行处理stream，周期性checkpoint到磁盘，最终推到deep storage，可以从处理周期中恢复，但是当磁盘不可访问时将丢失数据
"deep storage" file system 新的数据将不同加入集群.
metadata storage宕Coordinator不能找到新的 segments，只能按照对现有 segments的感知对外服务
ZooKeeper 宕 data topology 变化停止，Brokers按照对现有 data topology的感知对外服务

Query processing

query进入Broker，Broker按照他们已知的segments匹配，选取这些segment服务的machines，请求这些segment，Historical/Realtime响应（期间可能利用索引过滤命中数据）返回结果，Broker将结果合并返回，

二、Segments

segment files中存储index，segment按照time分区，segment按照time间隔建立，间隔通过segmentGranularity中的granularitySpec配置 here.，大小最好控制在300mb-700mb，如果超出，尝试改变time interval的granularity，或者将数据分区partitioningSpec中的targetPartitionSize（a good starting point for this parameter is 5 million rows）

A segment file's core data structures

包含3种数据列： timestamp column, dimension columns, and metric columns

timestamp和metric列很简单：在上面的情景中时integer和floating 的数组并以LZ4压缩，当请求知道哪些列后，简单解压这些列，对这些列应用aggregation 操作，对不需要的列忽略。

Dimensions列不同，因为他们支持filter或group-by 操作，必须包含下面的3中结构：

一个字典将values（一般是string）和 integer IDs映射
column的值列表，用字典1编码
对每个去重值构建bitmap

Naming Convention

segment identifier : datasource_intervalStart_intervalEnd_version_partitionNum

Segment Components

以上为例

version.bin
4 bytes 代表version. E.g., for v9 segments, the version is 0x0, 0x0, 0x0, 0x9
meta.smoosh
A file with metadata (filenames and offsets) about the contents of the other smoosh files
XXXXX.smoosh
There are some number of these files, which are concatenated binary data
The smoosh files represent multiple files "smooshed" together in order to minimize the number of file descriptors that must be open to house the data. They are files of up to 2GB in size (to match the limit of a memory mapped ByteBuffer in Java). The smoosh files house individual files for each of the columns in the data as well as an index.drd file with extra metadata about the segment.
There is also a special column called __time that refers to the time column of the segment. This will hopefully become less and less special as the code evolves, but for now it’s as special as my Mommy always told me I am.

Format of a column

Each column is stored as two parts:

A Jackson-serialized ColumnDescriptor
The rest of the binary for the column

A ColumnDescriptor is essentially an object that allows us to use jackson’s polymorphic deserialization to add new and interesting methods of serialization with minimal impact to the code. It consists of some metadata about the column (what type is it, is it multi-value, etc.) and then a list of serde logic that can deserialize the rest of the binary.

Sharding Data to Create Segments

Sharding

同一时段的数据可能分片

sampleData_2011-01-01T02:00:00:00Z_2011-01-01T03:00:00:00Z_v1_0

sampleData_2011-01-01T02:00:00:00Z_2011-01-01T03:00:00:00Z_v1_1

sampleData_2011-01-01T02:00:00:00Z_2011-01-01T03:00:00:00Z_v1_2

All 3 segments must be loaded before a query for the interval 2011-01-01T02:00:00:00Z_2011-01-01T03:00:00:00Z completes.

三、Historical Node

Historial Configuration.

负责加载历史segments，提供querying

Running

 io.druid.cli.Main server historical

Loading and Serving Segments

historical node 与Zookeeper常连接，得到segment的信息，Historical nodes间，Historical nodes与coordinator不直接连接

Coordinator node 负责向historical nodes分配新的segments

historical node 知道要load新数据吗，先check本地关于segment的metadata（metadata包括segment在deep storage 的位置，怎么压缩和处理segment）信息，如果本地没有就向Zookeeper申请加载，

Loading and Serving Segments From Cache

有本地化的cache加速，如果本地有数据不用再向zk请求，当然cache有更新失效

四、Broker

Broker Configuration.

路由queries。

Running

 io.druid.cli.Main server broker

Forwarding Queries

queries包含interval 过滤时间范围，根据zk中存在的节点信息路由queries。

Caching

有cache对已有的信息共享和最终的结果合并

待补充

- Coordinator
- Indexing Service
- Realtime
Dependencies
- Deep Storage
- Metadata Storage
- ZooKeeper

阅读全文

0 0