Distributed Database Architecture---Clustrix Query Optimizer
来源:互联网 发布:c stl源码 编辑:程序博客网 时间:2024/05/16 11:25
原文: http://docs.clustrix.com/display/CLXDOC/Query+Optimizer
See also:
- Understanding the Clustrix Explain OutputUnderstanding Explain Output
- A New Approach: Clustrix Sierra Database Engine
How the Clustrix Query Optimizer is Different
At the core of Clustrix Query Optimizer is the ability to execute onWhat is a query optimizer?
SQL is a declarative language, i.e a language that describes what is to be computed but not how. The job of the query optimizer is to determine how to do this computation, which ends up being critical to the performance of the entire system. For example, you might say in SQL that you want to join 3 tables and compute an aggregate operation. This leaves the following questions for the Query Optimizer:
- In what order should the tables be joined? This can be the difference between your query executing in 1ms or 10 minutes. Imagine if a predicate on on
e of the tables causes it to returns no rows – starting the read from that table is likely optimal and fast. - Which indexes should be used? Not using a proper index on a join constraint could be catastrophic, causing broadcast messages and full reads of the second table for each row of the first.
- Should we provide a non-blocking sort/aggregate? Should we do the sort/aggregate in stages, i.e first on separate nodes and then re-aggregate/re-sort later?
The set of query plans that these permutations create is known as the Search Space. The job of the Query Optimizer is to explore the Search Space and determine which plan uses the least amount of database resources. Typically, this is done by assigning costs to each plan, then choosing the cheapest on
The ClustrixDB Query Optimizer is also known as Sierra.
An Example
SQL Representation – What | Sierra Output – How |
---|---|
(display ((2 . "a") (0 . "expr0")) | (display ((2 . "a") (0 . "expr0")) |
Diagram 1 |
First, some explanations for these plans. We'll go from most indented to least indented. These descriptions should be read top to bottom for equally indented statements.
This SQL statement is performing the following:
- Read table Foo and join to Bar
- Evaluate the join constraint
- Compute the aggregate
- Display the output to the user
The Query Optimizer is doing the following:
- Read table Bar, preserving the sort order via stream_merge
- Evaluate the join constraint while reading Foo and still preserving sort order via msjoin this time.
- Compute the aggregate in a non-blocking way with stream_aggregate. This can be done because we preserved order.
How did the Query Optimizer find this plan? How did it make sure it even worked? Lets find out:
The ClustrixDB Query Optimizer
Sierra is modeled off of the Cascades Query optimization framework, which was chosen primarily because it provides the following:
- Cost-driven
- Extensible via a rule based mechanism
- Top-down approach
- General separation of logical vs. physical operators and properties
- Branch-and-bound pruning
Cascades has been proven to satisfy the requirements and demands of many commercial database systems, most notably: Tandem's NonStop SQL Product and Microsoft's SQL Server Product. A TPC-D Model of the Cascades framework can be found on
Modern query optimizers are often split into two parts, the Model and Search Engine.
The Model lists the equivalence transformations (rules), which are used by the search engine to expand the search space.
The Search Engine defines the interfaces between the search engine and the model, and provides the co
Terminology
Logical vs. Physical Model
In the Query Optimizer, the Logical model describes what is to be computed, and the Physical model describes how it is to be computed. In diagram 1 above, the SQL representation shows logically what to do and the Sierra output shows physically how to do it.
Operators and Expressions
An expr
maps to the physical expr
These operators have arguments which describe their Namespace, Object Id, and Columns. Here, the table_scan has no inputs and index_scan has an input that represents the join constraint. A list of operators can be found in the section on understanding explain output.
Physical Properties
Physical properties are related to intermediate results, or sub-plans. They describe things like how the da
- What Physical properties can and can't I provide to my parent?
- What Physical properties do I require of my input?
Here are some considerations Sierra takes while optimizing our query:
Physical Property Flow | |
---|---|
Invalid Plan | Valid Plan |
Digram 2 |
The invalid plan fails because the stream_combine operator is unable to preserve any kind of ordering that its inputs provide. However, in the valid plan, stream_merge is used, which can preserve the sort order of its child, and the index_scan itself does have sort order. In effect, plans which may or may not be valid are explored and the physical properties are used in order to validate whether they are possible. If any operator in the chain of provide and require fails, the whole thing is invalidated as a possible output plan.
Groups
Groups correspond to intermediate tables, or equivalently subplans of the query. Groups are logical and contain the following:
- All the logically equivalent expressions that describe that intermediate table
- All the physical implementations of those logical expressions
- Winners: A physical expr
ession that had the best cost given a set of physical properties - Logical properties: Which columns its required to produce as well as statistics about some of those columns.
Groups are the fundamental da
The Group for "Read Table Bar" |
---|
Diagram 3 |
The Memo
In the process of optimization, Sierra will keep track the intermediate tables that could be used in computing the final result table. Each of these corresponds to a group, and the set of all groups for a plan defines the memo. In Sierra, the memo is designed to represent all logical query trees and physical plans in the search space for a given initial query. The memo is a set of groups, with on
Rules (The Model)
The model's rule set can be thought of as defining the logical and physical search space of the optimizer. The memo is expanded to encompass the full logical and physical search space through the application of rules. The application of rules is a multi-step process of finding a binding in the memo, evaluating a rule condition (if the rule has on
Swap Join Order Rule | |
---|---|
Diagram 4 |
Tasks (The Search Engine)
Sierra's search engine is a series of tasks that are waiting to be computed. At any point in time during optimization, there are tasks waiting on a stack to be executed. Each task will likely push more tasks on
The Search Engine |
---|
Diagram 5 |
The Cost Model
Sierra costs plans using a combination of I/O, CPU usage, and latency. Remember that ClustrixDB is distributed so total CPU usage and latency are not proportional. Every operator describes a function in order to compute its costs given its inputs. For example an index_scan uses the row estimation framework to compute how many rows it expects to read from the btree and then its computes its cost as such:
The operator above the index_scan would then use this cost and row estimate to estimate its own cost. More information about this process can be found in the section on Understanding Explain Output.
Row Estimation
The way Sierra chooses the optimal plan for a query is by finding the plan with the cheapest cost. Cost is strongly dependent on how many rows the optimizer thinks are going to be flowing through the system. The job of the row estimation subsystem is to take statistical information from our Probability Distributions and compute an estimated number of rows that will come out of a given expr
The Explain
For the query above, we can get a succinct description of the plan, the row estimates, and the cost by prepending the query with 'explain':
For more information about explain see Understanding Explain Output.
Summary
To summarize how Sierra got the output in diagram 1, the following steps were performed:
- Sierra created many groups and expressions corresponding to the SQL Representation of diagram 1.
- In an on-demand fashion via the Task mechanism Sierra then:
- Fired rules in order to expand the logical and physical search space of the query, creating additional groups and expressions, some of which might not correspond to any valid plan.
- Shrank the search space to the plans that were valid, with the use of physical properties.
- Used the cost model in order to determine the cheapest plans from a given set of physical properties for each group.
- After all tasks had been processed, Sierra extracted the query plan corresponding to the winner of the top group, which is the output of diagram 1.
Distributed considerations
We've talked a lot about how the query optimizers finds the best plan, but so far the concepts are not unique to ClustrixDB. On
- Read table Foo which likely has slices on multiple nodes.
- Forward all those rows to on
e node - Insert all those rows into a hash container, computing the aggregate operation if necessary
- Read the container and output to the user
Here's what it would like:
But really we could distribute the aggregate and do this instead:
- Compute the sub-plan (under the aggregate), which likely has result rows on multiple nodes
- Locally insert those rows into a local hash container, computing the aggregate operation if necessary
- Read the hash container on each node and forward to a single node
- If necessary:
- Insert all those rows into a new final hash container, computing the aggregate operation if necessary
- Read that hash container
- Output rows to the user
Here's what this would look like:
The question for Sierra becomes which on
References
- On the optimal nesting order for computing N-relational joins, [Ibaraki 84]
- A TPC-D Model for Database Query Optimization in Cascades,http://web.cecs.pdx.edu/~kgb/t/ts.shtml123123
The Cascades Framework for Query Optimization, Graefe
- Efficiency in the Columbia Database, Yongwen Xu
- A New Approach: Clustrix Sierra Database Engine
- Distributed Database Architecture---Clustrix Query Optimizer
- DRDA(Distributed Relational Database Architecture )分布式关系数据库体系结构
- Distributed Database and Distributed Transaction
- Distributed Key-Value Database
- Elastic Distributed Database
- Performance Tuning-The Query Optimizer
- 8.9 Controlling the Query Optimizer
- 分布式数据库(distributed database)
- Expert Oracle Database Architecture
- 3.Database Architecture
- Oracle Database HA Architecture
- Shard (database architecture)
- Oracle Database Architecture
- gym database architecture
- Mysql Query optimizer mysql查询优化笔记
- Distributed Systems Architecture: A Middleware Approach
- Scalable Web Architecture and Distributed Systems
- Scalable Web Architecture and Distributed Systems
- 如何阅读《数据库查询优化器的艺术:原理解析与SQL性能优化》
- 《数据库查询优化器的艺术:原理解析与SQL性能的艺术》---感谢朋友和社区的支持
- MySQL是怎么保证产品质量的?
- 在Simulink中建模,生成C++代码,在vs平台下进行功能验证
- CodeForces
- Distributed Database Architecture---Clustrix Query Optimizer
- 招聘---MySQL全球开发团队招聘Windows性能架构师
- GoLang几种读文件方式的比较
- PostgreSQL的新书介绍
- Reading and Writing Files
- Java复习(一个月)
- MySQL Enterprise Monitor 3.0.7 has been released
- 进程创建时安全计算处理
- 相关子查询与不相关子查询的优化(一)