聚簇索引和非聚簇索引

来源：互联网发布：淘宝怎么开网店多少钱编辑：程序博客网时间：2024/06/14 23:31

MYSQL性能调优 : 对聚簇索引和非聚簇索引的认识

聚簇索引是对磁盘上实际数据重新组织以按指定的一个或多个列的值排序的算法。特点是存储数据的顺序和索引顺序一致。一般情况下主键会默认创建聚簇索引，且一张表只允许存在一个聚簇索引。

在《数据库原理》一书中是这么解释聚簇索引和非聚簇索引的区别的：聚簇索引的叶子节点就是数据节点，而非聚簇索引的叶子节点仍然是索引节点，只不过有指向对应数据块的指针。

聚簇索引和次级索引

Every InnoDB table has a special index calledthe clustered index where the data for the rows is stored. Typically,the clustered index is synonymous with the primary key. To get the best performance fromqueries, inserts, and other database operations, you must understand how InnoDBuses the clustered index to optimize the most common lookup and DML operationsfor each table.

每张Innodb引擎表都有一个特殊的索引，聚簇索引，它保存着每行数据。一般情况，聚簇索引就是主键索引。为了得到更高效的查询，插入，或者其他的数据库操作，你必须理解innodb引擎如何使用聚簇索引优化大多数查询和dml操作。

If you define a PRIMARY KEY on your table, InnoDB uses it as the clustered index.

If you do not define a PRIMARY KEY for your table, MySQL picks thefirst UNIQUE index that hasonly NOT NULLcolumns as the primary keyand InnoDB uses it as theclustered index.

If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internallygenerates a hidden clustered index on a synthetic column containing row IDvalues. The rows are ordered by the ID that InnoDB assignsto the rows in such a table. The row ID is a 6-byte field that increasesmonotonically as new rows are inserted. Thus, the rows ordered by the row IDare physically in insertion order.

· 如果你为表定义了一个主键，innodb就使用它作为聚簇索引。

· 如果你没有定义主键，mysql选择非空类型的唯一索引来作为作为主键，并且innodb会用它作为聚簇索引。

· 如果表中既没有主键，又没有合适的唯一索引，innodb内部生成一个隐式聚簇索引，建在由rowid组成的虚拟列上。在这张表中，innodb为每行数据指定一个rowid，数据行根据ID来排序。这些row id是由一些占6字节空间的自增长列组成。当有新数据插入的时候，row id增长，这样，保证row id就是按照数据的物理写入顺序来组织行。

How the Clustered Index Speeds UpQueries

聚簇索引提高了查询效率？

Accessing a row through the clusteredindex is fast because the row data is on the same page where the index searchleads. If a table is large, the clustered index architecture often saves a diskI/O operation when compared to storage organizations that store row data usinga different page from the index record. (For example, MyISAMuses one file for data rows and another for indexrecords.)

通过聚簇索引来寻找一行数据是非常快的，这是因为行数据和index保存在由index开头的同一个page上。如果表特别大，聚簇索引的这种构造就能节省磁盘I/O资源（索引和数据在不同页上时，根据索引来寻找数据存储页消耗的IO），比如myisam引擎，把索引和数据页分开存放；

How Secondary Indexes Relate to the ClusteredIndex

次级索引和聚簇索引之间是如何关联的？

All indexes other than the clusteredindex are known as secondary indexes. In InnoDB,each record in a secondary index contains the primary key columns for the row,as well as the columns specified for the secondary index. InnoDB uses this primary key value to search forthe row in the clustered index.

除了聚簇索引之外的其他索引类型都属于次级索引。在Innodb中，次级索引的每行数据都包含这条数据的主键列，主键列就像是为次级索引自己制定的列；聚集索引中，innodb通过主键值来查找数据行。

If the primary key is long, thesecondary indexes use more space, so it is advantageous to have a short primarykey.

如果主键过长，次级索引就需要更大的空间，因此，使用短的主键列是非常有用的。

总结：

1、啥是聚簇索引？啥是主键索引？啥叫辅助索引（二级索引）？

聚簇索引：索引指针直接指向数据页的索引，聚簇索引对数据物理页按索引键值进行物理组织排序；

主键索引：建在主键上的索引；innodb中，聚簇索引在有主键的情况下，默认指定主键为聚簇索引,因此,innodb中，主键索引一般都是聚簇索引。

二级索引：除了聚簇索引的，都称为2级索引；innodb中，二级索引查找数据行，需要先找到对应的主键，然后根据主键查找对应的行值。

2、聚簇索引的选择？

innodb中，聚簇索引是mysql自己来决定的。选择的顺序是：

（1）定义了主键，就选择主键；

（2）没有主键，选择第一个非空的唯一索引；

（3）两者都没有，innodb自己生成一个占6byte的自增长列。然后以它作为聚簇索引列

很多参考上提到：不同表的自增长列存放在同一个表中，由mysql自己管理。查询时，需要先在这张表中找到自增长列，然后再去找对应的数据；这样一来，因为所有的自建ID存放在这张表中，因此，这张表就变成了查询的瓶颈，导致innodb自建主键的效率比指定主键差。

还没有做过验证。

3、次级索引隐式包含主键的列。

4、在聚簇索引的每个叶节点上，存放以index键开始的index+数据行，省去了根据索引查找数据页的I/O;

myisam的主索引和次索引都指向物理行，下面来进行讲解

innodb的主键下存储该行的数据，此索引指向对主键的引用

myisam的索引存储图如下，可以看出，无论是id还是cat_id，下面都存储有执行物理地址的值。通过主键索引或者次索引来查询数据的时候，都是先查找到物理位置，然后再到物理位置上去寻找数据。

innodb的索引存储图如下，我们会发现，主键索引下面直接存储有数据，而次索引下，存储的是主键的id。通过主键查找数据的时候，就会很快查找到数据，但是通过次索引查找数据的时候，需要先查找到对应的主键id，然后才能查找到对应的数据。

innodb的主索引文件上直接存放该行数据,称为聚簇索引,次索引指向对主键的引用
myisam中, 主索引和次索引,都指向物理行(磁盘位置).

注意: innodb来说,
1: 主键索引既存储索引值,又在叶子中存储行的数据
2: 如果没有主键, 则会Unique key做主键
3: 如果没有unique,则系统生成一个内部的rowid做主键.
4: 像innodb中,主键的索引结构中,既存储了主键值,又存储了行数据,这种结构称为”聚簇索引”

阅读全文

0 0