<Chapter 2>Fast Index Creation in the InnoDB Storage Engine

来源:互联网 发布:2017年最火的网络手游 编辑:程序博客网 时间:2024/06/09 22:05

Chapter 2. Fast Index Creation in the InnoDB Storage Engine


第二章:在InnoDB存储引擎下快速创建索引

Table of Contents

目录

 

2.1. Overview of Fast Index Creation

2.1.关于快速创建索引

 

2.2. Examples

2.2.实例

 

2.3. Implementation

2.3.应用与实现

 

2.4. Concurrency Considerations

2.4.并发性的考虑

 

2.5. Crash Recovery

2.5.崩溃后的恢复

 

2.6. Limitations

2.6.局限性

2.1.Overview of Fast Index Creation


2.1.关于快速创建索引

In MySQL versions upto 5.0, adding or dropping an index on a table with existing data can be veryslow if the table has many rows. The CREATE INDEX and DROP INDEX commands workby creating a new, empty table defined with the requested set of indexes. Itthen copies the existing rows to the new table one-by-one, updating the indexesas it goes. Inserting entries into the indexes in this fashion, where the keyvalues are not sorted, requires random access to the index nodes, and is farfrom optimal. After all rows from the original table are copied, the old tableis dropped and the copy is renamed with the name of the original table.

MySQL 5.0以及更早的版本中,在现有的数据库中添加或删除一个索引非常耗时,特别是对于行数很多的表格。MySQL执行CREATE INDEX DROP INDEX命令的流程是:首先创建一个新的、空的临时表,其中包含要创建的索引;然后将原表中的数据一行一行地拷贝到临时表中,更新索引;在这种方式下,由于插入的时候数据不是按索引键排序的,导致随机访问索引节点,创建和删除索引的性能不佳;拷贝完所有数据后,删除临时表并将新表重命名为原来的表名。

 


Beginningwith version 5.1, MySQL allows a storage engine to create or drop indexeswithout copying the contents of the entire table. The standard built-in InnoDBin MySQL version 5.1, however, does not take advantage of this capability. Withthe InnoDB Plugin, however, users can in most cases add and drop indexes muchmore efficiently than with prior releases.

MySQL 5.1版本开始,MySQL允许存储引擎在不拷贝表数据的情况下创建或删除索引。MySQL 5.1内置的InnoDB并不带有这种性能,但是,新发布的InnoDB Plugin比以往版本更能实现高效的创建和删除索引。

 


InInnoDB, the rows of a table are stored in a clustered (or primary key) index,forming what some database systems call an “index-organized table”. Changingthe clustered index requires copying the data, even with the InnoDB Plugin.However, adding or dropping a secondary index with the InnoDB Plugin is muchfaster, since it does not involve copying the data.

InnoDB中,表中的行存储在聚集索引或主键索引,这样的表在某些数据库系统上被称为“索引组织表”。不管是built-in InnoDB还是InnoDB Plugin,更改聚集索引都需要拷贝数据。但是,由于不涉及数据的复制,在InnoDB Plugin添加或者删除非聚集索引会快得多。

 


Thisnew mechanism also means that you can generally speed the overall process ofcreating and loading an indexed table by creating the table with only theclustered index, and adding the secondary indexes after the data is loaded.

这种新特性意味着,通过创建只含聚集索引的表并在数据加载完毕后新增非聚集索引,整个创建及加载索引表的总体进程速率将得到显著提高。

 

Althoughno syntax changes are required in the CREATE INDEX or DROP INDEX commands, somefactors affect the performance, space usage, and semantics of this operation(seeSection 2.6, “Limitations”).

尽管CREATE INDEXDROP INDEX命令没有语句语法上的变化,仍然存在某些因素影响着数据库性能、空间使用率以及操作中涉及的语义(详情参考2.6章节“局限性”)。

 

Becausethe ability to create and drop indexes does not require use of a new on-diskfile format, it is possible to temporarily use the InnoDB Plugin to create ordrop an index, and then fall back to using the standard built-in InnoDB inMySQL for normal operations if you wish. See Chapter 11,Downgradingfrom the InnoDB Plugin for more information.

由于创建和删除索引并不需要使用新的磁盘文件格式,你可以暂时使用InnoDB Plugin创建和删除一条索引,然后使用标准built-inInnoDB进行其他普通的操作。详情参考第11章“InnoDB Plugin的降级”。


2.2.Examples



2.2.实例

Itis possible to create multiple indexes on a table with one ALTER TABLE command.This is relatively efficient, because the clustered index of the table needs tobe scanned only once (although the data is sorted separately for each newindex). For example:

CREATETABLE T1(A INT PRIMARY KEY, B INT, C CHAR(1)) ENGINE=InnoDB;

INSERTINTO T1 VALUES

(1,2,'a'),(2,3,'b'), (3,2,'c'), (4,3,'d'), (5,2,'e');

COMMIT;

ALTERTABLE T1 ADD INDEX (B), ADD UNIQUE INDEX (C);

使用一条ALTER TABLE命令创建多个索引是完全可能的并且相对有效,因为表中的聚集索引只需扫描一遍(即使每个新索引的数据都各自排好序)。例如:

CREATE TABLE T1(A INT PRIMARY KEY, B INT, C CHAR(1)) ENGINE=InnoDB;

INSERT INTO T1 VALUES

(1,2,'a'), (2,3,'b'), (3,2,'c'), (4,3,'d'), (5,2,'e');

COMMIT;

ALTER TABLE T1 ADD INDEX (B), ADD UNIQUE INDEX (C);

 

Theabove commands will create table T1 with the clustered index (primary key) oncolumn A, insert several rows, and then build two new indexes on columns B andC. If there were many rows inserted into T1 before the ALTER TABLE command,this approach would be much more efficient than creating the table with all itsindexes before loading the data.

上述的命令将创建表T1并在列A创建聚集索引(主键),插入几行后在列B和列C创建两个新的索引。如果在ALTER TABLE命令之前插入过多的行到表T1,此方法比加载完数据之后才创建带有全部索引的表会更有效。

 

Youmay also create the indexes one at a time, but then the clustered index of thetable is scanned (as well as sorted) once for each CREATE INDEX command. Thus,the following commands are not as efficient as the ALTER TABLE command above,even though neither requires recreating the clustered index for table T1.

CREATEINDEX B ON T1 (B);CREATE UNIQUE INDEX C ON T1 (C);

你也可以一次只创建一个索引,但是,对于CREATE INDEX命令而言,表中的聚集索引只扫描(并排序)一次。因此,下面所列的命令并不如上述的ALTER TABLE命令有效,尽管两者在表T1中都不必重建聚集索引。

 

Droppingindexes in the InnoDB Plugin does not require any copying of table data. Thus,you can equally quickly drop multiple indexes with a single ALTER TABLE commandor multiple DROP INDEX commands:

ALTERTABLE T1 DROP INDEX B, DROP INDEX C;

or

DROPINDEX B ON T1;DROP INDEX C ON T1;

InnoDB Plugin删除索引不需要拷贝表中的数据,因此,你可以同样快速地用单个ALTER TABLE命令或多个DROP INDEX命令来删除多个索引:

ALTER TABLE T1 DROP INDEX B, DROP INDEX C;

或者

DROP INDEX B ON T1;DROP INDEX C ON T1;

 

Restructuringthe clustered index in InnoDB always requires copying the data in the table.For example, if you create a table without a primary key, InnoDB chooses onefor you, which may be the first UNIQUE key defined on NOT NULL columns, or asystem-generated key. Defining a PRIMARY KEY later causes the data to becopied, as in the following example:

CREATETABLE T2 (A INT, B INT) ENGINE=InnoDB;INSERT INTO T2 VALUES (NULL, 1);

ALTERTABLE T2 ADD PRIMARY KEY (B);

InnoDB重组聚集索引需要拷贝表中的数据。例如,如果你创建的表没有主键,InnoDB将自动设置主键,它可能是列NOT NULL中的第一个UNIQUE键,也可能是一个系统生成的键。假如在这之后定义一个主键,会导致数据的拷贝,举例如下:

CREATE TABLE T2 (A INT, B INT) ENGINE=InnoDB;INSERT INTO T2 VALUES(NULL, 1);

ALTER TABLE T2 ADD PRIMARY KEY (B);

 

Notethat when you create a UNIQUE or PRIMARY KEY index, InnoDB must do some extrawork. For UNIQUE indexes, InnoDB checks that the table contains no duplicatevalues for the key. For a PRIMARY KEY index, InnoDB also checks that none ofthe PRIMARY KEY columns contains a NULL. It is best to define the primary keywhen you create a table, so you need not rebuild the table later.

需要注意的是,创建唯一索引或者主键索引时,InnoDB还需要做一些额外的工作。对于唯一索引而言,InnoDB需要检查并确保表中的key不存在重复的值;对于主键索引而言,InnoDB需要检查并确保所有的列PRIMARY KEY不包含NULL值。因此,最好在创建表的同时定义主键,这样在稍后的操作中就不必重建表了。



2.3. Implementation


2.3.应用与实现

InnoDB has two typesof indexes: the clustered index and secondary indexes. Since the clusteredindex contains the data values in its B-tree nodes, adding or dropping aclustered index does involve copying the data, and creating a new copy of thetable. A secondary index, however, contains only the index key and the value ofthe primary key. This type of index may be created or dropped without copyingthe data in the clustered index. Furthermore, because the secondary indexcontains the values of the primary key (used to access the clustered index whenneeded), when you change the definition of the primary key, thus recreating theclustered index, all secondary indexes are recreated as well.

InnoDB有两种类型的索引:聚集索引和非聚集索引。由于聚集索引的B-tree节点包含数据值,添加或删除聚集索引需要拷贝数据并创建一份新的表副本。但是,非聚集索引只包含索引键以及主键的值。创建或者删除非聚集索引时并不需要拷贝聚集索引的数据。此外,由于非聚集索引包含主键的值(必要时用于访问聚集索引),如果更改了主键的定义,重新创建聚集索引的同时,非聚集索引也将被重建。

 

Droppinga secondary index is simple. Only the internal InnoDB system tables and theMySQL data dictionary tables need to be updated to reflect the fact that theindex no longer exists. InnoDB returns the storage used for the index to thetablespace that contained it, so that new indexes or additional table rows mayuse the space.

删除非聚集索引是非常简单的。只有在InnoDB内部系统的表和MySQL数据字典表被更新时,才反应了非聚集索引已不再存在的事实。InnoDB会回收表空间里索引使用的空间,这样一来,新的索引或新增的表记录就可以使用这些空间了。

 

Toadd a secondary index to an existing table, InnoDB scans the table, and sortsthe rows using memory buffers and temporary files in order by the value(s) ofthe secondary index key column(s). The B-tree is then built in key-value order,which is more efficient than inserting rows into an index in random order withrespect to the key values. Because the B-tree nodes are split when they fill,building the index in this way results in a higher fill-factor for the index,making it more efficient for subsequent access.

如果在已知的表中新增非聚集索引,InnoDB将扫描此表,并且把内存缓冲区和临时文件中的记录按非聚集索引的键值排序。随后便建立起键值有序排列的B-tree,这比插入记录到键值随机排序的索引效率更高。因为B-tree节点在满了之后会分裂,用这种方式建立的索引拥有更高的填充因子,从而使后续的访问进行得更有效率。



2.4.Concurrency Considerations


2.4.并发性的考虑

Whilea secondary index is being created or dropped, the table is locked in sharedmode. That is, any writes to the table are blocked, but the data in the tablemay be read. When you alter the clustered index of a table, however, the tableis locked in exclusive mode, because the data must be copied. Thus, during thecreation of a new clustered index, all operations on the table are blocked.

在创建或者删除非聚集索引的同时,表被锁定为共享模式。也就是说,不允许修改表,但表中的数据是可读的。更改表中的聚集索引则需要拷贝数据,导致表被锁定为独占模式。因此,在创建新的聚集索引的过程中,所有针对表的操作都是不被允许的。

 

Beforeit can start executing, a CREATE INDEX or ALTER TABLE command must always waitfor currently executing transactions that are accessing the table to commit orrollback before it can proceed. In addition, ALTER TABLE commands that create anew clustered index must wait for all SELECT statements that access the tableto complete (or their containing transactions to commit). Even though theoriginal index exists throughout the creation of the new clustered index, notransactions whose execution spans the creation of the index can be accessingthe table, because the original table must be dropped when clustered index isrestructured.

如果当前事务正在访问表,我们必须在提交或回滚该事务之后才开始执行CREATE INDEX ALTER TABLE命令。除此之外,我们还必须等到表上所有的SELECT语句都完成之后(或者提交了包含的事务之后),才能使用ALTER TABLE命令创建新的聚集索引。即使原先的索引在新的聚集索引创建之后仍然存在,由于重组聚集索引时必须删除原表,以致于在整个索引创建过程中没有任何事务能访问表。

 

Oncea CREATE INDEX or ALTER TABLE command that creates a secondary index beginsexecuting, queries may access the table for read access, but may not update thetable. If an ALTER TABLE command is changing the clustered index, all queriesmust wait until the operation completes.

创建非聚集索引时,一旦CREATE INDEX ALTER TABLE开始执行,查询可能以读取的形式访问表,但无法更新表。如果使用ALTER TABLE命令更改聚集索引,所有查询必须等到该命令完成后才可执行。

 

Anewly-created secondary index contains only data that is current in the tableas of the time the CREATE INDEX or ALTER TABLE command begins to execute.Specifically, a newly-created index contains only the versions of data as ofthe most-recently committed transactions prior to the creation of the index.The index thus does not contain any rows that were deleted (and thereforemarked for deletion) by transactions that completed before the CREATE INDEX orALTER TABLE began. Similarly, the index contains only current versions of everyrow, and none of the old versions of rows that were updated by transactionsthat ran before the index was created.

新建的非聚集索引包含了CRATE INDEXALTER TABLE命令开始执行时当前表中的数据。具体来说,新建的索引只包含创建索引之前、最近一次提交的事务中的数据版本。因此,索引也不包含CREATE INDEXALTER TABLE命令执行之前由事务删除(被标记为“删除”)的记录。同样地,索引只包含每一行记录的当前版本,并且,创建索引之前运行的事务不能更新旧版本的记录。

 

Becausea newly-created index contains only information about data current at the timethe index was created, queries that need to see data that was deleted orchanged before the index was created cannot use the index. The only queriesthat could be affected by this limitation are those executing in transactionsthat began before the creation of the index was begun. For such queries,unpredictable results could occur. Newer queries can use the index.

因为新创建的索引只包含创建索引时刻的数据信息,如果使用某个查询来查看索引删除或更改前的数据,则不能使用索引。唯一受此局限性影响的查询,是那些在创建索引之前而运行的事务中执行的语句。这样的查询可能导致不可预知的结果。而新的查询可以使用索引。



2.5.Crash Recovery


2.5.崩溃后的恢复

Nodata is lost if the server crashes while an ALTER TABLE command is executing.Recovery, however, is different for clustered indexes and secondary indexes.

执行ALTER TABLE命令的时候,服务器崩溃并不会造成数据的丢失。但是,聚集索引和非聚集索引的恢复方法不尽相同。

 

Ifthe server crashes while creating a secondary index, upon recovery, InnoDBdrops any partially created indexes. All you need to do to create the index isto re-run the ALTER TABLE or CREATE INDEX command.

如果在创建非聚集索引时发生服务器崩溃,InnoDB将会删除部分新建的索引以恢复数据库。想要创建索引,你只需重新运行ALTER TABLE或者CREATE INDEX命令。

 

However,when a crash occurs during the creation of a clustered index, recovery issomewhat more complicated, because the data in the table must be copied to anentirely new clustered index. Remember that all InnoDB tables are stored asclustered indexes. In the following discussion, we use the word table andclustered index interchangeably.

但是,如果在创建聚集索引时发生服务器崩溃,恢复数据库的过程则较为复杂,因为表中的数据必须拷贝到另一个全新的聚集索引中。谨记,InnoDB中的表都以聚集索引的方式存储。在下面的讨论中,word表格和聚集索引将互换使用。

 

TheInnoDB Plugin creates the new clustered index by copying the existing data fromthe original table to a temporary table that has the desired index structure.Once the data is completely copied to this temporary table, the original tableis renamed with a different temporary table name. The temporary tablecomprising the new clustered index is then renamed with the name of theoriginal table, and the original table is then dropped from the database.

InnoDB Plugin新建聚集索引的方法是,首先将原表中的数据拷贝到一个临时表并且该临时表拥有预期的索引结构;数据拷贝完成后,将原表重命名(不可与临时表相同),包含新的聚集索引的临时表则重命名为原表之前的名字,并将原表从数据库中删除。

 

Ifa system crash occurs while creating a new clustered index, no data is lost,but users must complete the recovery process using the temporary tables thatexist during the process.

如果在新建聚集索引时发生系统崩溃,数据不会丢失,但是用户必须使用现有的临时表来恢复进程。

 

Usersrarely re-create a clustered index or re-define primary keys on large tables.Because system crashes are uncommon and the situation described here is rare,this manual does not provide information on recovering from this scenario.Instead, please see the InnoDB web site: http://www.innodb.com/support/tips.

用户极少会在大表中重新创建聚集索引或重新定义主键。由于系统崩溃以及这里所描述的情况是很罕见的,所以本手册不提供该情况下恢复数据库的相关信息,想了解更多请访问网站http://www.innodb.com/support/tips.

 

2.6.Limitations


2.6.局限性

Take the followingconsiderations into account when creating or dropping indexes using the InnoDBPlugin:

使用InnoDB Plugin创建或删除索引时需要充分考虑下面的情况:

 

Duringindex creation, files are written to the temporary directory ($TMPDIR on Unix,%TEMP% on Windows, or the value of--tmpdir configuration variable). Eachtemporary file is large enough to hold one column that makes up the new index,and each one is removed as soon as it is merged into the final index.

在创建索引时,文件被写入到临时目录中(Unix写入$TMPDIRWindows写入%TEMP%,或者写入--tmpdir配置变量的值)。每个临时文件都足够大,能够容纳一列记录以便建立新的索引,并且每个临时文件被合并到最终的索引时,该临时文件也被删除。

 

Dueto a limitation of MySQL, the table is copied, rather than using “Fast IndexCreation” when you create an index on a TEMPORARY TABLE. This has been reportedasMySQL Bug #39833.

由于MySQL的限制,在临时表中创建索引时需要直接复制表,而不使用“快速创建索引”。有人已针对此情况提交了名为MySQL Bug #39833的报告。

 

Toavoid consistency issues between the InnoDB data dictionary and the MySQL datadictionary, the table is copied, rather than using Fast Index Creation when youuse the ALTER TABLE ... RENAME COLUMN syntax.

当使用ALTER TABLE...RENAME COLUMN语法时,为了区别InnoDB数据字典和 MySQL 数据字典,我们也直接拷贝表,而不是使用“快速创建索引”。

 

Thecommand ALTER IGNORE TABLE t ADDUNIQUE INDEX does not delete duplicate rows. This has been reported asMySQL Bug #40344. The IGNORE keywordis ignored. If any duplicate rows exist, the operation fails with the followingerror message:

ERROR23000: Duplicate entry '347' for key'pl'

ALTER IGNORE TABLE t ADDUNIQUE INDEX命令并不会删除重复的记录。有人已针对此情况提交了名为MySQLBug #40344的报告。IGNORE关键词被忽略。如果存在重复的记录,操作将失败并显示以下错误信息:

ERROR 23000: Duplicate entry '347'for key 'pl'

 

Asnoted above, a newly-created index contains only information about data currentat the time the index was created. Therefore, you should not run queries in atransaction that might use a secondary index that did not exist at thebeginning of the transaction. There is no way for InnoDB to access “old” datathat is consistent with the rest of the data read by the transaction. See thediscussion of locking inSection 2.4, “Concurrency Considerations”.

如上所述,新创建的索引只包含创建索引时刻的当前数据信息。因此,在可能包含非聚集索引的事务中不能运行查询语句,该事务运行的前期并不存在非聚集索引。InnoDB并不能访问与由事务读取的剩余数据相一致的“旧”数据。更多关于锁定的讨论请参考第2.4章节“并发性的考虑”。

 

Priorto InnoDB Plugin 1.0.4, unexpected results could occur if a query attempts touse an index created after the start of the transaction containing the query.If an old transaction attempts to access a “too new” index, InnoDB Plugin 1.0.4and later reports an error:

ERRORHY000: Table definition has changed, please retry transaction

InnoDB Plugin 1.0.4版本之前,如果一个事务包含某查询语句并且该查询尝试使用事务运行之前所创建的索引,可能会导致意想不到的结果。如果一个旧的事务试图访问“太新”的索引,InnoDB Plugin 1.0.4会报告这样一条错误:

ERROR HY000: Table definition has changed, please retry transaction

 

Asthe error message suggests, committing (or rolling back) the transaction, andrestarting it, cures the problem.

错误信息表明,提交(或回滚)事务并重启,能够解决此问题。

 

InnoDBPlugin 1.0.2 introduces some improvements in error handling when users attemptto drop indexes. See sectionSection 8.7, “Better Error Handling when Dropping Indexes”for details.

InnoDB Plugin 1.0.2 在用户试图删除索引时产生的错误的处理方法上作出了一些改进。详情参考第8.7章节“删除索引时产生的错误及其更好的处理方法”。

 

MySQL5.1 does not support efficient creation or dropping of FOREIGN KEY constraints.Therefore, if you use ALTER TABLE to add or remove a REFERENCES constraint, thechild table will be copied, rather than using “Fast Index Creation”.

MySQL 5.1不支持有效的创建或删除FOREIGN KEY约束。因此,如果使用ALTER TABLE添加或移除REFERENCES约束,子表需要复制数据,而不使用“快速创建索引”。

 



原创粉丝点击