Metadata Locking在5.5里所做改动带来的影响

来源:互联网 发布:2016年中国出口数据 编辑:程序博客网 时间:2024/05/16 05:13

转自天堂:http://www.cnblogs.com/zuoxingyu/archive/2013/03/15/2960862.html

原文地址:http://www.mysqlperformanceblog.com/2013/02/01/implications-of-metadata-locking-changes-in-mysql-5-5/

While most of the talk recently has mostly been around the new changes in MySQL 5.6 (and that is understandable), I have had lately some very interesting cases to deal with, with respect to the Metadata Locking related changes that were introduced in MySQL 5.5.3. It appears that the implications of Metadata Locking have not been covered well, and since there are still a large number of MySQL 5.0 and 5.1 installations that would upgrade or are in the process of upgrading to MySQL 5.5, I thought it necessary to discuss what these implications exactly are.

我们最近讨论的热点围绕在MySQL5.6的新改动(这也是可以理解的),我最近有一些非常有趣的案例涉及到从5.5.3开始引进的Metadata Locking相关的改动。貌似Metadata Locking带来的问题并没有处理得很好。由于有很大一部分的5.0和5.1的将会升级到5.5或者正在升级中,我想有必要讨论一下这些影响到底是什么。

To read what Metadata Locking exactly is please read this section here in the MySQL manual.

有关Metadata Locking信息请看 MySQL manual.

Let’s start off with having a look at the Meta Data Locking behavior prior to MySQL 5.5.3 

我们先看看在5.5.3之前Meta Data Locking是怎么工作的

Metadata Locking behavior prior to MySQL 5.5.3   5.5.3之前版本是如何处理MetaData Locking的

Prior to MySQL 5.5.3 a statement that opened a table only held meta data locks till the end of the statement and not the end of the transaction. This meant that transaction was not really isolated, because the same query could return different results if executed twice and if a DDL was executed between the query invocations. Let me give you an example via a simple test case where I will add a new column to the table while a transaction in REPEATABLE-READ isolation mode is ACTIVE.

在5.5.3之前的版本里,一个语句打开一个表后,会一直持有Metadata Lock,直到这个语句结束,而不会持有到事务结束。意思就是说,事务不是真正意义上的隔离,因为如果在两次相同的查询之间做了一个DDL的操作,那么相同的查询将会返回不同的结果。我们看一个简单的例子,在这个例子里我将增加一列。事务隔离级别是REPEATABLE-READ

复制代码
session1 > select @@tx_isolation;+-----------------+| @@tx_isolation  |+-----------------+| REPEATABLE-READ |+-----------------+1 row in set (0.00 sec)session1 > start transaction;Query OK, 0 rows affected (0.00 sec)session1 > select * from test where id=1;+----+------+| id | x    |+----+------+|  1 | foo  |+----+------+1 row in set (0.00 sec)session2 > ALTER TABLE test add column c char(32) default 'dummy_text';Query OK, 2 rows affected (0.57 sec)Records: 2  Duplicates: 0  Warnings: 0session1 > select * from test where id=1;Empty set (0.00 sec)session1 > rollback;Query OK, 0 rows affected (0.00 sec)session1 > start transaction;Query OK, 0 rows affected (0.00 sec)session1 > select * from test where id=1;+----+------+------------+| id | x    | c          |+----+------+------------+|  1 | foo  | dummy_text |+----+------+------------+1 row in set (0.00 sec)
复制代码

And you can see how isolation is broken because the SELECT was not repeatable although transaction isolation level of REPEATABLE-READ was used. This behavior of versions prior to 5.5 also means that queries could be written in different order to the binary log breaking locking semantics and contrary to serialization concepts. For example take a look at the following excerpt from the binary log of a case when an UPDATE transaction is mixed with an ALTER:

你可以看到事务隔离是怎么样被破坏的---尽管事务隔离级别是REPEATABLE-READ,还是不能重复SELECT。5.5版本之前的这个表现意味着,SQL可以以不同的顺序写入到binlog文件,这违反了锁定义以及串行化的概念。看看下面的例子,来自一个UPDATE事务和ALTER混合的二进制日志文件。

复制代码
# at 536#130201 11:21:03 server id 1  end_log_pos 658   Query   thread_id=8     exec_time=0     error_code=0SET TIMESTAMP=1359714063/*!*/;ALTER TABLE test add column id_2 int(11) default 0 after id/*!*/;# at 658#130201 11:21:39 server id 1  end_log_pos 726   Query   thread_id=7     exec_time=0     error_code=0SET TIMESTAMP=1359714099/*!*/;BEGIN/*!*/;# at 726# at 773#130201 11:21:35 server id 1  end_log_pos 773   Table_map: `test`.`test` mapped to number 17#130201 11:21:35 server id 1  end_log_pos 829   Update_rows: table id 17 flags: STMT_END_FBINLOG 'L5cLURMBAAAALwAAAAUDAAAAABEAAAAAAAEABHRlc3QABHRlc3QAAwMD/gL+CQY=L5cLURgBAAAAOAAAAD0DAAAAABEAAAAAAAEAA///+AIAAAAAAAAAA2JhcvgCAAAAAAAAAANob3A='/*!*/;### UPDATE test.test### WHERE###   @1=2 /* INT meta=0 nullable=0 is_null=0 */###   @2=0 /* INT meta=0 nullable=1 is_null=0 */###   @3='bar' /* STRING(9) meta=65033 nullable=1 is_null=0 */### SET###   @1=2 /* INT meta=0 nullable=0 is_null=0 */###   @2=0 /* INT meta=0 nullable=1 is_null=0 */###   @3='hop' /* STRING(9) meta=65033 nullable=1 is_null=0 */# at 829#130201 11:21:39 server id 1  end_log_pos 856   Xid = 85COMMIT/*!*/;
复制代码

Note how ALTER is logged before the UPDATE, because ALTER did not block waiting for the transaction to commit.

关注下ALTER是怎么被记录到UPDATE之前的,原因在于ALTER根本不需要等待事务的提交

For the reasons described above the implementation of Metadata Locking was changed, starting MySQL 5.5.3. Let’s see how this works now.

上面描述的Metadata Locking的实现已经从5.5.3发生了改变,我们看看它是如何工作的

Metadata Locking behavior starting MySQL 5.5.3  5.5.3之后版本是如何处理的Metadata Locking

Starting with 5.5.3 DDL statements that modify the table metadata are executed in an isolated fashion consistent with transactional behavior. This means that any open transaction will hold metadata locks on the table it has accessed for as long as the transaction is open. Since an open transaction retains metadata locks on all tables that were opened by the transaction, hence any DDL operation cannot commence till all the transactions that accessed that table are open. Let’s see this in affect via a simple test case:

从5.5.3开始DDL语句以一个隔离的事务行为方式执行元数据的修改。也就是说,任何已经开始的事务将一直持有表的元数据锁直到事务提交。由于开始的事务会持有事务关联的所有表的元数据锁,所以任何DDL操作在前面的事务提交前是不能够执行的。我们通过一个简单的例子看看它的影响。

复制代码
session1 > start transaction;Query OK, 0 rows affected (0.00 sec)session1 > select * from test order by id;+----+------+| id | x    |+----+------+|  1 | foo  ||  2 | bar  |+----+------+2 rows in set (0.00 sec)session2 > ALTER TABLE test add column c char(32) default 'dummy_text';session3 > show processlist;+----+----------+-----------+------+---------+------+---------------------------------+-------------------------------------------------------------+| Id | User     | Host      | db   | Command | Time | State                           | Info                                                        |+----+----------+-----------+------+---------+------+---------------------------------+-------------------------------------------------------------+|  1 | msandbox | localhost | test | Sleep   |  140 |                                 | NULL                                                        ||  2 | msandbox | localhost | test | Query   |    3 | Waiting for table metadata lock | ALTER TABLE test add column c char(32) default 'dummy_text' ||  3 | msandbox | localhost | test | Query   |    0 | NULL                            | show processlist                                            |+----+----------+-----------+------+---------+------+---------------------------------+-------------------------------------------------------------+3 rows in set (0.00 sec)
复制代码

You can see how the ALTER blocks, because the transaction in session1 is still open and once the transaction in session1 is closed, the ALTER proceeds through successfully:

ALTER操作被阻塞了。原因在于session1里的事务还未提交,在session1的事务关闭之后,ALTER才能够执行成功

session1 > rollback;Query OK, 0 rows affected (0.00 sec)session2 > ALTER TABLE test add column c char(32) default 'dummy_text';Query OK, 2 rows affected (46.77 sec)Records: 2  Duplicates: 0  Warnings: 0

Let’s see where the ALTER spent most of its time:   看看ALTER在哪个步骤上花了很多时间

复制代码
session2 > show profiles;+----------+-------------+-------------------------------------------------------------+| Query_ID | Duration    | Query                                                       |+----------+-------------+-------------------------------------------------------------+|        1 | 46.78110075 | ALTER TABLE test add column c char(32) default 'dummy_text' |+----------+-------------+-------------------------------------------------------------+1 row in set (0.00 sec)session2 > show profile for query 1;+------------------------------+-----------+| Status                       | Duration  |+------------------------------+-----------+| starting                     |  0.000060 || checking permissions         |  0.000003 || checking permissions         |  0.000003 || init                         |  0.000005 || Opening tables               |  0.000045 || System lock                  |  0.000006 || setup                        |  0.000016 || creating table               |  0.168283 || After create                 |  0.000061 || copy to tmp table            |  0.165808 || rename result table          | 46.446738 || end                          |  0.000035 || Waiting for query cache lock |  0.000003 || end                          |  0.000006 || query end                    |  0.000003 || closing tables               |  0.000008 || freeing items                |  0.000016 || cleaning up                  |  0.000004 |+------------------------------+-----------+18 rows in set (0.00 sec)
复制代码

So the ALTER waited on the meta data locks just after the table with the new structure had been created and populated with data but before the old table was swapped with the new one. Note that ALTER is a multi-step process, the old table is locked in shared mode and then something similar to the following steps are taken: a new table with the new structure is created and then INSERT INTO new_table SELECT * FROM old_table is done and then RENAME old_table to tmp_table, new_table to old_table and finally DROP tmp_table.
Let’s see another example, this time trying a RENAME:

 ALTER仅仅只在新表结构创建好,导入数据后,做交换新旧表的时候,才开始锁等待。注意到ALTER是一个多步操作,旧表加共享锁,然后执行一些类似下面的操作:创建一个新结构的表,从旧表导入数据,然后重命名旧表为临时表,然后把新表重命名为旧表,最后删除临时表。我们看看另外一个例子,这次试试RENAME

复制代码
session2 > RENAME TABLE test to test_2;session3 > show processlist;+----+----------+-----------+------+---------+------+---------------------------------+-----------------------------+| Id | User     | Host      | db   | Command | Time | State                           | Info                        |+----+----------+-----------+------+---------+------+---------------------------------+-----------------------------+|  1 | msandbox | localhost | test | Sleep   |   49 |                                 | NULL                        ||  2 | msandbox | localhost | test | Query   |   35 | Waiting for table metadata lock | RENAME TABLE test to test_2 ||  3 | msandbox | localhost | test | Query   |    0 | NULL                            | show processlist            |+----+----------+-----------+------+---------+------+---------------------------------+-----------------------------+3 rows in set (0.00 sec)
复制代码

And you can see that the RENAME is also blocked, because a transaction that accessed the table “test” is still open.

RENAME也被阻塞了,原因在于事务访问的表test还没关闭

So we have an interesting conclusion here that the ALTER waits only at the last stages when its making changes to the table metadata, a table ALTER that alters a big table can keep executing without any hindrance, copying rows from the table with the old structure to the table with the new structure and will only wait at the last step when its about to make changes to table metadata.

Let’s see another interesting side-affect of metadata locking.

那么在这里我们得到一个很有趣的结论:ALTER只在修改表元数据的时候,只会在最后阶段才会等待锁释放,对一个大表的ALTER操作可以顺畅的执行,从旧表拷贝数据到新表,然后只会在最后一步修改表元数据的时候,才会有锁等待。我们看看另外一个有趣的metadata locking侧面影响

Now there is another interesting side-affect, and that is that when the ALTER comes at the state where it needs to wait for metadata locks, at that point the ALTER simply blocks any type of queries to the table, we know that writes would be blocked anyhow for the entire duration of the ALTER, but reads would be blocked as well at the time when the ALTER is waiting for metadata locks. Let’s see this in action via another test case:

另外一个有趣的侧面影响就是,当做一个ALTER的时候,它需要等待metadata locks,在那个阶段,ALTER简单的阻塞在这个表上的所有类型的操作。我们知道在ALTER整个过程中,写操作肯定是会阻塞的,但是在等待metadata locks的时间里,读操作也阻塞了。我们通过另外一个例子来看看:

复制代码
session1 > start transaction;Query OK, 0 rows affected (0.00 sec)session1 > select * from test_2 order by id;+----+------+------------+| id | x    | c          |+----+------+------------+|  1 | foo  | dummy_text ||  2 | bar  | dummy_text |+----+------+------------+2 rows in set (0.00 sec)session6 > ALTER TABLE test_2 DROP COLUMN c;session7 > select * from test_2 order by id;session8 > select * from test_2 order by id;session9 > select * from test_2 order by id;session10 > select * from test_2 order by id;session3 > show processlist;+----+----------+-----------+------+---------+------+---------------------------------+----------------------------------+| Id | User     | Host      | db   | Command | Time | State                           | Info                             |+----+----------+-----------+------+---------+------+---------------------------------+----------------------------------+|  1 | msandbox | localhost | test | Sleep   |  403 |                                 | NULL                             ||  3 | msandbox | localhost | test | Query   |    0 | NULL                            | show processlist                 ||  6 | msandbox | localhost | test | Query   |  229 | Waiting for table metadata lock | ALTER TABLE test_2 DROP COLUMN c ||  7 | msandbox | localhost | test | Query   |  195 | Waiting for table metadata lock | select * from test_2 order by id ||  8 | msandbox | localhost | test | Query   |  180 | Waiting for table metadata lock | select * from test_2 order by id ||  9 | msandbox | localhost | test | Query   |  169 | Waiting for table metadata lock | select * from test_2 order by id || 10 | msandbox | localhost | test | Query   |   55 | Waiting for table metadata lock | select * from test_2 order by id |+----+----------+-----------+------+---------+------+---------------------------------+----------------------------------+7 rows in set (0.00 sec)
复制代码

And you can see that the table is blocked for any kind of operation. Let’s see the profiling information for one of the queries that was blocked to see where the query spent most of its time:

可以看到这个表上的任何操作都阻塞了。我们看看profiling信息,看看那个被阻塞的操作,在什么地方消耗了最多的时间

复制代码
session10 > show profile for query 1;+----------------------+------------+| Status               | Duration   |+----------------------+------------+| starting             |   0.000058 || checking permissions |   0.000006 || Opening tables       | 213.028481 || System lock          |   0.000009 || init                 |   0.000014 || optimizing           |   0.000002 || statistics           |   0.000005 || preparing            |   0.000006 || executing            |   0.000001 || Sorting result       |   0.000002 || Sending data         |   0.000040 || end                  |   0.000003 || query end            |   0.000002 || closing tables       |   0.000003 || freeing items        |   0.000007 || logging slow query   |   0.000002 || cleaning up          |   0.000002 |+----------------------+------------+17 rows in set (0.00 sec)
复制代码

And you can see how the query spent nearly all its time waiting in the “Opening tables” state. Now this behavior with respect to ALTER making the table inaccessible in some cases is not really documented and as such I have reported a bug:http://bugs.mysql.com/bug.php?id=67647

Metadata locking behaves differently for queries that are serviced from the Query Cache, let’s see what happens in that case.

可以看到,这个操作花费的时间,基本上都在“Opening tables”上了。这种做法使得ALTERT操作的表在某些场景下变得不可访问了,官方文档里也确实没有标注出来。

Metadata Locking and Query Cache 

How does metadata locking behave with query_cache? That is an important question. If Query Cache is enabled and the SELECT can be serviced from the Query Cache then the SELECT will not block on the ALTER even though the ALTER is waiting for meta data locks. Why? Because in such a case no table open operation has to be performed. Let’s see this scenario via a test case:

在有query_cache的时候,metadata locking又是什么表现呢?这是个有趣的问题。如果query cache启用了,select 可以直接从query cache返回数据,那么即使ALTER还在等待meta data 锁,select 还是不会被阻塞。为什么会这样呢?原因在于,在这个场景下,不需要有打开表的操作。我们通过一个例子来看看

复制代码
session1 > start transaction;Query OK, 0 rows affected (0.00 sec)session1 > select * from test_2 order by id;+----+------+| id | x    |+----+------+|  1 | foo  ||  2 | bar  |+----+------+2 rows in set (0.00 sec)session6 > RENAME TABLE test_2 to test;session10 > select * from test_2 order by id;+----+------+| id | x    |+----+------+|  1 | foo  ||  2 | bar  |+----+------+2 rows in set (0.00 sec)session3 > show processlist;+----+----------+-----------+------+---------+------+---------------------------------+-----------------------------+| Id | User     | Host      | db   | Command | Time | State                           | Info                        |+----+----------+-----------+------+---------+------+---------------------------------+-----------------------------+|  1 | msandbox | localhost | test | Sleep   |   22 |                                 | NULL                        ||  3 | msandbox | localhost | test | Query   |    0 | NULL                            | show processlist            ||  6 | msandbox | localhost | test | Query   |    3 | Waiting for table metadata lock | RENAME TABLE test_2 to test || 10 | msandbox | localhost | test | Sleep   |   37 |                                 | NULL                        |+----+----------+-----------+------+---------+------+---------------------------------+-----------------------------+4 rows in set (0.00 sec)
复制代码

The query proceeds without being blocked on anything while the RENAME is still waiting for metadata locks. Let’s see the profiling information for this query:

这个查询在RENAME还在等待metadata locks的时候,就顺利执行完了。我们看看这个SQL的profiling信息

复制代码
session10 > show profile for query 11;+--------------------------------+----------+| Status                         | Duration |+--------------------------------+----------+| starting                       | 0.000018 || Waiting for query cache lock   | 0.000003 || checking query cache for query | 0.000007 || checking privileges on cached  | 0.000003 || checking permissions           | 0.000005 || sending cached result to clien | 0.000011 || logging slow query             | 0.000002 || cleaning up                    | 0.000002 |+--------------------------------+----------+8 rows in set (0.00 sec)
复制代码

You can see that no table open operation was performed and hence no wait.

可以看到,这里没有打开表的操作,所以也就没有锁等待了。

Does the fact that the table has already been opened and table object is in the table_cache change anything with respect to metadata locks.

从根源上来说,这个表已经打开了,表对象存在于table_cache,并没做任何metadata的修改操作。

Metadata Locking and Table Cache

No matter if a connection accesses a table that is already in the Table Cache, any query to a table that has a DDL operation waiting, will block. Why? Because MySQL sees that the old entries in the Table Cache have to be invalidated, and any query that accesses the table will have to reopen the modified table and there will be new entries in the Table Cache. Let’s see this phenomenon in action:

不管被访问的表是否已经存在于Table Cache,别的对正在做DDL操作并且在等待中的表操作都被阻塞了,这是为什么?原因在于MYSQL看到存在于Table Cache里的数据已经过期了,访问这个表的查询需要重新打开表,然后新数据加载到Cache里。我们看看这个现象

复制代码
session6 > ALTER TABLE test add column c char(32) default 'dummy_text';Query OK, 2 rows affected (59.80 sec)session10 > show status like 'Open%tables';+---------------+-------+| Variable_name | Value |+---------------+-------+| Open_tables   | 30    || Opened_tables | 0     |+---------------+-------+2 rows in set (0.00 sec)session10 > select * from test order by id;+----+------+------------+| id | x    | c          |+----+------+------------+|  1 | foo  | dummy_text ||  2 | bar  | dummy_text |+----+------+------------+2 rows in set (53.78 sec)session10 > show status like 'Open%tables';+---------------+-------+| Variable_name | Value |+---------------+-------+| Open_tables   | 30    || Opened_tables | 1     |+---------------+-------+2 rows in set (0.00 sec)session10 > show profile for query 18;+----------------------+-----------+| Status               | Duration  |+----------------------+-----------+| starting             |  0.000059 || checking permissions |  0.000010 || Opening tables       | 53.786685 || System lock          |  0.000009 || init                 |  0.000012 || optimizing           |  0.000003 || statistics           |  0.000007 || preparing            |  0.000006 || executing            |  0.000001 || Sorting result       |  0.000004 || Sending data         |  0.000033 || end                  |  0.000003 || query end            |  0.000002 || closing tables       |  0.000004 || freeing items        |  0.000009 || logging slow query   |  0.000002 || cleaning up          |  0.000002 |+----------------------+-----------+17 rows in set (0.00 sec)
复制代码

As you can see that the SELECT query still blocks, and the status counter Opened_tables is also incremented once the query finishes.

So much for the explanation, now let’s take a look at the consequences.

正如看到的那样,SELECT还是被阻塞了,每次访问后Opened_tables计数器都加1。说了这么多,下面做一个总结

Consequences   结论

The consequences of these changes in metadata locking is that, if you have some really hot tables, for example in web applications its typical to see a “sessions” table that is accessed on every request, then care should be taken when you have to ALTER the table otherwise it can easily cause a stall as many threads can get piled up waiting for table metadata lock bringing down the MySQL server or causing all the connections to get depleted.

metadata locking新的改动带来的影响就是:如果你有一个热点数据表,比方说在web应用里,典型的"sessions"表,每个请求都需要访问它,如果必须对这个表做ALTER操作,那么你就得非常小心了,因为它很容易导致很多进程在等待metadata locking,然后把服务器搞挂了,或者把连击数全部消耗完。

There are some other interesting consequences as well for application that use MySQL versions prior to 5.5:

这里还有另外一些在使用5.5版本之前的很有趣的结果

  • I remember a customer case where there is a reporting slave that daily runs a long running transaction, this transactions tends to run for hours. Now everyday one of the tables was renamed and swapped and that table was the one that is read from by the long running transaction. As the slave tried to execute the rename query it would simply block waiting for the long running transaction to finish, this would cause the slave to lag for hours waiting for the transaction to be completed, as you know that the slave is single-threaded so it cannot really apply any other event. This was never an issue when the application was using MySQL version < 5.5 as the datastore.
  • 我记得一个客户每天在从服务器上跑一个很长时间的查询去汇总数据,这个事务会跑好几个小时。现在开始其中的某个表每天都要重命名,然后被交换,而在那个费时很久的查询又会访问这个表。当slave在做重命名操作的时候,它会阻塞这个长查询的完成,这将导致salve滞后接小时去等待查询的完成。我们知道,slave是单线程的,所以它不能执行别的操作。在5.5之前的版本里不存在这个问题。
  • There was another interesting case this time with how Active MQ uses MySQL when in HA mode. In HA mode there are two Active MQ servers, both try to do something similar to the following sequence of events:
  • 另外一个有趣的案例就是,当使用 Active MQ做HA的时候,HA架构里有2个 Active MQ服务器,两个服务器都去做类型下面的操作。
复制代码
session1 > CREATE TABLE t1(i int(11) not null auto_increment primary key) ENGINE=InnoDB;Query OK, 0 rows affected (0.09 sec)session1 > insert into t1 values(null);Query OK, 1 row affected (0.21 sec)session1 > start transaction;Query OK, 0 rows affected (0.00 sec)session1 > select * from t1 where i=1 for update;+---+| i |+---+| 1 |+---+1 row in set (0.00 sec)session2 > CREATE TABLE t1(i int(11) not null auto_increment primary key) ENGINE=InnoDB;
复制代码

When using MySQL 5.1 the second CREATE would just fail immediately with the error “ERROR 1050 (42S01): Table ‘t1′ already exists”, but because of how meta data locking works in 5.5 this is no longer the case, the second CREATE will simply block with unintended consequences. A workaround here would be to set lock_wait_timeout variable to a very low value and then execute the CREATE TABLE, this will make sure that the CREATE fails immediately (however due to a different reason):

如果在5.1下运行,第二个CREATE会立刻返回一个“ERROR 1050 (42S01): Table ‘t1′ already exists”错误,但是由于5.5里的meta data locking,结果不再是这样的了。第二个CREATE会被意外阻塞。一个解决的办法就是把lock_wait_timeout变量得非常短,然后再执行CREATE TABLE操作,这能确保CREATE立刻返回失败(然而失败的原因不同)

session2 > set session lock_wait_timeout=1;CREATE TABLE t1(i int(11) not null auto_increment primary key) ENGINE=InnoDB;

However, I feel that the CREATE TABLE should fail in such a case when the table already exists and there is no other DDL like a DROP table waiting to run on the same table, and as such I have reported the bug: http://bugs.mysql.com/bug.php?id=67873