innodb double write

来源:互联网 发布:扶贫软件 编辑:程序博客网 时间:2024/05/17 23:40

https://www.percona.com/blog/2006/08/04/innodb-double-write/#more-72


One of very interesting techniques Innodb uses is technique called “doublewrite” It means Innodb will write data twice when it performs table space writes – writes to log files are done only once.

So why doublewrite is needed ? It is needed to archive data safety in case of partial page writes. Innodb does not log full pages to the log files, but uses what is called “physiological” logging which means log records contain page number for the operation as well as operation data (ie update the row) and log sequence information. Such logging structure is geat as it require less data to be written to the log, however it requires pages to be internally consistent. It does not matter which page version it is – it could be “current” version in which case Innodb will skip page upate operation or “former” in which case Innodb will perform update. If page is inconsistent recovery can’t proceed.

Now lets talk a bit about partial page writes – what are they and why are they happening. Partial page writes is when page write request submited to OS completes only partially. For example out of 16K Innodb page only first 4KB are updated and other parts remain in their former state. Most typically partial page writes happen when power failure happens. It also can happen on OS crash – there is a chance operation system will split your 16K write into several writes and failure happens just between their execution. Reasons for splitting could be file fragmentation – most file systems use 4K block sizes by default so 16K could use more than one fragment. Also if software RAID is used page may come on the stripe border requiring multiple IO requests. Same happens with Hardware RAID on power failure if it does not have battery backed up cache. If there is single write issued to the disk itself it should be in theory completed even if power goes down as there should be enough power accomulated inside the drive to complete it. I honestly do not know if this is always the case – it is hard to check as it is not the only reason for partial page writes. I just know they tend to happen and before Innodb doublewirite was implemented I had couple of data corruptions due to it.

So how does double write works ? You can think about it as about one more short term log file allocated inside Innodb tablespace – it contains space for 100 pages. When Innodb flushes pages from Innodb buffer pool it does so by multiple pages. So several pages will be written to double write buffer (sequentially), fsync() called to ensure they make it to the disk, then pages written to their real location and fsync() called the second time. Now on recovery Innodb checks doublewrite buffer contents and pages in their original location. If page is inconsistent in double write buffer it is simply discarded, if it is inconsistent in the tablespace it is recovered from double write buffer.

How much does double write buffer affect MySQL Performance ? Even though double write requires each page written twice its overhead is far less than double. Write to double write buffer is sequential so it is pretty cheap. It also allows Innodb to save on fsync()s – instead of calling fsync() for each page write Innodb submits multiple page writes and calls fsync() which allows Operating System to optimize in which order writes are executed and use multiple devices in parallel. This optimization could be used without doublewrite though, it was just implemented at the same time. So in general I would expect no more than 5-10% performance loss due to use of doublewrite.

Can you disable doublewrite ? If you do not care about your data (ie slaves on RAID0) or if your file system guarantees you no partial page writes could exist you can disable doublewrite by setting innodb_doublewrite=0 It is however not worth the trouble in most cases.

http://wangyuanzju.blog.163.com/blog/static/1302920067710473482

doublewrite是Innodb使用的有趣的技术之一,这一技术指的是当Innodb在进行表空间数据写操作时会将数据写两次(写日志时只写一次)。

为什么要用doublewrite?
目的是为了保证出现部分写失效(partial page write)--即数据页写到一半时就出现故障--时的数据安全性。Innodb并不在日志中记录整个数据页,而是使用一种称之为“生理”日志的技术,即日志项中只包含页号、对数据进行的操作(如更新一行记录)和日志序列号等信息。这一日志结构的优点是能够减少写入到日志的数据量,但这也要求要保持页内部的一致性。页的版本号是没关系的,数据页可以是当前版本(这时Innodb(故障恢复时)会跳过对页的更新操作)若是早期版本(这时Innodb将执行更新操作)。但如果页内部数据不一致,恢复将不能进行。

部分写失效
什么是部分写失效,为什么会出现这一现象呢?部分写失效指的是提交给操作系统的写数据页操作只完成了一部分。例如一个16K的Innodb数据页中只有4K被更新了,其余部分数据还是旧的。大部分的部分写失效都在断电时产生,在操作系统崩溃时也可能出现,这是由于操作系统可能将一个写16K数据的操作分割成多个写操作(这可能由文件碎片导致),而在执行这多个写操作的过程中发出的故障。当使用软件RAID技术时,数据页可能恰好跨越分片(stripe)的边界,这时也需要执行多个IO操作,因此也可能导致部分写失效。当使用硬件RAID又没有使用带电池的缓存时,断电时出现部分写失效也是可能的。当发送到磁盘本身只有一个写操作时,理论上硬件可以保证写操作即使在断电时也可以完成,因为驱动器内部应该积累有足够的电量来完成这一操作。但实话说我并不知道事实是不是这样,这很难检测,而且这也不是部分写失效的唯一原因。我所知道的只是部分写失效可能会出现,当Innodb实现 doublewrite功能前,我遇到过很多由于这一原因而导致数据被破坏。

doublewrite如何工作?
你可以将doublewrite看作是在Innodb表空间内部分配的一个短期的日志文件,这一日志文件包含100个数据页。Innodb在写出缓冲区中的数据页时采用的是一次写多个页的方式,这样多个页就可以先顺序写入到doublewrite缓冲区并调用fsync()保证这些数据被写出到磁盘,然后数据页才被定出到它们实际的存储位置并再次调用fsync()。故障恢复时Innodb检查doublewrite缓冲区与数据页原存储位置的内容,若数据页在doublewrite缓冲区中处于不一致状态将被简单的丢弃,若在原存储位置中不一致则从doublewrite缓冲区中还原。

doublewrite缓冲区对MySQL有何影响?
虽然doublewrite要求每个数据页都要被写二次,其性能开销远远小于二倍。写出到doublewrite缓冲区时是顺序写,因此开销很小。 doublewrite同时还可以降低Innodb执行的fsync()操作,即不需要写每个页时都调用一下fsync(),而可以提交多个写操作最后再调用一次fsync()操作,这使得操作系统可以优化写操作的执行顺序及并行使用多个存储设备。但在不使用doublewrite技术时也可以用这些优化,事实上这些优化是与doublewrite同时实现的。因此总体来说,我预计使用doublewrite技术带来的性能开销不会超过5%到10%。

能否禁用doublewrite?
如果你不关心数据一致性(比如使用了RAID0)或文件系统可以保证不会出现部分写失效,你可以通过将innodb_doublewrite参数设置为0还禁用doublewrite。但通常这可能带来更大的麻烦。

0 0