等待事件之Log File Sync

来源：互联网发布：编程语言难学吗编辑：程序博客网时间：2024/04/24 14:18

log file parallel write

The log file parallel write wait event has three parameters: files, blocks, and requests. In Oracle Database 10g, this wait event falls under the System I/O wait class. Keep the following key thoughts in mind when dealing with the log file parallel write wait event.

The log file parallel write event belongs only to the LGWR process.

A slow LGWR can impact foreground processes commit time.

Significant log file parallel write wait time is most likely an I/O issue

log file sync

The log file sync wait event has one parameter: buffer#. In Oracle Database 10g, this wait event falls under the Commit wait class. Keep the following key thoughts in mind when dealing with the log file sync wait event.

The log file sync wait event is related to transaction terminations (commits or rollbacks).

When a process spends a lot of time on the log file sync event, it is usually indicative of too many commits or short transactions.

The log file switch (checkpoint incomplete) wait event has no wait parameters.

In Oracle Database 10g, this wait event falls under the Configuration wait class. Keep the following key thought in mind when dealing with the log file switch (checkpoint incomplete) wait event.

Excessive log switches caused by small log files and a high transaction rate

　log file sync等待事件主要是指SERVER PORCESS在等待REDO全部被写到REDO文件中去。

LGWR写重做日志文件时是顺序写的，但是调度LGWR进程对同一组多个重做日志文件‘同时’写，是通过异步I/O来实现的,因此等待事件log file parallel write应该是在同一组下有多个重做日志文件时才会出现。(感谢3#的提示)

log file sync等待出现时，原因有可能是应用里面的提交次数过多，没有批量提交（WAITS过多，但平均等待时间很短）；也有可能是I/O问题（此时应该也会出现log file parallel write等待事件）。

1、Log File Sync是从提交开始到提交结束的时间。Log File Parallel Write是LGWR开始写Redo File到Redo File结束的时间。明确了这一点，可以知道，Log file sync 包含了log file parallel write。所以，log file sync等待时间一出，必先看log file parallel write。如果log file sync平均等待时间（也可称为提交响应时间）为20ms，log file parallel write为19ms，那么问题就很明显了，Redo file I/O缓慢，拖慢了提交的过程。

2、Log File Sync的时间不止log file parallel write。服务器进程开始提交，到通知LGWR写Redo，LGWR写完Redo通知进程提交完毕，来回通知也是要消耗CPU的。除去来回通知外，Commit还有增加SCN等等操作，如果log file sync和log file parallel write差距很大，证明I/O没有问题，但有可能是CPU资源紧张，导致进程和LGWR来回通知或其他的需要CPU的操作，得不到足够的CPU，因而产生延迟。

这种情况下要看一下CPU的占用率、Load，如果Load很高、CPU使用率也很高，哪就是由于CPU导致Log file sync响应时间加长。这种情况下，数据库通常会有一些并发症，比如Latch/Mutex的竞争会比平常严重些，因为CPU紧张吗，Latch/Mutex竞争一些会加巨的。

3、log file sync和log file parallel write相差很大，但CPU使用率也不高，这种情况比较少见，这就属于疑难杂症范畴了。I/O也很快，CPU也充足，log fie parallel write响应时间很短，但log file sync响应时间确很大。这是最难定位的情况，可以全面对比下Redo相关资料(v$sysstat中的资料）、Redo相关Latch的变化情况。

比如，redo synch time的平均响应时间，不是每次redo synch time都有提交，但每次提交必有redo synch time。如果redo synch time向应快，而log file sync慢，则说明Lgwr和进程的互相通知阶段出了问题。还有redo entries，这个Redo条目数，真正含意是进程向Log Buffer中写Redo的次数。redo log space wait time、redo log space requests资料和Log Buffer Space等待事件也要关注下。Log Buffer的大小通常不会影响Log File Sync，但通过Log Buffer的变化，可以了解Redo量的变化。

关于Log Buffer对Log File Sync的影响，

在新IMU机制下，Redo数据先在共享池中，提交时传到Log Buffer中，如果这时有等待，等待时间是Log Buffer Space。从Log Buffer到磁盘，等待事件才是log file sync。

老机制下也一样，Log Buffer之前的等待是log buffer space，log buffer之后的等待才是log file sync。

4、控制文件I/O有可能影响log file sync。

此问题还没来得及深入研究，只是以前在阿里的数据库中观察到这一现象。

5、Log File Sycn和Buffer Busy Waits。

没有直接关系。是其他原因，比如Redo相关的Latch，导致了Log File Sync和Buffer Busy Waits同时出现。此时Log File Sync和Buffer Busy Waits都不是原凶，真正的原凶是Log Buffer访问性能下降。

6、以同步模式向远端DataGuard传送Redo，也会导致Log File Sync。