oracle中的Latch和Lock

来源：互联网发布：js 动态添加div 编辑：程序博客网时间：2024/04/29 05:17

oracle中的Latch和Lock

转载一篇文章

见过农村的木门吗？门外边的锁叫lock ，门里边的门闩叫latch.

Latch和Lock的区别:
1．   Latch是对内存数据结构提供互斥访问的一种机制，而Lock是以不同的模式来套取共享资源对象，各个模式间存在着兼容或排斥，从这点看出，Latch的访问，包括查询也是互斥的，任何时候，只能有一个进程能pin住内存的某一块，幸好这个过程是相当的短暂，否则系统性能将没的保障，现在从9I开始，允许多个进程同时查询相同的内存块，但性能并没有想象中的好。
．   Latch只作用于内存中，他只能被当前实例访问，而L ock作用于数据库对象，在RAC体系中实例间允许Lock检测与访问
3．   Latch是瞬间的占用，释放，Lock的释放需要等到事务正确的结束，他占用的时间长短由事务大小决定
4．   Latch是非入队的，而Lock是入队的
5．   Latch不存在死锁，而Lock中存在（死锁在Oracle中是非常少见的）

Latches and Internal Locks
Latches and internal locks protect internal database and memory structures. Both are inaccessible to users, because users have no need to control over their occurrence or duration. The following section helps to interpret the Enterprise Manager LOCKS and LATCHES monitors.

Latches
Latches are simple, low-level serialization mechanisms to protect shared data structures in the system global area (SGA). For example, latches protect the list of users currently accessing the database and protect the data structures describing the blocks in the buffer cache. A server or background process acquires a latch for a very short time while manipulating or looking at one of these structures. The implementation of latches is operating system dependent, particularly in regard to whether and how long a process will wait for a latch.

Internal Locks
Internal locks are higher-level, more complex mechanisms than latches and serve a variety of purposes.

Dictionary Cache Locks
These locks are of very short duration and are held on entries in dictionary caches while the entries are being modified or used. They guarantee that statements being parsed do not see inconsistent object definitions.

Dictionary cache locks can be shared or exclusive. Shared locks are released when the parse is complete. Exclusive locks are released when the DDL operation is complete.

File and Log Management Locks
These locks protect various files. For example, one lock protects the control file so that only one process at a time can change it. Another lock coordinates the use and archiving of the redo log files. Datafiles are locked to ensure that multiple instances mount a database in shared mode or that one instance mounts it in exclusive mode. Because file and log locks indicate the status of files, these locks are necessarily held for a long time.

Tablespace and Rollback Segment Locks
These locks protect tablespaces and rollback segments. For example, all instances accessing a database must agree on whether a tablespace is online or offline. Rollback segments are locked so that only one instance can write to a segment

Latches are low level serialization mechanisms used to protect shared
data structures in the SGA. The implementation of latches is operating
system dependent, particularly in regard to whether a process will wait
for a latch and for how long. A latch is a type  of a lock that can be
very quickly aquired and freed. Latches are typically used to prevent
more than one process from executing the same  piece of  code at  a
given time. Associated with each latch is a cleanup procedure that will
be called if a process  dies while holding  the latch. Latches  have
an  associated level  that  is used to prevent deadlocks.  Once a
process acquires a latch at a certain level it  cannot subsequently
acquire a latch at a  level that is equal to  or less than that level
(unless it acquires it nowait).

Latches vs Enqueues:
Enqueues are another type of locking mechanism used in Oracle. An
enqueue is a more sophisticated mechanism which permits several
concurrent processes to have varying degree of sharing of "known"
resources. Any object which can be concurrently used, can be protected
with enqueues. A good example is of locks on tables. We allow varying
levels of sharing on tables e.g. two processes can lock a table in
share mode or in share update mode etc. One difference is that the
enqueue is obtained using an OS specific locking mechanism. An enqueue
allows the user to store a value in the lock, i.e the mode in which we
are requesting it. The OS lock manager keeps track of the resources
locked. If a process cannot be granted the lock because it is
incompatible with the mode requested and the lock is requested with
wait, the OS puts the requesting process on a wait queue which is
serviced in FIFO. Another difference between latches and enqueues is
that in latches there is no ordered queue of waiters like in enqueues.
Latch waiters may either use timers to wakeup and retry or spin (only
in multiprocessors). Since all waiters are concurrently retrying
(depending on the scheduler), anyone might get the latch and
conceivably the first one to try might be the last one to get

当一个进程需要访问SGA的数据结构的时候，需要通过LATCH去给资源上锁。当这个进程在处理这个数据结构的时候，闩锁一直锁住这个资源，当操作完成后，闩锁被释放。每个闩锁保护不同的数据，每个不同的闩锁有不同的名称。
Oracle使用“test and set”原子操作来操作闩锁，使用“set and free”原子操作来释放闩锁。当某个进程获得某个闩锁并开始进行操作的时候，其他的进程会等待这个闩锁被释放。比如有REDO LOG分配空间的闩锁、REDO LOG拷贝的闩锁和归档控制的闩锁，分别保护不同的REDO LOG操作。闩锁可以保证共享信息在被并发访问的时候保证数据的一致性。当某个闩锁的拥有进程突然死掉的时候，PMON会清除这些闩锁，使该闩锁保护的资源能够被其他的进程使用。
闩锁请求方式可以是“可等待申请”（willing-to-wait）和“不等待申请”(no wait)两种方式的，一般情况下，闩锁在WILLING-TO-WAIT模式下申请。使用WILLING-TO-WAIT模式申请的闩锁，如果上锁失败，那么该进程会等待一段事件后重试，直到获取到该闩锁，如果重试一定次数后还无法获得该闩锁，就返回失败。使用NO WAIT模式申请闩锁，如果上锁失败，不会进行等待，而直接返回上锁失败。有些闩锁总是采用WILLING-TO-WAIT模式申请，有些闩锁总是使用NO WAIT模式申请，也有些闩锁采用两种模式申请。
如果某个闩锁上锁请求失败，那么该进程会重试，如果重试的次数达到了_SPIN_COUNT数据库参数锁指定的值，那么该进程就会进入休眠，休眠结束后继续申请上锁，如果仍然失败，该进程将继续进入休眠。第一次休眠的时间为100毫秒，下一次休眠时间是上一次休眠时间的双倍。这样的操作会消耗大量的CPU资源。
当系统中使用某个闩锁的请求量十分大的时候，会产生大量的闩锁等待操作，这种情况下，就是发生了闩锁竞争。闩锁竞争激烈的时候，CPU使用率会居高不下，数据库的性能将大大下降

spin 并不是因为某些 latch 有多个而逐个尝试的意思。spin是 lock 住当前cpu 而尝试test-and-set 被其他cpu 占用的latch，这并不意味着latch有多个。
至于pin，和spin 完全是两个独立的概念，pin是进程的 *execute* ,某一非常小的时间段内访问某个内存对象(object heap / buffer ....)。 pin是针对oracle内部内存资源来说，而spin是针对cpu资源而说。
pin维护的对象，是比latch更细的一个层次，是具体到结构的单元，比如buffer cache 中一个buffer有pin状态，控制对象被访问，但多个buffer挂在同一个 list(cache buffer chain) 下，一个list拥有一个latch,对于buffer cache 来说，从另一个侧面过来的 buffer cache 还有有 write list and free list两大类list ,从这个角度，一个list 也拥有一个latch。latch实际上很多时候维护了一组buffer，而具体每个buffer又有pin状态（pin表示当前buffer正被进程访问，也可以说执行，也可以说使用,pin有共享或者独享标志）。

Waiting for a latch

ocp书上说latch的实现与os和平台有关
可以简单把其看作一块内存位置（0和非0代表其是否占用）
单cpu得时候，进程请求latch被占用时，其立即释放cpu资源，sleep然后再检查
  问题1：是醒了再次检查，还是有通知机制（另外进程释放后通知）
      2：此时这个等待得进程的触发时间就是latch free wait event吗？
多cpu得时候，占用latch资源得进程可能使用其他cpu，此时需求latch资源的进程会继续保持cpu并且spins（计数器吗？），计数到spins数值，然后再次查看latch资源，多次重复此步骤（次数和计数数值与os有关？不知道怎样相关，以及怎样的数－－问题3），如果还没有获得资源，再sleep。给出的理由是，latch资源的保持和检测时间以及代价比cpu资源的保持和释放代价要小
  问题4：这个理由不太好理解

1：是醒了再次检查，还是有通知机制（另外进程释放后通知）

自己设置定时机制，sleep，wake up   和其他进程释放后的通知机制  是两种不同类型的唤醒机制
缺省状况下，只有 libaray cache  and shared  pool  两种latch 是通知唤醒机制

      2：此时这个等待得进程的触发时间就是latch free wait event吗？

sleep  时间就是 latch  free  wait  event 时间

3: 一个进程在占用latch的同时是否使用其他cpu，这是一个值得考究的问题。通常来说，一个进程在run一个连续任务的时候不应该随意使用不同的cpu，因为这必然涉及到context的切换等等，消耗比较大

4：  一个进程为什么要spin而不sleep，这是因为sleep会释放cpu，其他进程获得cpu再执行，这个过程需要切换context（包括cpu的cache、寄存器等等都要变化,更何况一个进程从sleep到执行，在os上还可能会经历就绪态、运行态的转换），切换context的代价，比进程spins 2000(缺省) 次的代价要大，所以才oracle才采取这个策

Latch及latch冲突

http://www.net130.com　　发布日期：2004-6-6

浏览次数:

Latch及latch冲突

引言
Oracle Rdbms应用了各种不同类型的锁定机制，latch即是其中的一种，本文将集中介绍latch(闩)的概念，理解latch的实现方法并说明引起latch冲突的原因。

什么是latch

Latch是用于保护SGA区***享数据结构的一种串行化锁定机制。Latch的实现是与操作系统相关的，尤其和一个进程是否需要等待一个latch、需要等待多长时间有关。

Latch是一种能够极快地被获取和释放的锁，它通常用于保护描述buffer cache中block的数据结构。与每个latch相联系的还有一个清除过程，当持有latch的进程成为死进程时，该清除过程就会被调用。Latch还具有相关级别，用于防止死锁，一旦一个进程在某个级别上得到一个latch，它就不可能再获得等同或低于该级别的latch。

Latch与Enqueue（队列）

Enqueue是Oracle使用的另一种锁定机制，它更加复杂，允许几个并发进程不同程度地共享某些资源。任何可被并发使用的对象均可使用enqueue加以保护。一个典型的例子是表的锁定，我们允许在一个表上有不同级别的共享。与latch不同之处在于，enqueue是使用操作系统特定的锁定机制，一个enqueue允许用户在锁上存贮一个标记，来表明请求锁的模式。操作系统lock manager跟踪所有被锁定的资源，如果某个进程不能获取它所请求的那种锁，操作系统就把请求进程置于一个等待队列中，该队列按FIFO原则调度，而在latches中是没有象enqueue中排序的等待队列，latch等待进程要么使用定时器来唤醒和重试，要么spin(只用于多处理器情况下)。

何时需要latch

当一个进程准备访问SGA中的数据结构时，它就需要获得一个latch。当进程获得latch后，它将一直持有该latch直到它不再使用此数据结构，这时latch才会被释放。可通过latch名称来区分它所保护的不同数据结构。

Oracle使用元指令对latch进行操作, 当所需的latch已被其他进程持有时，执行指令进程将停止执行部分指令，直到该latch被释放为止。从根本上讲，latch防止并发访问共享数据结构，由于设置和释放latch的指令是不可分割的，操作系统就可以保证只有一个进程获得latch，又由于这只是单条指令，所以执行速度很快。latch被持有的时间是很短，而且提供了当持有者不正常中断时的清除机制，该清除工作是由Oracle后台进程PMON来完成的。

什么导致latch冲突

Latch保护SGA中的数据结构被多个用户同时访问，如果一个进程不能立即得到所需latch，它就必须等待，这就导致了CPU的额外负担和系统的速度降低。额外的CPU使用是进程‘spining’导致的，‘spining’是指进程定时地重复尝试获取latch，在连续两次之间，进程处于休眠状态，在得到latch之前，spining过程将重复进行下去。

如何标识内部latch的冲突

Server manager monitor是一个相当有用的来监视latch等待、请求和冲突的工具。也可查询相关的数据字典表：v$latch, v$latchholder, v$latchname。

v$latch表的每一行包括了对不同类型latch的统计，每一列反映了不同类型的latch请求的活动情况。不同类型的latch请求之间的区别在于，当latch不可立即获得时，请求进程是否继续进行。按此分类，latch请求的类型可分为两类：willing-to-wait和immediate。

Willing-to-wait : 是指如果所请求的latch不能立即得到，请求进程将等待一很短的时间后再次发出请求。进程一直重复此过程直到得到latch。

Immediate：是指如果所请求的latch不能立即得到，请求进程就不再等待，而是继续执行下去。

在v$latch中的以下字段反映了Willing-to-wait请求：

GETS---成功地以Willing-to-wait请求类型请求一个latch的次数。

MISSES---初始以Willing-to-wait请求类型请求一个latch不成功的次数。

SLEEPS---初始以Willing-to-wait请求类型请求一个latch不成功后，进程等待获取latch的次数。

在v$latch中的以下字段反映了Immediate类请求：

IMMEDIATE_GETS---以Immediate请求类型成功地获得一个latch的次数。

IMMEDIATE_MISSES---以Immediate请求类型请求一个latch不成功的次数。

我们可以通过对v$latch, v$latchholder, v$latchname的查询获得有关latch信息，例如：

/* 已知一个latch地址，找到latch名字 */

col name for a40

select a.name from v$latchname a, v$latch b

where b.addr = '&addr'

and b.latch#=a.latch#;

/* 显示系统范围内的latch统计 */

column name format A32 truncate heading "LATCH NAME"

column pid heading "HOLDER PID"

select c.name,a.addr,a.gets,a.misses,a.sleeps,

a.immediate_gets,a.immediate_misses,b.pid

from v$latch a, v$latchholder b, v$latchname c

where a.addr = b.laddr(+)

and a.latch# = c.latch#

order by a.latch#;

/* 由latch名称显示对latch的统计 */

select c.name,a.addr,a.gets,a.misses,a.sleeps,

a.immediate_gets,a.immediate_misses,b.pid

from v$latch a, v$latchholder b, v$latchname c

where a.addr = b.laddr(+) and a.latch# = c.latch#

and c.name like '&latch_name%' order by a.latch#;

latch有40余种，但作为DBA关心的主要应有以下几种：

Cache buffers chains latch: 当用户进程搜索SGA寻找database cache buffers时需要使用此latch。

Cache buffers LRU chain latch: 当用户进程要搜索buffer cache中包括所有 dirty blocks的LRU (least recently used) 链时使用该种latch。

Redo log buffer latch: 这种latch控制redo log buffer中每条redo entries的空间分配。

Row cache objects latch: 当用户进程访问缓存的数据字典数值时，将使用Row cache objects latch。

下面我们将着重介绍一下如何检测和减少redo log buffer latch的冲突。对redo log buffer的访问是由redo log buffer latch来控制的，这种latch有两种类型， redo allocation latch和redo copy latch。

Redo allocation latch控制redo entries在redo log buffer中的空间分配。Oracle的一个用户进程只有得到redo allocation latch后才能为redo entries在redo log buffer中分配空间，又由于一个instance只有一个redo allocation latch，所以一次只有一个用户进程在buffer中分配空间。当用户进程获得latch后，首先为redo entry分配空间，然后进程继续持有latch并拷贝entry到buffer中，这种拷贝称为“在redo allocation latch上的拷贝”(copying on the redo allocation latch)，拷贝完毕后，用户进程释放该latch。

一个“在redo allocation latch上的拷贝”的redo entry的最大值是由初始化参数LOG_SMALL_ENTRY_MAX_SIZE定义的，根据操作系统的不同而不同。

Redo Copy Latch只应用于多CPU的系统。在多CPU的instance中，如果一个redo entry太大，超过了LOG_SMALL_ENTRY_MAX_SIZE定义值，则不能进行“在redo allocation latch上的拷贝”, 此时用户进程必须获取redo copy latch。一个instance中可以有多个redo copy latch，其数目由初始参数LOG_SIMULTANEOUS_COPIES决定，缺省值为CPU数目。

在单CPU情况下，不存在redo copy latch，所有的redo entry无论大小, 都进行“在redo allocation latch上的拷贝”。

对redo log buffer的过多访问将导致redo log buffer latch的冲突，latch冲突将降低系统性能，我们可通过如下查询来检测这种latch冲突：

col name for a40

SELECT ln.name,gets,misses,immediate_gets,immediate_misses

FROM v$latch l,v$latchname ln

WHERE ln.name IN('redo allocation','redo copy') AND ln.latch#=l.latch#

若misses与gets的比例超过1%或immediate_misses与(immediate_gets+immediate_misses)比例超过1%时，应考虑采取措施减少latch的冲突。

大多数的redo log buffer latch冲突是在多个CPU情况下，两个或多个Oracle进程试图同时得到相同的latch发生的。由于一个instance只有一个redo allocation latch，为减少redo allocation latch的冲突，应减少单个进程持有latch的时间，这可以通过减小初始参数LOG_SMALL_ENTRY_MAX_SIZE以减小redo entry的数目和大小来实现。如果观察到有redo copy latch冲突，可以通过增大LOG_SIMULTANEOUS_COPIES 初始参数来加大latch数目，其缺省值为CPU数目，最大可增大到CPU数目的两倍。

http://www.oracle.com.cn/viewthread.php?tid=38808