Oracle RAC Cache Fusion 机制详解

来源:互联网 发布:stm32f103ret6编程手册 编辑:程序博客网 时间:2024/05/14 01:29
转自:http://blog.csdn.net/tianlesoftware/article/details/6534239

Introduction

      This post is about Oracle Cache Fusion technology, which isimplemented in Oracle database 10g RAC. We are going to discussjust about cache fusion. You should have the architecture knowledgeabout RAC. Please check Oracle documentation for understandingOracle RAC architecture. Also you can visit my previous post aboutOracle RAC installation to get some basic information andinstallation details.

      Cache fusion technology was partiallyimplemented in Oracle 8i in OPS (Oracle Parallel Server).Before Oracle 8i the situation was different. If we take a case ofmulti-instance Oracle Parallel server and if one of the instanceask for a block of data which is currently modified by otherinstance of same database, then the holding instance needs to writethe data to disk so that requesting instance can read the samedata. This is called “Disk Ping”. This has greatly effected theperformance of the database. With Oracle8i, partial cache fusion wasimplemented.

      Oracle 8i (Oracle Parallel Server) has a background process called“Block Server Process” which was responsible for cache fusion inOracle 8i OPS. Following table gives the scenario when cache fusionworks in Oracle 8i OPS and scenario where cache fusion was notworking. Of course these limitations are not present in Oracle 10gRAC.

    

 

      So when requesting instance ask for a block which is present inholding instance in a read or write mode and if the block isdirtied, then cache fusion used to work and block from cache ofholding instance used to get copied to requesting instance. But ifblock is not dirtied and block is present in holding instance thenrequesting instance has to read the block from datafile. Also ifthe block is opened for write in holding instance and otherinstance wants to update the same block then holding instance haveto write the block to disk so that requesting instance can readit.

 

Concept of cachefusion

      Cache Fusion basically is about fusing thememory buffer cache of multiple instance into one singlecache. For example if we have 3 instance in a RAC which isusing the same datafiles and each instance is having its own memorybuffer cache in there own SGA, then cache fusion will make thedatabase behave as if it has a single instance and the total buffercache is the sum of buffer cache of all the 3 instance. Belowfigure shows what I mean.

            

 

      This behavior is possible because of highspeed interconnect existing in the cluster between eachinstance. Each of instance is connected to other instanceusing a high-speed interconnect. This makes it possible to sharethe memory between 2 or more servers. Previously only datafilesharing was possible, now because of interconnect, even the cachememory can be shared.

 

      But how this helps? Well, for example if we have a data block inone of the instance and its updating the block and other instanceneeds the same data block then this data block can be copied fromholding instance buffer cache to requesting instance buffer cacheusing this high-speed interconnect. Thishigh speed interconnect is a private connection made just forsending data blocks and more by instances. External userscannot use this connection. It is this interconnect which makesmultiple server behave like a cluster. These servers are bind together using thisinterconnect.

 

      Moving further, now we know how the cluster is formed and what isthe back bone of cluster and what exactly we call “cache fusion”.Next we will see how cache fusion works. But before that we need todiscuss few important headings which is very important tounderstand.

 

We willdiscuss following topics before discussing CacheFusion

1Cache Coherency

2Multi-Version consistencymodel

3Resource Co-ordination –Synchronization

4Global Cache Service(GCS)

5Global Enqueue Service

6Global Resource Directory

7GCS resource modes androles

8Past Images

9Block access modes and bufferstates

     I promise this wont be too heavy. Lets look into the overview ofthese concepts. I wont be going into the details, just sufficientfor you to understand cache fusion.

 

2.1 CacheCoherency

      If we consider a single instance database, whenever a user queriesfor data he gets a consistent view of data. For example anotheruser has already read a block of data and changed some rows inbuffer cache. If another user want to read the data from same datablock then Oracle will make a copy of that data block in buffercache and apply the undo information present in undo tablespace toget a consistent view of data. This consistent data is thenpresented to user who wants to read the data.This is calledmaintaining consistency ofdata.
      Now consider a multi instance system RAC, where a data block mightnot be present in same instance. A user might be updating datablock in some other instance. If data block are already availablein local instance then they will be immediately available to theuser. if they are present in some other instance with in thecluster, they will be transfered into local buffer cache.
      Maintaining the consistency of data blocks in the buffer cache ofmultiple instance is called “CacheCoherency”.

 

 

2.2 Multi-Version consistency model

      Multi version consistency modeldistinguishes between current version of data block and one or moderead consistent version of data block. The current block isthe one which contains all the changes, committed as well asuncommitted. Example a user fired a DML on a data block which isnot present in any of the instance. Then this block will be readfrom disk into buffer cache where the value gets changed. Afterthen user commits and fires another DML on same data block. Nowthat data block is dirty and contains committed as well asuncommitted changes.
      Suppose this data block is requested by another user for reading,then oracle will make a copy and apply undo information andmake a Consistent Read “CR” copy of thisblock and ship it to requesting instance. Thus we havemultiple versions of same data blocks, each of them are consistentwith respect to the user who requested.
      During the course of operation there canbe many more version of same data block, each of them consistentwith respect to some point in time.

 

 

2.3 Resource Co-ordination –Synchronization

      In case of multi instance system such as RAC, where same resources(example data block) are getting used concurrently, effectivesynchronization is required for maintaining consistency.With in the shared cache, co-ordination ofconcurrent task is called synchronization. Thesynchronization provided by Oracle RAC provides a cluster wideconcurrency of resource and in turn ensure integrity of shareddata. All though there is synchronization within the cache, thereis some cost involved for doing the same. If we talk about lowlevel operation of synchronization, its just a data copy operationor data transfer operation.
      According to Oracle studies, accessing the block in a local cacheis much faster then accessing the block from another instance cachewith in the cluster. Because with local cache is the in memory copyand with other instance cache, the data transfer needs to be doneover high speed interconnect which is obviously slower then inmemory copy. Worst is the copy from disk,which is much slower then above two process. Below graphshows the block access time using these 3methods.

 

Forexample:

      Block access in local cache ~ 0.01 msec

      Block access in remote cache ~ 2.5 msec

      Block access on disk ~ 14 msec+

 

 

2.4 Global Cache Service

      Global Cache Service (GCS) is the main component of Oracle CacheFusion technology. This is represented bybackground process LMSn. There canbe max 10 LMS process for an instance. The main function ofGCS is to track the status and location of data blocks. Status ofdata block means the mode and role of data block (I will explainmode and role further). GCS is the main mechanism by which cachecoherency among “multiple cache” is maintained. GCS is also responsible for block transfer between theinstances.

 

2.5 Global Enqueue Service

      Global Enqueue Service (GES) tracks the status of all Oracleenqueuing mechanism. This involves allnon-cache fusion intra instance operations. GES performsconcurrency control on dictionary cache locks, library cache locksand transactions. If performs this operation for resources that areaccessed by more then once instance.
      Enqueue services are also present in single instance database.These are responsible for locking the rows on a table usingdifferent locking modes. To understand more about enqueues,check
Oracledocumentation aboutlocking.

 

2.6 Global Resource Directory

      GES and GCS together maintains GlobalResource Directory (GRD). GRD is like an in-memory databasewhich contains details about all theblocks that are present in cache. GRD know what is thelocation of latest version of block, what is the mode of block,what is the role of block (Mode and role will be discussed shortly)etc. When ever a user ask for any data block GCS gets all theinformation from GRD. GRD is a distributedresource, meaning that each instance maintain some part ofGRD. This distributed nature of GRD is a key to faulttolerance of RAC. GRD is stored inSGA.

 

Typically GRDcontains following and more information

      1Data Block Address – This is the address of datablock being modified

      2Location of most current version of datablock

      3Modes of data block

      4Roles of data block

      5SCN number of data block

      7Image of data block – Could be current image or pastimage.

 

2.7 GCS resource modes and roles

      Mode of data block is decided basedon whether a resource holder intends to modify the data or read thedata. The modes are as follows:

      1Null (N) Mode: Null mode is the least restrictive mode.It indicates no access rights. Itacts as a place holder.

      2Shared (S) Mode: Shared mode indicate that database blockis being read and not modified. However another session can readthe data block

      3Exclusive (X) Mode:Exclusive mode indicateexclusive access to block. Other resource cannot have write overthis data block. However it can haveconsistent read on this datablock.

 

GCS resources alsohas roles. Following are the different rolespresent:

      1Local: When a data block is first read into the instancefrom the disk it has a local role. Meaningthat only 1 copy of data block exists in the cache. No otherinstance cache has a copy of this block.

      2Global: Global role indicates that multiple copy of data block exists in clusteredinstance. For example a user connected to one of theinstance request for a data block. This data block is read fromdisk into an instance. The role granted is local. If anotherinstance request for same block this block will get copied to therequesting instance and the role becomesglobal.

 

      This role and mode information is maintained in GRD (GlobalResource Directory) by GCS (Global CacheService).

 

 

2. 8 PastImages

      Past Image concept was introduced in Oracle 9i to maintain dataintegrity. In an Oracle database, a typical block is not written todisk immediately after it is dirtied. This is to reduce excessiveIO. When the same dirty block is requested by some other instancefor write of read purpose, an image of theblock is created in owning instance and then the block is shifted to requestinginstance. This image copy of the block is called Past Image(PI). In the event of failure Oracle canreconstruct the block by reading PIs. It is also possible tohave more then 1 PI of the block, depending on how many times theblock was requested in dirty stage.

      A past image of the block is different then CR (Consistent read)image. Past image is required to create CR by applying undodata.

 

“Juggling”Data with Multiple Past Images

      1Multiple Past Image versions of a data block maybe kept by different instances

      2Upon a checkpoint, only the current image iswritten to disk; Past Images are discarded

      3In the event of a failure, current version ofblock can be reconstructed from PIs

      4Since PIs are kept in memory, they aid in avoidingfrequent disk writes

      5This avoids “disk pinging” experienced with 8i OPSdue to frequent writes to disk

      6Data is “juggled” in memory, without touching downon the disk

 

Oracle RAC PastImage(PI) 说明

http://blog.csdn.net/tianlesoftware/archive/2011/06/07/6529870.aspx

 

 

2.9 Blockaccess modes and buffer states

      An additional concurrency control concept is thebuffer state which is the state of a bufferin the local cache of an instance. The buffer state of a blockrelates to the access mode of the block. For example, if a bufferstate is exclusive current (XCUR),an instance owns the resource in exclusive mode.
      To see a buffer’s state, query the “status” column of theV$BH dynamic performanceview.

 

      The V$BH view provides information aboutthe block access mode and their buffer state names asfollows:

      1With a block access mode of NULL thebuffer state name is CR — An instance can perform a consistent read of the block.That is, if the instance holds an older version of thedata.

      2With a block access mode of S the buffer state name is SCUR — An instance hasshared access to the block and can only performreads.

      3With a block access mode of X the buffer state name is XCUR –An instance hasexclusive access to the block and can modifyit.

      4With a block access mode of NULL thebuffer state name is PI — An instance has made changes to the block but retainscopies of it as past images to record its state beforechanges.

 

 

      Only the SCUR and PI buffer states areReal Application Clusters-specific. There can beonly one copy of any one block buffered inthe XCUR state in the cluster database at any time. Toperform modifications on a block, a process must assign an XCURbuffer state to the buffer containing the data block.
      For example, if another instance requests read access to the mostcurrent version of the same block, then Oracle changes the accessmode from exclusive to shared, sends a current read version of theblock to the requesting instance, and keeps a PI buffer if thebuffer contained a dirty block.

      At this point, the first instance has the current block and therequesting instance also has the current block in shared mode.Therefore, the role of the resourcebecomes global. There can be multiple shared current (SCUR) versions of this blockcached throughout the cluster database at anytime.

 

 

Block transfer usingCache Fusion

      Lets consider a very details example of how the block transferhappens between different instances. For explaining this example Iam assuming a 3 node RAC system and also another assumption is thatany DML statement is followed by a commit. So if I say that a user executed update that meansuser executed update + commit. But there is no checkpoint until theend.

 

Stage1

      In stage 1 datablock is requested by a user C who is connected toinstance 3. So a data block is read into the buffer cache ofinstance 3.

 

SQL>select sales_rank fromsalesman where salesid = 10;

 

      Assume this gives a value of 30. This block is read for the firsttime and its not present in any other instance. So the role ofblock is LOCAL and the block is read in SHARED mode. Also there areNO PAST IMAGES. So we describe this stage has instance 3 havingSL0 mode (SHARED, LOCAL, 0 PASTIMAGES).

 

关于这些Lock Modes,在我的Blog里有说明:

      Oracle RAC Past Image(PI) 说明

      http://blog.csdn.net/tianlesoftware/archive/2011/06/07/6529870.aspx

       

 

Stage2

     In stage 2 user B issues the same select statement against thesalesman table. Instance 2 will need the same block; therefore, theblock is shipped from instance 3 to instance 2 via cache fusioninterconnect. There is no disk read atthis time. Both instances are in SHARED mode (S) and role isLOCAL (L). Here if you see carefully that even though the block ispresent in more then once instance, still we say that role is localbecause the block is not yet dirtied. Hadthe block been dirty and then requested by other instance, then inthat case the role will change toglobal.

 

Stage3

      In stage 3 user B decides to update the row and commit at instance2. The new sales rank is 24. At this stage, instance 2 acquiresEXCLUSIVE lock for updating the data at instance 2 and SHARED lock from instance 3 is downgraded to NULLlock.

      SQL>update salesman set sales_rank = 24 where salesid =10;

      SQL>commit;

 

     So instance 2 is having a mode XL0 (Exclusive, Local with 0 pastimages) and instance 3 is having a NULL lock, which is just a placeholder. Also the role of the block isstill LOCAL because the block is dirtied for the first time only oninstance 2 and no other instance is having any dirty copy ofthat. If another instance now tries to update same block therole will change to global.

 

Stage4

      In stage 4 user A decides to update in instance 1 the same row andhence the same block with sales rank of 40. It finds that block isdirtied in instance 2. Therefore the data block is shipped toinstance 1 from instance 2, however, aPAST IMAGE of the data block is created on instance 2 and lock modeon instance 2 is downgraded to NULL with a GLOBAL role.Instance 2 now has NG1 (NULL lock with GLOBAL role and 1 PASTIMAGE). At this time instance 1 will have EXCLUSIVE lock withGLOBAL role (XG0).

 

Stage5

      User C executes a select statement from instance 3 on same row. Thedata block from instance 1 being the most recent copy (GRD (GlobalResource Directory) knows this information about which instance ishaving the latest copy of data block), it is shipped to instance 3.As a result the lock on instance 1 isconverted to SHARED GLOBAL with 1 PAST IMAGE. The reason thelock gets changed to SHARED and not NULL is because instance 3 asked for shared lock (for readingdata) and not exclusive lock (for updating data). If theinstance 3 asked for exclusive lock then the instance 1 would havehad NULL lock.

      Also the instance 3 will now hold SG0 (SHARED, GLOBAL with 0 PASTIMAGES).

            

 

Stage6

      User B issues the same select statement against the salesman tableon instance 2. Instance 2 will request for a consistent copy ofbuffer from another instance, which happens to be the currentmaster.
      Therefore instance 1 will ship the block to instance 2, where itwill be required with SG1 (SHARED, GLOBAL with 1 PAST IMAGE).So instance 2 mode becomesSG1.

 

Stage7

      User C on instance 3 updates the same row. Therefore the instance 3requires an exclusive lock and instance 1 and instance 2 will bedowngraded to NULL lock with GLOBAL role and 1 PAST IMAGE.Instance 3 will have EXCLUSIVE lock,GLOBAL role and with no PAST IMAGES(XG0).

 

Stage8

      The checkpoint is initiated and a “Write to Disk” takes place atinstance 3. As a result previous past images will be discarded (asthey are not required for recovery) and instance 3 will hold thatblock in EXCLUSIVE lock LOCAL role with no PAST IMAGES(XL0).

      Further if any instance wants to read or write on the same blockthen a copy will be again shifted from instance3.

 

图示:

Read/Read Cache Fusion –GCS Processing

 

         

 

Write/Write CacheFusion – GCS Processing

 

        

         

Blocks to Disk –GCS Processing

 

                   

 

 

.  Online InstanceRecovery Steps

 

步骤如下:

1Instance Failure detected by Cluster Manager andGCS

2Reconfiguration of GES resources (enqueues);global resource directory isfrozen

3Reconfiguration of GCS resources; involvesredistribution among surviving instances

4One of the surviving instances becomes the“recovering instance”

5SMON process of recovering instance starts firstpass of redo log read of the failed instance’s redo logthread

6SMON finds BWR (block written records) in the redoand removes them as their PI is already written todisk

7SMON prepares recovery set of the blocks modifiedby the failed instance but not written todisk

8Entries in the recovery list are sorted by firstdirty SCN

9SMON informs each block’s master node to takeownership of the block for recovery

10Second pass of log readbegins. 

11Redo is applied to the datafiles.

12Global Resource Directory isunfrozen

0 0
原创粉丝点击