如何保证transaction commit的完整性

来源：互联网发布：java zip文件加密编辑：程序博客网时间：2024/05/16 15:58

欢迎和我探讨数据库技术，可以给我发信：tieyingz@cs.cmu.edu

当commit涉及到多行的时候，并且提供的是行锁，那么就涉及到如何保证多行commit的完整性，不至于1行提交了，另外一行没有提交，那么concurrent txn会读到不同的结果。无论是单机事务还是分布式事务都涉及到这个问题。

如何解决上述问题？需要一个全局锁（确切的说应该叫公共flag），至于这个锁如何实现，不同数据库不同方法。例如Oracle对表进行加锁，这样多行commit完再对表锁进行释放。而对于分布式事务，则需要一个全局锁，因为涉及到多个节点。Oracle把全局锁放在了commit site上。我们可以想象成一个中心server上有一个meta表，一个txn访问数据的时候如果该数据已经写入了，但是version还是没有被置上，就去meta表里查看对应数据是否已经被标记成commit了。

因此要保证commit的原子性需要性能上的代价，特别是分布式事务中，会引来额外的RPC开销。如果上面提到的flag放在本地则会大大减少RPC带来的影响。草案如下：

有一个GlobalMetaTable，同时每个节点有LocalMetaTable。GlobalMeta针对GlobalTxn，LocalMeta针对LocalTxn和GlobalTxn。对于GlobalTxn，其commit开始时，首先获得GlobalCommitId，更新GlobalMetaTable对应的那条Tuple，然后同步其涉及到的所有节点的LocalMetaTable。

下面转了一个两阶段提交的过程，其特点是阻塞，来保证提交过程的原子性！

前提条件

系统节点分为：其中一个节点被设置为协调者(co-ordinator)，其他节点设置为参与者(cohort)。

假定在每个节点上都有一个使用write-ahead log的稳定数据存储节点，说白了可以打日志，而且日志在persistent 存储中。如果要成功，那么在收到commit前没有节点崩溃，write-log的日志保证不丢失，并且协调者和任意参与者都可以互相通信。所以如果要成功的话，条件还是很严格的。

算法

分两个阶段

提交请求阶段(或者叫做投票阶段)

1.协调者发送一个query to commit消息给所有的cohorts，等待直到收到所有cohorts的回复。

2.cohorts在本地节点执行事务(之后协调者会要求提交这个事务)，写本地的redo和undo日志

3.每一个cohorts，如果执行成功，回复一个agreement消息(假如cohorts同意执行commit)；如果执行失败，回复一个abort消息。(两阶段提交协议)。

提交阶段(或者叫完成阶段)
成功

如果协调者接收到所有参与者发送回来的agreement消息：

1.协调者发送一个commit消息给所有的cohorts

2.每一个参与者完成commit操作，(两阶段提交协议)释放所有事务处理过程中使用的锁资源

3.每一个参与者回复一个acknowledgment给协调者

4.协调者在收到所有acknowledgment消息之后完成整个操作

失败

如果任何一个参与者在提交请求阶段回复abort消息给协调者:

1.协调者回复一个rollback消息给所有的cohorts

2.每一个参与者执行本地事务的undo操作(根据undo日志记录)，并且释放事务执行过程中使用的资源和锁

3.每一个参与者给协调者回复acknowledgement消息(两阶段提交协议)。

4.协调者在接收到所有的参与者的acknowledgement消息之后执行事务undo操作

缺点

两阶段提交协议最大的缺点是：它是一个阻塞协议。当一个节点在等待回复消息时进入阻塞状态。其他需要这些资源的处理事务需要等待。如果协调者挂掉，cohorts将永远不能结束它们的事务，如下面的情况所述：

如果一个参与者发送agreement消息给协调者，它将进入阻塞状态直到收到回复的commit或者rollback 消息。如果这个时候协调者挂掉，并且不再恢复，这个cohort将一直阻塞(为什么这里不能使用等待超时的机制来abort掉这个事务呢？个人理解，如果一个参与者独自决定将这个未完成事务abort掉，可能导致全局的数据不一致，因为不知道其他节点是否执行了abort操作)，(两阶段提交协议)除非它可以从其他cohort那里获得全局的commit/abort消息

当协调者发送query to commit消息之后，它将阻塞直到收到所有参与者的回复消息。如果这个时候，一个参与者挂掉，并且不再能恢复，协调者用下面的方法来解除阻塞：因为协调者是唯一一个决定提交或回滚的节点，(两阶段提交协议)所以可以使用超时机制来解决阻塞问题。如果协调者在一段时间之内没有收到来自cohort的消息，它将不再等待，直接向所有的cohort发送abort消息。这是这个协议的又一个缺点：它倾向于abort这样的case，而不是完成这个case

两阶段提交中的故障处理：超时和重发机制

首先说明(源自《数据库与事务处理》一书，为什么要这样设计？)

协调者对事务开始和提交消息进行强制写入到非易失性介质中；

参与者对准备消息和提交消息进行强制写入到非易失性介质中。。(两阶段提交协议)。

写入非易失性介质中的日志用于在节点崩溃之后能查找到崩溃时节点在事务中的状态。

由两阶段提交协议的工作原理可见，之所以能够在不丢失运行记录信息的情况下.从所有故障中迅速恢复，就是因为在执行过程中维护了事务日志，记录了执行恢复所需要的信息。

现在来分析当发生不同类型故障时，2Pc的行为。

n 站点故障

n 丢失报文

网络分割

站点故障

(1)一参与者在把就绪记录写入运行记录以前出现故障。在这种情况下，协调者超时机制满期，它将采取撤消的决定。所有的参与者都撤销它们的子事务。当发生该故障的参与者恢复时，重启动过程简单地撤销该事务即可．不需要过问其他站点的情况
(2)一参与者在把就绪记录写入运行记录以后发生故障。在这种情况下，其他参与者的站点终止该事务(提交或撤消)。当故障站点恢复时，重启动过程不得不询问协调者或别的某个参与者关于该事务的结果(提交或撤消)，然后执行相应的动作(提交或撤消)。这种情况下需要访问远程的恢复信息.
(3)协调者在把预备记录写入运行记录以后，而在写入global-commit或global-abort记录以前发生故障。这种情况下所有已经回答READY的参与者必须等待协调者恢复。协调者的重启动过程从头开始恢复提交协议，从预备记录(在运行记录中)读取参与者的标识，再次把PREPARE(预备)报发送给它们。每个就绪的参与者必须要识别出该新的PREPARE报文是前一个的重复报文。
(4)协调者在远行记录中写入global-commit或global-abort记录以后而在写入完成记录以前发生故障。这种情况下，协调者在重启动时必须再次给所有参与者发送其决定,未曾收到此命令的所有参与者不得不等待到协调者恢复为止。和以前一样，参与者不应因收到该命令报文两次而受到影响。
(5)协调者在运行记录中写入完成以后发生故障。这种情况下,该事物已经结束,在重启动时不需任何动作。

丢失报文

(1)来自一个参与者的回答报文(READY或ABORT)被丢失。在这种情况下，协调者的超时满期，整个事务被撤销。要注意，只由协调者来发现这种故障，而从协调者的观点来看，它完全好像是一参与者的故障。但是．从参与者的观点来看情况就不同了，该参与者并不认为自己有故障，因而不会执行重启动过程。
(2)丢失一个PREPARE报文。这种情况下该参与者仍停在等待状态。因为协调者并没有收到回答，所以其全局结果和前一种情况相同。
(3)丢失一命令报文(commit或abort)。采用图4.15的协议时，该参与者对此命令处于不肯定状态。在参与者中引入超时机制就可简单地消除这个问题；从回答起在超时后仍末收到任何报文的话，就发送—请求再发送该命令。
(4)丢失一个AcK报文。协调者对参与者有无收到该报文处于不肯定状态。可以在协调者中引入超时机制就可简单地消除这个问题；如果从发出命令起到超时后仍未受到任何AcK报文，协调者就再次发送该命令。在参与者站点处理这种情况的最好办法是再次发送AcK报文，即使该子事务在那期间已经完成并不再活动也要重发。

网络分割

           这里假设发生了简单的网络分割情况，即把站点分成为两个组；包含协调者的组叫做协调者组，而另一组叫参与者组。从协调者的观点来看，这种分割等效于一组参与者的故障情况，与上述第1(1)和1(2)点相似：协调者作出决定然后把命令发送给协调者一组中的所有参与者，因而这些站点能够正确地结束事务。如从参与者组的成员观点来看，这种分割等效于协调者故障，情况与上述第1(3)和1(4)点相似。

             要注意，对于涉及处理分布式事务的站点来说，其恢复过程要比集中式数据库复杂。在集中式数据库中，只合两种可能；事务要么提交，要么不提交，所以恢复机构执行相应的重做或撤销动作。在分布式数据库中，还可能有其他情况

(1)一个参与者就绪(情况1(2))。
         (2)协调者已启动第1阶段(情况1(3))。
         (3)协调者已启动第2阶段(情况1(4))。
        这些情况分布式数据库管理系统中的恢复机制都能识别，并根据识别的情况作相应的处理。

其他：

1.协调者在等待投票消息时超时。协调者发送abort消息给所有参与者，终止事务

2.协调者在等待提交完毕消息时超时。协调者与参与者联系，确认参与者的提交完毕消息。如果协调者无法联系上这个cohort，无法知道它是否正常提交，则放弃，因为已提交完成的参与者无法做回滚操作了。

这时该怎么处理呢？因为其他参与者已经完成提交，不能对事务进行回滚。可以这样考虑：协调者在多次重试都无法得到完成提交的消息之后可以放弃，待参与者重新恢复自行处理，这时可能有两种情况：a.参与者在本地日志中发现已完成本地提交，所以可能由于网络故障导致提交完成消息没有到达协调者，所以直接忽略；b.参与者发现在本地日志中发现尚未提交成功，因为到达这里，可以肯定本地已做好提交准备，但是不知道协调者是决定提交，所以向协调者询问，按协调者的回复来进行提交或回滚)。(两阶段提交协议)。

3.协调者在发送准备到发送提交消息的这段时间中崩溃。协调者恢复重启后，发现并未做提交操作，保险操作(因为协调者不知道它是否发现欧诺个准备消息，或其他参与者是否做好提交准备)，直接发送abort消息给所有参与者，终止事务

4.协调者在发送提交消息之后崩溃。这种情况下，不能保证所有参与者都已收到了提交消息，所以给所有的参与者发送commit消息，保证事务的正常提交

5.参与者在等待commit或abort消息的时候崩溃。重启之后发现日志中有事务准备消息，尝试向协调者询问事务状态，根据回复做提交或异常终止。如果无法联系上协调者，则向其他cohorts询问事务状态，如果有某一个节点已经做了提交或异常终止(说明协调者已发送了相关消息)，则做同样的操作

6.参与者在收到commit消息，完成提交之后出现崩溃。这时可能协调者在等待该参与者的提交完成回应消息，所以参与者主动联系协调者告知事务状态。

个人总结：

两阶段提交协议其实是将集中式的提交协议(常用单机数据库的事务提交方法)的工程拆开成两个阶段，从commit这个步骤将其拆开，在commit之前，(两阶段提交协议)发送消息给协调者，等待协调者确认收集到所有的可提交消息之后，再执行提交操作。当然最后还需要参与者回复ack消息给协调者确认本地事务提交已成功

感兴趣的同学可以参考一下oracle的实现：简单说就是对表加锁，而不仅仅对行加锁。

https://docs.oracle.com/cd/B28359_01/server.111/b28310/ds_txns003.htm：

Places a distributed lock on modified tables, which prevents reads

Queries that start after a node has prepared cannot access the associated locked data until all phases complete. The time is insignificant unless a failure occurs (see"Deciding How to Handle In-Doubt Transactions").

Two-Phase Commit Mechanism

Unlike a transaction on a local database, a distributed transaction involves altering data on multiple databases. Consequently, distributed transaction processing is more complicated, because the database must coordinate the committing or rolling back of the changes in a transaction as a self-contained unit. In other words, the entire transaction commits, or the entire transaction rolls back.

The database ensures the integrity of data in a distributed transaction using thetwo-phase commit mechanism. In theprepare phase, the initiating node in the transaction asks the other participating nodes to promise to commit or roll back the transaction. During thecommit phase, the initiating node asks all participating nodes to commit the transaction. If this outcome is not possible, then all nodes are asked to roll back.

All participating nodes in a distributed transaction should perform the same action: they should either all commit or all perform a rollback of the transaction. The database automatically controls and monitors the commit or rollback of a distributed transaction and maintains the integrity of the global database (the collection of databases participating in the transaction) using the two-phase commit mechanism. This mechanism is completely transparent, requiring no programming on the part of the user or application developer.

The commit mechanism has the following distinct phases, which the database performs automatically whenever a user commits a distributed transaction:

PhaseDescriptionPrepare phaseThe initiating node, called theglobal coordinator, asks participating nodes other than the commit point site to promise to commit or roll back the transaction, even if there is a failure. If any node cannot prepare, the transaction is rolled back.Commit phaseIf all participants respond to the coordinator that they are prepared, then the coordinator asks the commit point site to commit. After it commits, the coordinator asks all other nodes to commit the transaction.Forget phaseThe global coordinator forgets about the transaction.

This section contains the following topics:

Prepare Phase
Commit Phase
Forget Phase

Prepare Phase

The first phase in committing a distributed transaction is the prepare phase. In this phase, the database does not actually commit or roll back the transaction. Instead, all nodes referenced in a distributed transaction (except the commit point site, described in the "Commit Point Site") are told to prepare to commit. By preparing, a node:

Records information in the redo logs so that it can subsequently either commit or roll back the transaction, regardless of intervening failures
Places a distributed lock on modified tables, which prevents reads

When a node responds to the global coordinator that it is prepared to commit, the prepared nodepromises to either commit or roll back the transaction later, but does not make a unilateral decision on whether to commit or roll back the transaction. The promise means that if an instance failure occurs at this point, the node can use the redo records in the online log to recover the database back to the prepare phase.

Note:

Types of Responses in the Prepare Phase

When a node is told to prepare, it can respond in the following ways:

ResponseMeaningPreparedData on the node has been modified by a statement in the distributed transaction, and the node has successfully prepared.Read-onlyNo data on the node has been, or can be, modified (only queried), so no preparation is necessary.AbortThe node cannot successfully prepare.

Prepared Response

When a node has successfully prepared, it issues a prepared message. The message indicates that the node has records of the changes in the online log, so it is prepared either to commit or perform a rollback. The message also guarantees that locks held for the transaction can survive a failure.

Read-Only Response

When a node is asked to prepare, and the SQL statements affecting the database do not change any data on the node, the node responds with aread-only message. The message indicates that the node will not participate in the commit phase.

There are three cases in which all or part of a distributed transaction is read-only:

CaseConditionsConsequencePartially read-onlyAny of the following occurs:

Only queries are issued at one or more nodes.
No data is changed.
Changes rolled back due to triggers firing or constraint violations.

The read-only nodes recognize their status when asked to prepare. They give their local coordinators a read-only response. Thus, the commit phase completes faster because the database eliminates read-only nodes from subsequent processing.Completely read-only with prepare phaseAll of following occur:

No data changes.
Transaction is not started withSET TRANSACTION READ ONLY statement.

All nodes recognize that they are read-only during prepare phase, so no commit phase is required. The global coordinator, not knowing whether all nodes are read-only, must still perform the prepare phase.Completely read-only without two-phase commitAll of following occur:

No data changes.
Transaction is started with SET TRANSACTION READ ONLY statement.

Only queries are allowed in the transaction, so global coordinator does not have to perform two-phase commit. Changes by other transactions do not degrade global transaction-level read consistency because of global SCN coordination among nodes. The transaction does not use undo segments.

Note that if a distributed transaction is set to read-only, then it does not use undo segments. If many users connect to the database and their transactions arenot set toREAD ONLY, then they allocate undo space even if they are only performing queries.

Abort Response

When a node cannot successfully prepare, it performs the following actions:

Releases resources currently held by the transaction and rolls back the local portion of the transaction.
Responds to the node that referenced it in the distributed transaction with an abort message.

These actions then propagate to the other nodes involved in the distributed transaction so that they can roll back the transaction and guarantee the integrity of the data in the global database. This response enforces the primary rule of a distributed transaction:all nodes involved in the transaction either all commit or all roll back the transaction at the same logical time.

Steps in the Prepare Phase

To complete the prepare phase, each node excluding the commit point site performs the following steps:

The node requests that its descendants, that is, the nodes subsequently referenced, prepare to commit.
The node checks to see whether the transaction changes data on itself or its descendants. If there is no change to the data, then the node skips the remaining steps and returns a read-only response (see"Read-Only Response").
The node allocates the resources it needs to commit the transaction if data is changed.
The node saves redo records corresponding to changes made by the transaction to its redo log.
The node guarantees that locks held for the transaction are able to survive a failure.
The node responds to the initiating node with a prepared response (see "Prepared Response") or, if its attempt or the attempt of one of its descendents to prepare was unsuccessful, with an abort response (see"Abort Response").

These actions guarantee that the node can subsequently commit or roll back the transaction on the node. The prepared nodes then wait until aCOMMIT orROLLBACK request is received from the global coordinator.

After the nodes are prepared, the distributed transaction is said to be in-doubt (see "In-Doubt Transactions").It retains in-doubt status until all changes are either committed or rolled back.

Commit Phase

The second phase in committing a distributed transaction is the commit phase. Before this phase occurs,all nodes other than the commit point site referenced in the distributed transaction have guaranteed that they are prepared, that is, they have the necessary resources to commit the transaction.

Steps in the Commit Phase

The commit phase consists of the following steps:

The global coordinator instructs the commit point site to commit.
The commit point site commits.
The commit point site informs the global coordinator that it has committed.
The global and local coordinators send a message to all nodes instructing them to commit the transaction.
At each node, the database commits the local portion of the distributed transaction and releases locks.
At each node, the database records an additional redo entry in the local redo log, indicating that the transaction has committed.
The participating nodes notify the global coordinator that they have committed.

When the commit phase is complete, the data on all nodes of the distributed system is consistent.

Guaranteeing Global Database Consistency

Each committed transaction has an associated system change number (SCN) to uniquely identify the changes made by the SQL statements within that transaction. The SCN functions as an internal timestamp that uniquely identifies a committed version of the database.

In a distributed system, the SCNs of communicating nodes are coordinated when all of the following actions occur:

A connection occurs using the path described by one or more database links
A distributed SQL statement executes
A distributed transaction commits

Among other benefits, the coordination of SCNs among the nodes of a distributed system ensures global read-consistency at both the statement and transaction level. If necessary, global time-based recovery can also be completed.

During the prepare phase, the database determines the highest SCN at all nodes involved in the transaction. The transaction then commits with the high SCN at the commit point site. The commit SCN is then sent to all prepared nodes with the commit decision.

Forget Phase

After the participating nodes notify the commit point site that they have committed, the commit point site can forget about the transaction. The following steps occur:

After receiving notice from the global coordinator that all nodes have committed, the commit point site erases status information about this transaction.
The commit point site informs the global coordinator that it has erased the status information.
The global coordinator erases its own information about the transaction.

0 0