Quorum 机制在开源分布式系统中的应用

来源：互联网发布：安慕希网络促销方案编辑：程序博客网时间：2024/06/05 23:39

- 目录
- Introduction
  - Quorum-based voting in commit protocols
  - References
- Quorum 机制应用 - Zookeeper
  - Read operations
  - Write operations
  - Zookeeper transaction
  - 问题 - 是否会读到不一致的数据多个客户端是否会有不同的视图
  - Leader Election and Atomic Broadcast
- Quorum 机制应用 - Redis Sentinel

Introduction

A quorum is the minimum number of votes that a distributed transaction has to obtain in order to be allowed to perform an operation in a distributed system.
A quorum-based technique is implemented to enforce consistent operation in a distributed system.

Quorum 即某个操作允许执行所要获得的最小投票数，在分布式系统中，用于保证一致性操作。

Quorum-based voting in commit protocols

In a distributed database system, a transaction could be executing its operations at multiple sites.
Since atomicity requires every distributed transaction to be atomic, the transaction must have the same fate (commit or abort) at every site.
In case of network partitioning, sites are partitioned and the partitions may not be able to communicate with each other.

Every site in the system is assigned a vote Vi. Let us assume that the total number of votes in the system is V and the abort and commit quorums are Va and Vc, respectively.
Then the following rules must be obeyed in the implementation of the commit protocol:

Va + Vc > V, where 0 < Vc, Va <= V.
Before a transaction commits, it must obtain a commit quorum Vc.
The total of at least one site that is prepared to commit and zero or more sites waiting >= Vc.
Before a transaction aborts, it must obtain an abort quorum Va
The total of zero or more sites that are prepared to abort or any sites waiting >= Va.

The first rule ensures that a transaction cannot be committed and aborted at the same time.
The next two rules indicate the votes that a transaction has to obtain before it can terminate one way or the other.

假设一个集群有 5 台 server, 某台 server 拥有 1 票投票权来决定事务是 commit 还是 abort. 那么，想要提交事务或终止事务都至少获得 3 票。

References

Quorum (distributed computing)

Quorum 机制应用 - Zookeeper

Atomic broadcast and leader election use the notion of quorum to guarantee a consistent view of the system.
One example is acknowledging a leader proposal: the leader can only commit once it receives an acknowledgement from a quorum of servers.

Zookeeper 中原子性广播和 leader 选举应用了 quorum 机制。

Zookeeper 的 quorum 模式有如下特性：

1 leader + n followers(还有一种 observer 角色，这里不考虑)
每个 server 都保存一份数据副本
读请求（不改变状态）直接本地处理，写请求（改变状态）统一转发给 leader 处理，然后同步至 follower

Read operations

ZooKeeper servers process read requests (exists, getData, and getChildren) locally.

When a server receives, say, a getData request from a client, it reads its state and returns it to the client. Because it serves requests locally, ZooKeeper is pretty fast at serving read dominated workloads. We can add more servers to the ZooKeeper ensemble to serve more read requests, increasing overall throughput capacity.

[ZooKeeper: Distributed Process Coordination. Chapter 9. ZooKeeper Internals. Requests, Transactions, and Identifiers]

Write operations

Upon receiving a write request, a follower forwards it to the leader. The leader executes the request speculatively and broadcasts the result of the execution as a state update, in the form of a transaction. A transaction comprises the exact set of changes that a server must apply to the data tree when the transaction is committed.

How a server determines that a transaction has been committed. This follows a protocol called Zab: the ZooKeeper Atomic Broadcast protocol.

[ZooKeeper: Distributed Process Coordination. Chapter 9. ZooKeeper Internals. Zab: Broadcasting State Updates]

Zookeeper transaction

A transaction is treated as a unit, in the sense that all changes it contains must be applied atomically.
When a ZooKeeper ensemble applies transactions, it makes sure that all changes are applied atomically and there is no interference from other transactions. There is no rollback mechanism like with traditional relational databases.
When the leader generates a new transaction, it assigns to the transaction an identifier that we call a ZooKeeper transaction ID (zxid).

[ZooKeeper: Distributed Process Coordination. Chapter 9. ZooKeeper Internals. Requests, Transactions, and Identifiers]

问题 - 是否会读到不一致的数据（多个客户端是否会有不同的视图）?

答案是 YES. 因为写操作只在 quorum 个 server 成功之后返回，这样剩余的 server 可能还没来得及更新数据，这也是读操作高性能的代价。如果一定要保证读到最新的数据，客户端可以调用 sync 之后再读。由于并不是所有应用都对数据及时性有要求，因此，Zookeeper 并没有再内部 sync.

One drawback of using fast reads is not guaranteeing precedence order for read operations. That is, a read operation may return a stale value, even though a more recent update to the same znode has been committed.
To guarantee that a given read operation returns the latest updated value, a client calls sync followed by the read operation.
ZooKeeper: Wait-free coordination for Internet-scale systems

Sometimes developers mistakenly assume one other guarantee that ZooKeeper does not in fact make. This is:
Simultaneously Conistent Cross-Client Views
ZooKeeper does not guarantee that at every instance in time, two different clients will have identical views of ZooKeeper data. Due to factors like network delays, one client may perform an update before another client gets notified of the change. Consider the scenario of two clients, A and B. If client A sets the value of a znode /a from 0 to 1, then tells client B to read /a, client B may read the old value of 0, depending on which server it is connected to. If it is important that Client A and Client B read the same value, Client B should should call the sync() method from the ZooKeeper API method before it performs its read.
ZooKeeper Programmer’s Guide. Consistency Guarantees

Leader Election and Atomic Broadcast

这两部分是 Zookeeper 的核心，具体可以参考一下资料，这里不再赘述：

ZooKeeper: Wait-free coordination for Internet-scale systems
ZooKeeper: Distributed Process Coordination. Chapter 9. ZooKeeper Internals.
ZooKeeper’s atomic broadcast protocol: Theory and practice