【Postgresql源码分析之四】同步复制源码分析--支持多个同步备机

来源:互联网 发布:宁波楼盘成交数据 编辑:程序博客网 时间:2024/06/06 02:05

前面介绍了,在PG9.6之前的同步复制的代码实现过程。我们知道,在前面判断是否取消等待的时候,只管理一个同步备机(优先级最高),其它的备机均为潜在同步备机,这样会降低可靠性。在PG9.6版本中,对同步复制做了增强,支持多个同步备机。

支持多个同步备机说明

首先看下,releasenotes中关于多个同步备机的说明      

Support synchronous replication with multiple synchronous standby servers, not just one (Masahiko Sawada, Beena Emerson, Michael Paquier, Fujii Masao, Kyotaro Horiguchi)

The number of standby servers that must acknowledge a commit before it's considered done is now configurable as part of the synchronous_standby_names parameter.

主要修改的设置为参数synchronous_standby_names 的设置方法,参数使用介绍如下

Specifies a list of standby servers that can support synchronous replication, as described in Section 25.2.8. There will be one or more active synchronous standbys; transactions waiting for commit will be allowed to proceed after these standby servers confirm receipt of their data. The synchronous standbys will be those whose names appear earlier in this list, and that are both currently connected and streaming data in real-time (as shown by a state of streaming in the pg_stat_replication view). Other standby servers appearing later in this list represent potential synchronous standbys. If any of the current synchronous standbys disconnects for whatever reason, it will be replaced immediately with the next-highest-priority standby. Specifying more than one standby name can allow very high availability.

This parameter specifies a list of standby servers using either of the following syntaxes:

num_sync ( standby_name [, ...] )standby_name [, ...]

where num_sync is the number of synchronous standbys that transactions need to wait for replies from, and standby_name is the name of a standby server. For example, a setting of 3 (s1, s2, s3, s4) makes transaction commits wait until their WAL records are received by three higher-priority standbys chosen from standby servers s1s2s3 and s4.

The second syntax was used before PostgreSQL version 9.6 and is still supported. It's the same as the first syntax with num_sync equal to 1. For example, 1 (s1, s2) and s1, s2 have the same meaning: either s1 or s2 is chosen as a synchronous standby.

The name of a standby server for this purpose is the application_name setting of the standby, as set in the primary_conninfo of the standby's WAL receiver. There is no mechanism to enforce uniqueness. In case of duplicates one of the matching standbys will be considered as higher priority, though exactly which one is indeterminate. The special entry * matches any application_name, including the default application name of walreceiver.

Note: Each standby_name should have the form of a valid SQL identifier, unless it is *. You can use double-quoting if necessary. But note that standby_names are compared to standby application names case-insensitively, whether double-quoted or not.

If no synchronous standby names are specified here, then synchronous replication is not enabled and transaction commits will not wait for replication. This is the default configuration. Even when synchronous replication is enabled, individual transactions can be configured not to wait for replication by setting the synchronous_commit parameter to local or off.

This parameter can only be set in the postgresql.conf file or on the server command line.

从说明中可以看出,该参数支持多个同步备机的时候,设置方法改变为

num_sync ( standby_name [, ...] )

     num_sync表示同步备机的个数,默认为1,后面为同步备机的名称

在修改参数之后,需要解析参数,这里使用的是语法解析器实现的,详细步骤不再分析。无论是单个同步备机,还是多个同步备机,在后台提交事物的时候,均调用函数SyncRepWaitForLSN等待备机的响应,与前面的实现过程一致,这里也不再详细分析。

为支持多个多个同步备机,主要改动点在判断该LSN是否传输到多个同步备机你上,下面详细分析主机如何判断这一点。

多个同步备机实现分析

修改同步备机参数之后,新增一个结构体变量记录备机参数信息
/* * Struct for the configuration of synchronous replication. * * Note: this must be a flat representation that can be held in a single * chunk of malloc'd memory, so that it can be stored as the "extra" data * for the synchronous_standby_names GUC. */typedef struct SyncRepConfigData{intconfig_size;/* total size of this struct, in bytes */intnum_sync;/* number of sync standbys that we need to * wait for */intnmembers;/* number of members in the following list *//* member_names contains nmembers consecutive nul-terminated C strings */charmember_names[FLEXIBLE_ARRAY_MEMBER];} SyncRepConfigData;
该结构体主要记录设置的参数值,num_sync 表示设置的同步备机的个数,nmembers表示设置的同步备机的列表中参数个数,member_names存储设置的备机名称。该参数的解析是通过语法解析器完成的。

解除阻塞的步骤在函数SyncRepReleaseWaiters中实现的,下面介绍该函数的实现过程。

步骤1:调用 SyncRepGetOldestSyncRecPtr函数获取备机的个数以及对应的位置
/* * Check whether we are a sync standby or not, and calculate the oldest * positions among all sync standbys. */got_oldest = SyncRepGetOldestSyncRecPtr(&writePtr, &flushPtr,&applyPtr, &am_sync);
 SyncRepGetOldestSyncRecPtr函数是新增加的函数,主要是计算是否存在满足条件的备机以及目前写入的最老位置。

步骤2:SyncRepGetOldestSyncRecPtr函数介绍

首先看下该函数的说明
/* * Calculate the oldest Write, Flush and Apply positions among sync standbys. * * Return false if the number of sync standbys is less than * synchronous_standby_names specifies. Otherwise return true and * store the oldest positions into *writePtr, *flushPtr and *applyPtr. * * On return, *am_sync is set to true if this walsender is connecting to * sync standby. Otherwise it's set to false. */

该函数主要计算所有同步备机中最老的Write,Flush以及Aplly位置,如果同步备机的个数小于指定的个数num_sync,该函数返回false,am_sync为true表示该walsender进程连接有同步备机。

该函数中调用SyncRepGetSyncStandbys函数计算同步备机的个数
/* Get standbys that are considered as synchronous at this moment */<span style="white-space:pre"></span>sync_standbys = SyncRepGetSyncStandbys(am_sync);

SyncRepGetSyncStandbys函数计算所有同步备机。该函数中寻找同步备机的方法是通过优先级实现的。

先使用两个链表分别记录高优先级的备机(优先级1)以及等待链表记录其它优先级的备机,这样做的目的就很好的与之前的版本兼容,即实现一个同步备机的功能
/* * If the priority is equal to 1, consider this standby as sync and * append it to the result. Otherwise append this standby to the * pending list to check if it's actually sync or not later. */if (this_priority == 1){result = lappend_int(result, i);if (am_sync != NULL && walsnd == MyWalSnd)*am_sync = true;if (list_length(result) == SyncRepConfig->num_sync){list_free(pending);return result;/* Exit if got enough sync standbys */}}else{pending = lappend_int(pending, i);if (am_sync != NULL && walsnd == MyWalSnd)am_in_pending = true;/* * Track the highest priority among the standbys in the pending * list, in order to use it as the starting priority for later * scan of the list. This is useful to find quickly the sync * standbys from the pending list later because we can skip * unnecessary scans for the unused priorities. */if (this_priority < next_highest_priority)next_highest_priority = this_priority;}}

这里使用变量next_highest_priority 记录除优先级1以外的最高优先级别(值小)

判断是否找到满足足够条件的同步备机,采用的是计算链表的长度
/* * Consider all pending standbys as sync if the number of them plus * already-found sync ones is lower than the configuration requests. */if (list_length(result) + list_length(pending) <= SyncRepConfig->num_sync){boolneedfree = (result != NIL && pending != NIL);/* * Set *am_sync to true if this walsender is in the pending list * because all pending standbys are considered as sync. */if (am_sync != NULL && !(*am_sync))*am_sync = am_in_pending;result = list_concat(result, pending);if (needfree)pfree(pending);return result;}

接下来从pending链表中寻找满足条件的同步备机,这里的实现方式是通过比较优先级实现的,函数代码如下
/* * Find the sync standbys from the pending list. */priority = next_highest_priority;while (priority <= lowest_priority){ListCell   *cell;ListCell   *prev = NULL;ListCell   *next;next_highest_priority = lowest_priority + 1;for (cell = list_head(pending); cell != NULL; cell = next){i = lfirst_int(cell);walsnd = &WalSndCtl->walsnds[i];next = lnext(cell);this_priority = walsnd->sync_standby_priority;if (this_priority == priority){result = lappend_int(result, i);if (am_sync != NULL && walsnd == MyWalSnd)*am_sync = true;/* * We should always exit here after the scan of pending list * starts because we know that the list has enough elements to * reach SyncRepConfig->num_sync. */if (list_length(result) == SyncRepConfig->num_sync){list_free(pending);return result;/* Exit if got enough sync standbys */}/* * Remove the entry for this sync standby from the list to * prevent us from looking at the same entry again. */pending = list_delete_cell(pending, cell, prev);continue;}if (this_priority < next_highest_priority)next_highest_priority = this_priority;prev = cell;}priority = next_highest_priority;}

在前面使用next_highest_priority记录了备机中除1之外的最高优先级别,下面会遍历所有的pending列表,判断walsender进程对应的备机的优先级是否和该参数next_highest_priority记录的相等

for (cell = list_head(pending); cell != NULL; cell = next){i = lfirst_int(cell);walsnd = &WalSndCtl->walsnds[i];next = lnext(cell);this_priority = walsnd->sync_standby_priority;if (<strong>this_priority == priority</strong>){result = lappend_int(result, i);if (am_sync != NULL && walsnd == MyWalSnd)*am_sync = true;if (list_length(result) == SyncRepConfig->num_sync){list_free(pending);return result;/* Exit if got enough sync standbys */}pending = list_delete_cell(pending, cell, prev);continue;}if (this_priority < next_highest_priority)next_highest_priority = this_priority;prev = cell;}
如果相等,表示找到同步备机,将其加入result列表中,并判断是否已经找到足够的备机。

循环退出的条件为 priority  > lowest_priority, lowest_priority表示同步备机的个数,即最后一个同步备机的优先级。

步骤3:判断是否满足唤醒条件
/* * If the number of sync standbys is less than requested or we aren't * managing a sync standby then just leave. */if (<strong>!got_oldest</strong> || !am_sync){LWLockRelease(SyncRepLock);announce_next_takeover = !am_sync;return;}
注意这里如果同步备机的个数  小于执行的个数(got-oldest为false),也是不会唤醒队列的,主机处于阻塞状态。可以认为,目前的这种多同步备机属于强同步备机模式,保证xlog日志传输到所有指定的备机上。

下面就是唤醒等待队列,和前面分析的流程是一样的,不再详细分析
/* * Set the lsn first so that when we wake backends they will release up to * this location. */if (walsndctl->lsn[SYNC_REP_WAIT_WRITE] < writePtr){walsndctl->lsn[SYNC_REP_WAIT_WRITE] = writePtr;numwrite = SyncRepWakeQueue(false, SYNC_REP_WAIT_WRITE);}if (walsndctl->lsn[SYNC_REP_WAIT_FLUSH] < flushPtr){walsndctl->lsn[SYNC_REP_WAIT_FLUSH] = flushPtr;numflush = SyncRepWakeQueue(false, SYNC_REP_WAIT_FLUSH);}if (walsndctl->lsn[SYNC_REP_WAIT_APPLY] < applyPtr){walsndctl->lsn[SYNC_REP_WAIT_APPLY] = applyPtr;numapply = SyncRepWakeQueue(false, SYNC_REP_WAIT_APPLY);}



这里简单分析了,PG9.6版本中的多个同步备机实现方式,总结如下:
  • 与旧版本的同步备机的设置是完全兼容的
  • xlog日志必须传输到所有的同步备机上
  • 存活的同步备机个数必须和指定的num_sync个数相同,个人认为在实际使用不必如此严格,会严重影响主机的性能



0 0
原创粉丝点击