【翻译自mos文章】当 使用DCD 和TCPS时,rman duplicate hang住

来源:互联网 发布:网络的拼音怎么写的 编辑:程序博客网 时间:2024/05/01 04:14

当 使用DCD 和TCPS时,rman duplicate hang住。

来源于:
RMAN Duplicate hangs when using DCD and TCPS (文档 ID 1676197.1)

适用于:
Oracle Database - Enterprise Edition - Version 11.2.0.1 and later
Information in this document applies to any platform.

症状:
在datafile copy 阶段,RMAN active duplicate for standby hang住。SSL Oracle Net 和Dead Connection Detection (DCD) 正在使用。

这个hang 是 间歇性的(intermittent),也就是说,有时duplicate 是能工作的,在其他时候,会hang住 很多天,直到进程从操作系统和database中kill掉。
rman debug 揭示了下面的信息会repeat:

RMAN-06731: command backup:x% complete, time left HH:MM:SS

样例RMAN debug输出如下:

RMAN-12016: using channel ORA_DISK_8RMAN-08580: channel ORA_DISK_1: starting datafile copyRMAN-08522: input datafile file number=00012 name=+OFD_DATA/ofmim01q/datafile/ofm_tbs_oaam_indx.272.810048785...RMAN-08581: channel ORA_DISK_4: datafile copy complete, elapsed time: 00:00:16RMAN-08592: output file name=+OFN_DAT/ofmiy01q/datafile/ofm_ias_iau.373.842790419 tag=TAG20140321T065222RMAN-08581: channel ORA_DISK_7: datafile copy complete, elapsed time: 00:00:16RMAN-06731: command backup:94.1% complete, time left 00:21:05// // RMAN-06731 and % complete repeats here// Process is completely stalled  RMAN-06731: command backup:94.1% complete, time left 00:21:05 

在primary database上,我们可以看到8个session hang住,等待事件"remote db file write" 的wait time会简单的增加

SQL> select SID ,SERIAL# , INST_ID , USERNAME, OSUSER || '@' || MACHINE OSINFO, SUBSTR(PROGRAM,0,20) PROGRAM,  2  TO_CHAR(LOGON_TIME,'yyyy-mm-dd hh24:mi:ss') LOGON_TIME, EVENT, SECONDS_IN_WAIT SIW from gv$session where type <> 'BACKGROUND' and PROGRAM like 'rman%'  3  ORDER BY USERNAME, INST_ID, SID; SID SERIAL# INST_ID USERNAME OSINFO            PROGRAM              LOGON_TIME           EVENT                                 SIW---- ------- ------- -------- ----------------- -------------------- -------------------- ------------------------------ ---------- 632    5635       2 SYS      oracle@auq5583l   rman@auq5583l (TNS V 2014-04-08 17:29:27  SQL*Net message from client            34 758    2535       2 SYS      oracle@auq5583l   rman@auq5583l (TNS V 2014-04-08 17:29:30  SQL*Net message from client             4 948     441       2 SYS      oracle@auq5583l   rman@auq5583l (TNS V 2014-04-08 17:29:35  remote db file write                520361010     369       2 SYS      oracle@auq5583l   rman@auq5583l (TNS V 2014-04-08 17:29:36  remote db file write                355321073     215       2 SYS      oracle@auq5583l   rman@auq5583l (TNS V 2014-04-08 17:29:37  remote db file write                529351136     291       2 SYS      oracle@auq5583l   rman@auq5583l (TNS V 2014-04-08 17:29:37  remote db file write                540141199     753       2 SYS      oracle@auq5583l   rman@auq5583l (TNS V 2014-04-08 17:29:38  remote db file write                416511325    1595       2 SYS      oracle@auq5583l   rman@auq5583l (TNS V 2014-04-08 17:29:39  remote db file write                427301388    2121       2 SYS      oracle@auq5583l   rman@auq5583l (TNS V 2014-04-08 17:29:39  remote db file write                507711451    1351       2 SYS      oracle@auq5583l   rman@auq5583l (TNS V 2014-04-08 17:29:40  remote db file write                47650

hang住的进程必须被从databae和os级别kill掉。

原因:
Unfortunately expire_time + TCPS combination is not supported by oracle as NTZ layer(used for TCPS communication) uses routines that not async-signal-safe.
Using async-signal-safe routines can cause unpredictable results like hang, crash etc.

解决方案:
Do not use DCD with SSL Oracle Net. Remove sqlnet.expire_time from the sqlnet.ora file or set it to 0 (zero).


If you need to keep the connection alive due to firewall issues, consider using the operating system's TCP KEEPALIVE parameters instead. eg:

TCP_KEEPIDLE (the amount of time until the first keepalive packet is sent)
TCP_KEEPCNT (the number of probes to send)
TCP_KEEPINTVL (the interval between keepalive packets)


Otherwise, if you need to use DCD, you must use non-SSL Oracle Net.

 

0 0
原创粉丝点击