Instances Unable To Start If MTU Size Is Different for Cluster_interconnect (文档 ID 300388.1)

来源：互联网发布：禁用搜狗输入法云计算编辑：程序博客网时间：2024/05/22 06:57

对于Cluster_interconnect ，如果节点之间心跳网卡的MTU设置不同，可能会造成实例无法启动。

APPLIES TO:

应用于：

Oracle Server - Enterprise Edition - Version 9.0.1.0 to 11.2.0.3 [Release 9.0.1 to 11.2]
Information in this document applies to any platform.

***Checked for relevance on 07-Jan-2010***

Oracle服务器 - 企业版 - 版本9.0.1.0到11.2.0.3 [版本9.0.1到11.2]

本文档中的信息适用于任何平台。

SYMPTOMS

问题描述：

If the MTU size on the network cards used for the interconnect differs on the cluster member nodes, RAC instance(s) will not start

在各个集群节点上，如果用于互连的网卡（心跳网卡）MTU参数大小不同，则RAC实例将不会启动。

CHANGES

变动部分：

Network configuration

网络参数配置

CAUSE

具体原因（主要通过分析网络参数配置和查看日志）：

The MTU size is set on the private network interface, for example, two interfaces of two cluster members:

MTU大小在专用网络接口（用于节点间互连的心跳网络接口）上，比如下面列出了集群上两个节点的网络配置：

node 1
eth0 Link encap:Ethernet HWaddr 00:0E:0C:08:4B:D5
inet addr: xxx.x.x.x Bcast:xxx.x.x.x Mask:255.255.255.0
inet6 addr: fe80::20e:cff:fe08:4bd5/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1

node 2
eth0 Link encap:Ethernet HWaddr 00:0E:0C:08:03:59
inet addr: xxx.x.x.x Bcast:xxx.x.x.x Mask:255.255.255.0
inet6 addr: fe80::20e:cff:fe08:359/64 Scope:Link
UP BROADCAST RUNNING MULTICAST *MTU:1500* Metric:1

If you have different MTU sizes configured a startup will hang with following error in the alert-log:

一旦你设置了不同的MTU值，startup数据库后将会挂起，并报出以下日志：

Tue Mar 1 01:50:35 2005
lmon registered with NM - instance id 2 (internal mem no 1)
Tue Mar 1 01:50:36 2005
Reconfiguration started (old inc 0, new inc 2)
List of nodes:
0 1
Global Resource Directory frozen
Update rdomain variables
Communication channels reestablished
* domain 0 valid = 0 according to instance 0
Tue Mar 1 01:55:44 2005
IPC Send timeout to 0.0 inc 9 for msg type 53 from opid 5
Tue Mar 1 01:59:25 2005
Trace dumping is performing id=[cdmp_20050301095925]
Tue Mar 1 01:59:31 2005
Reconfiguration started (old inc 2, new inc 3)
List of nodes:

Typically you see timeouts in the alert-file and in the traces of background processes (LMD and LMON).

从警报日志与后台程序（LMD和LMON）的跟踪记录中，很容易看出超时。

SOLUTION

解决方案：

Identifiy the interface being used by Oracle RAC using oradebug ipc - Metalink note 181489.1

用Oadebug 工具的IPC 命令打印出的IPC信息，定位Rac中的导致数据库hang住的网络接口。

Check the network configuration, for example with ifconfig, for example: /sbin/ifconfig eth0

用相关命令检查心跳网卡的网络配置

Ping the ip-address of the network card with a packetsize that should fit for all interfaces. Use -M switch to avoid packet splitting, for example:
ping <nodename> -s <biggest-size-that fits> -M do

使用适合所有心跳网络接口的数据包大小，来ping 故障网卡的IP地址，看哪个值能够ping通。使用-M do 用以避免IP分片。

例如：

ping <nodename> -s <biggest-size-that fits> -M do

Configure the cluster interconnect interfaces to have the same MTU size on all cluster member nodes

经过上步的测试，选出合适的MTU值，用这个MTU值配置所有集群节点的互连的网卡（心跳网卡）的网络参数配置。

PS：通过查看官方文档（http://docs.oracle.com/cd/B19306_01/server.102/b14237/initparams025.htm#REFRN10017）得知一下信息：

CLUSTER_INTERCONNECTS参数定义一个私有网络，这个参数将影响GCS和GES服务网络接口的选择。

该参数主要用于以下目的：

1.覆盖默认的内联网络

2.单一的网络带宽不能满足RAC数据库的带宽要求，增加带宽。

CLUSTER_INTERCONNECTS将信息存储在集群注册表中，明确覆盖以下内容：

1.存储在OCR中通过oifcfg命令查看的网络分类。

2Oracle选择的默认内部连接。

该参数默认值是空，可以包含一到多个IP地址，用冒号分隔。

0 0