VIPs Often Go Offline Unexpectedly and Relocate to Another Node(文档 ID 1297867.1)

来源：互联网发布：网络国际象棋编辑：程序博客网时间：2024/06/06 04:24

In this Document

Symptoms Cause Solution

APPLIES TO:

Oracle Database - Enterprise Edition - Version 10.2.0.5 and later
Information in this document applies to any platform.

SYMPTOMS

VIPs often go offline unexpectedly, with the following message in crsd.log:

2011-02-17 15:11:16.437: [ CRSAPP][11321]32CheckResource error for ora.node02.vip error code = 1
2011-02-17 15:11:16.441: [ CRSRES][11321]32In stateChanged, ora.node02.vip target is ONLINE
2011-02-17 15:11:16.441: [ CRSRES][11321]32ora.node02.vip on node02 went OFFLINE unexpectedly

VIP tracing is set by using the following commands:

#crsctl debug log res "ora.node01.vip:5"
#crsctl debug log res "ora.node02.vip:5"

Following error messages (highlighted in bold letters) can be seen in the generated VIP trace "CRS_HOME/log/node02:

2011-02-18 15:32:39.481: [ RACG][1] [4587556][1][ora.node02.vip]: Fri Feb 18 15:32:37 GMT+08:00 2011 [ 8257768 ] About to execute command: /usr/sbin/ping -S 192.168.220.36 -c 1 -w 1 192.168.220.33
Fri Feb 18 15:32:39 GMT+08:00 2011 [ 8257768 ] IsIfAlive: RX packets checked if=en1 failed

2011-02-18 15:32:39.481: [ RACG][1] [4587556][1][ora.node02.vip]: Fri Feb 18 15:32:39 GMT+08:00 2011 [ 8257768 ] Interface en1 checked failed (host=node02)
Fri Feb 18 15:32:39 GMT+08:00 2011 [ 8257768 ] IsIfAlive: end for if=en1
Fri Feb 18 15:32:39 GMT+08:00 2011 [ 8257768 ] checkIf: end for if=en1

You can reset the VIP tracing to the default level by using the following commands:

#crsctl debug log res "ora.node01.vip:0"
#crsctl debug log res "ora.node02.vip:0"

CAUSE

The issue can be due to network performance when pinging the gateway using the public IP.

See "man ping" on AIX:

-S hostname/IP addr
Uses the IP address as the source address in outgoing ping packets.

-c Count
Specifies the number of echo requests, as indicated by the Count
variable, to be sent (and received).

-w timeout
This option works only with the -c option. It causes ping to wait
for a maximum of 'timeout' seconds for a reply (after sending the
last packet).

So the following command will check, if 1 packet sent from 192.168.220.36 to 192.168.220.33 will receive a reply within 1s.

ping -S 192.168.220.36 -c 1 -w 1 192.168.220.33
==>192.168.220.36 is the public IP, 192.168.220.33 is the gateway.

If the problem is with the network, the above "ping" command would take longer than 1s, and this leads to VIPs going offline unexpectedly and relocating to another node.

SOLUTION

To resolve the issue, please contact your network administrator to tune your network and ensure that the reply of the ping command is within 1s.

If you can't improve the network performance, please use the following temporary workaround (which is not recommended):

1. Stop all node applications.
% srvctl stop nodeapps -n <hostname>

2. Backup then Modify the racgvip script .

Change:
# timeout of ping in number of loops (1 sec)
PING_TIMEOUT=" -c 1 -w 1"

To:
# timeout of ping in number of loops (3 sec)
PING_TIMEOUT=" -c 1 -w 3"

3. Start the node applications and other necessary resources.
% srvctl start nodeapps -n <hostname>

阅读全文

0 0