Known Issues: Grid Infrastructure Redundant Interconnect and ora.cluster_interconnect.haip (Doc ID 1

来源:互联网 发布:淘宝没有代付选项 编辑:程序博客网 时间:2024/04/30 03:44

In this Document

 Purpose Details Bug 10332426 - HAIP fails to start due to network mismatch Bug 19270660 - AIX: category: -2, operation: open, loc: bpfopen:1,os, OS error: 2, other: ARP device /dev/bpf4, interface en8  Bug 16445624 - AIX: HAIP fails to start Bug 13989181 - AIX: HAIP fails to start with: category: -2, operation: SETIF, loc: bpfopen:21,o, OS error: 6, other: dev /dev/bpf0 Note 1447517.1 - AIX: HAIP fails to start if bpf and other devices using same major/minor number Bug 10253028 - "oifcfg iflist -p -n" not showing HAIP on AIX as expected Bug 13332363 - Wrong MTU for HAIP on Solaris Bug 10114953 - only one HAIP is create on HP-UX
 Bug 10363902 - HAIP Infiniband support for Linux and Solaris Bug 10357258 - Many HAIP started on Solaris IPMP - not affecting 11.2.0.3 Bug 10397652/ 12767231 - HAIP not failing over when private network fails - not affecting 11.2.0.3 Bug 11077756 - allow root script to continue upon HAIP failure Bug 12546712 - not affecting 11.2.0.3 HAIP fails to start if default gateway is configured for VLAN for private network on network switch Bug 12425730 - HAIP does not start, 11.2.0.3 not affected ASM on Non-First Node (Second or Others) Fails to Start: PMON (ospid: nnnn): terminating the instance due to error 481 11gR2 GI HAIP Resource Not Created in Solaris 11 if IPMP is Used for Private Network References

APPLIES TO:

Oracle Database - Enterprise Edition - Version 11.2.0.2 and later
Information in this document applies to any platform.

PURPOSE

This document lists knowns HAIP issues in 11gR2/12c Grid Infrastructure. Refer to note 1210883.1 for explanation of HAIP feature.

DETAILS

Bug 10332426 - HAIP fails to start due to network mismatch

Issue: HAIP fails to start while running rootupgrade.sh

Symptom:

  • Output of root script:
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'racnode1'
CRS-5017: The resource action "ora.cluster_interconnect.haip start" 
encountered the following error:
Start action for HAIP aborted
CRS-2674: Start of 'ora.cluster_interconnect.haip' on 'racnode1' failed
  • $GRID_HOME/log/<hostname>/gipcd/gipcd.log
2010-12-12 09:41:35.201: [ CLSINET][1088543040] Returning NETDATA: 0 interfaces
2010-12-12 09:41:40.201: [ CLSINET][1088543040] Returning NETDATA: 0 interfaces

Solution:

The cause is mismatch of private network information in OCR and on OS, output of the following should be consistent with each other regarding network adapter name, subnet and netmask - see note 1296579.1 for what to check.

oifcfg iflist -p -n
oifcfg getif
ifconfig

Bug 19270660 - AIX: category: -2, operation: open, loc: bpfopen:1,os, OS error: 2, other: ARP device /dev/bpf4, interface en8 

Issue: HAIP fails to start on AIX

Symptom:

  • $GRID_HOME/log/<hostname>/agent/ohasd/orarootagent_root/orarootagent_root.log

2014-07-21 16:38:59.240: [ USRTHRD][4372]{0:0:2} failed to create arp 
2014-07-21 16:38:59.240: [ USRTHRD][4115]{0:0:2} (null) category: -2, operation: open, loc: bpfopen:1,os, OS error: 2, other: ARP device /dev/bpf4, interface en8

Solution/Workaround:

bug 19270660 is fixed in 12.1.0.2, apply interim patch 19270660 if the issue is encountered.

 

Bug 16445624 - AIX: HAIP fails to start

Issue: HAIP fails to start if root script (root.sh or rootupgrade.sh) is executed via sudo (not as root user directly) or if bpf device is not functionin properly


Symptom:

  • Output of root script:
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'racnode1'
CRS-5017: The resource action "ora.cluster_interconnect.haip start" encountered the following error:
Start action for HAIP aborted
CRS-2674: Start of 'ora.cluster_interconnect.haip' on 'racnode1' failed
  • $GRID_HOME/log/<hostname>/agent/ohasd/orarootagent_root/orarootagent_root.log
2010-12-04 17:19:54.893: [ USRTHRD][2084] {0:3:37} failed to create arp
2010-12-04 17:19:54.893: [ USRTHRD][2084] {0:3:37} (null) category: -2, operation: ioctl, loc: bpfopen:2,os, OS error: 14, other:

OR

2011-09-29 16:44:46.771: [ USRTHRD][3600] {0:3:14} (null) category: -2, operation: open, loc: bpfopen:1,os, OS error: 2, other: 

OR

2011-09-29 16:44:46.771: [ USRTHRD][3600] {0:3:14} (null) category: -2, operation: open, loc: bpfopen:1,os, OS error: 22, other: 

OR

2011-09-29 16:44:46.771: [ USRTHRD][3600] {0:3:14} (null) category: -2, operation: open, loc: bpfopen:1,os, OS error: 22, other: 

OR

 

2012-04-21 12:36:43.951: [ USRTHRD][2572] {0:0:2} (null) category: -2, operation: SETIF, loc: bpfopen:21,o, OS error: 6, other: dev /dev/bpf0, ifr en2  

OR 


Various other OS error code can be seen as well

Solution/Workaround:

It's known on AIX and Solaris that command executed via sudo etc may not have full root environment, which could cause HAIP startup failure. 

The solution is to obtain and apply patch 16445624 on AIX.

The workaround is to execute root script (root.sh or rootupgrade.sh) as real root user directly.

If root script already failed, try one or all of the following:

 - reboot the node

 - execute "/usr/sbin/tcpdump -D" as root user, if the timestamp of the bpf device didn't get updated, delete the device and re-run the same "tcpdump -D" command

Before re-running root script, verify whether the following exists and the timestamp is updated

ls -ltr /dev/bpf*
cr--------   1 root     system       42,  0 Oct 03 10:32 /dev/bpf0
..

 

 

Bug 13989181 - AIX: HAIP fails to start with: category: -2, operation: SETIF, loc: bpfopen:21,o, OS error: 6, other: dev /dev/bpf0

Duplicate Bug 14358011

Issue: HAIP fails to start on AIX

Symptom:

  • $GRID_HOME/log/<hostname>/agent/ohasd/orarootagent_root/orarootagent_root.log

2012-04-21 12:36:43.951: [ USRTHRD][2572] {0:0:2} failed to create arp 
2012-04-21 12:36:43.951: [ USRTHRD][2572] {0:0:2} (null) category: -2, operation: SETIF, loc: bpfopen:21,o, OS error: 6, other: dev /dev/bpf0, ifr en2 
... 
2012-04-21 17:12:41.086: [ora.cluster_interconnect.haip][3343] {0:0:2} [start] Start of HAIP aborted 
2012-04-21 17:12:41.086: [   AGENT][3343] {0:0:2} UserErrorException: Locale is 
2012-04-21 17:12:41.087: [ora.cluster_interconnect.haip][3343] {0:0:2} [start] clsnUtils::error Exception type=2 string=

CRS-5017: The resource action "ora.cluster_interconnect.haip start" encountered the following error: Start action for HAIP aborted

Solution/Workaround:

bug 13989181 is fixed in 11.2.0.4, apply interim patch 13989181 if the issue is encountered.

 

 

Note 1447517.1 - AIX: HAIP fails to start if bpf and other devices using same major/minor number

Issue: HAIP fails to start on AIX as other system devices using same major/minor number as bpf devices

orarootagent_root.log shows: category: -2, operation: SETIF, loc: bpfopen:21,o, OS error: 22, other: dev /dev/bpf0, ifr en15

The solution is to ensure no other device is using same major/minor as bpf device, refer to note 1447517.1 for more details.

 

Bug 10253028 - "oifcfg iflist -p -n" not showing HAIP on AIX as expected

Issue: "oifcfg iflist -p -n" not showing HAIP on AIX

Fixed in: Expected behaviour on AIX

Symptom:

  • "oifcfg getif" output
en12  10.0.1.0  global  public
en13  10.1.1.0  global  cluster_interconnect
  • "ifconfig -a" output
en13: flags=5e080863,c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),PSEG,LARGESEND,CHAIN>
       inet 10.1.1.143 netmask 0xffffff00 broadcast 10.1.1.255
       inet 169.254.228.154 netmask 0xffff0000 broadcast 169.254.255.255
        tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0
..
Note HAIP exists
  • v$cluster_interconnects
SQL> select * from gv$cluster_interconnects;

  INST_ID NAME            IP_ADDRESS       IS_ SOURCE
---------- --------------- ---------------- --- 
        1 en13            169.254.228.154  NO
        2 en13            169.254.55.162   NO
  • "oifcfg iflist -p -n" output
en12  10.0.1.0  PUBLIC  255.255.255.0
en13  10.1.1.0  PUBLIC  255.255.255.0

Note usually we expect HAIP to be listed here as well, however it's not listed on AIX

  

Bug 13332363 - Wrong MTU for HAIP on Solaris

Issue: Wrong MTU size for HAIP on Solaris, refer to note 1290585.1 for more details.

Fixed in: 11.2.0.2 GI PSU5, 11.2.0.3 GI PSU1, 11.2.0.4

 

Bug 10114953 - only one HAIP is create on HP-UX

Issue: Only one HAIP created on HP-UX

2013-05-29 17:21:31.280: [ USRTHRD][29499] {0:0:56578} Arp::sCreateSocket {
2013-05-29 17:21:31.280: [ USRTHRD][29499] {0:0:56578} failed to create arp
2013-05-29 17:21:31.281: [ USRTHRD][29499] {0:0:56578} (null) category: -2, operation: ssclsi_dlpi_request, loc: dlpireq:8,na, OS error: 4, other:


The bug is fixed in 11.2.0.4, patch 10114953 is required before 11.2.0.4 is released.

OS kernel parameter dlpi_max_ub_promisc must be set to greater than 1 for the patch to be effective.

To find out value of dlpi_max_ub_promisc: kctune -v dlpi_max_ub_promisc

Refer to bug 15940367

 

Bug 10363902 - HAIP Infiniband support for Linux and Solaris

Issue: GIPC HA disabled or HAIP fails to start if cluster interconnect is Infiniband or any other network hardware that has hardware address (MAC) longer than 6 bytes

Fixed in: 11.2.0.3 for Linux and Solaris

Symptom:

  • Output of root script:
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'racnode1'
CRS-5017: The resource action "ora.cluster_interconnect.haip start" 
encountered the following error:
Start action for HAIP aborted
CRS-2674: Start of 'ora.cluster_interconnect.haip' on 'racnode1' failed
  • $GRID_HOME/log/<hostname>/gipcd/gipcd.log
2010-12-07 13:23:08.560: [ USRTHRD][3858] {0:0:62} Arp::sCreateSocket {
2010-12-07 13:23:08.560: [ USRTHRD][3858] {0:0:62} failed to create arp
2010-12-07 13:23:08.561: [ USRTHRD][3858] {0:0:62} (null) category: -2, 
operation: ssclsi_aix_get_phys_addr, loc: aixgetpa:4,n, OS error: 2, other:


 

 

Bug 10357258 - Many HAIP started on Solaris IPMP - not affecting 11.2.0.3

Issue: many HAIP created after active NIC fails in IPMP

Fixed in: 11.2.0.3, 11.2.0.2 GI PSU3, interim patch 10357258 exists for 11.2.0.2, patch 11865154 for 11.2.0.2.1, affects Solaris only

Symptom:

  • ifconfig output:
nxge3:2: flags=21000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,STANDBY> mtu  1500 index 5
          inet 169.254.20.88 netmask ffff0000 broadcast 169.254.255.255
nxge3:3: flags=21000842<BROADCAST,RUNNING,MULTICAST,IPv4,STANDBY> mtu 1500  index 5
          inet 169.254.20.88 netmask ffff0000 broadcast 169.254.255.255
..

Note the same HAIP shows up multiple times

 

Bug 10397652/ 12767231 - HAIP not failing over when private network fails - not affecting 11.2.0.3

Issue: HAIP does not failover even when private network experiences problem (i.e. switch port disabled or such) as OS is not providing reliable link information

Fixed in: Bug 12767231 is fixed in 11.2.0.2 GI PSU4, 11.2.0.3

Workaround on AIX is to set "MONITOR" flag for all private network adapters

ifconfig en1 monitor
ifconfig en1
en1: flags=5e080863,2c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,
GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),PSEG,LARGESEND,CHAIN,MONITOR
        inet 192.168.10.83 netmask 0xfffffc00 broadcast 192.168.11.255
        inet 169.254.74.136 netmask 0xffff8000 broadcast 169.254.127.255
         tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0

Bug 11077756 - allow root script to continue upon HAIP failure

Issue: Startup failure of HAIP fails root script, fix of the bug will allow root script to continue so HAIP issue can be worked later.

Fixed in: 11.2.0.2 GI PSU6, 11.2.0.3 and above

Note: the consequence is that HAIP will be disabled. Once the cause is identified and solution is implemented, HAIP needs to be enabled when there's an outage window. To enable, as root on ALL nodes:

# $GRID_HOME/bin/crsctl modify res ora.cluster_interconnect.haip -attr "ENABLED=1" -init
# $GRID_HOME/bin/crsctl stop crs
# $GRID_HOME/bin/crsctl start crs

 

Bug 12546712 - not affecting 11.2.0.3

Issue: ASM crashes as HAIP does not fail over when two or more private network fails , refer to note 1323995.1 for more details.

 

HAIP fails to start if default gateway is configured for VLAN for private network on network switch

Issue: HAIP fails to start if default gateway is configured for VLAN for private network on network switch

orarootagent_root.log shows: PROBE: conflict detected src { 169.254.12.247, <gateway MAC on switch> }, target { 0.0.0.0, <private NIC MAC> }

The solution is to remove default gateway setting on network switch for private network (VLAN), refer to Note 1366211.1 for more details.

 

Bug 12425730 - HAIP does not start, 11.2.0.3 not affected

Issue: HAIP fails to start, gipcd.log shows rank 0 or "-1" for private network

Fixed in: 11.2.0.2 GI PSU6, 11.2.0.3 and onward, refer to note 1374360.1 for details.

 

 

ASM on Non-First Node (Second or Others) Fails to Start: PMON (ospid: nnnn): terminating the instance due to error 481

HAIP not running could affect instance start. Refer Note 1383737.1 for details

 

11gR2 GI HAIP Resource Not Created in Solaris 11 if IPMP is Used for Private Network

HAIP will not be enabled on Solaris 11 if IPMP is configured for private network. This is by design. Refer to Note 1512141.1

0 0