主机硬件问题导致rac节点重启

来源:互联网 发布:去除ghost预装软件 编辑:程序博客网 时间:2024/06/06 03:23

昨晚,rac节点重启,虽未影响应用,但需查明原因

1,查看数据库日志alert.log,显示数据库直接重启,重启之前没有任何日志

2012-11-11 06:00:00.091000 +08:00Setting Resource Manager plan SCHEDULER[0x310D]:DEFAULT_MAINTENANCE_PLAN via scheduler windowSetting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameterStarting background process VKRMVKRM started with pid=60, OS id=234992012-11-11 06:00:06.599000 +08:00Begin automatic SQL Tuning Advisor run for special tuning task  "SYS_AUTO_SQL_TUNING_TASK"2012-11-11 06:01:16.131000 +08:00End automatic SQL Tuning Advisor run for special tuning task  "SYS_AUTO_SQL_TUNING_TASK"2012-11-11 22:28:42.709000 +08:00Adjusting the default value of parameter parallel_max_serversfrom 1280 to 985 due to the value of parameter processes (1000)Starting ORACLE instance (normal)****************** Huge Pages Information *****************Huge Pages memory pool detected (total: 35840 free: 35840)DFLT Huge Pages allocation successful (allocated: 3001)***********************************************************2012-11-11 22:28:43.755000 +08:00LICENSE_MAX_SESSION = 0LICENSE_SESSIONS_WARNING = 02012-11-11 22:28:50.135000 +08:00Private Interface 'bond1:1' configured from GPnP for use as a private interconnect.  [name='bond1:1', type=1, ip=169.254.61.86, mac=00-1b-21-d5-26-b0, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]Public Interface 'bond0' configured from GPnP for use as a public interface.  [name='bond0', type=1, ip=10.4.124.235, mac=e4-1f-13-80-57-c1, net=10.4.124.224/27, mask=255.255.255.224, use=public/1]Public Interface 'bond0:1' configured from GPnP for use as a public interface.  [name='bond0:1', type=1, ip=10.4.124.245, mac=e4-1f-13-80-57-c1, net=10.4.124.224/27, mask=255.255.255.224, use=public/1]Picked latch-free SCN scheme 3Using LOG_ARCHIVE_DEST_1 parameter default value as USE_DB_RECOVERY_FILE_DESTAutotune of undo retention is turned on.LICENSE_MAX_USERS = 0SYS auditing is disabledStarting up:Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit ProductionWith the Partitioning, Real Application Clusters, OLAP, Data Miningand Real Application Testing options.Using parameter settings in server-side pfile /oracle/app/oracle/product/11.2.0/db_1/dbs/initSMPDB3.oraSystem parameters with non-default values:

ASM log

2012-11-11 22:28:05.078000 +08:00* instance_number obtained from CSS = 3, checking for the existence of node 0...* node 0 does not exist. instance_number = 3Starting ORACLE instance (normal)

2,linux系统日志/var/log/error和messages

error,疑点是memory crash kernel

Nov 11 22:22:21 dtydb5 kernel: Memory for crash kernel (0x0 to 0x0) notwithin permissible rangeNov 11 22:22:45 dtydb5 automount[17304]: lookup_read_master: lookup(nisplus): couldn't locate nis+ table auto.masterNov 11 22:26:32 dtydb5 ntpd[19555]: 10.7.0.81 is inappropriate address for the fudge command, line ignoredNov 11 22:26:33 dtydb5 logger: Oracle HA daemon is enabled for autostart.Nov 11 22:26:34 dtydb5 logger: exec /oracle/11.2.0/grid/perl/bin/perl -I/oracle/11.2.0/grid/perl/lib /oracle/11.2.0/grid/bin/crswrapexece.pl /oracle/11.2.0/grid/crs/install/s_crsconfig_dtydb5_env.txt /oracle/11.2.0/grid/bin/ohasd.bin "reboot"Nov 11 22:27:07 dtydb5 smartd[20467]: Problem creating device name scan listNov 11 22:27:56 dtydb5 multipathd: asm!.asm_ctl_spec: failed to store path infoNov 11 22:27:56 dtydb5 multipathd: uevent trigger errorNov 11 22:27:56 dtydb5 multipathd: asm!.asm_ctl_vmb: failed to store path infoNov 11 22:27:56 dtydb5 multipathd: uevent trigger errorNov 11 22:27:56 dtydb5 multipathd: asm!.asm_ctl_vdbg: failed to store path info
mesages 22:18 syslogd 重启,应该没啥问题

Nov 11 22:22:18 dtydb5 syslogd 1.4.1: restart.Nov 11 22:22:19 dtydb5 kernel: klogd 1.4.1, log source = /proc/kmsg started.Nov 11 22:22:19 dtydb5 kernel: Linux version 2.6.18-194.el5 (mockbuild@x86-005.build.bos.redhat.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Tue Mar 16 21:52:39 EDT 2010Nov 11 22:22:19 dtydb5 kernel: Command line: ro root=/dev/rootvg/LogVol00 rhgb quietNov 11 22:22:19 dtydb5 kernel: BIOS-provided physical RAM map:

3,rac日志,主要还是怀疑rac 节点被剔除重启导致服务器重启

crsd 日志:/oracle/11.2.0/grid/log/dtydb5/crsd/crsdOUT.log

2012-11-11 22:28:14Changing directory to /oracle/11.2.0/grid/log/dtydb5/crsd2012-11-11 22:28:14CRSD REBOOT
/oracle/11.2.0/grid/log/dtydb5/crsd/crsd.l01
2012-11-11 22:20:20.413: [UiServer][1171753280] {3:22096:3634} Sending message to PE. ctx= 0xd671ea02012-11-11 22:20:20.414: [   CRSPE][1169652032] {3:22096:3634} Processing PE command id=593485. Description: [Stat Resource : 0x2aaaadda9a60]2012-11-11 22:20:20.418: [UiServer][1171753280] {3:22096:3634} Done for ctx=0xd671ea02012-11-11 22:28:14.786: [ default][900772256] First attempt: init CSS context succeeded.[  clsdmt][1087560000]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=dtydb5DBG_CRSD))2012-11-11 22:28:14.791: [  clsdmt][1087560000]PID for the Process [21647], connkey 12012-11-11 22:28:14.792: [  clsdmt][1087560000]Creating PID [21647] file for home /oracle/11.2.0/grid host dtydb5 bin crs to /oracle/11.2.0/grid/crs/init/2012-11-11 22:28:14.792: [  clsdmt][1087560000]Writing PID [21647] to the file [/oracle/11.2.0/grid/crs/init/dtydb5.pid]2012-11-11 22:28:15.308: [ default][1087560000] Policy Engine is not initialized yet!2012-11-11 22:28:15.308: [ default][900772256] CRS Daemon Starting2012-11-11 22:28:15.311: [ default][900772256] ENV Logging level for Module: AGENT  12012-11-11 22:28:15.311: [ default][900772256] ENV Logging level for Module: AGFW  02012-11-11 22:28:15.311: [ default][900772256] ENV Logging level for Module: CLSFRAME  0

ohasd.log :/oracle/11.2.0/grid/log/dtydb5/ohasd/ohasd.log

2012-11-11 22:27:08.498: [ default][3640775072] OHASD Daemon Starting. Command string :reboot2012-11-11 22:27:08.500: [ default][3640775072] Initializing OLR2012-11-11 22:27:08.520: [  OCRRAW][3640775072]proprioo: for disk 0 (/oracle/11.2.0/grid/cdata/dtydb5.olr), id match (1), total id sets, (1) need recover (0), my votes (0), total votes (0), commit_lsn (4630), lsn (4630)2012-11-11 22:27:08.520: [  OCRRAW][3640775072]proprioo: my id set: (931531576, 1028247821, 0, 0, 0)2012-11-11 22:27:08.520: [  OCRRAW][3640775072]proprioo: 1st set: (931531576, 1028247821, 0, 0, 0)2012-11-11 22:27:08.520: [  OCRRAW][3640775072]proprioo: 2nd set: (0, 0, 0, 0, 0)2012-11-11 22:27:08.551: [ default][3640775072] Running mode check...2012-11-11 22:27:08.551: [ default][3640775072] OHASD running as the Privileged user2012-11-11 22:27:08.551: [ default][3640775072] Loading debug levels...2012-11-11 22:27:08.553: [ default][3640775072] OCR Logging level for Module: AGFW  02012-11-11 22:27:08.554: [ default][3640775072] OCR Logging level for Module: CLSFRAME  02012-11-11 22:27:08.554: [ default][3640775072] OCR Logging level for Module: CLSVER  02012-11-11 22:27:08.554: [ default][3640775072] OCR Logging level for Module: CLUCLS  02012-11-11 22:27:08.555: [ default][3640775072] OCR Logging level for Module: CRSAPP  02012-11-11 22:27:08.555: [ default][3640775072] OCR Logging level for Module: CRSCCL  0

crs alert  alertdtydb5.log

2012-11-11 22:27:08.548[ohasd(19651)]CRS-2112:The OLR service started on node dtydb5.2012-11-11 22:27:08.620[ohasd(19651)]CRS-1301:Oracle High Availability Service started on node dtydb5.2012-11-11 22:27:08.647[ohasd(19651)]CRS-8017:location: /etc/oracle/lastgasp has 2 reboot advisory log files, 0 were announced and 0 errors occurred2012-11-11 22:27:10.481[/oracle/11.2.0/grid/bin/oraagent.bin(20785)]CRS-5815:Agent '/oracle/11.2.0/grid/bin/oraagent_grid' could not find any base type entry points for type 'ora.daemon.type'. Details at (:CRSAGF00108:) {0:2:2} in /oracle/11.2.0/grid/log/dtydb5/agent/ohasd/oraagent_grid/oraagent_grid.log.2012-11-11 22:27:10.592[/oracle/11.2.0/grid/bin/oraagent.bin(20785)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/oracle/11.2.0/grid/log/dtydb5/agent/ohasd/oraagent_grid/oraagent_grid.log"2012-11-11 22:27:11.4962012-11-11 22:27:11.496[/oracle/11.2.0/grid/bin/orarootagent.bin(20781)]CRS-5016:Process "/oracle/11.2.0/grid/bin/acfsload" spawned by agent "/oracle/11.2.0/grid/bin/orarootagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/oracle/11.2.0/grid/log/dtydb5/agent/ohasd/orarootagent_root/orarootagent_root.log"2012-11-11 22:27:26.622[/oracle/11.2.0/grid/bin/oraagent.bin(20912)]CRS-5815:Agent '/oracle/11.2.0/grid/bin/oraagent_grid' could not find any base type entry points for type 'ora.daemon.type'. Details at (:CRSAGF00108:) {0:5:2} in /oracle/11.2.0/grid/log/dtydb5/agent/ohasd/oraagent_grid/oraagent_grid.log.2012-11-11 22:27:29.974[gpnpd(20934)]CRS-2328:GPNPD started on node dtydb5.
经检查,无网络和磁盘方面的问题,也无其它问题

4,系统方面无问题,只能看看服务器硬件方面了

登录web登录服务器的管理口,方面如下内容,问题基本可以确定了,硬件报错CPU 4:Cache error occurred.,这个问题只能硬件工程师来了

E 30 11/11/2012 22:19:21 OEM Event OEM Event CPU 4:Cache error occurred.


原创粉丝点击