主机硬件问题导致rac节点重启
来源:互联网 发布:去除ghost预装软件 编辑:程序博客网 时间:2024/06/06 03:23
昨晚,rac节点重启,虽未影响应用,但需查明原因
1,查看数据库日志alert.log,显示数据库直接重启,重启之前没有任何日志
2012-11-11 06:00:00.091000 +08:00Setting Resource Manager plan SCHEDULER[0x310D]:DEFAULT_MAINTENANCE_PLAN via scheduler windowSetting Resource Manager plan DEFAULT_MAINTENANCE_PLAN via parameterStarting background process VKRMVKRM started with pid=60, OS id=234992012-11-11 06:00:06.599000 +08:00Begin automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK"2012-11-11 06:01:16.131000 +08:00End automatic SQL Tuning Advisor run for special tuning task "SYS_AUTO_SQL_TUNING_TASK"2012-11-11 22:28:42.709000 +08:00Adjusting the default value of parameter parallel_max_serversfrom 1280 to 985 due to the value of parameter processes (1000)Starting ORACLE instance (normal)****************** Huge Pages Information *****************Huge Pages memory pool detected (total: 35840 free: 35840)DFLT Huge Pages allocation successful (allocated: 3001)***********************************************************2012-11-11 22:28:43.755000 +08:00LICENSE_MAX_SESSION = 0LICENSE_SESSIONS_WARNING = 02012-11-11 22:28:50.135000 +08:00Private Interface 'bond1:1' configured from GPnP for use as a private interconnect. [name='bond1:1', type=1, ip=169.254.61.86, mac=00-1b-21-d5-26-b0, net=169.254.0.0/16, mask=255.255.0.0, use=haip:cluster_interconnect/62]Public Interface 'bond0' configured from GPnP for use as a public interface. [name='bond0', type=1, ip=10.4.124.235, mac=e4-1f-13-80-57-c1, net=10.4.124.224/27, mask=255.255.255.224, use=public/1]Public Interface 'bond0:1' configured from GPnP for use as a public interface. [name='bond0:1', type=1, ip=10.4.124.245, mac=e4-1f-13-80-57-c1, net=10.4.124.224/27, mask=255.255.255.224, use=public/1]Picked latch-free SCN scheme 3Using LOG_ARCHIVE_DEST_1 parameter default value as USE_DB_RECOVERY_FILE_DESTAutotune of undo retention is turned on.LICENSE_MAX_USERS = 0SYS auditing is disabledStarting up:Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit ProductionWith the Partitioning, Real Application Clusters, OLAP, Data Miningand Real Application Testing options.Using parameter settings in server-side pfile /oracle/app/oracle/product/11.2.0/db_1/dbs/initSMPDB3.oraSystem parameters with non-default values:
ASM log
2012-11-11 22:28:05.078000 +08:00* instance_number obtained from CSS = 3, checking for the existence of node 0...* node 0 does not exist. instance_number = 3Starting ORACLE instance (normal)
2,linux系统日志/var/log/error和messages
error,疑点是memory crash kernel
Nov 11 22:22:21 dtydb5 kernel: Memory for crash kernel (0x0 to 0x0) notwithin permissible rangeNov 11 22:22:45 dtydb5 automount[17304]: lookup_read_master: lookup(nisplus): couldn't locate nis+ table auto.masterNov 11 22:26:32 dtydb5 ntpd[19555]: 10.7.0.81 is inappropriate address for the fudge command, line ignoredNov 11 22:26:33 dtydb5 logger: Oracle HA daemon is enabled for autostart.Nov 11 22:26:34 dtydb5 logger: exec /oracle/11.2.0/grid/perl/bin/perl -I/oracle/11.2.0/grid/perl/lib /oracle/11.2.0/grid/bin/crswrapexece.pl /oracle/11.2.0/grid/crs/install/s_crsconfig_dtydb5_env.txt /oracle/11.2.0/grid/bin/ohasd.bin "reboot"Nov 11 22:27:07 dtydb5 smartd[20467]: Problem creating device name scan listNov 11 22:27:56 dtydb5 multipathd: asm!.asm_ctl_spec: failed to store path infoNov 11 22:27:56 dtydb5 multipathd: uevent trigger errorNov 11 22:27:56 dtydb5 multipathd: asm!.asm_ctl_vmb: failed to store path infoNov 11 22:27:56 dtydb5 multipathd: uevent trigger errorNov 11 22:27:56 dtydb5 multipathd: asm!.asm_ctl_vdbg: failed to store path info
mesages 22:18 syslogd 重启,应该没啥问题Nov 11 22:22:18 dtydb5 syslogd 1.4.1: restart.Nov 11 22:22:19 dtydb5 kernel: klogd 1.4.1, log source = /proc/kmsg started.Nov 11 22:22:19 dtydb5 kernel: Linux version 2.6.18-194.el5 (mockbuild@x86-005.build.bos.redhat.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Tue Mar 16 21:52:39 EDT 2010Nov 11 22:22:19 dtydb5 kernel: Command line: ro root=/dev/rootvg/LogVol00 rhgb quietNov 11 22:22:19 dtydb5 kernel: BIOS-provided physical RAM map:
3,rac日志,主要还是怀疑rac 节点被剔除重启导致服务器重启
crsd 日志:/oracle/11.2.0/grid/log/dtydb5/crsd/crsdOUT.log
2012-11-11 22:28:14Changing directory to /oracle/11.2.0/grid/log/dtydb5/crsd2012-11-11 22:28:14CRSD REBOOT/oracle/11.2.0/grid/log/dtydb5/crsd/crsd.l01
2012-11-11 22:20:20.413: [UiServer][1171753280] {3:22096:3634} Sending message to PE. ctx= 0xd671ea02012-11-11 22:20:20.414: [ CRSPE][1169652032] {3:22096:3634} Processing PE command id=593485. Description: [Stat Resource : 0x2aaaadda9a60]2012-11-11 22:20:20.418: [UiServer][1171753280] {3:22096:3634} Done for ctx=0xd671ea02012-11-11 22:28:14.786: [ default][900772256] First attempt: init CSS context succeeded.[ clsdmt][1087560000]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=dtydb5DBG_CRSD))2012-11-11 22:28:14.791: [ clsdmt][1087560000]PID for the Process [21647], connkey 12012-11-11 22:28:14.792: [ clsdmt][1087560000]Creating PID [21647] file for home /oracle/11.2.0/grid host dtydb5 bin crs to /oracle/11.2.0/grid/crs/init/2012-11-11 22:28:14.792: [ clsdmt][1087560000]Writing PID [21647] to the file [/oracle/11.2.0/grid/crs/init/dtydb5.pid]2012-11-11 22:28:15.308: [ default][1087560000] Policy Engine is not initialized yet!2012-11-11 22:28:15.308: [ default][900772256] CRS Daemon Starting2012-11-11 22:28:15.311: [ default][900772256] ENV Logging level for Module: AGENT 12012-11-11 22:28:15.311: [ default][900772256] ENV Logging level for Module: AGFW 02012-11-11 22:28:15.311: [ default][900772256] ENV Logging level for Module: CLSFRAME 0
ohasd.log :/oracle/11.2.0/grid/log/dtydb5/ohasd/ohasd.log
2012-11-11 22:27:08.498: [ default][3640775072] OHASD Daemon Starting. Command string :reboot2012-11-11 22:27:08.500: [ default][3640775072] Initializing OLR2012-11-11 22:27:08.520: [ OCRRAW][3640775072]proprioo: for disk 0 (/oracle/11.2.0/grid/cdata/dtydb5.olr), id match (1), total id sets, (1) need recover (0), my votes (0), total votes (0), commit_lsn (4630), lsn (4630)2012-11-11 22:27:08.520: [ OCRRAW][3640775072]proprioo: my id set: (931531576, 1028247821, 0, 0, 0)2012-11-11 22:27:08.520: [ OCRRAW][3640775072]proprioo: 1st set: (931531576, 1028247821, 0, 0, 0)2012-11-11 22:27:08.520: [ OCRRAW][3640775072]proprioo: 2nd set: (0, 0, 0, 0, 0)2012-11-11 22:27:08.551: [ default][3640775072] Running mode check...2012-11-11 22:27:08.551: [ default][3640775072] OHASD running as the Privileged user2012-11-11 22:27:08.551: [ default][3640775072] Loading debug levels...2012-11-11 22:27:08.553: [ default][3640775072] OCR Logging level for Module: AGFW 02012-11-11 22:27:08.554: [ default][3640775072] OCR Logging level for Module: CLSFRAME 02012-11-11 22:27:08.554: [ default][3640775072] OCR Logging level for Module: CLSVER 02012-11-11 22:27:08.554: [ default][3640775072] OCR Logging level for Module: CLUCLS 02012-11-11 22:27:08.555: [ default][3640775072] OCR Logging level for Module: CRSAPP 02012-11-11 22:27:08.555: [ default][3640775072] OCR Logging level for Module: CRSCCL 0
crs alert alertdtydb5.log
2012-11-11 22:27:08.548[ohasd(19651)]CRS-2112:The OLR service started on node dtydb5.2012-11-11 22:27:08.620[ohasd(19651)]CRS-1301:Oracle High Availability Service started on node dtydb5.2012-11-11 22:27:08.647[ohasd(19651)]CRS-8017:location: /etc/oracle/lastgasp has 2 reboot advisory log files, 0 were announced and 0 errors occurred2012-11-11 22:27:10.481[/oracle/11.2.0/grid/bin/oraagent.bin(20785)]CRS-5815:Agent '/oracle/11.2.0/grid/bin/oraagent_grid' could not find any base type entry points for type 'ora.daemon.type'. Details at (:CRSAGF00108:) {0:2:2} in /oracle/11.2.0/grid/log/dtydb5/agent/ohasd/oraagent_grid/oraagent_grid.log.2012-11-11 22:27:10.592[/oracle/11.2.0/grid/bin/oraagent.bin(20785)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/oracle/11.2.0/grid/log/dtydb5/agent/ohasd/oraagent_grid/oraagent_grid.log"2012-11-11 22:27:11.4962012-11-11 22:27:11.496[/oracle/11.2.0/grid/bin/orarootagent.bin(20781)]CRS-5016:Process "/oracle/11.2.0/grid/bin/acfsload" spawned by agent "/oracle/11.2.0/grid/bin/orarootagent.bin" for action "check" failed: details at "(:CLSN00010:)" in "/oracle/11.2.0/grid/log/dtydb5/agent/ohasd/orarootagent_root/orarootagent_root.log"2012-11-11 22:27:26.622[/oracle/11.2.0/grid/bin/oraagent.bin(20912)]CRS-5815:Agent '/oracle/11.2.0/grid/bin/oraagent_grid' could not find any base type entry points for type 'ora.daemon.type'. Details at (:CRSAGF00108:) {0:5:2} in /oracle/11.2.0/grid/log/dtydb5/agent/ohasd/oraagent_grid/oraagent_grid.log.2012-11-11 22:27:29.974[gpnpd(20934)]CRS-2328:GPNPD started on node dtydb5.经检查,无网络和磁盘方面的问题,也无其它问题
4,系统方面无问题,只能看看服务器硬件方面了
登录web登录服务器的管理口,方面如下内容,问题基本可以确定了,硬件报错CPU 4:Cache error occurred.,这个问题只能硬件工程师来了
E 30 11/11/2012 22:19:21 OEM Event OEM Event CPU 4:Cache error occurred.
- 主机硬件问题导致rac节点重启
- 9i rac 脑裂导致节点重启
- RAC+DG 单节点重启DG失效问题
- 未配置默认网关导致RAC数据库VIP启动失败,节点重启
- 一次RAC共享磁盘映射问题导致RAC异常重启的故障处理过程
- oracle RAC一个节点频繁重启
- oracle 10g RAC 节点重启,但是没有记录有效的日志信息--问题诊断
- rac目录权限导致重启失败(转)
- RAC OCR盘故障导致的集群重启恢复
- 最常见的5个导致节点重新启动、驱逐或 CRS 意外重启的问题
- rac节点问题
- rac集群节点级联重启故障分析
- 今天装rac遇到linux节点不断重启
- ORACLE 10G RAC 节点自动重启故障处理
- 关于RAC节点重启的一点胡言乱语
- ORACLE RAC节点意外重启Node Eviction诊断流程图
- RAC删除节点失败重启大法解决
- RAC各节点硬件配置可以不一样
- C# WinForm捕获全局异常
- [LeetCode]Container With Most Water
- 一些linux书及资料
- C++语言 获取EXE文件的属性详细信息
- x86机器(VMware安装Linux系统)启动日志、日志分析、故障分析
- 主机硬件问题导致rac节点重启
- 按键扫描
- C++语言 实现类对象的单模式创建
- 图像处理代码
- 06mkfile-03tree
- C++语言 对话框程序设计
- 栈 Stack
- C++中的常用修饰符inline
- 未捕捉的异常