MC/Service Guard测试(2/2)

来源:互联网 发布:ios9cydia软件源闪退 编辑:程序博客网 时间:2024/06/07 03:01

这段时间单位所有系统上线前的audit,sat(test),以及各种fail over(电,网络,cluster,存储等),忙得不亦乐乎。

所以cluster包的测试耽搁了两天,今天晚上总算找点时间配置测试完毕。更新如下:

 

 

  • 开始测试package
    • 增加新的共享vg1:
      • 增加vg1

netapp上增加lun
Netapp#l
un show

        /vol/vol1/iscsi1/lun0         10g (10737418240)   (r/w, online, unmap)

NetApp#lun map /vol/vol1/iscsi1/lun0 ig0

lun map: auto-assigned ig0=1
Db1#ioscan -H 255
DB1#insf -H 255
DB!#ioscan -funC disk
发现新硬盘

disk     16  255/0/5.0.0.1  sdisk      CLAIMED     DEVICE       NETAPP  LUN

                           /dev/dsk/c8t0d1   /dev/rdsk/c8t0d1

# mkdir /dev/vgdb1

# mknod /dev/vgdb1/group c 64 0x020000

# vgcreate /dev/vgdb1 /dev/dsk/c8t0d1

Increased the number of physical extents per physical volume to 2559.

Volume group "/dev/vgdb1" has been successfully created.

Volume Group configuration for /dev/vgdb1 has been saved in /etc/lvmconf/vgdb1.conf

# lvcreate -n lv_db1 /dev/vgdb1

Logical volume "/dev/vgdb1/lv_db1" has been successfully created with

character device "/dev/vgdb1/rlv_db1".

Volume Group configuration for /dev/vgdb1 has been saved in /etc/lvmconf/vgdb1.conf

#lvextend -l 2559 /dev/vgdb1/lv_db1
#
newfs -F vxfs /dev/vgdb1/rlv_db1

# mkdir /db1

# mount /dev/vgdb1/lv_db1 /db1

#umount /db1

#vgchange -a n /dev/vgdb1

在第二个节点上类似操作(查询设备,不建立vg

导出vg信息到第二个节点:
# vgexport -v -s -p -m /tmp/vgdb1.map vgdb1
# vgexport -v -s -p -f /tmp/vgdb1pv vgdb1

Beginning the export process on Volume Group "vgdb1".

/dev/dsk/c8t0d1

# rcp /tmp/vgdb1.map /tmp/vgdb1pv db2:/tmp
在第二个节点上:

# mkdir /dev/vgdb1

You have mail in /var/mail/root

# mknod /dev/vgdb1/group c 64 0x020000

# vgimport -v -f /tmp/vgdb1pv -m /tmp/vgdb1.map vgdb1

Beginning the import process on Volume Group "vgdb1".

vgimport: Warning:  Volume Group belongs to different CPU ID.

Can not determine if Volume Group is in use on another system. Continuing.

Logical volume "/dev/vgdb1/lv_db1" has been successfully created

with lv number 1.

Volume group "/dev/vgdb1" has been successfully created.

Warning: A backup of this volume group may not exist on this machine.

Please remember to take a backup using the vgcfgbackup command after activating the volume group.
激活查看属性ok

  • 修改cluster配置文件:
    • Mcdb.conf增加:
      USER_NAME  oracle
      USER_HOST  CLUSTER_MEMBER_NODE

USER_ROLE  MONITOR
VOLUME_GROUP            /dev/vgdb1
#cmcheckconf -v -C /etc/cmcluster/mcdb.conf
#cmapplyconf -v -C /etc/cmcluster/mcdb.conf
 

  • 安装oracle
    • 参照oracle文档,现在的环境是已经有一个数据库,准备迁移到cluster
      参数需要修改的:controlfiles:/u01/oradata/ora10/control01.ctl,02.ctl
      db_recovery_file_dest:/u01/flash_recovery_area
      Spfile:/opt/oracle/products/ora10/dbs/spfileora10.ora
      然后数据库文件的整体迁移.rman
      密码文件和参数文件放到共享存储上:重建密码文件,参数文件用
      Flash_recovery_area迁移
    • 增加node2节点上的opt大小扩充到20G
      /opt
         LV Name                     /dev/vg00/lvol6

   LV Status                   available/syncd          

   LV Size (Mbytes)            20480          

   Current LE                  640      

   Allocated PE                640        

   Used PV                     1      

Node2#lvextend -L 20480 /dev/vg00/lvol6
 

# extendfs -F vxfs /dev/vg00/lvol6--->error

vxfs extendfs: /dev/vg00/lvol6 is mounted, cannot extend.

#fsadm -F vxfs -b 20460M /opt

  • 手工建库:
    只在第一个节点装了数据库软件,而且建立了一个库(本地);第二个节点没装软件也没建库。能不能偷偷懒,把软件直接拷贝到第二台机器,然后把库迁移到共享存储上呢。可以:

先把oracle的目录整体拷贝看能用不,/var/ /etc/ /opt
SQL> alter system set db_recovery_file_dest='/db1/flash_recovery_area';
SQL> alter system set control_files='/db1/oradata/ora10/control01.ctl,/db1/oradata/ora10/control02.ctl' scope=spfile;
spfile拷贝到共享目录上,然后pfile建在本地,指向其.spfile=/db1/dbs/spfileora10.ora
controlfile改错了,alter system set control_files='/db1/oradata/ora10/control01.ctl','/db1/oradata/ora10/control02.ctl' scope=spfile;

Rename all datafile and redo log file to new location.
SQL>alter database open;
Copy initora10
文件到node2

手工挂接node2的共享存储,

password文件也可以考虑放到共享存储上。

OK!db 放到存储上了

  • 建立cluster package
    • Cd /etc/cmcluster
    • Mkdir db1
    • Cd db1
       

# cmmakepkg -p db1.conf

Package template is created.

This file must be edited before it can be used.

# cmmakepkg -s db1.cntl

Package control script is created.

This file must be edited before it can be used.

编辑两个文件。
Db1.cntl(默认值部分不再重复,只列出修改部分)
VG[0]=vgdb1

LV[0]="/dev/vgdb1/lv_db1"; FS[0]="/db1"; FS_MOUNT_OPT[0]=""

IP[0]="10.68.14.225"

SUBNET[0]="10.68.14.0"

 

SERVICE_NAME[0]="DB_RESOURCE"

SERVICE_CMD[0]="/etc/cmcluster/db1/oracle.sh monitor"

SERVICE_RESTART[0]="-r 2"

 

#SERVICE_NAME[1]="LSNR_RESOURCE"

#SERVICE_CMD[1]="/etc/cmcluster/db1/oracle.sh listener_monitor"

#SERVICE_RESTART[1]="-r 2"

 

function customer_defined_run_cmds

{

        /etc/cmcluster/db1/oracle.sh startup

        test_return 51

}

 

function customer_defined_halt_cmds

{

        /etc/cmcluster/db1/oracle.sh shutdown

        test_return 52

}

 

Db1.conf

PACKAGE_NAME            db1  

 

NODE_NAME               db1

NODE_NAME               db2

 

RUN_SCRIPT              /etc/cmcluster/db1/db1.cntl    

RUN_SCRIPT_TIMEOUT              NO_TIMEOUT

HALT_SCRIPT             /etc/cmcluster/db1/db1.cntl    

HALT_SCRIPT_TIMEOUT             NO_TIMEOUT

 

SERVICE_NAME                   DB_RESOURCE

SERVICE_FAIL_FAST_ENABLED      NO                      

SERVICE_HALT_TIMEOUT           30

 

SERVICE_NAME                   LSNR_RESOURCE

SERVICE_FAIL_FAST_ENABLED      NO                      

SERVICE_HALT_TIMEOUT           30

 

SUBNET                  10.68.14.0

 

另外,单独写了一个oracle.sh

 

#!/usr/bin/sh

ORACLE_HOME=/opt/oracle/products/ora10

SID_NAME=ora10

export ORACLE_HOME

export ORACLE_SID=${SID_NAME}

 

#monitor interval

MONITOR_INTERVAL=10

 

#monitor process

set -A DBA_MONITOR_PROCESSES ora_smon_${SID_NAME} ora_pmon_${SID_NAME} ora_lgwr_${SID_NAME} ora_dbw0_${SID_NAME} ora_ckpt_${SID_N

AME} LISTENER

 

function db_shutdown

{

print "Begin listener shutdown at `date`!"

su - oracle -c ${ORACLE_HOME}/bin/lsnrctl <<EOF

stop

exit

print "End listener shutdown at `date`!"

EOF

 

print "begin instance shutdown at `date`"

su - oracle -c ${ORACLE_HOME}/bin/sqlplus /nolog <<EOF

connect / as sysdba

shutdown immediate

EOF

print "end instance shutdown at `date`"

}

 

function db_startup

{

print "Begin listener startup at `date`!"

su - oracle -c ${ORACLE_HOME}/bin/lsnrctl <<EOF

connect / as sysdba

startup

EOF

print "End oracle startup at `date`."

 

}

 

function db_monitor

{

sleep ${MONITOR_INTERVAL}

typeset -i n=0

 

  for i in ${DBA_MONITOR_PROCESSES[@]}

  do

    DBA_MONITOR_PROCESSES_PID[$n]=`ps -fu oracle | awk '/'${i}'/ { print $2 }'`

    if [[ ${DBA_MONITOR_PROCESSES_PID[$n]} = "" ]]

    then

        print "/n"

        print "/n *** ${i} has failed at startup time, ABORTING Oracle! ***"

        exit

    fi

    (( n = n + 1 ))

  done

 

 

  sleep ${MONITOR_INTERVAL}

 set -A MONITOR_PROCESSES_PID ${DBA_MONITOR_PROCESSES_PID[@]}

 

  while true

  do

    for i in ${MONITOR_PROCESSES_PID[@]}

    do

      kill -s 0 ${i} > /dev/null

      if [[ $? != 0 ]]

      then

        print "/n"

        print "/n *** ${i} has failed at startup time, ABORTING Oracle! ***"

        exit

      fi

    done

 

    #check_ext_proc_lsnr

    sleep ${MONITOR_INTERVAL}

 

  done

}

 

if [[ $# != 1 ]]; then

    print "/n *** ${0} called with an incorrect number of arguments ***/n"

    print "Usage: ${0} [ shutdown | startup | monitor ]"

    print "$#: $@"

    exit

fi

 

print "/n *** $0 called with $1 argument! ***/n"

case $1 in

 

  startup)

        db_startup

  ;;

 

  shutdown)

        db_shutdown

  ;;

 

  monitor)

        db_monitor

  ;;

 

  *)

    print "Usage: ${0} [ shutdown | startup | monitor ]"

  ;;

esac

 

 

Rcp 3 files to node2
cmcheckconf -v -C /etc/cmcluster/mcdb.conf -P /etc/cmcluster/db1/db1.conf

cmapplyconf -v -C /etc/cmcluster/mcdb.conf -P /etc/cmcluster/db1/db1.conf

 

apply的时候,只有conf会自动分发,但其他用户自定义的文件-比如脚本比如cntl,不会自动分发。-需要手动分发。
 

switch自动切换的问题

两个node分别用cmmodpkg来修正auto switch的问题。

 

我脚本写的挺烂的,东西参考一些,南北自造一通。开始是oracle都跑不起来,后来总算跑起来了,不过跑完就断。服务monitor写了个空函数,就造成了早早退出。

后来抄了一个monitor程序,总体就是一个死循环,然后定时轮训进程状态。

后来服务注释了,竟然还能起来,找到原因(conf中没有删除服务)
 

# cmviewcl -v

 

CLUSTER      STATUS      

mcdb         up          

 

  NODE         STATUS       STATE       

  db1          up           running     

 

    Network_Parameters:

    INTERFACE    STATUS       PATH                NAME        

    PRIMARY      up           0/1/2/0             lan0        

    PRIMARY      up           0/1/2/1             lan1        

    STANDBY      up           0/3/1/0             lan2        

 

  NODE         STATUS       STATE       

  db2          up           running     

 

    Network_Parameters:

    INTERFACE    STATUS       PATH                NAME        

    PRIMARY      up           0/1/2/0             lan0        

    PRIMARY      up           0/1/2/1             lan1        

    STANDBY      up           0/3/1/0             lan2        

 

    PACKAGE      STATUS       STATE        AUTO_RUN     NODE        

    db1          up           running      enabled      db2         

 

      Policy_Parameters:

      POLICY_NAME     CONFIGURED_VALUE

      Failover        configured_node

      Failback        manual

 

      Script_Parameters:

      ITEM       STATUS   MAX_RESTARTS  RESTARTS   NAME

      Service    uninitia            0         0   DB_RESOURCE

      Service    uninitia            0         0   LSNR_RESOURCE

      Subnet     up                                10.68.14.0

 

      Node_Switching_Parameters:

      NODE_TYPE    STATUS       SWITCHING    NAME                     

      Primary      up           enabled      db1                      

      Alternate    up           enabled      db2          (current) 

 

 

listener启动失败原因:2->1拷贝的,listener.oraip没变

2 两个节点的ip都应该是vip, /etc/hosts

 

手工调试脚本先

./oracle.sh startup

./oracle.sh monitor

./oracle.sh shutdown

 

还有几个问题

1)手工停止db后,package状态还是up

2)需要加个debug或者maint状态

 

 

原创粉丝点击