2套RAC容灾切换+更改容灾RAC的虚拟ip和scan-ip

来源：互联网发布：linux 标准输出重定向编辑：程序博客网时间：2024/05/18 01:12

背景：医院当前服务器数据库环境为11.2.0.4的RAC，存储使用的是虚拟化存储设备SVC，后续称为主RAC。还有一套备用服务器数据库环境也是11.2.0.4的RAC，存储也是使用SVC，后续称为容灾RAC。医院想到做到当主RAC出现崩溃，或者存储出现问题，能及时切到容灾服务器接管业务，客户端不用改变任何配置就能在短暂时间内重新工作，数据不丢失。关键知识：
SVC：IBM的SVC_v7000是一种在存储上做容灾的软件，使用该IBM的软件管理的存储，会有一种主从关系，主、从卷之间可以通过设置保持存储上数据的一一致性，还可以通过切换来控制当前服务器使用的是主卷，还是从卷。
VIP:虚拟IP用于客户端应用，以支持失效转移，通俗说就是一台挂了，另一台自动接管，客户端没有任何感觉。在所有节点都正常运行时，每个节点的VIP会被分配到public网卡上，在linux下ifconfig查看，public网卡上是2个IP地址；如果一个节点宕机，这个节点的VIP会被转移到还在运行的节点上。也就是幸存的节点的public NIC这个网卡上，会有3个IP地址。
核心工作：要保障主RAC切换到容灾RAC切换后，集群、数据库能正常启动，客户端不更改任何配置就能继续使用，那么必须要保证主、容灾RAC两边的虚拟IP相同，这样两套RAC不能同时处于开机状态。
通过前面一天的数据库检查、集群检查和用户沟通如果要实验上述功能，那么就必须更改容灾服务器的虚拟IP。且由于scan-ip的存在，那么这次共需要更改：scan-ip和vip。
更改虚拟IP步骤：
（1）手工执行主RAC的OCR备份，以便更改错误进行恢复，两个节点都要做
[root@gddb1 bin]# ./ocrconfig -showbackup
[root@gddb1 bin]# ./ocrconfig -manualbackup
[root@gddb2 bin]# ./ocrconfig -manualbackup
（2）关闭所有的crs资源，仅保留crs的后台进程
--停止相关资源、全局停、停一边两边都停了（为了确认，当时两个节点都停执行了一次）：
[root@gddb1 ~]# cd /u01/app/11.2.0/grid_1/bin/
[root@gddb1 bin]# ./srvctl stop database -d test
[root@gddb1 bin]# ./srvctl stop listener
[root@gddb1 bin]# ./srvctl stop scan_listener
[root@gddb1 bin]# ./srvctl stop scan
[root@gddb1 bin]# ./srvctl stop cvu
[root@gddb1 bin]# ./srvctl stop nodeapps -n gddb1
[root@gddb1 bin]# ./srvctl stop nodeapps -n gddb2

（3）修改hosts操作系统上的IP，两个节点都要做，包含SCAN-IP
192.168.0.252 gddb1-vip
192.168.0.254 gddb2-vip
192.168.0.53 node-scan

------------------------------下列步骤只在一个节点做------------------------------
（1）  用grid用户修改VIP 地址，用root在一个节点做
[root@gddb1 bin]# ./srvctl modify nodeapps -n gddb1 -A 192.168.0.252/255.255.255.0/eth0
[root@gddb1 bin]# ./srvctl modify nodeapps -n gddb2 -A 192.168.0.254/255.255.255.0/eth0
--两个节点都检查VIP地址是否修改成功
[root@gddb1 bin]# ./srvctl config nodeapps -a
Network exists: 1/192.168.0.0/255.255.255.0/eth0, type static
VIP exists: /gddb1-vip/192.168.0.252/192.168.0.0/255.255.255.0/eth0, hosting node gddb1
VIP exists: /gddb2-vip/192.168.0.254/192.168.0.0/255.255.255.0/eth0, hosting node gddb2

（2）更改scan-ip
[root@node1 bin]# ./srvctl config scan
SCAN name: rac-scan, Network: 1/192.168.0.0/255.255.255.0/eth0
SCAN VIP name: scan1, IP: /node-scan/192.168.0.54
[root@node1 bin]# ./srvctl stop scan_listener
PRCC-1016 : LISTENER_SCAN1 was already stopped
PRCR-1005 : Resource ora.LISTENER_SCAN1.lsnr is already stopped

[root@node1 bin]# ./srvctl stop scan
PRCC-1016 : scan1 was already stopped
PRCR-1005 : Resource ora.scan1.vip is already stopped

[root@node1 bin]# ./srvctl status scan
SCAN VIP scan1 is enabled
SCAN VIP scan1 is not running
[root@node1 bin]# ./srvctl modify scan -n node-scan
[root@node1 bin]# ./srvctl config scan
SCAN name: rac-scan, Network: 1/192.168.0.0/255.255.255.0/eth0
SCAN VIP name: scan1, IP: /rac-scan/192.168.0.54

遇到的问题：
SQL> startup
ORA-00119: invalid specification for system parameter REMOTE_LISTENER
ORA-00132: syntax error or unresolved network name 'node-scan:1521'

处理：更改SCAN-IP和两节点的连接字符串配置（tnsname.ora）的名称为：node-scan，这是由于参数文件中也有该参数的配置，所以必须保持名字一致。

[grid@gddb1 admin]$ pwd
/u01/app/oracle/product/11.2.0/db_1/network/admin
[grid@gddb1 admin]$ cat tnsnames.ora
# tnsnames.ora Network Configuration File: /u01/app/oracle/product/11.2.0/db_1/network/admin/tnsnames.ora
# Generated by Oracle configuration tools.

TEST =
  (DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = node-scan)(PORT = 1521))
(CONNECT_DATA =
   (SERVER = DEDICATED)
   (SERVICE_NAME = test)
)
  )

----------------------------------------------------------------------------------------------------
（3）手工启动crs
[root@gddb1 bin]#./crsctl enable crs
[root@gddb1 bin]# ./crsctl start crs
[root@gddb2 bin]# ./crsctl enable crs
[root@gddb2 bin]# ./crsctl start crs

./srvctl start nodeapps -n rac01
./srvctl start nodeapps -n rac02

（4）最后重启整个集群即可：
重启整个集群，用root两个节点都要做：
./crsctl stop crs
./crsctl start crs

--实例启动：
srvctl start instance -d test -i test2

srvctl start instance -d test -i test1

（5）其他可用命令：
crs_start -all --启动其他未启动的资源

0 0