ORACLE10G+asm+RAC集群(linux)

来源:互联网 发布:ec软件使用 编辑:程序博客网 时间:2024/06/04 19:07
linux环境下建asm rac
RAC集群的关键点:
    1.共享存储
    2.节点间需要内部通讯,以协调集群正常运行,所以每个节点需要提供外部网络与内部网络.
    3.CRS集群软件: 需要集群软件(Clusterware)协调各节点.
    4.集群注册文件(OCR): 需要注册集群,保存在共享磁盘上.
    5.仲裁磁盘(Voting Disk): 需要协调各节点决定控制权,做为表决器,保存在共享磁盘上.
    6.虚拟IP(Virturl IP): 提供客户端连接,IP由集群软件接管,当集群就绪时,虚拟IP可以连接.
    共享存储访问方式(存储系统):
    1.集群文件系统(CFS:Cluster File System)
    2.自动存储管理(ASM:Automatic Storage Management)
    3.网络文件系统(NFS)
    4.裸设备(RAW)
    单机文件系统FAT32,NTFS,ext3不能作为共享存储
    选择以下储存方案来建立集群系统:
    项目                存储系统            存储位置
    Clusterware软件     本地文件系统        本地磁盘
    voting disk         RAW                 共享磁盘
    OCR                 RAW                 共享磁盘
    数据库软件          本地文件系统        共享磁盘
    数据库              ASM                 共享磁盘


环境:2台虚拟机(rac1,rac2),2块网卡,一块共享存储30g,内存2g


安装前准备 :
一、网络主机名(每个节点)
 1、修改网卡IP   vi /etc/sysconfig/network-scripts/ifcfg-eth0 (1)


 2、修改hosts文件  vi /etc/hosts
#rac1
192.168.56.10  rac1
10.10.10.10   rac1priv
192.168.56.211  rac1vip
#rac2
192.168.56.11  rac2
10.10.10.11   rac2priv
192.168.56.212  rac2vip
 3、修改主机名  vi /etc/sysconfig/network


 4、修改完成后,重启网络服务 service network restart


二、关闭不需要的服务(每个节点)
chkconfig  autofs off
chkconfig  acpid off
chkconfig  sendmail off
chkconfig  cups-config-daemon off
chkconfig  cpus off
chkconfig  xfs off
chkconfig  lm_sensors off
chkconfig  gpm off
chkconfig  openibd off
chkconfig  pcmcia off
chkconfig   cpuspeed off
chkconfig   nfslock off
chkconfig   ip6tables off
chkconfig   rpcidmapd off
chkconfig   apmd off
chkconfig   sendmail off
chkconfig   arptables_jf off
chkconifg   microcode_ctl off
chkconfig   rpcgssd off
chkconfig ntpd off
三、安装支持oracle软件所需的系统插件(每个节点) (不确定)
安装oracle依赖的软件包
mount光盘
[root@node1 ~]# mount /dev/cdrom /mnt
mount: block device /dev/cdrom is write-protected, mounting read-only
修改yum源
[root@node1 ~]# vi /etc/yum.repos.d/rhel-debuginfo.repo
name=Red Hat Enterprise Linux $releasever - $basearch - Debug
baseurl=file:///mnt/Server
enabled=1
gpgcheck=0
改好后  刷新
[root@node1 ~]# yum clean all
安装包
[root@server1 yum.repos.d]# yum install -y lib*


yum install -y  binutils-* libXp*  compat-libstdc++-33-* elfutils-libelf-* elfutils-libelf-devel-* gcc-* gcc-c++-* glibc-* glibc-common-* glibc-devel-* glibc-headers-* ksh-* libaio-* libgcc-* libstdc++-*  make-* sysstat-* unixODBC-*  unixODBC-devel-*




mount /dev/cdrom /mnt
cd /mnt/Server
rpm -p compat-db-4*
rpm -Uvh libaio-0*
rpm -Uvh compat-libstdc++-33-3*
rpm -Uvh compat-gcc-34-3*
rpm -Uvh compat-gcc-34-c++-3*
rpm -Uvh libXp-1*
rpm -Uvh openmotif-2*
rpm -Uvh gcc-4*
rpm -Uvh glibc-2.5-12.i686.rpm


四、创建oracle用户和dba组,rac的各个节点都要创建
groupadd -g 1100 dba
useradd -u 1000 -g dba oracle
passwd oracle


五、配置互信,每台机器都要执行
su - oracle


/usr/bin/ssh-keygen -t rsa
/usr/bin/ssh-keygen -t dsa


在第二个节点
cd .ssh
scp id_rsa.pub rac1:/home/oracle/.ssh/id_rsa.pub2
scp id_dsa.pub rac1:/home/oracle/.ssh/id_dsa.pub2


在第一台机器执行
cd .ssh
 cat id_dsa.pub  id_dsa.pub2 id_rsa.pub  id_rsa.pub2>authorized_keys
 chmod 644 authorized_keys
scp authorized_keys rac02:/home/oracle/.ssh


请注意,当您使用 ssh 第一次访问远程主机时,其 RSA 密钥将是未知的,从而将提示您确认是否希望连接该主机。 SSH 将记录该远程主机的 RSA 密钥,并在以后连接该主机时不再做出相关提示。 
在每台机器上,以 oracle 用户身份登录,运行
ssh rac1 date
ssh rac1priv date
ssh rac2 date
ssh rac2priv date


六、修改系统参数


1、修改系统核心参数 vi /etc/sysctl.conf (root用户)
kernel.core_uses_pid = 1
fs.file-max = 65536
fs.aio-max-nr = 1048576
net.ipv4.ip_local_port_range = 1024 65000
net.core.rmem_default = 1048576
net.core.rmem_max = 1048576
net.core.wmem_default = 262144
net.core.wmem_max = 262144
kernel.shmmni = 4096
kernel.sem = 500 64000 100 128
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1


sysctl -p 使它生效


2、编辑vi /etc/profile 文件,添加如下部分:
if [ $USER = "oracle" ]; then
if [ $SHELL = "/bin/ksh" ]; then
ulimit -p 16384
ulimit -n 65536
else
ulimit -u 16384 -n 65536
fi
fi
之后,执行:$ulimit验证一下.


3、往vi /etc/csh.login文件里追加以下内容:
if ( $USER == "oracle" ) then
limit maxproc 16384
limit descriptors 65536
umask 022
endif
4、修改用户限制 vi /etc/security/limits.conf
oracle  soft     nofile 655360
oracle  hard     nofile 635360
oracle  soft    nproc   10240
oracle  hard    nproc   16384








七、修改oracle用户的环境变量 vi /home/oracle/.bash_profile
export ORACLE_BASE=/oracle/app/oracle
export ORACLE_HOME=$ORACLE_BASE/product/10.2/db_1
export ORA_CRS_HOME=$ORACLE_BASE/product/10.2/crs
export ORACLE_SID=test1(test2)
export PATH=$ORACLE_HOME/bin:$ORA_CRS_HOME/bin:$PATH


八、创建/oracle目录,并赋予oracle用户的权限
mkdir /oracle
chown -R oracle:dba /oracle
chmod -R 755 /oracle




九、配置Hangcheck 计时器 (可以不用配置,把安装软件的节点时间调的比其他节点时间慢)
vi /etc/rc.local
增加:
modprobe hangcheck-timer hangcheck-tick=30 hangcheck_margin=180
这个增加后,记住一定要重新或者执行生效啊!


要立即加载模块,执行


modprobe -v hangcheck-timer 


查看是否执行成功,下面为成功


lsmod | grep hangcheck_timer


hangcheck_timer         8153  0


-------------------修改时间方法(修改节点1比节点2慢)
date -s 13:00:00( 例子:修改为13点整)


十、划分共享磁盘分区 


1、查看磁盘信息 fdisk -l
Disk /dev/sda: 32.2 GB, 32212254720 bytes
255 heads, 63 sectors/track, 3916 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes


   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          13      104391   83  Linux
/dev/sda2              14         535     4192965   82  Linux swap / Solaris
/dev/sda3             536        3916    27157882+  83  Linux


Disk /dev/sdb: 32.2 GB, 32212254720 bytes
255 heads, 63 sectors/track, 3916 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes


Disk /dev/sdb doesn't contain a valid partition table
2、划分/dev/sdb磁盘分区
fdisk /dev/sdb
 Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable.




The number of cylinders for this disk is set to 3916.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)


Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-3916, default 1): 
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-3916, default 3916): +400M   


Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 2
First cylinder (51-3916, default 51): 
Using default value 51
Last cylinder or +size or +sizeM or +sizeK (51-3916, default 3916): +400M


Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 3
First cylinder (101-3916, default 101): 
Using default value 101
Last cylinder or +size or +sizeM or +sizeK (101-3916, default 3916): 
Using default value 3916


Command (m for help): w
The partition table has been altered!


Calling ioctl() to re-read partition table.
Syncing disks.


 




把分区映射为raw盘,2个节点
节点一:
cd  /etc/udev/rules.d
vi 60-raw.rules
在里面添加
ACTION=="add", KERNEL=="sdb1", RUN+="/bin/raw /dev/raw/raw1 %N"
ACTION=="add", KERNEL=="sdb2", RUN+="/bin/raw /dev/raw/raw2 %N"
ACTION=="add", KERNEL=="sdb3", RUN+="/bin/raw /dev/raw/raw3 %N"
ACTION=="add", KERNEL=="raw[1-3]",OWNER="oracle",GROUP="dba",MODE="660"


保存完成后,使它生效 start_udev 
验证:raw -qa
[root@rac1 rules.d]# raw -qa
/dev/raw/raw1: bound to major 8, minor 17
/dev/raw/raw2: bound to major 8, minor 18
/dev/raw/raw3: bound to major 8, minor 19
节点二:
cd  /etc/udev/rules.d
vi 60-raw.rules
在里面添加
ACTION=="add", KERNEL=="sdb1", RUN+="/bin/raw /dev/raw/raw1 %N"
ACTION=="add", KERNEL=="sdb2", RUN+="/bin/raw /dev/raw/raw2 %N"
ACTION=="add", KERNEL=="sdb3", RUN+="/bin/raw /dev/raw/raw3 %N"
ACTION=="add", KERNEL=="raw[1-3]",OWNER="oracle",GROUP="dba",MODE="660"


partprobe  --重新读取分区
start_udev -- 启动
raw -qa --查看
[root@rac2 rules.d]# raw -qa
/dev/raw/raw1: bound to major 8, minor 17
/dev/raw/raw2: bound to major 8, minor 18
/dev/raw/raw3: bound to major 8, minor 19


十一、上传软件至/oracle目录下
      解压:gunzip *.gz
            
            cpio -idcmv< *.cpio
c
            unzip *.zip


chown -R oracle:dba /oracle
chmod -R 755 /oracle


开始安装:
一、安装集群软件
用oracle用户,执行安装
 
用root用户执行rootpre.sh 脚本,两个节点
[root@rac1 ~]# cd /oracle/clusterware/
[root@rac1 clusterware]# cd rootpre/
[root@rac1 rootpre]# ls
rootpre.sh
[root@rac1 rootpre]# ./rootpre.sh
No OraCM running 
[root@rac1 rootpre]# scp rootpre.sh rac2:/oracle
The authenticity of host 'rac2 (192.168.56.30)' can't be established.
RSA key fingerprint is c6:99:59:37:f5:e5:0d:9e:c6:72:18:ab:1c:2a:46:19.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'rac2,192.168.56.30' (RSA) to the list of known hosts.
root@rac2's password: 
rootpre.sh                                                            100% 2981     2.9KB/s   00:00    


 
执行完成后,在 按Y键




出现以下界面,现在可以开始安装了
 




 




 
把警告忽略
 
 
 
 
 
 


 


 
每个节点执行上面的脚本,执行一个脚本后,再去另外个节点执行,切记不要同时执行
/oracle/app/oracle/oraInventory/orainstRoot.sh


/oracle/app/oracle/product/10.2/crs/root.sh
执行第二个脚本时间用的久,耐心等待!!!!!!!




在第二个节点,执行第二个脚本,报错


 
正常,这是oracle10g安装在red hat5上的bug
解决方法:1.直接升级集群软件
           2.修改vipca


我们用第二种方法:
cd /oracle/app/oracle/product/10.2/crs/bin
vi vipca
在里面添加 unset LD_ASSUME_KERNEL 
 。。。。编辑这个写错一行了。。。(你妹害我重装)
正确:
 
在oracle用户下运行 oifcfg iflist


[oracle@rac2 ~]$ oifcfg iflist
eth0  192.168.56.0
eth1  10.10.10.0




在root用户下执行


./oifcfg setif -global eth0/192.168.56.0:public 
./oifcfg setif -global eth1/10.10.10.0:cluster_interconnect




再执行./vipca
报错。。。。
 


直接下一步,还是报错
 


估计原因就是因为,vipca服务没启动


退出重来吧。。。。或者直接升级也可以解决,我选择重来。


删掉集群
1 、 cd $ORA_CRS_HOME/install  root用户
     执行 ./rootdeinstall.sh
          ./rootdelete.sh


2. Stop the Nodeapps on all nodes:


srvctl stop nodeapps -n <node name>


3. rm -f /etc/init.d/init.cssd 
rm -f /etc/init.d/init.crs 
rm -f /etc/init.d/init.crsd 
rm -f /etc/init.d/init.evmd 
rm -f /etc/rc2.d/K96init.crs
rm -f /etc/rc2.d/S96init.crs
rm -f /etc/rc3.d/K96init.crs
rm -f /etc/rc3.d/S96init.crs
rm -f /etc/rc5.d/K96init.crs
rm -f /etc/rc5.d/S96init.crs
        rm -Rf /etc/oracle/scls_scr
rm -f /etc/inittab.crs 
cp /etc/inittab.orig /etc/inittab
4.rm -rf <CRS Install Location>/*




5. dd if=/dev/zero of=/dev/raw/raw1 bs=8192 count=2560
   dd if=/dev/zero of=/dev/raw/raw2 bs=8192 count=12800
6.删完重启










------再次安装,这次执行到最后,只出现一个脚本。。。。


 


又报这个错
 


编辑vipca
 正确编辑的
先前放错位置了,难怪报错。。,fuck!!! 一定要细心


 










 


 


运行完成后,直接OK,










 




安装oracle软件
 
 


 
 


 
 


 
 


 
 
执行/oracle/app/oracle/product/10.2/db_1/root.sh 
建监听 oracle用户
netca
 


 


 
 








 
 








 
 
完成






创建数据库 oracle 用户  dbca
 
 




 
 






 
 




 
 






 
 
 










 
 




 dbca
 






 


  




 
 






 
 






 
 
安装完成


-------------------------升级集群数据库---------------
1、首先升级集群软件
[oracle@rac1 oracle]$ cd Disk1
[oracle@rac1 Disk1]$ ls
install  patch_note.htm  response  runInstaller  stage
[oracle@rac1 Disk1]$ ./runInstaller
 












 
 
 
 


 
 


每个节点执行这2个脚本
[root@rac1 bin]# /oracle/app/oracle/product/10.2/crs/bin/crsctl stop crs    --停止集群服务
Stopping resources.
Successfully stopped CRS resources 
Stopping CSSD.
Shutting down CSS daemon.
Shutdown request successfully issued.
[root@rac1 bin]# /oracle/app/oracle/product/10.2/crs/install/root102.sh    ---更新集群,并启动集群
Creating pre-patch directory for saving pre-patch clusterware files
Completed patching clusterware files to /oracle/app/oracle/product/10.2/crs
Relinking some shared libraries.
Relinking of patched files is complete.
WARNING: directory '/oracle/app/oracle/product/10.2' is not owned by root
WARNING: directory '/oracle/app/oracle/product' is not owned by root
WARNING: directory '/oracle/app/oracle' is not owned by root
WARNING: directory '/oracle/app' is not owned by root
WARNING: directory '/oracle' is not owned by root
Preparing to recopy patched init and RC scripts.
Recopying init and RC scripts.
Startup will be queued to init within 30 seconds.
Starting up the CRS daemons.
Waiting for the patched CRS daemons to start.
  This may take a while on some systems.
.
10205 patch successfully applied.
clscfg: EXISTING configuration version 3 detected.
clscfg: version 3 is 10G Release 2.
Successfully deleted 1 values from OCR.
Successfully deleted 1 keys from OCR.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node <nodenumber>: <nodename> <private interconnect name> <hostname>
node 1: rac1 rac1priv rac1
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
clscfg -upgrade completed successfully
Creating '/oracle/app/oracle/product/10.2/crs/install/paramfile.crs' with data used for CRS configuration
Setting CRS configuration values in /oracle/app/oracle/product/10.2/crs/install/paramfile.crs




--升级oracle软件
oracle软件升级包和cluster软件包集合在一起,所以直接执行就行了。
 
 
 
 
 


 
报错:原因是因为我还没关闭数据库及监听
我们直接关闭集群服务吧(两边都执行)root用户
/oracle/app/oracle/product/10.2/crs/bin/crsctl stop crs
 
 
 
执行脚本/oracle/app/oracle/product/10.2/db_1/root.sh
---升级数据库
1、首先把集群服务起来 (每个节点都执行)
[root@rac2 /]# cd /etc/init.d
[root@rac2 init.d]# ./init.crs start
Startup will be queued to init within 30 seconds.
2、查看alert日志
cd $ORACLE_BASE/admin/test/bdump
tail -f alert_test2.log
 
发现数据库,启不来,要以升级模式upgrade启
3、创建pfile参数文件
①首先查看原pfile参数文件
cd $ORACLE_HOME/dbs
  查看里面内容
       SPFILE='+DATADG/test/spfiletest.ora'
 ②进入到sqlplus中去
创建一个pfile文件并存放到/home/oracle目录下


SQL> create pfile='/home/oracle/a.txt' from SPFILE='+DATADG/test/spfiletest.ora';




③修改刚刚创建的pfile文件(a.txt)
 
 
把cluster_database 注释掉
④以upgrade模式启动数据库
SQL> startup upgrade pfile='/home/oracle/a.txt';
ORACLE instance started.


Total System Global Area  599785472 bytes
Fixed Size    2098112 bytes
Variable Size  163580992 bytes
Database Buffers  427819008 bytes
Redo Buffers    6287360 bytes
Database mounted.
Database opened.




SQL>spool /home/oracle/upgrd.log  创建升级日志
SQL> @?/rdbms/admin/catupgrd.sql 开始升级
@?/rdbms/admin/utlrp.sql检测无效对象
SQL>spool off  




数据库升级完成后,一定要正常关闭数据库
shutdown immediate


然后再启startup