实现NFS HA共享目录

来源：互联网发布：淘宝上传图片的分辨率编辑：程序博客网时间：2024/04/28 13:45

需求描述

有需求，才会有新产物生成。对于做平台，关键就是如何提高稳定、安全、高效的集群资源供用户使用。本次调整，实现了平台的软件安装路径和用户家目录统一管理，保障用户7*24小时用户登录平台。
平台使用Openldap进行用户统一管理，自取代nis使用至今，未出现过中断问题，相对来说还是比较稳定靠谱的。使用openldap统一进行用户管理，就需要平台中所有节点有统一共享的/home目录。
另外平台中所有软件安装采用module的方式，灵活加载及更换版本。实现该功能也需要平台中所有节点共享同一存储路径/opt，如果不使用共享的方式，就需要定期进行同步，保持所有节点的目录内容一致，这样管理维护起来成本会比较高。

解决方案

若要保证稳定，就需要有冗余节点。硬件上使用盘阵来解决，采用双控制器，配置多路径。前端节点至少2个，互相冗余。软件上使用Corosync+Pacemaker提供HA解决方案。

开始干活

配置存储和服务器

1. 硬件配置
一台DELL MD系列盘阵、2台R720服务器
盘阵接口：12Gb SAS
硬盘配置：12*4T 7.2k NL-SAS
采用9+2 Raid6,1块热备盘

2. IP地址分布
盘阵配置了2个管理IP：192.168.242.38/192.168.242.39
两台服务器IP：192.168.242.3/192.168.242.4
HOSTNAME:hpc-242-003/hpc-242-004
虚拟IP：192.168.242.40
心跳线地址:10.0.0.3/10.0.0.4
使用一根网线将两台服务器的2个网卡直连

3. 存储映射
这里将由11块磁盘组成的Raid6创建了2个VD，分别作为/opt目录和/home目录使用。映射时需要将这两组vd同时映射到2台服务器上，最终效果如下：

[root@hpc-242-003 ~]# multipath -ll3600a0980006e2e77000001a3563a2f05 dm-1 DELL,MD34xxsize=5.0T features='0' hwhandler='0' wp=rw|-+- policy='round-robin 0' prio=1 status=active| `- 7:0:0:1 sdc 8:32 active ready running`-+- policy='round-robin 0' prio=1 status=enabled  `- 7:0:1:1 sde 8:64 active ready running3600a0980006e2e77000001a6563a2f24 dm-0 DELL,MD34xxsize=28T features='0' hwhandler='0' wp=rw|-+- policy='round-robin 0' prio=1 status=active| `- 7:0:0:0 sdb 8:16 active ready running`-+- policy='round-robin 0' prio=1 status=enabled  `- 7:0:1:0 sdd 8:48 active ready running

在另外一个节点上也可看到相同的设备

    [root@hpc-242-004 ~]# multipath -ll    3600a0980006e2e77000001a3563a2f05 dm-0 DELL,MD34xx    size=5.0T features='0' hwhandler='0' wp=rw    |-+- policy='round-robin 0' prio=1 status=active    | `- 5:0:1:1 sde 8:64 active ready running    `-+- policy='round-robin 0' prio=1 status=enabled      `- 5:0:0:1 sdc 8:32 active ready running    3600a0980006e2e77000001a6563a2f24 dm-1 DELL,MD34xx    size=28T features='0' hwhandler='0' wp=rw    |-+- policy='round-robin 0' prio=1 status=active    | `- 5:0:1:0 sdd 8:48 active ready running    `-+- policy='round-robin 0' prio=1 status=enabled      `- 5:0:0:0 sdb 8:16 active ready running

这里，存储的映射就做好了。具体做法这里不做详细说明，使用不同的存储整列操作方式可能都不相同，大致原理就是通过盘阵的管理界面将主机、sas端口、VD进行绑定，这里将两组vd作为一组资源、两台主机作为一组资源，通过sas线将组资源进行映射即可。

4. 软件安装及配置

需要安装的软件有多路径软件multipath，HA组件：Corosync+Pacemaker。

安装multipath

yum install device-mapper-multipath.x86_64

multipath安装完成后，默认没有配置文件，可将/usr/share/doc/device-mapper-multipath-0.4.9/multipath.conf拷贝至/etc目录下

cp /usr/share/doc/device-mapper-multipath-0.4.9/multipath.conf  /etc/multipath.conf

拷贝过来后简单修改一个参数：

defaults {        user_friendly_names yes}

将yes改成no，否则会出现别名，然后启动multipath服务

 /etc/init.d/multipathd start

multipath服务启动后执行multipath -ll可能还发现不了磁盘设备，把系统重启一下即可。

安装Corosync+Pacemaker

yum install -y corosync

配置corosync

[root@hpc-242-004 corosync]# cat corosync.conf# Autoconfigured by Intel Manager for Lustre# DO NOT EDIT -- CHANGES MADE HERE WILL BE LOSTcompatibility: whitetanktotem {    version: 2    secauth: off    threads: 0    token: 5000    token_retransmits_before_loss_const: 10    max_messages: 20    rrp_mode: active        interface {        ringnumber: 0        bindnetaddr: 10.0.0.0        mcastaddr: 226.94.0.1        mcastport: 4870        ttl: 1    }}logging {    fileline: off    to_stderr: no    to_logfile: no    to_syslog: yes    logfile: /var/log/cluster/corosync.log    debug: off    timestamp: on    logger_subsys {        subsys: AMF        debug: off    }}amf {    mode: disabled}service {    name: pacemaker    ver: 1}

生成密钥文件

corosync-keygen

将秘钥文件和配置文件复制到另外一个节点上

scp /etc/corosync/authkey 192.168.242.4:/etc/corosync/scp /etc/corosync/corosync.conf 192.168.242.4:/etc/corosync/corosync.conf

好了，到这里corosync配置完成，下面我们配置pacemaker

安装pacemaker

 yum install -y pacemaker

pacemaker安装后，默认是没有crm 交互命令界面的，需要安装crmsh才行。
安装crmsh

wget  http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/x86_64/crmsh-1.2.6-0.rc2.2.1.x86_64.rpmrpm -ivh crmsh-1.2.6-0.rc2.2.1.x86_64.rpm

会提示有很多依赖包需要安装

yum install -y python-dateutil python-lxml

再次执行安装

rpm -ivh crmsh-1.2.6-0.rc2.2.1.x86_64.rpm

启动corosync，pacemaker服务

我们在配置文件中，将pacemaker整合进corosync中，corosync启动的同时也会启动pacemaker，我这里目前测试启动时还需手动启动pacemaker服务，停止时只要把corosync进程kill掉，pacemaker进程也会死掉。

/etc/init.d/corosync start/etc/init.d/pacemaker start

两个节点上都启动后，检查状态是否正常。正常后我们进行资源添加。

添加资源

这里将盘阵分配出的2个VD作为资源添加到pacemaker中，另外使用nfs的方式将存储共享出去，所以还需要一个虚拟ip作为资源进行管理。

crm configure primitive optdir ocf:heartbeat:Filesystem params device=/dev/mapper/3600a0980006e2e77000001a3563a2f05p1 directory=/opt fstype=xfs op start timeout=60 op stop timeout=60crm configure primitive homedir ocf:heartbeat:Filesystem params device=/dev/mapper/3600a0980006e2e77000001a6563a2f24p1 directory=/home options=rw,usrquota,grpquota fstype=xfs op start timeout=60 op stop timeout=60crm configure primitive vip ocf:heartbeat:IPaddr params ip=192.168.242.40 nic=eth0:0 cidr_netmask=24

查看资源状态

[root@hpc-242-004 corosync]# crm resource list Resource Group: webservice     homedir    (ocf::heartbeat:Filesystem):    Started      vip    (ocf::heartbeat:IPaddr):    Started      optdir (ocf::heartbeat:Filesystem):    Started

或者通过crm_mon

Last updated: Fri Nov  6 17:19:59 2015Last change: Fri Nov  6 10:16:49 2015Stack: classic openais (with plugin)Current DC: hpc-242-003 - partition with quorumVersion: 1.1.11-97629de2 Nodes configured, 2 expected votes3 Resources configuredOnline: [ hpc-242-003 hpc-242-004 ] Resource Group: webservice     homedir    (ocf::heartbeat:Filesystem):    Started hpc-242-003     vip        (ocf::heartbeat:IPaddr):        Started hpc-242-003     optdir     (ocf::heartbeat:Filesystem):    Started hpc-242-003

这里重点介绍下如何添加资源，查看当前集群系统所支持的类型

[root@hpc-242-003 ~]# crmcrm(live)# racrm(live)ra# classeslsbocf / heartbeat pacemakerservicestonithcrm(live)ra#

查看某种类别下的所用资源代理的列表

crm(live)ra# list lsbauditd                blk-availability      corosync              corosync-notifyd      crond                 dkms_autoinstaller    functionsgmond                 halt                  htcacheclean          httpd                 ip6tables             iptables              iscsiiscsid                kdump                 killall               lvm2-lvmetad          lvm2-monitor          mdmonitor             multipathdnetconsole            netfs                 network               nfs                   nfslock               nscd                  nslcdpacemaker             postfix               quota_nld             rdisc                 restorecond           rpcbind               rpcgssdrpcidmapd             rpcsvcgssd            rsyslog               salt-minion           sandbox               saslauthd             singlesshd                  sysstat               udev-post             winbind               zabbix_agentd         crm(live)ra# list ocf heartbeatCTDB               Delay              Dummy              Filesystem         IPaddr             IPaddr2            IPsrcaddr          LVMMailTo             Route              SendArp            Squid              VirtualDomain      Xinetd             apache             conntrackddb2                dhcpd              ethmonitor         exportfs           iSCSILogicalUnit   mysql              named              nfsnotifynfsserver          pgsql              postfix            rsyncd             symlink            tomcat             crm(live)ra#

查看某个资源代理的配置方法,通过info命令可详细查看添加资源时配置参数的格式

crm(live)ra# info ocf:heartbeat:FilesystemManages filesystem mounts (ocf:heartbeat:Filesystem)Resource script for Filesystem. It manages a Filesystem on ashared storage medium. The standard monitor operation of depth 0 (also known as probe)checks if the filesystem is mounted. If you want deeper tests,set OCF_CHECK_LEVEL to one of the following values:10: read first 16 blocks of the device (raw read)This doesn't exercise the filesystem at all, but the device onwhich the filesystem lives. This is noop for non-block devicessuch as NFS, SMBFS, or bind mounts.20: test if a status file can be written and readThe status file must be writable by root. This is not always thecase with an NFS mount, as NFS exports usually have the"root_squash" option set. In such a setup, you must either useread-only monitoring (depth=10), export with "no_root_squash" onyour NFS server, or grant world write permissions on thedirectory where the status file is to be placed.Parameters (*: required, []: default):device* (string): block device    The name of block device for the filesystem, or -U, -L options for mount, or NFS mount specification.directory* (string): mount point    The mount point for the filesystem.fstype* (string): filesystem type    The type of filesystem to be mounted.options (string):     Any extra options to be given as -o options to mount.    For bind mounts, add "bind" here and set fstype to "none".

接下来将3个资源添加到了一个资源组webservice里，目的在于始终保持三个资源同时在一个节点上启动，添加资源组方式：

[root@hpc-242-004 corosync]# crmcrm(live)# configurecrm(live)configure# group webservice vip homedir optdir

定义一个webservice资源组并添加资源

为了保障稳定切换，还需要配置一些参数

[root@hpc-242-004 corosync]# crmcrm(live)# configurecrm(live)configure# property no-quorum-policy=ignore crm(live)configure# verifycrm(live)configure# commit

verify是确认配置文件是否正确，commit是确认对配置进行修改。在命令行配置资源时，只要不用commit提交配置好资源，就不会生效，一但用commit命令提交，就会写入到cib.xml的配置文件中。

我们需要考虑一个问题：资源由故障节点切换到正常节点后，当故障节点恢复后，资源需不需切换回来，我们这里是不需要的，因为每做一次切换，服务就会有短暂的中断，对业务多多少少都会有一些小的影响，所以这里我们就需要设置资源黏性。
资源黏性是指：资源更倾向于运行在哪个节点。
资源黏性值范围及其作用：
0：这是默认选项。资源放置在系统中的最适合位置。这意味着当负载能力“较好”或较差的节点变得可用时才转移资源。此选项的作用基本等同于自动故障回复，只是资源可能会转移到非之前活动的节点上；
大于0：资源更愿意留在当前位置，但是如果有更合适的节点可用时会移动。值越高表示资源越愿意留在当前位置；
小于0：资源更愿意移离当前位置。绝对值越高表示资源越愿意离开当前位置；
INFINITY：如果不是因节点不适合运行资源（节点关机、节点待机、达到migration-threshold 或配置更改）而强制资源转移，资源总是留在当前位置。此选项的作用几乎等同于完全禁用自动故障回复；
-INFINITY：资源总是移离当前位置；
我们这里可以通过以下方式为资源指定默认黏性值： rsc_defaults resource-stickiness=0

crm(live)configure# rsc_defaults resource-stickiness=0

配置完成后，我们查看下所有配置：

[root@hpc-242-004 corosync]# crmcrm(live)# configurecrm(live)configure# shownode hpc-242-003node hpc-242-004primitive homedir Filesystem \    params device="/dev/mapper/3600a0980006e2e77000001a6563a2f24p1" directory="/home" options="rw,usrquota,grpquota" fstype=xfs \    op start timeout=10 interval=0 \    op stop timeout=10 interval=0primitive optdir Filesystem \    params device="/dev/mapper/3600a0980006e2e77000001a3563a2f05p1" directory="/opt" fstype=xfs \    op start timeout=10 interval=0 \    op stop timeout=10 interval=0primitive vip IPaddr \    params ip=192.168.242.40 nic="eth0:0" cidr_netmask=24group webservice homedir vip optdir \    meta target-role=Startedproperty cib-bootstrap-options: \    dc-version=1.1.11-97629de \    expected-quorum-votes=2 \    no-quorum-policy=ignore \    symmetric-cluster=true \    cluster-infrastructure="classic openais (with plugin)" \    last-lrm-refresh=1446713502 \    stonith-enabled=falsersc_defaults rsc-options: \    resource-stickiness=0rsc_defaults rsc_defaults-options: \    failure-timeout=20m \    migration-threshold=3

总结

完成配置后，进行效果测试，将处于active状态的机器关机或重启，可发现我们创建的webservice组资源会在另外一台节点上启动，两块磁盘会切换挂载到新的active节点，因客户采用的是虚拟IP挂载，切换过程因资源的漂移，客户端访问会有短暂中断，但总体影响不大。
Corosync+Pacemaker功能非常强大，前面有提到过，可以管理资源非常多，如httpd、mysql、oracle等等，很多应用的高可用都可通过Corosync+Pacemaker方式来实现，但实际测试过程中发现稳定性还是有些小问题，测试时如果不是采用重启或关机，而是直接将active状态节点的corosync进程kill，切换会有点问题，这个后续再进行深入研究。

0 0