bug1234513

来源:互联网 发布:js的location方法 编辑:程序博客网 时间:2024/06/05 16:04

Bug 1234513 -NFS locks can become out-of-sync between client & server when client process killed with signal

这个bug从复现到写case整整5天,从一开始的什么也不懂(对锁的概念很模糊)到后来写出满意的case还是很有成就感的,先总结如下:

=重现现象:

client那个执行的while循环退出并且没有任何test_lock在运行,即没有任何锁存在的情况下,server那边lslk还有lockd


=重现步骤:

1.client端执行while循环(给文件加锁解锁的过程),lslk列出有关文件系统的锁

2.server端lslk同步有锁lockd

3.client端中断循环语句,kill掉所有test_lock进程

4.server端lslk打印锁的信息


=问题原因:

1.当使用nfsvers=3挂载时,client端程序中调用了fcnl,其中设置write锁,server端lslk就会出现lockd,这是正常现象

2.client端执行循环是为了反复操作文件进行加锁释放锁,并且因为timemount超时,程序会被timeout发送的siigterm信号断掉

3./etc/init.d/nfslock restart 可以释放锁,real中是/etc/init.d/nfslock


=解决问题:

1.lslk与lslock命令

   a,在终端启动时,会加载/bin,/sbin下面的文件及命令,但是有些命令是不存在的,所以要自己进行安装(一个命令对应一个包)

          rpm -qf `which lslk`/ rpm -qf l`which lslock`    #查找系统中是否有此命令

          yum provides */lslk   /   yum prvides */lslock   #查找提供此命令的安装包

   b,对lslk与lslock进行封装

           #rhel7 uses lslocks provided by util-linux
           if which lslocks >/dev/null 2>&1; then
                   lsLocks=lslocks
           # rhel6 uses lslk from package lslk
           elif which lslk >/dev/null 2>&1; then
                   lsLocks=lslk
           else
                  echo "{Warn} Need to use tool lslocks or lslk to execute this test."
                  report_result $TEST FAIL
                  exit 1
            fi

            对于只有一端使用lslk的:

                  lsLocks=lslock

                  which lsllock || lsLocks=lslk

 2.免密码登陆问题:

     $ssh-keygen -t rsa   #生成一对密钥

     $ssh-copy-id -i ~/.ssh/id_rsa.pub  root@xxxx    #给xxxx分配公钥(本地机器登陆远端机器时,用自己的私钥去匹配已经分配好的公钥)

     $ssh root@xxx   #实验是否可以免密码登陆

  3.为了避免信号控制的复杂,可以使用ssh进行远程登陆来验证远程机器。

  4.screen的使用:      

        run 'screen -dm bash -c "while true; do timeout 3 ./test_lock $nfsmp/stats; sleep 1; done &> screen.log"'
        run "ps aux | grep -v grep | grep SCREEN"
        if [ $? -ne 0 ]; then
                run "cat screen.log"
        fi
  5.学会debug信息:run "rhts-sync-block -s testing $HOSTNAME"

     暂停到这里,重新打开一个终端进行验证

#rhel7 uses lslocks provided by util-linuxif which lslocks >/dev/null 2>&1; then        lsLocks=lslocks# rhel6 uses lslk from package lslkelif which lslk >/dev/null 2>&1; then        lsLocks=lslkelse        echo "{Warn} Need to use tool lslocks or lslk to execute this test."        report_result $TEST FAIL        exit 1fiServer() {    rlPhaseStartSetup do-$role-Setup-        rlFileBackup /etc/exports        run "mkdir -p $expdir"        run 'echo "$expdir *(rw,no_root_squash)" > /etc/exports'        run "service_nfs restart"        run "exportfs -v" -    rlPhaseEnd    rlPhaseStartTest do-$role-Test-        run "rhts-sync-set -s servReady"        run "rhts-sync-block -s testDone $CLIENT"    rlPhaseEnd    rlPhaseStartCleanup do-$role-Cleanup-        rlFileRestore        run "rm -rf $expdir"        run "service nfs restart"    rlPhaseEnd}Client() {    rlPhaseStartSetup do-$role-Setup-        run "mkdir -p $nfsmp"        run "rhts-sync-block -s servReady $SERVER"        run "ls_nfsvers $SERVER" -    rlPhaseEndfor V in $(ls_nfsvers $SERVER); do    rlPhaseStartTest do-$role-Test-vers${V}        run "mount -o vers=$V $SERVER:$expdir $nfsmp"        run "ssh $SERVER service nfslock restart"        run "ssh $SERVER $lsLocks" 0 "Server should not have lock"        run 'screen -dm bash -c "while true; do timeout 3 ./test_lock $nfsmp/stats; sleep 1; done &> screen.log"'        run "ps aux | grep -v grep | grep SCREEN"        if [ $? -ne 0 ]; then                run "cat screen.log"        fi        # when lslk find test_lock then break cycle and print lock        run "while :; do $lsLocks | grep -q test_lock && break; done"        run "$lsLocks"        run "ssh $SERVER $lsLocks"        run "sleep 100"        run "pkill -9 screen"        run "ps aux | grep -v grep | grep while" 1 "Should be killed"        run "sleep 100"        run "$lsLocks | grep test_lock" 1 "Client should have released lock"        run "ssh $SERVER $lsLocks | grep lockd" 1 "Server should have released lock"        run "rm $nfsmp/stats"        run "sleep 30"        run "umount $nfsmp"        rlPhaseEnddone    rlPhaseStartCleanup do-$role-Cleanup-        run "rhts-sync-set -s testDone"        run "service nfslock restart"        run "rm -rf $nfsmp"    rlPhaseEnd}rlJournalStart


     

    

原创粉丝点击