深入浅出Signals综合分析

来源:互联网 发布:手机相片日期软件 编辑:程序博客网 时间:2024/05/21 15:44

一:基本概念

Signals are software interrupts. Signals are classic examples of asynchronous events. They occur at what appear to be random times to the process. The process can’t simply test a variable (such as errno) to see whether a signal has occurred; instead, the process has to tell the kernel “if and when this signal occurs, do the following.”

  • Let the default action apply.
  • Catch the signal
  • Ignore the signal

以上的引文,都是非常重要的概念,但是特别强调catch the signal,这句话。不知道中文翻译版的书,怎么处理这里。我的推荐:catch仅仅指的是有意识使用自定义函数对信号的捕获,不同于default的捕获。关于这点signal函数中,有所提到。

二 : 重要API

singal:

#include <signal.h>void (*signal(int signo, void (*func)(int)))(int);Returns: previous disposition of signal if OK, SIG_ERR on error

如下是APUE书中,对于signal的实现,这本书也是基于该实现进行的讲解,要知道不同的平台signal的实现会有差别。

#include "apue.h"/* Reliable version of signal(), using POSIX sigaction(). */Sigfunc * signal(int signo, Sigfunc *func){struct sigaction act, oact;act.sa_handler = func;sigemptyset(&act.sa_mask);act.sa_flags = 0;if (signo == SIGALRM) {#ifdef SA_INTERRUPTact.sa_flags |= SA_INTERRUPT;#endif} else {act.sa_flags |= SA_RESTART;}if (sigaction(signo, &act, &oact) < 0)return(SIG_ERR);return(oact.sa_handler);}

The prototype for the signal function states that the function requires two arguments and returns a pointer to a function that returns nothing (void).

  • 从语法上说,上面的signal原型返回的是一个指向函数的指针,所以具体看以上的实现中return(oact.sa_handler)也确实是原来的handler函数。(ocat =original action)
  • func的参数可以被二个宏定义取代: SIG_IGN、SIG_DFL,同时返回的value值可以被:SIG_ERR取代。从C的语法分析,这三个也就是个宏定义:
#define SIG_ERR (void (*)())-1#define SIG_DFL (void (*)())0#define SIG_IGN (void (*)())1

  接着回到上面最开始的问题,我引用下书中的话:

When we send the SIGTERM signal, the process is terminated, since it doesn’t catch the signal, and the default action for the signal is termination

也就是说,对于SIGTERM,默认的终止行为不用catch去形容,而显示指定signal()中func函数,则称为catch一个信号。理解这点,对于后面的内容很重要,虽然只是个小概念的不同角度问题。

提问:如果一个信号发生在signal函数执行期间,此时会发生什么?

kill raise

#include <signal.h>int kill(pid_t pid, int signo);int raise(int signo);

这里唯一需要关注的点就是kill信号能发给谁:

pid > 0 The signal is sent to the process whose process ID is pid.
pid == 0 The signal is sent to all processes whose process group ID equals the process group ID of the sender and for which the sender has permission to send the signal.

这样的设计是遵从一个准则:

real or effective user ID of the sender has to equal the real or effective user ID of the receiver.

其他特殊的情况可以遇到再说,也就是说信号的发送和接收都是有条件的。那么自然而然的思考一个问题,如果我们Kill发送的信号,正好是当前进程所unblock的信号,那么此时会在kill函数返回前,执行信号的处理函数。
提问:同样的说法对raise是否适用呢?在使用这两个函数的时候需要注意可能存在BUG的地方在哪里呢?

alarm and pause

include <unistd.h>unsigned int alarm(unsigned int seconds);Returns: 0 or number of seconds until previously set alarm

自然而然的需要分析,alarm如何使用:

  • when we call alarm, a previously registered alarm clock for the process has not yet expired, the number of seconds left for that alarm clock is returned as the value of this function. That previously registered alarm clock is replaced by the new value.
  • If a previously registered alarm clock for the process has not yet expired and if the seconds value is 0, the previous alarm clock is canceled. The number of seconds left for that previous alarm clock is still returned as the value of the function.

使用alarm的时候要小心:如果我们准备catch这个alarm所产生的信号,那么一定应该是设置handler在前,之后再call alarm函数。否则因为调度原因,程序并不一定总是有效的!

#include <unistd.h>int pause(void);Returns: −1 with errno set to EINTR

pasue设计的本应该就是被信号打断的,所以检查其返回值为-1,应该是设计者当做正常对待的事情。
APUE上列举了使用alarm pause实现sleep函数的例子,但是作为一个可以经得起各种情况推敲的函数,设计的时候却不是那么简单,总结下要考虑一下几点:

  • 因为使用alarm所以我们需考虑是否会和函数外部的alarm冲突!并充分考虑避免冲突的方式!
  • 在处理alarm 信号的时候,覆盖函数外部的alarm 信号处理函数,这点要注意解决!
  • 还是可能在alarm()与pasue()之间发生调度,从而使pasue()卡死。

如何妥善的解决以上所有问题,并且经得起实践的考验是个小小的挑战。当讨论到sigprocmask和sigsuspend时,作为第二小实验代码来自己实现!

对于alarm的特性,我们自然想到上面说的对于一个可中断的system call,这可是一个好的手段去处理一些具有阻塞性质的actions,比如读一些很慢的设备等等,但是这样的程序设计也是需要考虑多个陷阱。这里就是如果使用longjmp实现,就必需考虑到longjmp的弊端:

  • 对堆栈的破坏,准确说一些automatic variable和一些register variable可能要回滚以前的值。
  • 无法避免信号处理套嵌时,意外abort前一层信号处理函数。

有关于这些方面将在下一篇process篇中记录并分析

Signal Sets

#include <signal.h>int sigemptyset(sigset_t *set);int sigfillset(sigset_t *set);int sigaddset(sigset_t *set, int signo);int sigdelset(sigset_t *set, int signo);All four return: 0 if OK, −1 on errorint sigismember(const sigset_t *set, int signo);Returns: 1 if true, 0 if false, −1 on error

这个比较简单,简单到只要记着set是作为mask来使用的,fillset置1所有的位,意味着block所有的信号,而empty相反,add和del根据名字也清楚干什么的。

Sigprocmask

int sigprocmask(int how, const sigset_t *restrict set, sigset_t *restrict oset);Returns: 0 if OK, −1 on error

用法也很简单, oset返回旧的值,set是要设置的新值,how决定如何对待这些新值:

  • SIG_BLOCK:参数set与oset进行or,所得到的新set值,置位为1的信号都会被block。
  • SIG_UNBLOCK:参数set与oset的补集进行and,说白了就是参数set指定的信号都会被放行。
  • SIG_SETMASK:直接按照参数set来进行mask值的设定。

注意:当sigprocmask放行一些pending的信号时,在执行该函数期间就会触发信号处理动作。

Sigpending Function

#include <signal.h>int sigpending(sigset_t *set);Returns: 0 if OK, −1 on error

仅仅是返回当前pending的信号,所以在判断当前被block的信号时非常有用。

Sigaction Function

#include <signal.h>int sigaction(int signo, const struct sigaction *restrict act, struct sigaction *restrict oact);Returns: 0 if OK, −1 on error

以更加复杂的方式实现signal的功能,自然多了很多自定义的选项,最上面的signal是由sigaction来实现:

struct sigaction {void (*sa_handler)(int); /* addr of signal handler, *//* or SIG_IGN, or SIG_DFL */sigset_t sa_mask; /* additional signals to block */int sa_flags; /* signal options, Figure 10.16 *//* alternate handler */void (*sa_sigaction)(int, siginfo_t *, void *);};

围绕着sa_flags又扯出来一堆东西,如果只抓重点来说明的话,有几点需要注意:

  • 两个handler,分别为sa_handler,和sa_sigaction,后者参数更多,使用哪个由flag是否设置SA_SIGINFO 决定。
  • sa_mask如同上面的comment所说,增加额外的block信号位,但是这种改变是临时性的,随着handler的返回,当前process的mask值恢复原值!
  • 当调用handler时,该信号自动被屏蔽,所以不会出现多次信号累积,这点依赖于操作系统。

其他更多的应当在设计,阅读源码中去逐步体会学习。

Sigsetjmp and Siglongjmp

#include <setjmp.h>int sigsetjmp(sigjmp_buf env, int savemask);Returns: 0 if called directly, nonzero if returning from a call to siglongjmpvoid siglongjmp(sigjmp_buf env, int val);

描述一种场景,(该场景会在setjmp和longjmp篇继续提及):如果从一个handler调用longjmp,因为没有正常返回handler,而是直接到main中,其上述的系统自动屏蔽信号后的mask之后的值将无法确定!
所以现在增加对process mask的存储和恢复,当savemask不为0,第一次调用sigsetjmp将存储当前process mask到env,之后随着调用siglongjmp恢复之前保存的值。

Sigsuspend

#include <signal.h>int sigsuspend(const sigset_t *sigmask);Returns: −1 with errno set to EINTR

we need a way to both restore the signal mask and put the process to sleep in a single atomic operation

对以下的代码进行分析:

/* block SIGINT and save current signal mask */if (sigprocmask(SIG_BLOCK, &newmask, &oldmask) < 0)err_sys("SIG_BLOCK error");/* critical region of code *//* restore signal mask, which unblocks SIGINT */if (sigprocmask(SIG_SETMASK, &oldmask, NULL) < 0)err_sys("SIG_SETMASK error");/* window is open */pause(); /* wait for signal to occur */

那么如果要消除窗口时间,那么需要一个原子操作,所以Sigsuspend操作应运而生。如果用该函数实现上面的操作就应该如下面修改:

/* block SIGINT and save current signal mask */if (sigprocmask(SIG_BLOCK, &newmask, &oldmask) < 0)err_sys("SIG_BLOCK error");/* critical region of code */if(sigsuspend(&oldmask)!= -1)err_sys("SIG_SUSPEND ERROR");if(sigprocmask(SIG_SETMASK,&oldmask,NULL)< 0)err_sys("SIG_SETMASK error");

以上直接简单的调了个顺序,但是如果你期待的信号不发生的话,那么程序将block下去哦,同时sigsupendsigaction对进程当前的sig_mask的改变都是暂时性的,当该函数返回时sig_mask 恢复原来的值。

Abort

include <stdlib.h>void abort(void);This function never returns

实际上MAN手册这里的描述比较好,以下总结源自MAN手册[1]

  • abort()函数首先unblock SIGARBR,然后再raise这个信号给调用函数
  • 当显示指定catch function或者忽略时,当执行handler function返回的时候依然终止程序,除非你用上面提到过的longjmp
  • abort()会自动关闭打开的流
  • abort()永远不会return!

System

parent should be blocked while the system function is executing. Indeed, this is what POSIX.1 specifies. Otherwise, when the child created by system terminates, it would fool the caller of system into thinking that one of its own children terminated.

这句话我反复理解了不下10遍。一个重要的概念shell所执行命令时,先fork后exec,所以如果如果使用system执行一些程序,此时将会是
这里写图片描述
那么对于在System函数执行期间,屏蔽INT和QUIT的原因,原文这样给出:

Since the command that is executed by system can be an interactive command (as is the ed program in this example) and since the caller of system gives up control while the program executes, waiting for it to finish, the caller of system should not be receiving these two terminal-generated signals.

其实也就说当前运行进程所收取的信号,并不希望被父进程所看到,因为会引起一些误判!
书中所给的一种实现:

intsystem(const char *cmdstring) /* with appropriate signal handling */{pid_t pid;int status;struct sigaction ignore, saveintr, savequit;sigset_t chldmask, savemask;.......if ((pid = fork()) < 0) {status = -1; /* probably out of processes */} else if (pid == 0) { /* child *//* restore previous signal actions & reset signal mask */sigaction(SIGINT, &saveintr, NULL);sigaction(SIGQUIT, &savequit, NULL);sigprocmask(SIG_SETMASK, &savemask, NULL);execl("/bin/sh", "sh", "-c", cmdstring, (char *)0);_exit(127); /* exec error */} else { /* parent */while (waitpid(pid, &status, 0) < 0)if (errno != EINTR) {status = -1; /* error other than EINTR from waitpid() */break;}.......}

截取了最关心的部分,此时waitpid所等待的进程应该是/bin/sh,而且最终实现的时候调用waitpid来获取子进程结束的信息,所以我们可以断定waitpid的实现不依赖于SIGCHILD,提问:那么让一个处于wait进程唤醒的到底是什么?关于这点将会在以后的process篇总结的时候验证!

正是因为这点所以对于返回进程信息的判断也要非常注意,waitpid所等待的进程是/bin/sh,而非执行的command。

Sleep

#include <unistd.h>unsigned int sleep(unsigned int seconds);Returns: 0 or number of unslept seconds
#include <time.h>int clock_nanosleep(clockid_t clock_id, int flags,const struct timespec *reqtp, struct timespec *remtp);Returns: 0 if slept for requested time or error number on failure
#include <time.h>int nanosleep(const struct timespec *reqtp, struct timespec *remtp);Returns: 0 if slept for requested time or −1 on error

有点麻烦,如果像书中那么详细的介绍的话,列出来最为重要的几点:

  • sleep如果使用alarm实现的话,要考虑本文最前面alarm部分的思考!
  • linux使用nanosleep实现sleep,而其并不产生任何信号,所以无需担心会和其他函数发送交叉。
  • 使用绝对时间,会提高精确度,这话是指着调度说的,也就是说因为频繁调度,而让实时任务的时间要求不能满足。仔细思考这样的情况发生的场合!

Sigqueue

#include <signal.h>int sigqueue(pid_t pid, int signo, const union sigval value)Returns: 0 if OK, −1 on error

想象一种RT场景:信号的发生的次数是不可忽略的。此时有必要将信号以队列的形式存储起来!
Linux对该特性支持的情况:

  • supports sigqueue
  • queues signals even if the caller doesn’t use the SA_SIGINFO flag

三:程序设计时的重要概念

fork

When a process calls fork, the child inherits the parent’s signal dispositions. Here, since the child starts off with a copy of the parent’s memory image, the address of a signal-catching function has meaning in the child.

这意味着,如果我们在fork之前的对于signal的一些处理,将被子进程所继承。这样的继承,是源自于fork的实现原理,按道理,在追踪fork源码的时候,我们可以确定这一点。

Interrupted System Calls

  一般来说,不希望system calls被信号所中断,但是因为一些系统调用可能让程序处于无限期的block状态,这种时候该特性就非常实用了。
  作为事情的两面性,当我们可以在设计程序时考虑到这点,有针对的去中断一些系统调用,但是我们又必须对中断系统调用所返回的值做处理,这无疑增加了我们写应用时的设计工作量。
  所以automatic restart就被引入,但是并非所有的场合,都希望被中断的系统调用还要继续重新发起,所以一些设计就又出现了!

On FreeBSD 8.0, Linux 3.2.0, and Mac OS X 10.6.8, when signal handlers are installed with the signal function, interrupted system calls will be restarted.

  目前我们掌握的信息为:

  • signal默认支持automatic restart特性
  • sigaction 作为补充对于automaic restart特性采取可选方式。默认不开启!

Reentrant Functions

  正在执行一些函数的时候,信号处理函数很可能被调用,这一般来说不会造成什么问题。但是如果此时在信号处理函数中又调用了被打断的函数,这种时候问题可能就产生了。
  针对这种情况,有两种方式解决这种问题:

  • 使该函数reentrant
  • 做一些信号屏蔽工作

当我们正确解决这个问题之后,又有些考虑:如果我们在信号处理函数中改变了一些global value,那么将覆盖进入信号处理函数的时候原来的值。所以一种方式就是在进入信号处理函数时保存这些变量值,并在退出信号处理函数之前,恢复原来的值:

Therefore, as a general rule, when calling the functions listed in Figure 10.4 from a signal handler, we should save and restore errno.

SIGCLD Semantics

首先看看两个定义:Zombie Process与SIGCHLD

UNIX System terminology, a process that has terminated, but whose parent has not yet waited for it, is called a zombie.

SIGCHLD : Whenever a process terminates or stops, the SIGCHLD signal is sent to the parent. By default, this signal is ignored, so the parent must catch this signal if it wants to be notified whenever a child’s status changes. The normal action in the signal-catching function is to call one of the wait functions to fetch the child’s process ID and termination status.

实际上回忆下我们以前的实验都是直接调用wait系列函数来获取子进程的终止信息,而不是上述陈述的情况。这样会让我有个困惑,那么我们是否有必要显示的去catch这个信号,并调用wait系列子函数吗? 本实验作为实验一,实验平台基于fedora,kernal version:4.1x。

Signal process

首先先看一个自己截取APUE中的关键部分:

1.First, a signal is generated for a process (or sent to a process) when the event that causes the signal occurs.
2. When the signal is generated, the kernel usually sets a flag of some form in the process table.
3. We say that a signal is delivered to a process when the action for a signal is taken. During the time between the generation of a signal and its delivery, the signal is said to be pending
4. A process has the option of blocking the delivery of a signal. If a signal that is blocked is generated for a process, and if the action for that signal is either the default action or to catch the signal, then the signal remains pending for the process until the process either (a) unblocks the signal or (b) changes the action to ignore the signal.
5. The system determines what to do with a blocked signal when the signal is delivered, not when it’s generated. This allows the process to change the action for the signal before it’s delivered.
6. Each process has a signal mask that defines the set of signals currently blocked from delivery to that process. Each process has a signal mask that defines the set of signals currently blocked from delivery to that process.

上面的东西很多,但是每一条都很关键,首先当一个事件发生的时候,信号就会产生,之后内核会对该进程相应的位置置位信号标志,那么最重要的概念就是:delivery描述一个信号被处理的时候!,而在这之间信号被称为pending~ 所以一个更加重要的理解:block一个信号,代表在其采取动作之前对其进行遮掩,此时无论你处理信号的方式是什么,但是IGNORE除外,一个自然而然的事情就是:pending flag被清除的时刻:catch , default or ignore.简单点就是take action的时候,系统清楚flag标记。

四:实验代码

实验一

题目:wait系列函数和SIGCHILD的关系?

实验二

题目:The process creates a file and writes the integer 0 to the file. The process then calls fork, and the parent and child alternate incrementing the counter in the file. Each time the counter is incremented, print which process (parent or child) is doing the increment.

实验三

题目:Write a program that calls fwrite with a large buffer (about one gigabyte). Before calling fwrite, call alarm to schedule a signal in 1 second. In your signal handler, print that the signal was caught and return. Does the call to fwrite complete? What’s happening?

实验四

题目:Modify Figure 3.5 as follows: (a) change BUFFSIZE to 100; (b) catch the SIGXFSZ signal using the signal_intr function, printing a message when it’s caught, and returning from the signal handler; and (c) print the return value from write if the requested number of bytes wasn’t written. Modify the soft RLIMIT_FSIZE resource limit (Section 7.11) to 1,024 bytes and run your new program, copying a file that is larger than 1,024 bytes. (Try to set the soft resource limit from your shell. If you can’t do this from your shell, call setrlimit directly from the program.)

为了不增加篇幅,实验内容单独放在另外一篇文章记录:戳这里

参考文献

[1] abort : https://linux.die.net/man/3/abort

原创粉丝点击