定时器函数执行原理揭秘

来源：互联网发布：摄影美工培训编辑：程序博客网时间：2024/05/20 20:46

我们总是在无法满足条件的时候调用sleep()是系统睡眠，在编写网络程序的时候，总是调用poll、select,想明白内核是如何实现基于精确时间的调度操作吗？或许我们该交流交流...
1 前言
延期执行有两种：
第一种是不需要精确地时间控制，比如软中断和tasklet机制，在每个异步中断处理结束时处理或者调用内核线程
         ksoftirqd执行。
第二种需要精确地时间控制，像工作队列[内核线程keventd完成]、等待队列[生产者激活]、完成量等涉及到进程休眠
         等待的结构体都要依靠定时器机制，在某个精确地时间间隔后，由内核执行某个延期操作
2 定时器类型
   低分辨率定时器：典型分辨率为1ms，由PIT(可编程中断定时器 8253芯片构成)
   高分辨率定时器：可达到ns级的分辨率，如声卡驱动程序可能需要很短的周期间隔向声卡发送一些数据
由于定时器引发的周期时钟在内核整个生命活动周期内都是活动的,系统不会长时间进入省电模式，从而引入动态时钟
目前有低分辨率动态时钟和高分辨率动态时钟两种，内核中这四种类别的所有可能组合都是有效地
3 低分辨率定时器的实现
   在IA-32系统中，一般选用HPET或者PIT作为时钟中断的周期性时钟源，中断每秒大概100次；高HZ适合用交互式应用比较频繁的桌面系统和多媒体系统应用，低HZ适合于服务器和批处理机器。
实现概览：
3-1 时钟时间设备初始化
现在可以系统的介绍内核时钟子系统的初始化过程。系统刚上电时，需要注册 IRQ0 时钟中断，完成时钟源设备，时钟事件设备，tick device 等初始化操作并选择合适的工作模式。由于刚启动时没有特别重要的任务要做，因此默认是进入低精度 + 周期 tick 的工作模式，之后会根据硬件的配置（如硬件上是否支持 HPET 等高精度 timer）和软件的配置（如是否通过命令行参数或者内核配置使能了高精度 timer 等特性）进行切换。在一个支持 hrtimer 高精度模式并使能了 dynamic tick 的系统中，第一次发生 IRQ0 的软中断时 hrtimer 就会进行从低精度到高精度的切换，然后再进一步切换到 NOHZ 模式。IRQ0 为系统的时钟中断，使用全局的时钟事件设备（global_clock_event）来处理的，其定义如下：
```
 static struct irqaction irq0  = {         .handler = timer_interrupt,         .flags = IRQF_DISABLED | IRQF_NOBALANCING | IRQF_IRQPOLL | IRQF_TIMER,         .name = "timer" }; 
```
它的中断处理函数 timer_interrupt 的简化实现如清单 12 所示：
```
 static irqreturn_t timer_interrupt(int irq, void *dev_id)  {  . . . .         global_clock_event->event_handler(global_clock_event); . . . .         return IRQ_HANDLED;  } 
```
在 global_clock_event->event_handler 的处理中，除了更新 local CPU 上运行进程时间的统计，profile 等工作，更重要的是要完成更新 jiffies 等全局操作。这个全局的时钟事件设备的 event_handler 根据使用环境的不同，在低精度模式下可能是 tick_handle_periodic / tick_handle_periodic_broadcast，在高精度模式下是 hrtimer_interrupt。目前只有 HPET 或者 PIT 可以作为 global_clock_event 使用。其初始化流程清单 13 所示：
```
 void __init time_init(void)  {         late_time_init = x86_late_time_init;  }  static __init void x86_late_time_init(void)  {         x86_init.timers.timer_init();         tsc_init();  }  /* x86_init.timers.timer_init 是指向 hpet_time_init 的回调指针 */  void __init hpet_time_init(void)  {         if (!hpet_enable())                 setup_pit_timer();         setup_default_timer_irq();  } 
```
由上图可以看到，系统优先使用 HPET 作为 global_clock_event，只有在 HPET 没有使能时，PIT 才有机会成为 global_clock_event。在使能 HPET 的过程中，HPET 会同时被注册为时钟源设备和时钟事件设备。
```
 hpet_enable     hpet_clocksource_register  hpet_legacy_clockevent_register     clockevents_register_device(&hpet_clockevent); 
```
clockevent_register_device 会触发 CLOCK_EVT_NOTIFY_ADD 事件，即创建对应的 tick_device。然后在 tick_notify 这个事件处理函数中会添加新的 tick_device。
```
 clockevent_register_device trigger event CLOCK_EVT_NOTIFY_ADD  tick_notify receives event CLOCK_EVT_NOTIFY_ADD     tick_check_new_device         tick_setup_device 
```
在 tick_device 的设置过程中，会根据新加入的时钟事件设备是否使用 broadcast 来分别设置 event_handler。对于 tick device 的处理函数，可下图所示：
low resolution modeHigh resolution modeperiodic ticktick_handle_periodichrtimer_interruptdynamic tick                   tick_nohz_handlerhrtimer_interrupt
所以在低分辨率且周期性时钟中断中，采用tick_handle_periodic作为时钟中断的处理函数，每次时钟中断，都执行该函数一次调用一下函数：
tick_handle_periodic(struct clock_event_device* dev) /Tick-common.c
   -> tick_periodic(cpu)
               -> do_timer(1) 1 为 1 个jiffies
               -> update_process_times(user_mode(get_irq_regs())) 通过寄存器来查看是否位于用户态
               -> profile_tick(CPU_PROFILING)
3-2 该事件处理程序执行下面两个步骤：
do_timer() --主cpu执行：更新jiffies、系统墙上时间、系统负载(平均队列长度)
update_process_times()--每个cpu执行：更新当前进程utime、stime、调用的软中断，激活该cpu上定时器函数、
                                    cpu调度、运行当前注册的posix定时器
3-3 do_timer()
/*
* The 64-bit jiffies value is not atomic - you MUST NOT read it
* without sampling the sequence number in xtime_lock.
* jiffies is defined in the linker script...
*/
若采用动态时钟，ticks可能大于1，这样有利于节省电能
void do_timer(unsigned long ticks)
{
   jiffies_64 += ticks;
   update_wall_time();  //待探索
   calc_global_load();
/*系统负载定义：
1 系统平均负载：在特定时间间隔内运行队列中的平均进程数[运行队列长度]，一般来说只要每个CPU的当
前活动进程数不大于3 那么系统的性能就是良好的，如果每个CPU的任务数大于5，那么就表示这台机器的性能有严重问题.
2 计算方式：当前负载：load(t) = [ [5.1s前统计的负载] load(t-1) e^-5/60*m + n* (2048 - e^-5/60)] >> 2048,
n是系统此刻活动的进程数(包括就绪态和TASK_UNINTERRUPE状态进程)
m表示前m分钟内的系统平均负载，e^{-5/60 = 1884}e^{-5/60*5 = 2014}2048 表示作为精度precision标准
*/
}
3-4 update_process_times(int user_tick)    每个cpu统计值更新，目前仍在一步中断处理过程中
/*
* Called from the timer interrupt handler to charge one tick to the current
* process. user_tick is 1 if the tick is user time, 0 for system.
*/
void update_process_times(int user_tick)
{
struct task_struct *p = current;
int cpu = smp_processor_id();
/* Note: this timer irq context must be accounted for as well. */
account_process_tick(p, user_tick);
run_local_timers();
rcu_check_callbacks(cpu, user_tick);
printk_tick();
scheduler_tick();
run_posix_cpu_timers(p);
}
3-4-1:更新cpu时间：
每个时钟中断，动态时钟中可能若干次时钟中断才触发一次中断，所以描述的是每次触发的时钟中断
更新当前进程utime += 1000000000/HZ [每次中断加10的7次方次]，更新线程组的运行时间，更新
每cpu上的内核数据统计值，
/*
* 'kernel_stat.h' contains the definitions needed for doing
* some kernel statistics (CPU usage, context switches ...),
* used by rstatd/perfmeter
*/
struct cpu_usage_stat {
cputime64_t user; //cpustat->user = cputime64_add(cpustat->user, tmp);
cputime64_t nice;
cputime64_t system;
cputime64_t softirq;
cputime64_t irq;
cputime64_t idle;
cputime64_t iowait;
cputime64_t steal;
cputime64_t guest;
};
/*
* Account a single tick of cpu time.
* @p: the process that the cpu time gets accounted to
* @user_tick: indicates if the tick is a user or a system tick
*/
void account_process_tick(struct task_struct *p, int user_tick)
{
cputime_t one_jiffy_scaled = cputime_to_scaled(cputime_one_jiffy);
struct rq *rq = this_rq();
if (user_tick)
  account_user_time(p, cputime_one_jiffy, one_jiffy_scaled);
else if ((p != rq->idle) || (irq_count() != HARDIRQ_OFFSET))
  account_system_time(p, HARDIRQ_OFFSET, cputime_one_jiffy,
        one_jiffy_scaled);
else
  account_idle_time(cputime_one_jiffy);
}
3-4-2 触发软中断，从而处理该cpu上的定时器函数
/*
* Called by the local, per-CPU timer interrupt on SMP.
*/
void run_local_timers(void)
{
hrtimer_run_queues();
raise_softirq(TIMER_SOFTIRQ);
//给task_struct->preemt_count标记，并触发ksoftirqd内核线程，该线程执行的软中断函数分析见下面详解
softlockup_tick();
}
3-4-3 进程调度
/*
* This function gets called by the timer code, with HZ frequency.
* We call it with interrupts disabled.
*
* It also gets called by the fork code, when changing the parent's
* timeslices.
*/
void scheduler_tick(void)
{
int cpu = smp_processor_id();
struct rq *rq = cpu_rq(cpu);
struct task_struct *curr = rq->curr;
sched_clock_tick();
spin_lock(&rq->lock);
update_rq_clock(rq);
update_cpu_load(rq);
curr->sched_class->task_tick(rq, curr, 0);
spin_unlock(&rq->lock);
perf_event_task_tick(curr, cpu);
#ifdef CONFIG_SMP
rq->idle_at_tick = idle_cpu(cpu);
trigger_load_balance(rq, cpu); //激活内核线程，执行软中断的SHED_SOFTIRQ,各个cpu间负载均衡
#endif
}
4 软中断中处理定时器
1 比较时间操作
获取系统自开机以来的jiffies值
static inline u64 get_jiffies_64(void)
{
return (u64)jiffies;
}
2 标准的时间比较函数，值均为jiffies值
#define time_after(a,b)  \
(typecheck(unsigned long, a) && \
typecheck(unsigned long, b) && \
((long)(b) - (long)(a) < 0))
#define time_before(a,b) time_after(b,a)
#define time_after_eq(a,b) \
(typecheck(unsigned long, a) && \
typecheck(unsigned long, b) && \
((long)(a) - (long)(b) >= 0))
#define time_before_eq(a,b) time_after_eq(b,a)
/*
* Calculate whether a is in the range of [b, c].
*/
#define time_in_range(a,b,c) \
(time_after_eq(a,b) && \
time_before_eq(a,c))
3 时间换算
unsigned int inline jiffies_to_msecs(const unsigned long j)
{
#if HZ <= MSEC_PER_SEC && !(MSEC_PER_SEC % HZ)
return (MSEC_PER_SEC / HZ) * j;
//假设jiffies=1000,即自开机来执行了1000次，HZ=100，每秒运行100个时钟中断，则当前执行了10000ms=10s
#elif HZ > MSEC_PER_SEC && !(HZ % MSEC_PER_SEC)
return (j + (HZ / MSEC_PER_SEC) - 1)/(HZ / MSEC_PER_SEC);
#else
# if BITS_PER_LONG == 32
return (HZ_TO_MSEC_MUL32 * j) >> HZ_TO_MSEC_SHR32;
# else
return (j * HZ_TO_MSEC_NUM) / HZ_TO_MSEC_DEN;
# endif
#endif
}
unsigned int inline jiffies_to_usecs(const unsigned long j)
unsigned long msecs_to_jiffies(const unsigned int m)
unsigned long usecs_to_jiffies(const unsigned int u)
4 jiffies和timeval以及timespec之间转化
时间在内核中以jiffies偏移量或绝对值表示，但是用户在定义定时函数时习惯按秒而不是HZ来思考，内核提供了这两种之间的时间换算
程序员定义sleep(2)-->转为结构体
struct timeval {time_t tv_sec;suseconds_t tv_usec; 微秒}
struct timespec {time_t tv_sec;long tv_nsec;纳秒}
                        -->转为相对jiffies
/* Same for "timeval"
*
* Well, almost. The problem here is that the real system resolution is
* in nanoseconds and the value being converted is in micro seconds.
* Also for some machines (those that use HZ = 1024, in-particular),
* there is a LARGE error in the tick size in microseconds.
* The solution we use is to do the rounding AFTER we convert the
* microsecond part. Thus the USEC_ROUND, the bits to be shifted off.
* Instruction wise, this should cost only an additional add with carry
* instruction above the way it was done above.
*/
unsigned long
timeval_to_jiffies(const struct timeval *value)
{
unsigned long sec = value->tv_sec;
long usec = value->tv_usec;
if (sec >= MAX_SEC_IN_JIFFIES){
  sec = MAX_SEC_IN_JIFFIES;
  usec = 0;
}
return (((u64)sec * SEC_CONVERSION) +
  (((u64)usec * USEC_CONVERSION + USEC_ROUND) >>
   (USEC_JIFFIE_SC - SEC_JIFFIE_SC))) >> SEC_JIFFIE_SC;
}
unsigned long timespec_to_jiffies(const struct timespec *value)
void jiffies_to_timeval(const unsigned long jiffies, struct timeval *value)
void jiffies_to_timespec(const unsigned long jiffies, struct timespec *value)
          --->转为绝对的jiffies
m = jiffies + n;   jiffies 为内核中当前的jiffies值
5 动态定时器原理
5-1 数据结构
定时器数据结构:
struct timer_list {
struct list_head entry;
unsigned long expires;                    //绝对的jiffies值
void (*function)(unsigned long);       //超时回调函数，一般为唤醒进程
unsigned long data;                       //一般为进程task_struct
struct tvec_base *base;                  //该cpu对应的定时器数组链表基地址
#ifdef CONFIG_TIMER_STATS
void *start_site;
char start_comm[16];
int start_pid;
#endif
#ifdef CONFIG_LOCKDEP
struct lockdep_map lockdep_map;
#endif
};
定时器数组链表结构
struct tvec_base {
spinlock_t lock;                           //保护定时器中断的锁结构
struct timer_list *running_timer; //暂存当前执行的所有定时器结构
unsigned long timer_jiffies;         //上次的jiffies值
unsigned long next_timer;           //下次调用的jiffies值
struct tvec_root tv1;               //最低8位用于最近256个jiffies内的定时器链表[2.56s内的中断] 挂载在256个链表
struct tvec tv2;                          //用于256-2(14)-1时间范围内        共挂载到64个链表中
//每个链表挂载了256个时间间隔的时钟中断
struct tvec tv3;                          //用于2(14) - 2(20)-1时间范围内    共挂载到64个链表中
struct tvec tv4;                          //用于2(20) -2(26)-1时间范围内     共挂载到64个链表中
struct tvec tv5;                          //用于2(26)-2(32)-1时间范围内      共挂载到64个链表中
} ____cacheline_aligned;
struct tvec_base boot_tvec_bases;
static DEFINE_PER_CPU(struct tvec_base *, tvec_bases) = &boot_tvec_bases;
为每个cpu定义一个定时器数组链表结构，初始化为boot_tvec_bases
#define TVN_BITS (CONFIG_BASE_SMALL ? 4 : 6)
#define TVR_BITS (CONFIG_BASE_SMALL ? 6 : 8)
#define TVN_SIZE (1 << TVN_BITS)     64
#define TVR_SIZE (1 << TVR_BITS)     256
#define TVN_MASK (TVN_SIZE - 1)
#define TVR_MASK (TVR_SIZE - 1)
struct tvec {
struct list_head vec[TVN_SIZE];     //共64个链表
};
struct tvec_root {
struct list_head vec[TVR_SIZE];     //共256个链表
};
5-2 动态定时器实现原理
定时器数组链表共分为5个组，第一组256项，每项都是一个双链表，挂载0-255个时钟周期内要执行的定时器任务；其余组都是64项，每组都有64个双链表，以第二组为例，这64项个双链表中每个链表可容许的时间间隔为2(8)=256个时钟周期。对第三组，每个链表可容纳的时间间隔是2(14)个时钟周期。内核主要负责关注第一组定时器，内核在每个组中都有一个计数器，保存了该数组的当前位置编号，每当遇到定时中断，内核扫描第一个数组链表，执行特定位置的所有定时器函数，然后将技术器+1，如果达到256，重新归0，并将第二组中的当前位置的定时器列表填充第一个数组列表，然后第二组当前计数器+=1,依此类推，第三组填充第二组,当前计数器+=1 ...，由于在组间通过指针移动，效率相当高
   那么每组的当前位置如何体现呢，也是在struct tvec_base->timer_jiffies值体现，该值记录了一个时间点，此前到期的定时器都已经执行完毕，该timers_jiffies 一般等于或略微小于jiffies，每一组的索引位置计算如下：
#define INDEX(N) ((base->timer_jiffies >> (TVR_BITS + (N) * TVN_BITS)) & TVN_MASK)
注第一组为：base->timer_jiffies & TVR_MASK
第二组的N=0为：INDEX(0) ((base->timer_jiffies >> (TVR_BITS ) & TVN_MASK)
第三组的N=1为：INDEX(1) ((base->timer_jiffies >> (TVR_BITS +1*TVN_BITS) & TVN_MASK)
...
5-3 将定时器挂载到每cpu定时器数组链表中
1 定义一个定时器
动态定义：
#define DEFINE_TIMER(_name, _function, _expires, _data)  \
struct timer_list _name =    \
  TIMER_INITIALIZER(_function, _expires, _data)
#define TIMER_INITIALIZER(_function, _expires, _data) {  \
  .entry = { .prev = TIMER_ENTRY_STATIC }, \
  .function = (_function),   \
  .expires = (_expires),    \
  .data = (_data),    \
  .base = &boot_tvec_bases,   \
  __TIMER_LOCKDEP_MAP_INITIALIZER(  \
   __FILE__ ":" __stringify(__LINE__)) \
}
静态定义：
struct timer_list my_timer;
init_timer(&my_timer)
my_timer.expires = jiffies + 1*HZ
my_timer.data = &my_timer;
my_timer.function = my_function;   void my_function(unsigned long data)
2 激活定时器将定时器加入到当前cpu对应的定时器数组列表中
2-1
/**
* add_timer - start a timer
* @timer: the timer to be added
*
* The kernel will do timer ->function(->data) callback from the
* timer interrupt at the ->expires point in the future. The
* current time is 'jiffies'.
*
* The timer's ->expires, ->function (and if the handler uses it, ->data)
* fields must be set prior calling this function.
*
* Timers with an ->expires field in the past will be executed in the next
* timer tick.
*/
void add_timer(struct timer_list *timer)
{
    BUG_ON(timer_pending(timer));     //确保该timer->enter.next = NULL,即该timer尚未挂载到内核中
     mod_timer(timer, timer->expires); //其中的expires为jiffies绝对值
}
2-2 mod_timer 修改timer值的超时时间expires值,如果尚未加入列表则加入其中
/**
* mod_timer - modify a timer's timeout
* @timer: the timer to be modified
* @expires: new timeout in jiffies
*
* mod_timer() is a more efficient way to update the expire field of an
* active timer (if the timer is inactive it will be activated)
*
* mod_timer(timer, expires) is equivalent to:
*
*     del_timer(timer); timer->expires = expires; add_timer(timer);
*
* Note that if there are multiple unserialized concurrent users of the
* same timer, then mod_timer() is the only safe way to modify the timeout,
* since add_timer() cannot modify an already running timer.
*
* The function returns whether it has modified a pending timer or not.
* (ie. mod_timer() of an inactive timer returns 0, mod_timer() of an
* active timer returns 1.)
*/
int mod_timer(struct timer_list *timer, unsigned long expires) //expires值为绝对值
{
/*
* This is a common optimization triggered by the
* networking code - if the timer is re-modified
* to be the same thing then just return:
*/
if (timer_pending(timer) && timer->expires == expires)
  return 1;
return __mod_timer(timer, expires, false, TIMER_NOT_PINNED);
}
2-3  __mod_timer
static inline int
__mod_timer(struct timer_list *timer, unsigned long expires,
      bool pending_only, int pinned)
{
struct tvec_base *base, *new_base;
unsigned long flags;
int ret = 0 , cpu;
timer_stats_timer_set_start_info(timer); //记录该timer对应的进程信息，后面再执行该timer函数时，会收集插入该定时器的进程的相应信息，便于执行某些操作
/*
void __timer_stats_timer_set_start_info(struct timer_list *timer, void *addr=0)
{
if (timer->start_site)
  return;
timer->start_site = addr;
memcpy(timer->start_comm, current->comm, TASK_COMM_LEN);
timer->start_pid = current->pid;
}
*/
BUG_ON(!timer->function);
base = lock_timer_base(timer, &flags);
// 给该cpu定时器数组队列struct tvec_base加锁per_cpu(tvec_bases).lock
if (timer_pending(timer)) {      //timer已经加入到定时器队列中
  detach_timer(timer, 0);
  if (timer->expires == base->next_timer &&
      !tbase_get_deferrable(timer->base))
   base->next_timer = base->timer_jiffies;
  ret = 1;
} else {
  if (pending_only)
   goto out_unlock;
}
debug_activate(timer, expires);
new_base = __get_cpu_var(tvec_bases);
cpu = smp_processor_id();
#if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP)
if (!pinned && get_sysctl_timer_migration() && idle_cpu(cpu)) {
  int preferred_cpu = get_nohz_load_balancer();
  if (preferred_cpu >= 0)
   cpu = preferred_cpu;
}
#endif
new_base = per_cpu(tvec_bases, cpu);
if (base != new_base) {
  /*
   * We are trying to schedule the timer on the local CPU.
   * However we can't change timer's base while it is running,
   * otherwise del_timer_sync() can't detect that the timer's
   * handler yet has not finished. This also guarantees that
   * the timer is serialized wrt itself.
   */
  if (likely(base->running_timer != timer)) {
   /* See the comment in lock_timer_base() */
   timer_set_base(timer, NULL);
   spin_unlock(&base->lock);
   base = new_base;
   spin_lock(&base->lock);
   timer_set_base(timer, base);
  }
}
timer->expires = expires;
if (time_before(timer->expires, base->next_timer) &&
     !tbase_get_deferrable(timer->base))
  base->next_timer = timer->expires;
internal_add_timer(base, timer);
out_unlock:
spin_unlock_irqrestore(&base->lock, flags);
return ret;
}
2.4 internal_add_timer 激活定时器主函数
static void internal_add_timer(struct tvec_base *base, struct timer_list *timer)
{
unsigned long expires = timer->expires; //一般大于jiffies, base->timer_jiffies一般=jiffies-1
unsigned long idx = expires - base->timer_jiffies; //相对当前timer_jiffies时间
struct list_head *vec;
if (idx < TVR_SIZE) {                   //相对当前jiffies时间在255之内
  int i = expires & TVR_MASK;
/*
*确定在第一组中的插入的列表项，比如base_jiffies=254，timer->expires = 259, idx=5,即在未来5个时钟周期触发
*当前第一组的当前位置 = timer_jiffies =254，则按理应该将该timer插入到第一组第3项，而i=259&256=3，即插入*到第一组3项，正确
*/
  vec = base->tv1.vec + i;
}
else if (idx < 1<< (TVR_BITS + TVN_BITS)) {       //相对当前jiffies时间在256 - 2(14)-1之内
     int i = (expires >> TVR_BITS) & TVN_MASK;
      vec = base->tv2.vec + i;
/*
* 关键思想是：timer->expires[求出未来执行的定时器当前插入的位置] > jiffies > timer_jiffies[作为当前定时器执
* 行时每项数组下标标准]，由于timer->expires 总是 > timer_jiffies，所以插入的位置在各组中总是相对当前定时器
* 而言的
* 依然假设base_jiffies = 254, timer->expires = 512，idx = 258 即距离258个时钟中断触发
* i= (512 >> 8)&63 = 2，即插入第二个数组的第二项中
* 当前第一组的当前位置 = timer_jiffies =254，第二组当前位置为IDN(0)=0，
* 在第二个时钟中断时，timer_jiffies=256,从而将第二组中的当前下
* 标处移动到第一组中，第二组的下标+=1；在第258个时钟中断时，再将第二组的下一个下标项移动到第一组中。由于
* 此时在第一组的下标为0，而当第二组的下一个下标移入第一组时，根据timer->expires= 512%256 = 0从而放入第
* 一组的第0项，执行到期任务。
*/
} else if (idx < 1 << (TVR_BITS + 2 * TVN_BITS)) {
  int i = (expires >> (TVR_BITS + TVN_BITS)) & TVN_MASK;
  vec = base->tv3.vec + i;
} else if (idx < 1 << (TVR_BITS + 3 * TVN_BITS)) {
  int i = (expires >> (TVR_BITS + 2 * TVN_BITS)) & TVN_MASK;
  vec = base->tv4.vec + i;
} else if ((signed long) idx < 0) {
  /*
   * Can happen if you add a timer with expires == jiffies,
   * or you set a timer to go off in the past
   */
  vec = base->tv1.vec + (base->timer_jiffies & TVR_MASK);
}
else
{
  int i;
  /* If the timeout is larger than 0xffffffff on 64-bit
   * architectures then we use the maximum timeout:
   */
  if (idx > 0xffffffffUL) {
   idx = 0xffffffffUL;
   expires = idx + base->timer_jiffies;
  }
  i = (expires >> (TVR_BITS + 3 * TVN_BITS)) & TVN_MASK;
  vec = base->tv5.vec + i;
}
/*
* Timers are FIFO:
*/
list_add_tail(&timer->entry, vec);
}
3 定时器软中断，处理该cpu上的定时器
由上文可知，update_process_times时统计完执行进程时间信息后，执行run_local_timers函数
3-1 激活时钟处理软中断
/*
* Called by the local, per-CPU timer interrupt on SMP.
*/
void run_local_timers(void)
{
hrtimer_run_queues();
raise_softirq(TIMER_SOFTIRQ);
softlockup_tick();
}
void raise_softirq(unsigned int nr)
{
unsigned long flags;
local_irq_save(flags);
raise_softirq_irqoff(nr);
local_irq_restore(flags);
}
inline void raise_softirq_irqoff(unsigned int nr)
{
__raise_softirq_irqoff(nr);   //设置该cpu相关结构中的softirq_pending向量，共32项，表明该号软中断有内容处理
                                        local_cpu_data->softirq_pending
/*
* If we're in an interrupt or softirq, we're done
* (this also catches softirq-disabled code). We will
* actually run the softirq once we return from
* the irq or softirq.
*
* Otherwise we wake up ksoftirqd to make sure we
* schedule the softirq soon.
*/
if (!in_interrupt())
  wakeup_softirqd(); //唤醒内核线程，执行或者在中断处理末尾执行软中断
}
3-2 时钟处理软中断函数执行，与时钟中断相关的软中断执行函数为：
/*
* This function runs timers and the timer-tq in bottom half context.
*/
static void run_timer_softirq(struct softirq_action *h)
{
struct tvec_base *base = __get_cpu_var(tvec_bases);
perf_event_do_pending();
hrtimer_run_pending();
if (time_after_eq(jiffies, base->timer_jiffies))
         __run_timers(base);
}
3-3 调用定时器处理函数
/**
* __run_timers - run all expired timers (if any) on this CPU.
* @base: the timer vector to be processed.
*
* This function cascades all vectors and executes all expired timer
* vectors.
*/
static inline void __run_timers (struct tvec_base *base)
{
struct timer_list *timer;
spin_lock_irq(&base->lock);
while (time_after_eq(jiffies, base->timer_jiffies)) {
//如果当前的jiffies 大于上次调用的timer_jiffies，每循环一次，timer_jiffies+=1,知道=jiffies为止
  struct list_head work_list;
  struct list_head *head = &work_list;
  int index = base->timer_jiffies & TVR_MASK;    //本次调用时的jiffies值对应的第一组的下标
  /*
   * Cascade timers: 级联处理,主要关注第一组，当第一组下标循环到0，即每当timer_jiffies%256=0时，
   *                        将上一组中的一项分流到该组中，依此类推，INDEX（n）= 第n+2组中的当前下标
   */
  if (!index &&
   (!cascade(base, &base->tv2, INDEX(0))) &&
    (!cascade(base, &base->tv3, INDEX(1))) &&
     !cascade(base, &base->tv4, INDEX(2)))
   cascade(base, &base->tv5, INDEX(3));
  ++base->timer_jiffies; //每处理一次，定时器+=1，向前进1位
  list_replace_init(base->tv1.vec + index, &work_list);
//这样第一组中一般肯定会有来自其他组填充的定时器，将该项清空，并且将work_list填充为该项定时器列表
  while (!list_empty(head)) {
   void (*fn)(unsigned long);
   unsigned long data;
   timer = list_first_entry(head, struct timer_list,entry);   //对于每个定时器结构，处理
   fn = timer->function;
   data = timer->data;
   timer_stats_account_timer(timer); //统计信息，填充struct entry，跟踪执行该timer的进程信息
   set_running_timer(base, timer);
   detach_timer(timer, 1);        //将该timer从双向链表中删除出去
   spin_unlock_irq(&base->lock);
   {
    int preempt_count = preempt_count();
#ifdef CONFIG_LOCKDEP
    /*
     * It is permissible to free the timer from
     * inside the function that is called from
     * it, this we need to take into account for
     * lockdep too. To avoid bogus "held lock
     * freed" warnings as well as problems when
     * looking into timer->lockdep_map, make a
     * copy and use that here.
     */
    struct lockdep_map lockdep_map =
     timer->lockdep_map;
#endif
    /*
     * Couple the lock chain with the lock chain at
     * del_timer_sync() by acquiring the lock_map
     * around the fn() call here and in
     * del_timer_sync().
     */
    lock_map_acquire(&lockdep_map);
    trace_timer_expire_entry(timer);
    fn(data); //执行处理函数
    trace_timer_expire_exit(timer);
    lock_map_release(&lockdep_map);
    if (preempt_count != preempt_count()) {
     printk(KERN_ERR "huh, entered %p "
            "with preempt_count %08x, exited"
            " with %08x?\n",
            fn, preempt_count,
            preempt_count());
     BUG();
    }
   }
   spin_lock_irq(&base->lock);
  }
}
set_running_timer(base, NULL);
spin_unlock_irq(&base->lock);
}
/*
* 作用：将该数组列表中的index项重新插入到定时器数组列表中，由于timers->jiffies此时已变化，所以插入位置也改
           变
* base 该定时器数组列表
* tv      该组数组列表
* index timer_jiffies目前在该数组列表中的下标，可以为0
* 当该tvec组满时返回0
*/
static int cascade(struct tvec_base *base, struct tvec *tv, int index)
{
/* cascade all the timers from tv up one level */
struct timer_list *timer, *tmp;
struct list_head tv_list;
list_replace_init(tv->vec + index, &tv_list);
/*
* We are removing _all_ timers from the list, so we
* don't have to detach them individually.
*/
list_for_each_entry_safe(timer, tmp, &tv_list, entry) {
  BUG_ON(tbase_get_base(timer->base) != base);
  internal_add_timer(base, timer);
}
return index;
}

0 0