Linux Interrupt
来源:互联网 发布:深圳龙岗大运软件小镇 编辑:程序博客网 时间:2024/05/21 11:34
在面试的时候我们常常问或者被问一个问题:几种中断下半部机制softirq、tasklet、workqueue有什么区别?linux为什么要设计这几种机制?真正能够回答清楚的人还是少数的。下面我们就详细分析一下这其中的区别。
本文的代码分析基于linux kernel 3.18.22和arm64架构,最好的学习方法还是”RTFSC”
1. linux中断
arm64和其他所有cpu架构的中断处理流程都是一样:正常执行流程被打断进入中断服务程序,保护现场、处理中断、恢复现场:
1
在整个中断处理过程中,arm64的cpu全局中断是自动disable的(PSTATE寄存器中的interrupt bit被masks)。如果用户想支持interrupt nested,需要自己在中断服务程序中使能中断。linux现在是不使用中断嵌套的。
2
1.1 cpu中断打开/关闭
arm64关闭和打开本地cpu的全局中断的方法,是操作SPSR(Saved Process Status Register)寄存器IRQ mask bit。
3
linux中arm64关闭和打开本地cpu中断的函数实现。
- arch/arm64/include/asm/irqflags.h:
- local_irq_disable() -> raw_local_irq_disable() -> arch_local_irq_disable()
- local_irq_enable() -> raw_local_irq_enable() -> arch_local_irq_enable()
static inline void arch_local_irq_enable(void){ asm volatile( // (1) 清除DAIF中的bit2 I标志位,打开中断 "msr daifclr, #2 // arch_local_irq_enable" : : : "memory");}static inline void arch_local_irq_disable(void){ asm volatile( // (2) 设置DAIF中的bit2 I标志位,关闭中断 "msr daifset, #2 // arch_local_irq_disable" : : : "memory");}static inline unsigned long arch_local_irq_save(void){ unsigned long flags; asm volatile( // (3) 备份DAIF标志 "mrs %0, daif // arch_local_irq_save\n" "msr daifset, #2" : "=r" (flags) : : "memory"); return flags;}static inline unsigned long arch_local_save_flags(void){ unsigned long flags; asm volatile( // (4) 恢复DAIF标志 "mrs %0, daif // arch_local_save_flags" : "=r" (flags) : : "memory"); return flags;}
1.2 中断控制器GIC
上面描述了cpu对全局中断的处理,但是还有一个工作需要有人去做:就是把外部中断、内部中断、cpu间中断等各种中断按照优先级、亲和力、私有性等发送给多个cpu。负责这个工作的就是中断控制器GIC(Generic Interrupt Controller)。
4
从软件角度上看,GIC可以分成两个功能模块:5
- Distributor。负责连接系统中所有的中断源,通过寄存器可以独立的配置每个中断的属性:priority、state、security、outing information、enable status。定义哪些中断可以转发到cpu core。
- CPU Interface。cpu core用来接收中断,寄存器主要提供的功能:mask、 identify 、control states of interrupts forwarded to that core。每个cpu core拥有自己的cpu interface。
对GIC来说,中断可以分成以下几种类型:6
- SGI(Software Generated Interrupt),Interrupt IDs 0-15。系统一般用其来实现IPI中断。
- PPI(Private Peripheral Interrupt),Interrupt IDs16-31。私有中断,这种中断对每个cpu都是独立一份的,比如per-core timer中断。
- SPI(Shared Peripheral Interrupt),Interrupt numbers 32-1020。最常用的外设中断,中断可以发给一个或者多个cpu。
- LPI(Locality-specific Peripheral Interrupt)。基于message的中断,GICv2和GICv1中不支持。
GIC从原理上理解并不难,但是如果涉及到级联等技术细节,整个初始化过程还是比较复杂。大家可以自行下载GIC手册:GIC-400、GIC-500学习,GIC代码分析也是一篇很不错的分析文章。
一款GIC相关的操作函数都会集中到irq_chip数据结构中,以GIC-400为例,它的相关操作函数如下:
- drivers/irqchip/irq-gic.c:
static struct irq_chip gic_chip = { .name = "GIC", .irq_mask = gic_mask_irq, .irq_unmask = gic_unmask_irq, .irq_eoi = gic_eoi_irq, .irq_set_type = gic_set_type, .irq_retrigger = gic_retrigger,#ifdef CONFIG_SMP .irq_set_affinity = gic_set_affinity,#endif .irq_set_wake = gic_set_wake,};
1.3 linux中断处理流程
从代码上看linux中断的处理流程大概是这样的:
从处理流程上看,对于gic的每个中断源,linux系统分配一个irq_desc数据结构与之对应。irq_desc结构中有两个中断处理函数desc->handle_irq()和desc->action->handler(),这两个函数代表中断处理的两个层级:
desc->handle_irq()。第一层次的中断处理函数,这个是系统在初始化时根据中断源的特征统一分配的,不同类型的中断源的gic操作是不一样的,把这些通用gic操作提取出来就是第一层次的操作函数。具体实现包括:
handle_fasteoi_irq();
handle_simple_irq();
handle_edge_irq();
handle_level_irq();
handle_percpu_irq();
handle_percpu_devid_irq();desc->action->handler()。第二层次的中断处理函数,由用户注册实现具体设备的驱动服务程序,都是和GIC操作无关的代码。同时一个中断源可以多个设备共享,所以一个desc可以挂载多个action,由链表结构组织起来。
1.4 中断服务注册
从上一节的中断二层结构中可以看到第二层的中断处理函数desc->action->handler是由用户来注册的,下面我们来分析具体注册过程:
- kernel/irq/manage.c:
- request_irq() -> request_threaded_irq() -> __setup_irq()
static inline int __must_checkrequest_irq(unsigned int irq, irq_handler_t handler, unsigned long flags, const char *name, void *dev){ return request_threaded_irq(irq, handler, NULL, flags, name, dev);}| →int request_threaded_irq(unsigned int irq, irq_handler_t handler, irq_handler_t thread_fn, unsigned long irqflags, const char *devname, void *dev_id){ struct irqaction *action; struct irq_desc *desc; int retval; /* * Sanity-check: shared interrupts must pass in a real dev-ID, * otherwise we'll have trouble later trying to figure out * which interrupt is which (messes up the interrupt freeing * logic etc). */ if ((irqflags & IRQF_SHARED) && !dev_id) return -EINVAL; // (1)根据中断号找到对应的desc结构 desc = irq_to_desc(irq); if (!desc) return -EINVAL; if (!irq_settings_can_request(desc) || WARN_ON(irq_settings_is_per_cpu_devid(desc))) return -EINVAL; // (2)如果action->handler为空,那么用户是想创建一个线程化中断 // 将线程化中断的action->handler初始化为irq_default_primary_handler() // irq_default_primary_handler()非常简单,只是返回一个IRQ_WAKE_THREAD值 if (!handler) { if (!thread_fn) return -EINVAL; handler = irq_default_primary_handler; } // (3)分配新的action数据结构 action = kzalloc(sizeof(struct irqaction), GFP_KERNEL); if (!action) return -ENOMEM; action->handler = handler; action->thread_fn = thread_fn; action->flags = irqflags; action->name = devname; action->dev_id = dev_id; chip_bus_lock(desc); // (4)将新的action结构安装到desc中 retval = __setup_irq(irq, desc, action); chip_bus_sync_unlock(desc); if (retval) kfree(action);#ifdef CONFIG_DEBUG_SHIRQ_FIXME if (!retval && (irqflags & IRQF_SHARED)) { /* * It's a shared IRQ -- the driver ought to be prepared for it * to happen immediately, so let's make sure.... * We disable the irq to make sure that a 'real' IRQ doesn't * run in parallel with our fake. */ unsigned long flags; disable_irq(irq); local_irq_save(flags); handler(irq, dev_id); local_irq_restore(flags); enable_irq(irq); }#endif return retval;}|| →static int__setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new){ struct irqaction *old, **old_ptr; unsigned long flags, thread_mask = 0; int ret, nested, shared = 0; cpumask_var_t mask; if (!desc) return -EINVAL; if (desc->irq_data.chip == &no_irq_chip) return -ENOSYS; if (!try_module_get(desc->owner)) return -ENODEV; /* * Check whether the interrupt nests into another interrupt * thread. */ nested = irq_settings_is_nested_thread(desc); // (4.1)判断中断是否是支持嵌套 if (nested) { if (!new->thread_fn) { ret = -EINVAL; goto out_mput; } /* * Replace the primary handler which was provided from * the driver for non nested interrupt handling by the * dummy function which warns when called. */ new->handler = irq_nested_primary_handler; } else { // (4.2)判断中断是否可以被线程化 // 如果中断没有设置_IRQ_NOTHREAD标志 & 强制中断线程化标志被设置(force_irqthreads=1) // 强制把中断线程化: // new->thread_fn = new->handler;new->handler = irq_default_primary_handler; if (irq_settings_can_thread(desc)) irq_setup_forced_threading(new); } /* * Create a handler thread when a thread function is supplied * and the interrupt does not nest into another interrupt * thread. */ // (4.3)如果是线程化中断,创建线程化中断对应的线程 if (new->thread_fn && !nested) { struct task_struct *t; static const struct sched_param param = { .sched_priority = MAX_USER_RT_PRIO/2, }; // 创建线程 t = kthread_create(irq_thread, new, "irq/%d-%s", irq, new->name); if (IS_ERR(t)) { ret = PTR_ERR(t); goto out_mput; } sched_setscheduler_nocheck(t, SCHED_FIFO, ¶m); /* * We keep the reference to the task struct even if * the thread dies to avoid that the interrupt code * references an already freed task_struct. */ get_task_struct(t); // 赋值给->thread成员 new->thread = t; /* * Tell the thread to set its affinity. This is * important for shared interrupt handlers as we do * not invoke setup_affinity() for the secondary * handlers as everything is already set up. Even for * interrupts marked with IRQF_NO_BALANCE this is * correct as we want the thread to move to the cpu(s) * on which the requesting code placed the interrupt. */ set_bit(IRQTF_AFFINITY, &new->thread_flags); } if (!alloc_cpumask_var(&mask, GFP_KERNEL)) { ret = -ENOMEM; goto out_thread; } /* * Drivers are often written to work w/o knowledge about the * underlying irq chip implementation, so a request for a * threaded irq without a primary hard irq context handler * requires the ONESHOT flag to be set. Some irq chips like * MSI based interrupts are per se one shot safe. Check the * chip flags, so we can avoid the unmask dance at the end of * the threaded handler for those. */ if (desc->irq_data.chip->flags & IRQCHIP_ONESHOT_SAFE) new->flags &= ~IRQF_ONESHOT; /* * The following block of code has to be executed atomically */ // (4.4)找到最后一个action结构 raw_spin_lock_irqsave(&desc->lock, flags); old_ptr = &desc->action; old = *old_ptr; if (old) { /* * Can't share interrupts unless both agree to and are * the same type (level, edge, polarity). So both flag * fields must have IRQF_SHARED set and the bits which * set the trigger type must match. Also all must * agree on ONESHOT. */ if (!((old->flags & new->flags) & IRQF_SHARED) || ((old->flags ^ new->flags) & IRQF_TRIGGER_MASK) || ((old->flags ^ new->flags) & IRQF_ONESHOT)) goto mismatch; /* All handlers must agree on per-cpuness */ if ((old->flags & IRQF_PERCPU) != (new->flags & IRQF_PERCPU)) goto mismatch; /* add new interrupt at end of irq queue */ do { /* * Or all existing action->thread_mask bits, * so we can find the next zero bit for this * new action. */ thread_mask |= old->thread_mask; old_ptr = &old->next; old = *old_ptr; } while (old); // 如果有多个action,共享标志设为1 shared = 1; } /* * Setup the thread mask for this irqaction for ONESHOT. For * !ONESHOT irqs the thread mask is 0 so we can avoid a * conditional in irq_wake_thread(). */ if (new->flags & IRQF_ONESHOT) { /* * Unlikely to have 32 resp 64 irqs sharing one line, * but who knows. */ if (thread_mask == ~0UL) { ret = -EBUSY; goto out_mask; } /* * The thread_mask for the action is or'ed to * desc->thread_active to indicate that the * IRQF_ONESHOT thread handler has been woken, but not * yet finished. The bit is cleared when a thread * completes. When all threads of a shared interrupt * line have completed desc->threads_active becomes * zero and the interrupt line is unmasked. See * handle.c:irq_wake_thread() for further information. * * If no thread is woken by primary (hard irq context) * interrupt handlers, then desc->threads_active is * also checked for zero to unmask the irq line in the * affected hard irq flow handlers * (handle_[fasteoi|level]_irq). * * The new action gets the first zero bit of * thread_mask assigned. See the loop above which or's * all existing action->thread_mask bits. */ new->thread_mask = 1 << ffz(thread_mask); } else if (new->handler == irq_default_primary_handler && !(desc->irq_data.chip->flags & IRQCHIP_ONESHOT_SAFE)) { /* * The interrupt was requested with handler = NULL, so * we use the default primary handler for it. But it * does not have the oneshot flag set. In combination * with level interrupts this is deadly, because the * default primary handler just wakes the thread, then * the irq lines is reenabled, but the device still * has the level irq asserted. Rinse and repeat.... * * While this works for edge type interrupts, we play * it safe and reject unconditionally because we can't * say for sure which type this interrupt really * has. The type flags are unreliable as the * underlying chip implementation can override them. */ pr_err("Threaded irq requested with handler=NULL and !ONESHOT for irq %d\n", irq); ret = -EINVAL; goto out_mask; } // (4.5)如果是第一个action,做一些初始化工作 if (!shared) { ret = irq_request_resources(desc); if (ret) { pr_err("Failed to request resources for %s (irq %d) on irqchip %s\n", new->name, irq, desc->irq_data.chip->name); goto out_mask; } init_waitqueue_head(&desc->wait_for_threads); /* Setup the type (level, edge polarity) if configured: */ if (new->flags & IRQF_TRIGGER_MASK) { ret = __irq_set_trigger(desc, irq, new->flags & IRQF_TRIGGER_MASK); if (ret) goto out_mask; } desc->istate &= ~(IRQS_AUTODETECT | IRQS_SPURIOUS_DISABLED | \ IRQS_ONESHOT | IRQS_WAITING); irqd_clear(&desc->irq_data, IRQD_IRQ_INPROGRESS); if (new->flags & IRQF_PERCPU) { irqd_set(&desc->irq_data, IRQD_PER_CPU); irq_settings_set_per_cpu(desc); } if (new->flags & IRQF_ONESHOT) desc->istate |= IRQS_ONESHOT; if (irq_settings_can_autoenable(desc)) irq_startup(desc, true); else /* Undo nested disables: */ desc->depth = 1; /* Exclude IRQ from balancing if requested */ if (new->flags & IRQF_NOBALANCING) { irq_settings_set_no_balancing(desc); irqd_set(&desc->irq_data, IRQD_NO_BALANCING); } // 设置中断亲和力 /* Set default affinity mask once everything is setup */ setup_affinity(irq, desc, mask); } else if (new->flags & IRQF_TRIGGER_MASK) { unsigned int nmsk = new->flags & IRQF_TRIGGER_MASK; unsigned int omsk = irq_settings_get_trigger_mask(desc); if (nmsk != omsk) /* hope the handler works with current trigger mode */ pr_warning("irq %d uses trigger mode %u; requested %u\n", irq, nmsk, omsk); } // (4.6)将新的action插入到desc链表中 new->irq = irq; *old_ptr = new; irq_pm_install_action(desc, new); /* Reset broken irq detection when installing new handler */ desc->irq_count = 0; desc->irqs_unhandled = 0; /* * Check whether we disabled the irq via the spurious handler * before. Reenable it and give it another chance. */ // (4.7)如果中断之前被虚假disable了,重新enable中断 if (shared && (desc->istate & IRQS_SPURIOUS_DISABLED)) { desc->istate &= ~IRQS_SPURIOUS_DISABLED; __enable_irq(desc, irq); } raw_spin_unlock_irqrestore(&desc->lock, flags); /* * Strictly no need to wake it up, but hung_task complains * when no hard interrupt wakes the thread up. */ // (4.8)唤醒线程化中断对应的线程 if (new->thread) wake_up_process(new->thread); register_irq_proc(irq, desc); new->dir = NULL; register_handler_proc(irq, new); free_cpumask_var(mask); return 0;mismatch: if (!(new->flags & IRQF_PROBE_SHARED)) { pr_err("Flags mismatch irq %d. %08x (%s) vs. %08x (%s)\n", irq, new->flags, new->name, old->flags, old->name);#ifdef CONFIG_DEBUG_SHIRQ dump_stack();#endif } ret = -EBUSY;out_mask: raw_spin_unlock_irqrestore(&desc->lock, flags); free_cpumask_var(mask);out_thread: if (new->thread) { struct task_struct *t = new->thread; new->thread = NULL; kthread_stop(t); put_task_struct(t); }out_mput: module_put(desc->owner); return ret;}
1.5 中断线程化
从上一节可以看到,使用request_irq()注册的是传统中断,而直接使用request_threaded_irq()注册的是线程化中断。线程化中断的主要目的把中断上下文的任务迁移到线程中,减少系统关中断的时间,增强系统的实时性。
中断对应的线程命名规则为:
t = kthread_create(irq_thread, new, "irq/%d-%s", irq, new->name);
我们通过ps命令查看系统中的中断线程,注意这些线程是实时线程SCHED_FIFO:
root@:/ # ps | grep "irq/" root 171 2 0 0 irq_thread 0000000000 S irq/389-chargerroot 239 2 0 0 irq_thread 0000000000 S irq/296-PS_int-root 247 2 0 0 irq_thread 0000000000 S irq/297-1124000root 1415 2 0 0 irq_thread 0000000000 S irq/293-goodix_root@a0255:/ #
线程化中断的创建和处理任务流程如下:
线程和action是一一对应的,即用户注册一个中断处理程序对应一个中断线程。
1.6 外设中断打开/关闭
前面的章节讲述了本地cpu全局中断的enable/disable。如果要操作单个中断源的enable/disable,使用enable_irq()/disable_irq()函数。最后调用主要是GIC chip相关的函数:
- kernel/irq/manage.c:
- enable_irq() -> __enable_irq() -> irq_enable()
void enable_irq(unsigned int irq){ unsigned long flags; struct irq_desc *desc = irq_get_desc_buslock(irq, &flags, IRQ_GET_DESC_CHECK_GLOBAL); if (!desc) return; if (WARN(!desc->irq_data.chip, KERN_ERR "enable_irq before setup/request_irq: irq %u\n", irq)) goto out; __enable_irq(desc, irq);out: irq_put_desc_busunlock(desc, flags);}| →void __enable_irq(struct irq_desc *desc, unsigned int irq){ switch (desc->depth) { case 0: err_out: WARN(1, KERN_WARNING "Unbalanced enable for IRQ %d\n", irq); break; case 1: { if (desc->istate & IRQS_SUSPENDED) goto err_out; /* Prevent probing on this irq: */ irq_settings_set_noprobe(desc); irq_enable(desc); check_irq_resend(desc, irq); /* fall-through */ } default: desc->depth--; }}|| →void irq_enable(struct irq_desc *desc){ // 操作GIC chip对应的函数 irq_state_clr_disabled(desc); if (desc->irq_data.chip->irq_enable) desc->irq_data.chip->irq_enable(&desc->irq_data); else desc->irq_data.chip->irq_unmask(&desc->irq_data); irq_state_clr_masked(desc);}
- kernel/irq/manage.c:
- enable_irq() -> __enable_irq() -> irq_enable()
void disable_irq(unsigned int irq){ if (!__disable_irq_nosync(irq)) synchronize_irq(irq);}| →static int __disable_irq_nosync(unsigned int irq){ unsigned long flags; struct irq_desc *desc = irq_get_desc_buslock(irq, &flags, IRQ_GET_DESC_CHECK_GLOBAL); if (!desc) return -EINVAL; __disable_irq(desc, irq); irq_put_desc_busunlock(desc, flags); return 0;}|| →void __disable_irq(struct irq_desc *desc, unsigned int irq){ if (!desc->depth++) irq_disable(desc);}||| →void irq_disable(struct irq_desc *desc){ // 操作GIC chip对应的函数 irq_state_set_disabled(desc); if (desc->irq_data.chip->irq_disable) { desc->irq_data.chip->irq_disable(&desc->irq_data); irq_state_set_masked(desc); }}| →void synchronize_irq(unsigned int irq){ struct irq_desc *desc = irq_to_desc(irq); if (desc) { __synchronize_hardirq(desc); /* * We made sure that no hardirq handler is * running. Now verify that no threaded handlers are * active. */ // 如果是线程化中断,需要等到线程执行完成 wait_event(desc->wait_for_threads, !atomic_read(&desc->threads_active)); }}
1.7 中断亲和力
同样基于GIC chip提供的能力,我们能配置中断源对cpu的亲和力。
- kernel/irq/manage.c:
- enable_irq() -> __enable_irq() -> irq_enable()
static inline intirq_set_affinity(unsigned int irq, const struct cpumask *cpumask){ return __irq_set_affinity(irq, cpumask, false);}| →int __irq_set_affinity(unsigned int irq, const struct cpumask *mask, bool force){ struct irq_desc *desc = irq_to_desc(irq); unsigned long flags; int ret; if (!desc) return -EINVAL; raw_spin_lock_irqsave(&desc->lock, flags); ret = irq_set_affinity_locked(irq_desc_get_irq_data(desc), mask, force); raw_spin_unlock_irqrestore(&desc->lock, flags); return ret;}|| →int irq_set_affinity_locked(struct irq_data *data, const struct cpumask *mask, bool force){ struct irq_chip *chip = irq_data_get_irq_chip(data); struct irq_desc *desc = irq_data_to_desc(data); int ret = 0; if (!chip || !chip->irq_set_affinity) return -EINVAL; if (irq_can_move_pcntxt(data)) { ret = irq_do_set_affinity(data, mask, force); } else { irqd_set_move_pending(data); irq_copy_pending(desc, mask); } if (desc->affinity_notify) { kref_get(&desc->affinity_notify->kref); schedule_work(&desc->affinity_notify->work); } irqd_set(data, IRQD_AFFINITY_SET); return ret;}||| →int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask, bool force){ struct irq_desc *desc = irq_data_to_desc(data); struct irq_chip *chip = irq_data_get_irq_chip(data); int ret; // 操作GIC chip对应的函数 ret = chip->irq_set_affinity(data, mask, force); switch (ret) { case IRQ_SET_MASK_OK: case IRQ_SET_MASK_OK_DONE:#ifdef CONFIG_MTK_IRQ_NEW_DESIGN update_affinity_settings(desc, mask, true);#else cpumask_copy(data->affinity, mask);#endif case IRQ_SET_MASK_OK_NOCOPY: irq_set_thread_affinity(desc); ret = 0; } return ret;}
2. linux中断下半部
接下来就是大名鼎鼎的中断下半部了,包括:softirq、tasklet、workqueue。中断下半部的主要目的就是减少系统关中断的时间,把少关键代码放在中断中做,大部分处理代码放到不用关中断的空间去做。
上面有最激进的方法中断线程化,但是大部分时候还是需要用到中断上、下半部的方法。
workqueue在另外文章中已经有详细解析,本处只解析softirq、tasklet。
2.1 preempt_count
static __always_inline int preempt_count(void){ return current_thread_info()->preempt_count; /* 0 => preemptable, <0 => bug */}
开始之前先了解一下preempt_count这个背景知识,preempt_count是thread_info结构中的一个字段,用来表示当前进程能否被抢占。
所谓的抢占:是指在进程在内核空间运行,如果主动不释放cpu,在时间片用完或者高优先级任务就绪的情况下,会被强行剥夺掉cpu的使用权。
但是进程可能在做一些关键操作,不能被抢占,被抢占后系统会出错。所以linux设计了preempt_count字段,=0可以被抢占,>0不能被抢占。
进程在中断返回内核态时,做是否可抢占的检查:
- arch/arm64/kernel/entry.s:
- el1_irq() -> __enable_irq() -> irq_enable()
.align 6el1_irq: kernel_entry 1 enable_dbg#ifdef CONFIG_TRACE_IRQFLAGS bl trace_hardirqs_off#endif#ifdef CONFIG_MTPROF bl MT_trace_hardirqs_off#endif irq_handler#ifdef CONFIG_PREEMPT get_thread_info tsk ldr w24, [tsk, #TI_PREEMPT] // get preempt count // (1)如果preempt_count!=0,不进行可抢占判断 cbnz w24, 1f // preempt count != 0 ldr x0, [tsk, #TI_FLAGS] // get flags // (2)如果preempt_count==0 & TIF_NEED_RESCHED被置位 // 进行调度 tbz x0, #TIF_NEED_RESCHED, 1f // needs rescheduling? bl el1_preempt1:#endif#ifdef CONFIG_MTPROF bl MT_trace_hardirqs_on#endif#ifdef CONFIG_TRACE_IRQFLAGS bl trace_hardirqs_on#endif kernel_exit 1ENDPROC(el1_irq)#ifdef CONFIG_PREEMPTel1_preempt: mov x24, lr // (3)抢占调度1: bl preempt_schedule_irq // irq en/disable is done inside ldr x0, [tsk, #TI_FLAGS] // get new tasks TI_FLAGS tbnz x0, #TIF_NEED_RESCHED, 1b // needs rescheduling? ret x24#endif| →asmlinkage __visible void __sched preempt_schedule_irq(void){ enum ctx_state prev_state; /* Catch callers which need to be fixed */ BUG_ON(preempt_count() || !irqs_disabled()); prev_state = exception_enter(); do { __preempt_count_add(PREEMPT_ACTIVE); local_irq_enable(); __schedule(); local_irq_disable(); __preempt_count_sub(PREEMPT_ACTIVE); /* * Check again in case we missed a preemption opportunity * between schedule and now. */ barrier(); } while (need_resched()); exception_exit(prev_state);}
虽然preempt_count>0就是禁止抢占,linux进一步按照各种场景对preempt_count bit进行了资源划分:
/* * PREEMPT_MASK: 0x000000ff * SOFTIRQ_MASK: 0x0000ff00 * HARDIRQ_MASK: 0x000f0000 * NMI_MASK: 0x00100000 * PREEMPT_ACTIVE: 0x00200000 */#define PREEMPT_BITS 8#define SOFTIRQ_BITS 8#define HARDIRQ_BITS 4#define NMI_BITS 1
各场景分别利用各自的bit来disable/enable抢占:
- 普通场景(PREEMPT_MASK)。对应函数preempt_disable()、preempt_enable()。
- 软中断场景(PREEMPT_MASK)。对应函数local_bh_disable()、local_bh_enable()。
- 普通中断场景(HARDIRQ_MASK)。对应函数__irq_enter()、__irq_exit()。
- NMI中断场景(NMI_MASK)。对应函数nmi_enter()、nmi_exit()。
所以反过来,我们也可以通过preempt_count的值来判断当前在什么场景:
#define in_irq() (hardirq_count())#define in_softirq() (softirq_count())#define in_interrupt() (irq_count())#define in_serving_softirq() (softirq_count() & SOFTIRQ_OFFSET)#define in_nmi() (preempt_count() & NMI_MASK)#define hardirq_count() (preempt_count() & HARDIRQ_MASK)#define softirq_count() (preempt_count() & SOFTIRQ_MASK)#define irq_count() (preempt_count() & (HARDIRQ_MASK | SOFTIRQ_MASK \ | NMI_MASK))
2.2 softirq
回到中断上下半部的架构,linux系统虽然将大部分工作移出了中断上下文,不关闭中断。但是它也希望移出的工作能够很快的得到执行,软中断为了保证自己能很快执行,使用__local_bh_disable_ip()禁止抢占。
softirq的具体实现机制如下:
- 系统支持固定的几种软中断,softirq_vec数组用来记录这些软中断执行函数:
enum{ HI_SOFTIRQ=0, TIMER_SOFTIRQ, NET_TX_SOFTIRQ, NET_RX_SOFTIRQ, BLOCK_SOFTIRQ, BLOCK_IOPOLL_SOFTIRQ, TASKLET_SOFTIRQ, SCHED_SOFTIRQ, HRTIMER_SOFTIRQ, RCU_SOFTIRQ, /* Preferable RCU should always be the last softirq */ NR_SOFTIRQS};// 注册软中断的服务程序void open_softirq(int nr, void (*action)(struct softirq_action *)){ softirq_vec[nr].action = action;}//TASKLET_SOFTIRQ、HI_SOFTIRQ两个软中断用来给tasklet服务。 open_softirq(TASKLET_SOFTIRQ, tasklet_action); open_softirq(HI_SOFTIRQ, tasklet_hi_action);
- 使用irq_stat[cpu].__softirq_pending来记录每个cpu上所有softirq的pending状态,raise_softirq()用来置位一个softirq pending:
void raise_softirq(unsigned int nr){ unsigned long flags; local_irq_save(flags); raise_softirq_irqoff(nr); local_irq_restore(flags);}| →inline void raise_softirq_irqoff(unsigned int nr){ __raise_softirq_irqoff(nr); if (!in_interrupt()) wakeup_softirqd();}|| →void __raise_softirq_irqoff(unsigned int nr){ trace_softirq_raise(nr); or_softirq_pending(1UL << nr);}||| →#define or_softirq_pending(x) (local_softirq_pending() |= (x))#ifndef __ARCH_IRQ_STATextern irq_cpustat_t irq_stat[]; /* defined in asm/hardirq.h */#define __IRQ_STAT(cpu, member) (irq_stat[cpu].member)#endif /* arch independent irq_stat fields */#define local_softirq_pending() \ __IRQ_STAT(smp_processor_id(), __softirq_pending)
- softirq的执行有两个时刻:在退出中断irq_exit()时或者在softirqd线程当中:
软中断使用smpboot_register_percpu_thread()函数,给每个cpu上创建了对应的softirqd线程:
root@:/ # ps | grep softirq root 3 2 0 0 smpboot_th 0000000000 S ksoftirqd/0root 12 2 0 0 __kthread_ 0000000000 R ksoftirqd/1root 16 2 0 0 __kthread_ 0000000000 R ksoftirqd/2root 20 2 0 0 __kthread_ 0000000000 R ksoftirqd/3root 24 2 0 0 __kthread_ 0000000000 R ksoftirqd/4root 28 2 0 0 __kthread_ 0000000000 R ksoftirqd/5root 32 2 0 0 __kthread_ 0000000000 R ksoftirqd/6root 36 2 0 0 __kthread_ 0000000000 R ksoftirqd/7
软中断优先在irq_exit()中执行,如果超过时间等条件转为softirqd线程中执行。满足以下任一条件软中断在softirqd线程中执行:
- 在irq_exit()->__do_softirq()中运行,时间超过2ms。
- 在irq_exit()->__do_softirq()中运行,轮询软中断超过10次。
- 在irq_exit()->__do_softirq()中运行,本线程需要被调度。
- 调用raise_softirq()唤醒软中断时,不在中断环境中。
我们也看到,软中断处理是按照优先级逐个调用softirq_vec[]数组中的软中断处理函数,所以前面的软中断是可以阻塞后面的软中断的。这个在我们写程序的时候需要注意。
2.3 tasklet
linux已经有了softirq机制,为什么还需要tasklet机制?最主要的原因是softirq是多cpu执行的,可能碰到很多重入的问题,而tasklet同一时刻只能在一个cpu上执行,不需要处理重入互斥问题。另外linux也不建议用户去添加新的软中断。
下面我们来具体分析一下tasklet的实现机制:
- per-cpu变量tasklet_vec/tasklet_hi_vec以链表的形式记录了当前cpu需要处理的tasklet任务:
void __init softirq_init(void){ int cpu; for_each_possible_cpu(cpu) { // (1)tasklet_vec为低优先级的tasklet链表 per_cpu(tasklet_vec, cpu).tail = &per_cpu(tasklet_vec, cpu).head; // (2)tasklet_hi_vec为高优先级的tasklet链表 per_cpu(tasklet_hi_vec, cpu).tail = &per_cpu(tasklet_hi_vec, cpu).head; }}
- push一个tasklet任务:
static inline void tasklet_schedule(struct tasklet_struct *t){ if (!test_and_set_bit(TASKLET_STATE_SCHED, &t->state)) __tasklet_schedule(t);}| →void __tasklet_schedule(struct tasklet_struct *t){ unsigned long flags; local_irq_save(flags); // (1)将新的tasklet插入到本cpu链表尾部 t->next = NULL; *__this_cpu_read(tasklet_vec.tail) = t; __this_cpu_write(tasklet_vec.tail, &(t->next)); // (2)raise软中断来处理tasklet raise_softirq_irqoff(TASKLET_SOFTIRQ); local_irq_restore(flags);}
- 处理一个tasklet任务:
static void tasklet_action(struct softirq_action *a){ struct tasklet_struct *list; local_irq_disable(); // (1)list取出当前链表中所有已有的tasklet list = __this_cpu_read(tasklet_vec.head); // (2)tasklet_vec.head和tasklet_vec.tail返回初始化状态,继续接收新的tasklet __this_cpu_write(tasklet_vec.head, NULL); __this_cpu_write(tasklet_vec.tail, this_cpu_ptr(&tasklet_vec.head)); local_irq_enable(); // (3)逐个处理取出的list链表中的 tasklet while (list) { struct tasklet_struct *t = list; list = list->next; // (4)tasklet互斥锁,保证tasklet同时只能在一个cpu上执行 if (tasklet_trylock(t)) { if (!atomic_read(&t->count)) { // (6)在tasklet运行前清除TASKLET_STATE_SCHED标志 // 这个时候tasklet可以重新加入新的队列了,但是还不能执行 if (!test_and_clear_bit(TASKLET_STATE_SCHED, &t->state)) BUG(); // (7)执行实际的tasklet处理函数 t->func(t->data); // (8)释放tasklet锁,其他cpu可以运行这个tasklet了 tasklet_unlock(t); continue; } tasklet_unlock(t); } local_irq_disable(); // (5)如果获取tasklet互斥锁失败,先加入到cpu tasklet_vec链表中 // 下次执行 t->next = NULL; *__this_cpu_read(tasklet_vec.tail) = t; __this_cpu_write(tasklet_vec.tail, &(t->next)); __raise_softirq_irqoff(TASKLET_SOFTIRQ); local_irq_enable(); }}
参考资料
- ARM Cortex-A Series Programmer’s Guide for ARMv8-A ↩
- ARM Cortex-A Series Programmer’s Guide for ARMv8-A ↩
- ARM Cortex-A Series Programmer’s Guide for ARMv8-A ↩
- GIC代码分析 ↩
- ARM Cortex-A Series Programmer’s Guide for ARMv8-A ↩
- ARM Cortex-A Series Programmer’s Guide for ARMv8-A ↩
- Linux Interrupt
- linux interrupt
- Linux Interrupt
- Linux interrupt & exception
- linux interrupt, deferrable function
- linux dts interrupt
- Linux Kernel Interrupt 分析
- Linux中断(interrupt)子系统
- Linux中断(interrupt)子系统
- interrupt
- interrupt
- interrupt()
- interrupt
- interrupt()
- Interrupt
- Interrupt
- Interrupt Handling Internals in Linux Kernel
- Understanding the linux kernel-ch4-Interrupt Handling
- 如何学习OpenStack
- 死锁产生的原因,必要条件及解决办法
- Java学习笔记之IO(十三):对象的序列化和反序列化
- ZIP压缩算法详细分析及解压实例解释
- 北京国建融科合创科技有限公司应邀协办【2017中国·住博会】
- Linux Interrupt
- java.lang.UnsupportedOperationException at java.util.AbstractList
- Oracle number数据类型
- STM8L中断线和中断端口使用方法
- ExpandableListView(可折叠列表)的基本使用
- 阿里云独享虚拟主机的不同版本区别详解
- 【python学习笔记】format字符串
- DBA的价值和定位
- Socket