Linux进程调度

来源：互联网发布：java大整数编辑：程序博客网时间：2024/05/17 21:51

一、策略

1.1 IO消耗性进程和CPU消耗性进程

IO消耗性进程：

大部分时间用来提交IO请求或等待IO请求。

缩短运行时间，提高运行频率(优先调度)。

CPU消耗性进程：

大部分时间用在执行代码上。

延长运行时间，降低运行频率。

1.2 进程优先级

优先级高的先运行，低的后运行，相同优先级的轮流运行。

动态优先级。

nice值：

范围-20到19 ，默认为0 。

值越小优先级越高。

实时优先级：

范围0到99。

优先级高于普通进程。

1.3 时间片

太长太短都不好。

进程不一定一次用完它所有的时间片。

所有进程时间片用完后，重新计算时间片。

1.4 进程抢占

进程切换到TASK_RUNNING状态，会检查其优先级，决定是否抢占。

时间片为0的进程，会被抢占。

二、Linux调度算法

2.1 可执行队列

运行队列struct runqueue，每个CPU一个。

Linux-2.6.32-220.el6.x86_64代码中的可执行队列结构体：

/* * This is the main, per-CPU runqueue data structure. * * Locking rule: those places that want to lock multiple runqueues * (such as the load balancing or the thread migration code), lock * acquire operations must be ordered by ascending &runqueue. */struct rq {/* runqueue lock: */spinlock_t lock;/* * nr_running and cpu_load should be in the same cacheline because * remote CPUs use both these fields when doing load calculation. */unsigned long nr_running;#define CPU_LOAD_IDX_MAX 5unsigned long cpu_load[CPU_LOAD_IDX_MAX];#ifndef __GENKSYMS__unsigned long last_load_update_tick;#endif#ifdef CONFIG_NO_HZ#ifndef __GENKSYMS__unsigned char nohz_balance_kick;#elseunsigned long last_tick_seen;unsigned char in_nohz_recently;#endif#endif#ifndef __GENKSYMS__unsigned int skip_clock_update;#endif/* capture load from *all* tasks on this cpu: */struct load_weight load;unsigned long nr_load_updates;u64 nr_switches;#ifdef __GENKSYMS__u64 nr_migrations_in;#endifstruct cfs_rq cfs;struct rt_rq rt;#ifdef CONFIG_FAIR_GROUP_SCHED/* list of leaf cfs_rq on this cpu: */struct list_head leaf_cfs_rq_list;#endif#ifdef CONFIG_RT_GROUP_SCHEDstruct list_head leaf_rt_rq_list;#endif/* * This is part of a global counter where only the total sum * over all CPUs matters. A task can increase this counter on * one CPU and if it got migrated afterwards it may decrease * it on another CPU. Always updated under the runqueue lock: */unsigned long nr_uninterruptible;struct task_struct *curr, *idle;unsigned long next_balance;struct mm_struct *prev_mm;u64 clock;atomic_t nr_iowait;#ifdef CONFIG_SMPstruct root_domain *rd;struct sched_domain *sd;unsigned char idle_at_tick;/* For active balancing */int post_schedule;int active_balance;int push_cpu;/* cpu of this runqueue: */int cpu;int online;unsigned long avg_load_per_task;struct task_struct *migration_thread;struct list_head migration_queue;u64 rt_avg;u64 age_stamp;u64 idle_stamp;u64 avg_idle;#endif/* calc_load related fields */unsigned long calc_load_update;long calc_load_active;#ifdef CONFIG_SCHED_HRTICK#ifdef CONFIG_SMPint hrtick_csd_pending;struct call_single_data hrtick_csd;#endifstruct hrtimer hrtick_timer;#endif#ifdef CONFIG_SCHEDSTATS/* latency stats */struct sched_info rq_sched_info;unsigned long long rq_cpu_time;/* could above be rq->cfs_rq.exec_clock + rq->rt_rq.rt_runtime ? *//* sys_sched_yield() stats */unsigned int yld_count;/* schedule() stats */unsigned int sched_switch;unsigned int sched_count;unsigned int sched_goidle;/* try_to_wake_up() stats */unsigned int ttwu_count;unsigned int ttwu_local;/* BKL stats */unsigned int bkl_count;#endif};

运行队列相关函数：

//在<linux/sched.c>中实现//返回给定处理器上的可执行队列。#define cpu_rq(cpu)(&per_cpu(runqueues, (cpu)))   //返回当前处理器上的可执行队列。#define this_rq()(&__get_cpu_var(runqueues))   //返回给定进程所在的可执行队列。#define task_rq(p)cpu_rq(task_cpu(p))                 //操作可执行队列时加锁。static struct rq *task_rq_lock(struct task_struct *p, unsigned long *flags)static inline void task_rq_unlock(struct rq *rq, unsigned long *flags)static struct rq *this_rq_lock(void)//需要加2次锁时，保证加锁顺序与解锁顺序相反。static void double_rq_lock(struct rq *rq1, struct rq *rq2)static void double_rq_unlock(struct rq *rq1, struct rq *rq2)

2.2 优先级数组

struct prio_array优先级数组：

使可运行处理器的每一种优先级都包含一个相应的队列，而这些队列包含对应优先级上的可执行进程链表。

位图：

开始时位图所有位置0，当某个优先级的进程开始准备执行时，位图中相应的位置1。

MAX_PRIO：

定义了系统拥有的优先级个数。默认值为140。

每个处理器维护2个优先级数组，一个活跃数组，一个过期数组。

2.3 重新计算时间片

所有进程时间片都用完，再重新计算时间片，相当耗时。

每个处理器维护2个优先级数组，一个活跃数组，一个过期数组。

2.4 schedule

1. 找到活动优先级数组中第一个被设置的位(该位对应着优先级最高的活动进程)；

2. 调度程序选择这个级别链表里的头一个进程；

3. 如果该进程不是当前进程，context_switch被调用。

2.5 计算优先级和时间片

nice值：static_prio

-20到19，默认0 。

用户设置不可改变。

动态优先级：prio

effective_prio()返回在nice上+5或-5。

动态优先级计算：

判断一个进程是IO消耗型，还是处理器消耗型；

主要通过休眠时间推断；

sleep_avg 0到MAX_SLEEP_AVG 默认10毫秒。

时间片计算：

以静态优先级为基础计算；

创建一个新进程时，父子进程均分时间片。

2.6 唤醒与睡眠

睡眠：

1. 进程把自己设置成休眠状态；

2. 从可执行队列移出，放入等待队列；

3. 调用schedule选择和执行一个其他进程。

唤醒：

1. 进程被设置成休眠状态；

2. 从等待队列移出，放入可执行队列。

休眠通过等待队列wait_queue_header_t处理。

2.7 负载平衡程序

负载平衡程序：

有多个处理器时，负载平衡程序使得各个处理器上调度的进程数均衡。

负载平衡程序有kernel/sched.c中 load_balance()函数实现。

调用时机：

schedule()调用时，可执行队列为空。

定时器定时调用。

三、抢占和上下文切换

3.1 上下文切换

上下文切换：

从一个可执行进程切换到另一个可执行进程。

context_switch()负责切换，schedule()会调用该函数。

知道什么时候调用schedule()：

need_resched来标志是否需要重新执行一次调度。

void set_tsk_need_resched(struct task_struct *tsk) //设置指定进程中的need_resched标志void clear_tsk_need_resched(struct task_struct *tsk) //清除指定进程中的need_resched标志int need_resched(void) //检查need_resched标志的值，如果被设置就返回真，否则返回假。

3.2 用户抢占

用户抢占发生在：

从系统调用返回用户空间。

从中断处理程序返回用户空间。

3.3 内核抢占

调度安全：

只要调度是安全的，内核可以在任何时间抢占。

没有锁，调度就是安全的。

thread_info引入了preempt_count锁计数器，为0时，表示没有锁。

内核抢占发生在：

从中断处理程序，返回内核空间之前。

内核代码再一次具有可抢占性的时候。

内核显示的调用schedule()。

内核中的任务阻塞（会导致调用schedule()）。

四、实时调度

4.1 SCHED_FIFO

比普通进程高。

没有时间片，一直运行，直到运行完或让出处理器。

4.2 SCHED_RR

比普通进程高。

有时间片。

4.3 SCHED_NORMAL

实时优先级范围：

0到MAX_RT_PRIO-1(100)。

nice值范围：

-20到19。

共享了MAX_RT_PRIO到MAX_RT_PRIO+40。

五、与调度相关的系统调用

#include <linux/sched.h>int sched_setscheduler(struct task_struct *, int, struct sched_param *); //设置进程调度策略long sched_setaffinity(pid_t pid, const struct cpumask *new_mask); //设置进程的处理器的亲和力long sched_getaffinity(pid_t pid, struct cpumask *mask); //获取进程的处理器的亲和力

0 0