Linux电源管理(四)CPUFreq
来源:互联网 发布:酒店网络评价回复 编辑:程序博客网 时间:2024/04/28 19:00
CPUFreq简介
CPUFreq是一种实时的电压和频率调节技术,也叫DVFS(Dynamic Voltage and Frequency Scaling)动态电压频率调节。
为何需要CPUFreq
随着技术的发展,CPU的频率越来越高,性能越来越好,芯片制造工艺也越来越先进。但高性能的同时也带来高发热。其实移动嵌入式设备并不需要时刻保持高性能。因此,需要一种机制,实现动态地调节频率和电压,以实现性能和功耗的平衡。
CPUFreq软件框架
和一般的linux子系统类似,CPUFreq采用了机制与策略分离的设计架构。分为三个模块:
cpufreq core: 对cpufreq governors和cpufreq drivers进行了封装和抽象并定义了清晰的接口,从而在设计上完成了对机制和策略的分离。
cpufreq drivers:位于cpucore的底层,用于设置具体cpu硬件的频率。通过cpufreq driver可以使cpu频率得到调整。cpufreq driver借助Linux Cpufreq标准子系统中的cpufreq_driver结构体,完成cpu调频驱动的注册及实现。
cpufreq governor:位于cpucore的上层,用于CPU升降频检测,根据系统和负载,决定cpu频率要调节到多少。cpufreq governor借助于linux cpufreq子系统中cpufreq_governor结构体,完成了cpu调频策略的注册和实现。
CPUFreq实现原理
linux cpufreq通过向系统注册实现cpufreq driver和cpufreq governor。cpu governor实现调频的策略,cpu driver实现调频的实际操作,从而完成动态调节频率和电压。一般情况下,优先调节频率,频率无法满足,再调节电压以实现调频。
CPUFreq sys用户态接口
cpufreq相关的节点位于/sys/devices/system/cpu/cpu0/cpufreq目录下:
$ cd /sys/devices/system/cpu/cpu0/cpufreq
可以看到以下节点:
shell@tiny4412:/sys/devices/system/cpu/cpu0/cpufreq # ls
affected_cpus
cpuinfo_cur_freq
cpuinfo_max_freq
cpuinfo_min_freq
cpuinfo_transition_latency
related_cpus
scaling_available_governors
scaling_cur_freq
scaling_driver
scaling_governor
scaling_max_freq
scaling_min_freq
scaling_setspeed
stats
具体含义如下表:
CPUFreq实现分析
CPUFreq Core层
CPUFreq子系统将一些共同的逻辑代码组织在一起,构成了CPUFreq核心模块。这些公共逻辑模块向CPUFreq和其它内核模块提供了必要的API完成一个完整的CPUFreq子系统。这一节我们分析CPUFreq核心层的一些重要API的实现及使用。
代码位置:
/drivers/cpufreq/cpufreq.c
CPUFreq子系统初始化
static int __init cpufreq_core_init(void){ int cpu; if (cpufreq_disabled()) return -ENODEV; for_each_possible_cpu(cpu) { per_cpu(cpufreq_policy_cpu, cpu) = -1; init_rwsem(&per_cpu(cpu_policy_rwsem, cpu)); } cpufreq_global_kobject = kobject_create_and_add("cpufreq", &cpu_subsys.dev_root->kobj); BUG_ON(!cpufreq_global_kobject);#if defined(CONFIG_ARCH_SUNXI) && defined(CONFIG_HOTPLUG_CPU) /* register reboot notifier for process cpus when reboot */ register_reboot_notifier(&reboot_notifier);#endif return 0;}core_initcall(cpufreq_core_init);
可见,CPUFreq子系统在系统启动阶段由Initcall机制调用完成核心部分的初始化工作。cpufreq_policy_cpu是一个per_cpu变量,在smp系统下,每个cpu可以有自己独立的policy,也可以与其它cpu共用一个policy。通过kobject_create_and_add函数建立cpufreq节点,这与我们之前看到的sys下的cpufreq节点相吻合。该节点以后会用来放其它一些参数。
参数cpu_subsys是内核的一个全局变量,是由更早期的初始化时初始化的,代码在drivers/base/cpu.c中:
struct bus_type cpu_subsys = { .name = "cpu", .dev_name = "cpu",};EXPORT_SYMBOL_GPL(cpu_subsys);void __init cpu_dev_init(void){ if (subsys_system_register(&cpu_subsys, cpu_root_attr_groups)) panic("Failed to register CPU subsystem"); cpu_dev_register_generic();}
这将会建立一根cpu总线,总线下挂着系统中所有的cpu,cpu总线设备的根目录就位于:/sys/devices/system/cpu,同时,/sys/bus下也会出现一个cpu的总线节点。cpu总线设备的根目录下会依次出现cpu0,cpu1,…… cpux节点,每个cpu对应其中的一个设备节点。CPUFreq子系统利用这个cpu_subsys来获取系统中的cpu设备,并在这些cpu设备下面建立相应的cpufreq对象,这个我们在后面再讨论。
这样看来,cpufreq子系统的初始化其实没有做什么重要的事情,只是初始化了几个per_cpu变量和建立了一个cpufreq文件节点。下图是初始化过程的序列图:
注册cpufreq_governor
系统中可以同时存在多个governor策略,一个policy通过cpufreq_policy结构中的governor指针和某个governor相关联。要想一个governor被policy使用,首先要把该governor注册到cpufreq的核心中,我们可以通过核心层提供的API来完成注册:
int cpufreq_register_governor(struct cpufreq_governor *governor){ int err; if (!governor) return -EINVAL; if (cpufreq_disabled()) return -ENODEV; mutex_lock(&cpufreq_governor_mutex); governor->initialized = 0; err = -EBUSY; if (__find_governor(governor->name) == NULL) { err = 0; list_add(&governor->governor_list, &cpufreq_governor_list); } mutex_unlock(&cpufreq_governor_mutex); return err;}
核心层定义了一个全局链表变量:cpufreq_governor_list,注册函数首先根据governor的名称,通过__find_governor()函数查找该governor是否已經被注册过,如果没有被注册过,则把代表该governor的结构体添加到cpufreq_governor_list链表中。
注册cpufreq_driver驱动
与governor不同,系统中只会存在一个cpufreq_driver驱动,cpufreq_driver是平台相关的,负责最终实施频率的调整动作,而选择工作频率的策略是由governor完成的。所以,系统中只需要注册一个cpufreq_driver即可,它只负责如何控制该平台的时钟系统,从而设定由governor确定的工作频率。核心提供了一个API:cpufreq_register_driver来完成注册工作。
下面我们分析一下这个函数的工作过程:
int cpufreq_register_driver(struct cpufreq_driver *driver_data){ unsigned long flags; int ret; if (cpufreq_disabled()) return -ENODEV; // 从代码可以看到,verify和init回调函数必须要实现,而setpolicy和target回调则至少要被实现其中的一个。 if (!driver_data || !driver_data->verify || !driver_data->init || ((!driver_data->setpolicy) && (!driver_data->target))) return -EINVAL; pr_debug("trying to register driver %s\n", driver_data->name); if (driver_data->setpolicy) driver_data->flags |= CPUFREQ_CONST_LOOPS; write_lock_irqsave(&cpufreq_driver_lock, flags); //检查全局变量cpufreq_driver是否已经被赋值,如果没有,则传入的参数被赋值给全局变量cpufreq_driver,从而保证了系统中只会注册一个cpufreq_driver驱动 if (cpufreq_driver) { write_unlock_irqrestore(&cpufreq_driver_lock, flags); return -EBUSY; } cpufreq_driver = driver_data; write_unlock_irqrestore(&cpufreq_driver_lock, flags); //通过subsys_interface_register给每一个cpu建立一个cpufreq_policy ret = subsys_interface_register(&cpufreq_interface); if (ret) goto err_null_driver; if (!(cpufreq_driver->flags & CPUFREQ_STICKY)) { int i; ret = -ENODEV; /* check for at least one working CPU */ for (i = 0; i < nr_cpu_ids; i++) if (cpu_possible(i) && per_cpu(cpufreq_cpu_data, i)) { ret = 0; break; } /* if all ->init() calls failed, unregister */ if (ret) { pr_debug("no CPU initialized for driver %s\n", driver_data->name); goto err_if_unreg; } } //注册cpu hot plug通知,以便在cpu hot plug的时候,能够动态地处理各个cpu policy之间的关系(比如迁移负责管理的cpu等等) register_hotcpu_notifier(&cpufreq_cpu_notifier); pr_debug("driver %s up and running\n", driver_data->name); return 0;err_if_unreg: subsys_interface_unregister(&cpufreq_interface);err_null_driver: write_lock_irqsave(&cpufreq_driver_lock, flags); cpufreq_driver = NULL; write_unlock_irqrestore(&cpufreq_driver_lock, flags); return ret;}
cpufreq_interface结构体如下:
static struct subsys_interface cpufreq_interface = { .name = "cpufreq", .subsys = &cpu_subsys, .add_dev = cpufreq_add_dev, .remove_dev = cpufreq_remove_dev,};
subsys_interface_register遍历子系统下面的每一个子设备,然后用这个子设备作为参数,调用cpufrq_interface结构的add_dev回调函数,这里的回调函数被指向了cpufreq_add_dev。
下图是cpufreq_driver注册过程的序列图:
通过__cpufreq_set_policy函数,最终使得该policy正式生效。到这里,每个cpu的policy已经建立完毕,并正式开始工作。
__cpufreq_set_policy函数时序图如下:
其它API
int cpufreq_register_notifier(struct notifier_block *nb, unsigned int list);
int cpufreq_unregister_notifier(struct notifier_block *nb, unsigned int list);
以上两个API用于注册和注销cpufreq系统的通知消息,第二个参数可以选择通知的类型,可以有以下两种类型:
- CPUFREQ_TRANSITION_NOTIFIER 收到频率变更通知
- CPUFREQ_POLICY_NOTIFIER 收到policy更新通知
cpufreq_driver_target:用来设置目标频率,实际回调cpufreq的target函数。
int __cpufreq_driver_target(struct cpufreq_policy *policy, unsigned int target_freq, unsigned int relation){ int retval = -EINVAL; unsigned int old_target_freq = target_freq; if (cpufreq_disabled()) return -ENODEV; /* Make sure that target_freq is within supported range */ if (target_freq > policy->max) target_freq = policy->max; if (target_freq < policy->min) target_freq = policy->min; pr_debug("target for CPU %u: %u kHz, relation %u, requested %u kHz\n", policy->cpu, target_freq, relation, old_target_freq); if (target_freq == policy->cur) return 0; if (cpufreq_driver->target) retval = cpufreq_driver->target(policy, target_freq, relation); return retval;}
CPUFreq driver层
通常一个驱动工程师驱动需要实现是大多是cpufreq driver,这部有具体的cpu差异。cpufreq driver主要完成平台相关的CPU频率/电压的控制,它在cpufreq framework中是非常简单的一个模块,主要是定义一个struct cpufreq_driver变量,填充必要的字段,并根据平台的特性,实现其中的回调函数。然后注册到系统中去。
cpufreq_driver 结构体如下所示。
struct cpufreq_driver { struct module *owner; //一般这THIS_MODULE char name[CPUFREQ_NAME_LEN]; //cpufreq driver名字,如"cpufreq-sunxi" u8 flags; //标志:可以设置一些值,如CPUFREQ_STICKY,表示就算所有的init调用都失败了,driver也不被remove。 bool have_governor_per_policy; /* needed by all drivers */ int (*init) (struct cpufreq_policy *policy); //必须实现,用于在cpufreq core在cpu device添加后运行 int (*verify) (struct cpufreq_policy *policy); //必须实现,在当上层软件需要设定一个新的policy的时候,会调用driver的verify回调函数,检查该policy是否合法 /* define one out of two */ int (*setpolicy) (struct cpufreq_policy *policy); //一般不实现 int (*target) (struct cpufreq_policy *policy, //实际的调频函数 unsigned int target_freq, unsigned int relation); /* should be defined, if possible */ unsigned int (*get) (unsigned int cpu); //用于获取指定cpu的频率值 /* optional */ unsigned int (*getavg) (struct cpufreq_policy *policy, unsigned int cpu); int (*bios_limit) (int cpu, unsigned int *limit); int (*exit) (struct cpufreq_policy *policy); int (*suspend) (struct cpufreq_policy *policy); int (*resume) (struct cpufreq_policy *policy); struct freq_attr **attr;}
下面例子填充并实现cpufreq_driver结构体中这些必要成员。
static struct cpufreq_driver sunxi_cpufreq_driver = { .name = "cpufreq-sunxi", .flags = CPUFREQ_STICKY, .init = sunxi_cpufreq_init, .verify = sunxi_cpufreq_verify, .target = sunxi_cpufreq_target, .get = sunxi_cpufreq_get, .attr = sunxi_cpufreq_attr,};
先看一下init函数,init函数主要完成从device tree里获取对应的clock,regulator配置最大最小频率等。
device tree配置如下:
cpu@0 { device_type = "cpu"; compatible = "arm,cortex-a53","arm,armv8"; reg = <0x0 0x0>; enable-method = "psci"; cpufreq_tbl = < 480000 648000 720000 816000 912000 1008000 1104000 1152000 1200000>; clock-latency = <2000000>; clock-frequency = <1008000000>; cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0 &SYS_SLEEP_0>;};
Init函数如下:
static int __init sunxi_cpufreq_initcall(void){ struct device_node *np; const struct property *prop; struct cpufreq_frequency_table *freq_tbl; const __be32 *val; int ret, cnt, i; np = of_find_node_by_path("/cpus/cpu@0"); if (!np) { CPUFREQ_ERR("No cpu node found\n"); return -ENODEV; } if (of_property_read_u32(np, "clock-latency", &sunxi_cpufreq.transition_latency)) sunxi_cpufreq.transition_latency = CPUFREQ_ETERNAL; prop = of_find_property(np, "cpufreq_tbl", NULL); if (!prop || !prop->value) { CPUFREQ_ERR("Invalid cpufreq_tbl\n"); ret = -ENODEV; goto out_put_node; } cnt = prop->length / sizeof(u32); val = prop->value; freq_tbl = kmalloc(sizeof(*freq_tbl) * (cnt + 1), GFP_KERNEL); if (!freq_tbl) { ret = -ENOMEM; goto out_put_node; } for (i = 0; i < cnt; i++) { freq_tbl[i].index = i; freq_tbl[i].frequency = be32_to_cpup(val++); } freq_tbl[i].index = i; freq_tbl[i].frequency = CPUFREQ_TABLE_END; sunxi_cpufreq.freq_table = freq_tbl;#ifdef CONFIG_DEBUG_FS sunxi_cpufreq.cpufreq_set_us = 0; sunxi_cpufreq.cpufreq_get_us = 0;#endif sunxi_cpufreq.last_freq = ~0; sunxi_cpufreq.clk_pll = clk_get(NULL, PLL_CPU_CLK); if (IS_ERR_OR_NULL(sunxi_cpufreq.clk_pll)) { CPUFREQ_ERR("Unable to get PLL CPU clock\n"); ret = PTR_ERR(sunxi_cpufreq.clk_pll); goto out_err_clk_pll; } sunxi_cpufreq.clk_cpu = clk_get(NULL, CPU_CLK); if (IS_ERR_OR_NULL(sunxi_cpufreq.clk_cpu)) { CPUFREQ_ERR("Unable to get CPU clock\n"); ret = PTR_ERR(sunxi_cpufreq.clk_cpu); goto out_err_clk_cpu; } sunxi_cpufreq.vdd_cpu = regulator_get(NULL, CPU_VDD); if (IS_ERR_OR_NULL(sunxi_cpufreq.vdd_cpu)) { CPUFREQ_ERR("Unable to get CPU regulator\n"); ret = PTR_ERR(sunxi_cpufreq.vdd_cpu); /* do not return error even if error*/ } /* init cpu frequency from dt */ ret = __init_freq_dt(); if (ret == -ENODEV#ifdef CONFIG_CPU_VOLTAGE_SCALING || ret == -EINVAL#endif ) goto out_err_dt; pr_debug("[cpufreq] max: %uMHz, min: %uMHz, ext: %uMHz, boot: %uMHz\n", sunxi_cpufreq.max_freq / 1000, sunxi_cpufreq.min_freq / 1000, sunxi_cpufreq.ext_freq / 1000, sunxi_cpufreq.boot_freq / 1000);#ifdef CONFIG_CPU_VOLTAGE_SCALING __vftable_show(); sunxi_cpufreq.last_vdd = sunxi_cpufreq_getvolt();#endif mutex_init(&sunxi_cpufreq.lock); ret = cpufreq_register_driver(&sunxi_cpufreq_driver); if (ret) { CPUFREQ_ERR("failed register driver\n"); goto out_err_register; } else { goto out_put_node; }out_err_register: mutex_destroy(&sunxi_cpufreq.lock);out_err_dt: if (!IS_ERR_OR_NULL(sunxi_cpufreq.vdd_cpu)) { regulator_put(sunxi_cpufreq.vdd_cpu); } clk_put(sunxi_cpufreq.clk_cpu);out_err_clk_cpu: clk_put(sunxi_cpufreq.clk_pll);out_err_clk_pll: kfree(freq_tbl);out_put_node: of_node_put(np); return ret;}
从上面可以看出,init函数主要的工作是从device tree中获取资源并配置最大最小频率等,然后注册一个cpufreq驱动。
下看看一下cpufreq_frequency_table_verify的实现,该函数主要是确保在policy->min和policy->max之间至少有一个有效
频率,并且所有其他的指标都符合。
static int sunxi_cpufreq_verify(struct cpufreq_policy *policy){ return cpufreq_frequency_table_verify(policy, sunxi_cpufreq.freq_table);}
get函数主要是获取当前cpu频率。
static unsigned int sunxi_cpufreq_get(unsigned int cpu){ unsigned int current_freq = 0;#ifdef CONFIG_DEBUG_FS ktime_t calltime = ktime_get();#endif clk_get_rate(sunxi_cpufreq.clk_pll); current_freq = clk_get_rate(sunxi_cpufreq.clk_cpu) / 1000;#ifdef CONFIG_DEBUG_FS sunxi_cpufreq.cpufreq_get_us = ktime_to_us(ktime_sub(ktime_get(), calltime));#endif return current_freq;}
target是实现调频调压的操作者。
static int sunxi_cpufreq_target(struct cpufreq_policy *policy, __u32 freq, __u32 relation){ int ret = 0; unsigned int index; struct cpufreq_freqs freqs;#ifdef CONFIG_DEBUG_FS ktime_t calltime;#endif#ifdef CONFIG_SMP int i;#endif#ifdef CONFIG_CPU_VOLTAGE_SCALINGunsigned int new_vdd;#endif mutex_lock(&sunxi_cpufreq.lock); /* avoid repeated calls which cause a needless amout of duplicated * logging output (and CPU time as the calculation process is * done) */ if (freq == sunxi_cpufreq.last_freq) goto out; CPUFREQ_DBG(DEBUG_FREQ, "request frequency is %uKHz\n", freq); if (unlikely(sunxi_boot_lock)) freq = freq > sunxi_cpufreq.boot_freq ? sunxi_cpufreq.boot_freq : freq; /* try to look for a valid frequency value from cpu frequency table */ if (cpufreq_frequency_table_target(policy, sunxi_cpufreq.freq_table, freq, relation, &index)) { CPUFREQ_ERR("try to look for %uKHz failed!\n", freq); ret = -EINVAL; goto out; } /* frequency is same as the value last set, need not adjust */ if (sunxi_cpufreq.freq_table[index].frequency == sunxi_cpufreq.last_freq) goto out; freq = sunxi_cpufreq.freq_table[index].frequency; CPUFREQ_DBG(DEBUG_FREQ, "target is find: %uKHz, entry %u\n", freq, index); /* notify that cpu clock will be adjust if needed */ if (policy) { freqs.cpu = policy->cpu; freqs.old = sunxi_cpufreq.last_freq; freqs.new = freq;#ifdef CONFIG_SMP /* notifiers */ for_each_cpu(i, policy->cpus) { freqs.cpu = i; cpufreq_notify_transition(policy, &freqs, CPUFREQ_PRECHANGE); }#else cpufreq_notify_transition(policy, &freqs, CPUFREQ_PRECHANGE);#endif }#ifdef CONFIG_CPU_VOLTAGE_SCALING /* get vdd value for new frequency */ new_vdd = __get_vdd_value(freq * 1000); CPUFREQ_DBG(DEBUG_FREQ, "set cpu vdd to %dmv\n", new_vdd); if (!IS_ERR_OR_NULL(sunxi_cpufreq.vdd_cpu) && (new_vdd > sunxi_cpufreq.last_vdd)) { CPUFREQ_DBG(DEBUG_FREQ, "set cpu vdd to %dmv\n", new_vdd); if (regulator_set_voltage(sunxi_cpufreq.vdd_cpu, new_vdd*1000, new_vdd*1000)) { CPUFREQ_ERR("try to set cpu vdd failed!\n"); /* notify everyone that clock transition finish */ if (policy) { freqs.cpu = policy->cpu;; freqs.old = freqs.new; freqs.new = sunxi_cpufreq.last_freq;#ifdef CONFIG_SMP /* notifiers */ for_each_cpu(i, policy->cpus) { freqs.cpu = i; cpufreq_notify_transition(policy, &freqs, CPUFREQ_POSTCHANGE); }#else cpufreq_notify_transition(policy, &freqs, CPUFREQ_POSTCHANGE);#endif } return -EINVAL; } }#endif#ifdef CONFIG_DEBUG_FS calltime = ktime_get();#endif /* try to set cpu frequency */#ifndef CONFIG_SUNXI_ARISC if (__set_cpufreq_by_ccu(freq))#else if (arisc_dvfs_set_cpufreq(freq, ARISC_DVFS_PLL1, ARISC_DVFS_SYN, NULL, NULL))#endif { CPUFREQ_ERR("set cpu frequency to %uKHz failed!\n", freq);#ifdef CONFIG_CPU_VOLTAGE_SCALING if (!IS_ERR_OR_NULL(sunxi_cpufreq.vdd_cpu) && (new_vdd > sunxi_cpufreq.last_vdd)) { if (regulator_set_voltage(sunxi_cpufreq.vdd_cpu, sunxi_cpufreq.last_vdd*1000, sunxi_cpufreq.last_vdd*1000)) { CPUFREQ_ERR("try to set voltage failed!\n"); sunxi_cpufreq.last_vdd = new_vdd; } }#endif /* set cpu frequency failed */ if (policy) { freqs.cpu = policy->cpu; freqs.old = freqs.new; freqs.new = sunxi_cpufreq.last_freq;#ifdef CONFIG_SMP /* notifiers */ for_each_cpu(i, policy->cpus) { freqs.cpu = i; cpufreq_notify_transition(policy, &freqs, CPUFREQ_POSTCHANGE); }#else cpufreq_notify_transition(policy, &freqs, CPUFREQ_POSTCHANGE);#endif } ret = -EINVAL; goto out; }#ifdef CONFIG_DEBUG_FS sunxi_cpufreq.cpufreq_set_us = ktime_to_us(ktime_sub(ktime_get(), calltime));#endif#ifdef CONFIG_CPU_VOLTAGE_SCALING if(sunxi_cpufreq.vdd_cpu && (new_vdd < sunxi_cpufreq.last_vdd)) { CPUFREQ_DBG(DEBUG_FREQ, "set cpu vdd to %dmv\n", new_vdd); if(regulator_set_voltage(sunxi_cpufreq.vdd_cpu, new_vdd*1000, new_vdd*1000)) { CPUFREQ_ERR("try to set voltage failed!\n"); new_vdd = sunxi_cpufreq.last_vdd; } } sunxi_cpufreq.last_vdd = new_vdd;#endif /* notify that cpu clock will be adjust if needed */ if (policy) {#ifdef CONFIG_SMP for_each_cpu(i, policy->cpus) { freqs.cpu = i; cpufreq_notify_transition(policy, &freqs, CPUFREQ_POSTCHANGE); }#else cpufreq_notify_transition(policy, &freqs, CPUFREQ_POSTCHANGE);#endif } sunxi_cpufreq.last_freq = freq; CPUFREQ_DBG(DEBUG_FREQ, "DVFS done! Freq[%uMHz] Volt[%umv] ok\n", \ sunxi_cpufreq_get(0) / 1000, sunxi_cpufreq_getvolt());out: mutex_unlock(&sunxi_cpufreq.lock); return ret;}
代码比较较容易理解,这里不再分析,流程图如下:
CPUFreq governor层
上面提到过,governor的作用是根据系统的负载,检测系统的负载状况,然后根据当前的负载,选择出某个可供使用的工作频率,然后把该工作频率传递给cpufreq_driver,完成频率的动态调节。内核默认提供了5种governor供我们使用.
- Performance: 性能优先的governor,直接将cpu频率设置为policy->{min,max}中的最大值。一般会被选做默认的governor以节省系统启动时间,之后再切换.
- Powersave:功耗优先的governor,直接将cpu频率设置为policy->{min,max}中的最小值。
- Userspace: 由用户空间程序通过scaling_setspeed文件修改频率。一般用作调试。
- Ondemand:根据CPU的当前使用率,动态的调节CPU频率。
- interactive: 交互式动态调节CPU频率,与Ondemand类似,由谷歌开发并广泛使用于手机平板等设备上。本文主要讨论该governor。
我们看一下cpufreq_governor结构体:
struct cpufreq_governor { char name[CPUFREQ_NAME_LEN]; //governor的名字,这里被赋值为interactive int initialized; //初始化标志位 int (*governor) (struct cpufreq_policy *policy, unsigned int event); //这个calback用于控制governor的行为,比较重要,是governor的一个去切入点 ssize_t (*show_setspeed) (struct cpufreq_policy *policy, char *buf); int (*store_setspeed) (struct cpufreq_policy *policy, unsigned int freq); unsigned int max_transition_latency; /* HW must be able to switch to next freq faster than this value in nano secs or we will fallback to performance governor */ struct list_head governor_list; //所有注册的governor都会被add到这个链表里面 struct module *owner;};
定义一个governor如下:
struct cpufreq_governor cpufreq_gov_interactive = { .name = "interactive", .governor = cpufreq_governor_interactive, .max_transition_latency = 10000000, .owner = THIS_MODULE,};
governor是这个结构的核心字段,cpufreq_governor注册后,cpufreq的核心层通过该字段操纵这个governor的行为,包括:初始化、启动、退出等工作。
- 一个governor如何被初始化的?
当一个governor被policy选定后,核心层会通过__
ufreq_set_policy函数对该cpu的policy进行设定。如果policy认为这是一个新的governor(和原来使用的旧的governor不相同),policy会通过__
cpufreq_governor函数,并传递CPUFREQ_GOV_POLICY_INIT参数,而__cpufreq_governor函数实际上是调用cpufreq_governor结构中的governor回调函数。
下面是它收到CPUFREQ_GOV_POLICY_INIT参数时的代码片段:
case CPUFREQ_GOV_POLICY_INIT: if (have_governor_per_policy()) { WARN_ON(tunables); } else if (tunables) { tunables->usage_count++; policy->governor_data = tunables; return 0; } tunables = kzalloc(sizeof(*tunables), GFP_KERNEL); if (!tunables) { pr_err("%s: POLICY_INIT: kzalloc failed\n", __func__); return -ENOMEM; } tunables->usage_count = 1; tunables->io_is_busy = true; tunables->above_hispeed_delay = default_above_hispeed_delay; tunables->nabove_hispeed_delay = ARRAY_SIZE(default_above_hispeed_delay); tunables->go_hispeed_load = DEFAULT_GO_HISPEED_LOAD; tunables->target_loads = default_target_loads; tunables->ntarget_loads = ARRAY_SIZE(default_target_loads); tunables->min_sample_time = DEFAULT_MIN_SAMPLE_TIME; tunables->timer_rate = DEFAULT_TIMER_RATE; tunables->boostpulse_duration_val = DEFAULT_MIN_SAMPLE_TIME; tunables->timer_slack_val = DEFAULT_TIMER_SLACK; spin_lock_init(&tunables->target_loads_lock); spin_lock_init(&tunables->above_hispeed_delay_lock); policy->governor_data = tunables; if (!have_governor_per_policy()) common_tunables = tunables; rc = sysfs_create_group(get_governor_parent_kobj(policy), get_sysfs_attr()); if (rc) { kfree(tunables); policy->governor_data = NULL; if (!have_governor_per_policy()) common_tunables = NULL; return rc; } if (!policy->governor->initialized) { idle_notifier_register(&cpufreq_interactive_idle_nb); cpufreq_register_notifier(&cpufreq_notifier_block, CPUFREQ_TRANSITION_NOTIFIER); }#ifdef CONFIG_CPU_FREQ_INPUT_EVNT_NOTIFY if (!input_handler_register_count) { cpumask_clear(&interactive_cpumask); rc = input_register_handler( &cpufreq_interactive_input_handler); if (rc) return rc; } tunables->input_event_freq = policy->max * DEFAULT_INPUT_EVENT_FRFQ_PERCENT / 100; tunables->input_dev_monitor = true; input_handler_register_count++;#endif break;
时序图如下:
经过sysfs_create_group后在/sys/devices/system/cpu/cpufreq/interactive建立了对应的sys节点,节点主要包括:
boost: interactive对突发任务的处理。
boostpulse:对突发任务的处理频率上升后持续的时间
go_hispeed_load:高频阈值。当系统的负载超过该值,升频,否则降频。
hispeed_freq: 当workload达到 go_hispeed_load时,频率将被拉高到这个值
input_boost:对input事件,如触屏等突发处理
min_sample_time:最小采样时间。每次调频结果必须维持至少这个时间。
timer_rate: 采样定时器的采样率。
当CPU不处于idel状态时,timer_rate作为采样速率来计算CPU的workload. 当CPU处于idel状态,此时使用一个可延时定时器,会导致CPU不能从idel状态苏醒来响应定时器. 定时器的最大的可延时时间用timer_slack表示,默认值80000 uS.
- 一个governor如何被启动的?
类似governor初始化,event CPUFREQ_GOV_START被调用:
case CPUFREQ_GOV_START: mutex_lock(&gov_lock); freq_table = cpufreq_frequency_get_table(policy->cpu); //如果没有设置hispeed_freq的值的话,就设置hispeed_freq为policy->max if (!tunables->hispeed_freq) tunables->hispeed_freq = policy->max; //遍历所有处于online状态的CPU for_each_cpu(j, policy->cpus) { pcpu = &per_cpu(cpuinfo, j); pcpu->policy = policy; pcpu->target_freq = policy->cur; pcpu->freq_table = freq_table; pcpu->floor_freq = pcpu->target_freq; pcpu->floor_validate_time = ktime_to_us(ktime_get()); pcpu->hispeed_validate_time = pcpu->floor_validate_time; pcpu->max_freq = policy->max; down_write(&pcpu->enable_sem); del_timer_sync(&pcpu->cpu_timer); del_timer_sync(&pcpu->cpu_slack_timer); //启动相关的定时器 cpufreq_interactive_timer_start(tunables, j); //启动定时器以后governor就可以工作了,所以设置pcpu->governor_enabled为1 pcpu->governor_enabled = 1; up_write(&pcpu->enable_sem); } mutex_unlock(&gov_lock); break;
现在,governor 字段被设置为cpufreq_governor_interactive,我们看看它的实现:
static int cpufreq_governor_interactive(struct cpufreq_policy *policy, unsigned int event){ int rc; unsigned int j; struct cpufreq_interactive_cpuinfo *pcpu; struct cpufreq_frequency_table *freq_table; struct cpufreq_interactive_tunables *tunables; unsigned long flags; if (have_governor_per_policy()) tunables = policy->governor_data; else tunables = common_tunables; WARN_ON(!tunables && (event != CPUFREQ_GOV_POLICY_INIT)); switch (event) { case CPUFREQ_GOV_POLICY_INIT: if (have_governor_per_policy()) { WARN_ON(tunables); } else if (tunables) { tunables->usage_count++; policy->governor_data = tunables; return 0; } tunables = kzalloc(sizeof(*tunables), GFP_KERNEL); if (!tunables) { pr_err("%s: POLICY_INIT: kzalloc failed\n", __func__); return -ENOMEM; } tunables->usage_count = 1; tunables->io_is_busy = true; tunables->above_hispeed_delay = default_above_hispeed_delay; tunables->nabove_hispeed_delay = ARRAY_SIZE(default_above_hispeed_delay); tunables->go_hispeed_load = DEFAULT_GO_HISPEED_LOAD; tunables->target_loads = default_target_loads; tunables->ntarget_loads = ARRAY_SIZE(default_target_loads); tunables->min_sample_time = DEFAULT_MIN_SAMPLE_TIME; tunables->timer_rate = DEFAULT_TIMER_RATE; tunables->boostpulse_duration_val = DEFAULT_MIN_SAMPLE_TIME; tunables->timer_slack_val = DEFAULT_TIMER_SLACK; spin_lock_init(&tunables->target_loads_lock); spin_lock_init(&tunables->above_hispeed_delay_lock); policy->governor_data = tunables; if (!have_governor_per_policy()) common_tunables = tunables; rc = sysfs_create_group(get_governor_parent_kobj(policy), get_sysfs_attr()); if (rc) { kfree(tunables); policy->governor_data = NULL; if (!have_governor_per_policy()) common_tunables = NULL; return rc; } if (!policy->governor->initialized) { idle_notifier_register(&cpufreq_interactive_idle_nb); cpufreq_register_notifier(&cpufreq_notifier_block, CPUFREQ_TRANSITION_NOTIFIER); }#ifdef CONFIG_CPU_FREQ_INPUT_EVNT_NOTIFY if (!input_handler_register_count) { cpumask_clear(&interactive_cpumask); rc = input_register_handler( &cpufreq_interactive_input_handler); if (rc) return rc; } tunables->input_event_freq = policy->max * DEFAULT_INPUT_EVENT_FRFQ_PERCENT / 100; tunables->input_dev_monitor = true; input_handler_register_count++;#endif break; case CPUFREQ_GOV_POLICY_EXIT: if (!--tunables->usage_count) { if (policy->governor->initialized == 1) { cpufreq_unregister_notifier(&cpufreq_notifier_block, CPUFREQ_TRANSITION_NOTIFIER); idle_notifier_unregister(&cpufreq_interactive_idle_nb); } sysfs_remove_group(get_governor_parent_kobj(policy), get_sysfs_attr()); kfree(tunables); common_tunables = NULL; }#ifdef CONFIG_CPU_FREQ_INPUT_EVNT_NOTIFY if (input_handler_register_count > 0) input_handler_register_count--; if (!input_handler_register_count) { cpumask_clear(&interactive_cpumask); input_unregister_handler(&cpufreq_interactive_input_handler); }#endif policy->governor_data = NULL; break; case CPUFREQ_GOV_START: mutex_lock(&gov_lock); freq_table = cpufreq_frequency_get_table(policy->cpu); if (!tunables->hispeed_freq) tunables->hispeed_freq = policy->max; for_each_cpu(j, policy->cpus) { pcpu = &per_cpu(cpuinfo, j); pcpu->policy = policy; pcpu->target_freq = policy->cur; pcpu->freq_table = freq_table; pcpu->floor_freq = pcpu->target_freq; pcpu->floor_validate_time = ktime_to_us(ktime_get()); pcpu->hispeed_validate_time = pcpu->floor_validate_time; pcpu->max_freq = policy->max; down_write(&pcpu->enable_sem); del_timer_sync(&pcpu->cpu_timer); del_timer_sync(&pcpu->cpu_slack_timer); cpufreq_interactive_timer_start(tunables, j); pcpu->governor_enabled = 1; up_write(&pcpu->enable_sem); }#ifdef CONFIG_CPU_FREQ_INPUT_EVNT_NOTIFY cpumask_or(&interactive_cpumask, &interactive_cpumask, policy->cpus);#endif mutex_unlock(&gov_lock); break; case CPUFREQ_GOV_STOP: mutex_lock(&gov_lock); for_each_cpu(j, policy->cpus) { pcpu = &per_cpu(cpuinfo, j); down_write(&pcpu->enable_sem); pcpu->governor_enabled = 0; del_timer_sync(&pcpu->cpu_timer); del_timer_sync(&pcpu->cpu_slack_timer); up_write(&pcpu->enable_sem); }#ifdef CONFIG_CPU_FREQ_INPUT_EVNT_NOTIFY cpumask_andnot(&interactive_cpumask, &interactive_cpumask, policy->cpus);#endif mutex_unlock(&gov_lock); break; case CPUFREQ_GOV_LIMITS: if (policy->max < policy->cur) __cpufreq_driver_target(policy, policy->max, CPUFREQ_RELATION_H); else if (policy->min > policy->cur) __cpufreq_driver_target(policy, policy->min, CPUFREQ_RELATION_L); for_each_cpu(j, policy->cpus) { pcpu = &per_cpu(cpuinfo, j); down_read(&pcpu->enable_sem); if (pcpu->governor_enabled == 0) { up_read(&pcpu->enable_sem); continue; } spin_lock_irqsave(&pcpu->target_freq_lock, flags); if (policy->max < pcpu->target_freq) pcpu->target_freq = policy->max; else if (policy->min > pcpu->target_freq) pcpu->target_freq = policy->min; spin_unlock_irqrestore(&pcpu->target_freq_lock, flags); up_read(&pcpu->enable_sem); /* Reschedule timer only if policy->max is raised. * Delete the timers, else the timer callback may * return without re-arm the timer when failed * acquire the semaphore. This race may cause timer * stopped unexpectedly. */ if (policy->max > pcpu->max_freq) { down_write(&pcpu->enable_sem); del_timer_sync(&pcpu->cpu_timer); del_timer_sync(&pcpu->cpu_slack_timer); cpufreq_interactive_timer_start(tunables, j); up_write(&pcpu->enable_sem); } pcpu->max_freq = policy->max; } break; } return 0;}
该函数主要初始化两个定时器,cpufreq_interactive_timer和cpufreq_interactive_nop_timer。
关键在于cpufreq_interactive_timer定时器的实现。
- Linux电源管理(四)CPUFreq
- wince电源管理(四)
- Linux电源管理_wakelocks--(四)
- 四极管:WinCE 电源管理概述(四)
- Linux电源管理(一)电源管理系统架构
- Linux电源管理(三)电源管理接口
- Linux CPUFreq
- linux 用户空间电源管理 (一)
- linux 用户空间电源管理 (二)
- linux 用户空间电源管理 (三)
- linux-3.4 电源管理框架(1)
- Linux电源管理(五)thermal
- Linux电源管理(六)cpuhotplug
- linux 电源管理
- linux新旧电源管理
- linux 电源管理
- Linux电源管理编译
- linux 电源管理
- Redis学习日志(一)
- 从github上下载的项目无法在android studio中打开的解决办法
- 兑换积分
- 微服务的优缺点
- String总结--小码哥java
- Linux电源管理(四)CPUFreq
- 在.NET中创建高级控制台应用程序
- c#图片对比度调整
- java图片对比度调整
- Spring配置文件<context:property-placeholder>标签使用漫谈
- 1021. 个位数统计
- 【SQL Server学习笔记】16:谓词和运算符
- Swift:"奇怪"的事件响应链
- LinkList_withHeadNode(带头结点的单链表)