synchronization---per-CPU variable
来源:互联网 发布:爱淘客网站源码 编辑:程序博客网 时间:2024/06/05 11:14
====================================================================================
Index:
1. Intro
2. Reference
3. Basic theory
4. APIs
4.1. APIs for static per-CPU variable
4.2. APIs for dynamic per-CPU variable
5. Implementation details, based on kernel 2.6.11.12
5.1. impl of static per-CPU variable
5.1.1. UP version
5.1.2. SMP version
5.2. impl of dynamic per-CPU variable
5.2.1. UP version
5.2.2. SMP version
6. Misc tips
====================================================================================
1. Intro
This doc describles per-CPU variable.
====================================================================================
2. Reference
[1] ulk, ulk - OReilly.Understanding.The.Linux.Kernel.3rd.Edition
//5.2.1. Per-CPU Variables
====================================================================================
3. Basic theory
<<ulk>>
/5.2. Synchronization Primitives
Table 5-2. Various types of synchronization techniques used by the kernel
Technique Description Scope
Per-CPU variables Duplicate a data structure among the CPUs All CPUs
The basic theory of per-CPU variable is:
For per-CPU variables, the kernel arrange them like below:
* variable #0 variable #1 variable #2
* ------------------- ------------------- ------------
* | u0 | u1 | u2 | u3 | | u0 | u1 | u2 | u3 | | u0 | u1 | u
* ------------------- ...... ------------------- .... ------------
A per-CPU variable is in fact a array-like structure, its has NR_CPU elements, each element corresponds to a CPU.
[*] <<ulk>> says each element is aligned to CPU cache line, that is impl detail, we will see.
Then, the code only access the local CPU copy of the per-CPU variable.
per-CPU variable is divided into 2 types:
static per-CPU variable
like simple static variable, it is directly compiled and linked into vmlinux or module.
dynamic per-CPU variable
like simple dynamic variable, it is dynamically allocated in dynamic memory area.
As a synchronization techniqure, per-CPU variable alone is not that riable, consider the following scenario:
task #0, system call service routine is accessing a per-CPU variable local copy on CPU #0.
a HW IRQ issued, hardirq handler interrupts system call service routine, and run.
This hardirq handler wakes up a higher priority task #1.
harirq handler returns.
During IRET, kernel preemption happens, task #1 preempts task #0.
task #1 get to run.
.....
AFTER sometime, task #0 is migrated to other CPU #1, and get scheduled and resumed.
!!!__but now, task #0 is still accessing the per-CPU variable copy of CPU #0, not CPU #1. This causes problem.
So per-CPU variable MUST be used with other synchronization techniques, when accessing per-CPU variables, we need to:
disable preemption
This prevents the scenario above, so task #0 always on CPU #0 during its access to per-CPU variables.
disable softirq # including _lock_bh
disable hardirq # include _lock_irq / lock_irqsave
These implicitly disable preemption.
But additionally, if softirq / hardirq can possibly access a shared per-CPU variable, then, it is needed. In fact, these 2 are rules of using locks. See:
<<kdoc - kernerl-locking>>
[*] Note that, in scenario above, we use system call service routine as a example, but this DOES NOT mean per-CPU variable is only used in "user context", it can be used in ANY context.
====================================================================================
4. APIs
We use different sets of APIs to manipulate static per-CPU variable and dynamic per-CPU variable.
For use per-CPU variable, just:
#include <linux/percpu.h>
Don't include other percpu.h, which are about the architecture-specific implementation details of per-CPU variable.
[*] Note that, the following APIs are from kernel 2.6.11.12, the APIs of recent kernel keep the same, but the implementation changes a lot. For simplicity, we use kernel 2.6.11.12 for description.
====================================================================================
4.1. APIs for static per-CPU variable
#
# DECLARE_PER_CPU()
# is to externly declare a static per-CPU variable, because it uses 'extern' keyword in DECLARE_PER_CPU_SECTION().
#
# It is ususally used for declaring per-CPU variable in header file, or forward declaration in C file.
#
#
# DEFINE_PER_CPU()
# is to define a static per-CPU variable.
#
# It is used in C file.
#
#define DECLARE_PER_CPU(type, name)
#define DEFINE_PER_CPU(type, name)
#
# per_cpu(var, cpu)
# Selects the element for CPU cpu of the per-CPU array name.
#
# Note that, per_cpu() is NOT to retrieve local CPU copy of per-CPU variable, but to retrieve the per-CPU variable copy
# of the specified CPU(!^^__by CPU index).
#
# It should be considered as a internal API, and it is rare to use it in common kernel programming.
#
#define per_cpu(var, cpu) (*RELOC_HIDE(&per_cpu__##var, __per_cpu_offset[cpu]))
#
# __get_cpu_var(var()
# Get the local copy of per-CPU variable(!^^__that is, smp_processor_id() returns the local CPU index, and then, pass
# the local CPU index to per_cpu).
#
#define __get_cpu_var(var) per_cpu(var, smp_processor_id())
#
# get_cpu_var(var)
# Disables kernel preemption, then selects the local CPU's element of the per-CPU array name
#
# put_cpu_var(var)
# Enables kernel preemption (name is not used)
#
#
# As we can see, get_cpu_var() / put_cpu_var() internally disable / enable kernel preemption, so they are most commonly
# used APIs for static per-CPU variable in programming.
#
#define get_cpu_var(var) (*({ preempt_disable(); &__get_cpu_var(var); }))
#define put_cpu_var(var) preempt_enable()
===================================================================================
4.2. APIs for dynamic per-CPU variable
#
# alloc_percpu(type)
# Dynamically allocates a per-CPU array of type data structures and returns its address
#
#define alloc_percpu(type) \
((type *)(__alloc_percpu(sizeof(type), __alignof__(type))))
#
# free_percpu(pointer)
# Releases a dynamically allocated per-CPU array at address pointer
#
static inline void free_percpu(const void *ptr)
#
# per_cpu_ptr(pointer, cpu)
# Returns the address of the element for CPU cpu of the per-CPU array at address pointer
#
# Note that, unlike get_cpu_var() / put_cpu_var(), per_cpu_ptr() does not disable kernel preemption for us, so it is like
# __get_cpu_var(), so when we use per_cpu_ptr(), we need to disable preemption ourself.
#
#define per_cpu_ptr(ptr, cpu) \
({ \
struct percpu_data *__p = (struct percpu_data *)~(unsigned long)(ptr); \
(__typeof__(ptr))__p->ptrs[(cpu)]; \
})
====================================================================================
5. Implementation details, based on kernel 2.6.11.12
Although the theory / sematics / APIs remain the same for per-CPU variable across different kernel version, but in recent kernel, the internal implementation changes a lot, and much more complicated(!^^__same thing happen to workQ...).
For simplicity, here we describes the implementation based on kernel 2.6.11.12, which is enough to understand the internal of per-CPU variable.
====================================================================================
5.1. impl of static per-CPU variable
====================================================================================
5.1.1. UP version
#
# [*] In fact, when we use "name" to define a static per-CPU variable, the name of this per-CPU variable is not
# directly "name", but preappended with a prefix "per_cpu__". This handling is common to UP / SMP.
#
# In UP version, per-CPU variable is just defined simply like regular variables, no special handling, because there is
# only one CPU, so there is only one element, no need to define per-CPU variable as an array.
#
#define DEFINE_PER_CPU(type, name) \
__typeof__(type) per_cpu__##name
#
# So, In UP, per_cpu() and __get_cpu_var() just return the per-CPU variable directly.
#
#define per_cpu(var, cpu) (*((void)cpu, &per_cpu__##var))
#define __get_cpu_var(var) per_cpu__##var
#
# get_cpu_var() / put_cpu_var() are common to UP / SMP, it is the internal __get_cpu_var() they called which makes
# difference.
#
# Note that, even in UP, get_cpu_var() also disable kernel preemption. because it need to avoid the following case:
# task #0 is preempted by task #1, when accessing a per-CPU variable.
# task #1 will also access a same per-CPU variable.
# task #1 scheduled again, it sees a inconsistent view of this per-CPU variable.
#
#define get_cpu_var(var) (*({ preempt_disable(); &__get_cpu_var(var); }))
#define put_cpu_var(var) preempt_enable()
====================================================================================
5.1.2. SMP version
#
# In SMP, DEFINE_PER_CPU() performs some special handling when defining a per-CPU variable.
#
# It uses the section attribute ".data.percpu", then, the per-CPU variable would be compiled and linked into
# ".data.percpu" section of vmlinux or module.
#
#[*] Note that, even for SMP, a per-CPU variable is NOT directly defined as "a array of NR_CPU element", we will
# see how kernel handles this thing soon.
#
#define DEFINE_PER_CPU(type, name) \
__attribute__((__section__(".data.percpu"))) __typeof__(type) per_cpu__##name
#
# get_cpu_var() / put_cpu_var() are common to UP / SMP, it is the internal __get_cpu_var() they called which makes
# difference.
#
#define get_cpu_var(var) (*({ preempt_disable(); &__get_cpu_var(var); }))
#define put_cpu_var(var) preempt_enable()
#
# per_cpu() of SMP is different from that of UP, it is:
#
# &"per_cpu__##var" + __per_cpu_offset[cpu]
#
#define __get_cpu_var(var) per_cpu(var, smp_processor_id())
#define per_cpu(var, cpu) (*RELOC_HIDE(&per_cpu__##var, __per_cpu_offset[cpu]))
# define RELOC_HIDE(ptr, off) \
({ unsigned long __ptr; \
__ptr = (unsigned long) (ptr); \
(typeof(ptr)) (__ptr + (off)); })
-----------------------------------------------------------------------------------
#
# setup_per_cpu_areas() is to set up memory area containing static per-CPU variable of vmlinux.
#
@@trace - how kernel handles static per-CPU variable of vmlinux.
start_kernel()
setup_per_cpu_areas();
#
# __per_cpu_start[] and __per_cpu_end[] are 2 linker symbols, defined in:
# /arch/$(arch)/kernel/vmlinux.lds.S - x86, mips, ppc
# like:
# __per_cpu_start = .;
# .data.percpu : { *(.data.percpu) }
# __per_cpu_end = .;
# . = ALIGN(4096);
#
# As we see, they identify the start and the end of ".data.percpu" section, which contains all the
# static per-CPU variables of vmlinux.
#
/* Created by linker magic */
extern char __per_cpu_start[], __per_cpu_end[];
#
# Compute the size of ".data.percpu" section.
#
# Allocate a memory area of "size of .data.percpu" x NR_CPUS, from bootmem allocator.
#
# Copy the content of ".data.percpu" section, into this memory area, duplicated NR_CPUS copy.
# And set __per_cpu_offset[] accordingly.
#
/* Copy section for each CPU (we discard the original) */
size = ALIGN(__per_cpu_end - __per_cpu_start, SMP_CACHE_BYTES);
#ifdef CONFIG_MODULES
if (size < PERCPU_ENOUGH_ROOM)
size = PERCPU_ENOUGH_ROOM;
#endif
ptr = alloc_bootmem(size * NR_CPUS);
for (i = 0; i < NR_CPUS; i++, ptr += size) {
__per_cpu_offset[i] = ptr - __per_cpu_start;
memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
}
-----------------------------------------------------------------------------------
As we can see from above, the memory area allocated by setup_per_cpu_areas(), is in fact of the following layout:
------------------------------ <- __per_cpu_offset[0]
range #0 for CPU #0
------------------------------
SMP_CACHE_BYTES alignment
------------------------------ <- __per_cpu_offset[1]
range #1 for CPU #1
------------------------------
.
.
.
------------------------------
SMP_CACHE_BYTES alignment
------------------------------ <- __per_cpu_offset[N]
range #N for CPU #N
------------------------------
And in theory, a per-CPU variale is an "array of element", but in implementation, the elements are NOT organized continously in RAM, but separated in the different "range", like the following:
------------------------------------------- <- __per_cpu_offset[0]
range #0 for CPU #0
-----------------------------------
CPU #0 copy of per-CPU variable #a
-----------------------------------
CPU #0 copy of per-CPU variable #b
-----------------------------------
CPU #0 copy of per-CPU variable #c
-----------------------------------
-------------------------------------------
SMP_CACHE_BYTES alignment
------------------------------------------- <- __per_cpu_offset[1]
range #1 for CPU #1
-----------------------------------
CPU #1 copy of per-CPU variable #a
-----------------------------------
CPU #1 copy of per-CPU variable #b
-----------------------------------
CPU #1 copy of per-CPU variable #c
-----------------------------------
-------------------------------------------
.
.
.
-------------------------------------------
SMP_CACHE_BYTES alignment
------------------------------------------- <- __per_cpu_offset[N]
range #N for CPU #N
-----------------------------------
CPU #N copy of per-CPU variable #a
-----------------------------------
CPU #N copy of per-CPU variable #b
-----------------------------------
CPU #N copy of per-CPU variable #c
-----------------------------------
-------------------------------------------
In fact ".data.section" of vmlinux will be released after the memory area above has been constructed (!^^__perharps in bootmem allocator retire time), And actually SMP per_cpu() is to return the pointer to the per-CPU range of the memory area, by:
#
# &"per_cpu__##var" + __per_cpu_offset[cpu]
#
# [*] Note that, "per_cpu__##var" is the original address value(!^^__known in compile/link time) of
# per-CPU variable in ".data.section", which is discared. But we never access this address, but just add
# __per_cpu_offset[cpu] to it, to get the actual address value of the copy of per-CPU variable specified,
# in the per-CPU range of the memory area.
#
# [*] So, this is why we don't define a "array of element" in SMP DEFINE_PER_CPU(), and how per_cpu() works.
#
#define per_cpu(var, cpu) (*RELOC_HIDE(&per_cpu__##var, __per_cpu_offset[cpu]))
# define RELOC_HIDE(ptr, off) \
({ unsigned long __ptr; \
__ptr = (unsigned long) (ptr); \
(typeof(ptr)) (__ptr + (off)); })
-----------------------------------------------------------------------------------
[*] How to handle static per-CPU variable in module ???
Because we use the same APIs to access static per-CPU variables in kernel and module, so the module per-CPU variables are also organized into the per-CPU memory range, described by __per_cpu_offset[NR_CPUS].
As we see from:
setup_per_cpu_areas();
/* Copy section for each CPU (we discard the original) */
size = ALIGN(__per_cpu_end - __per_cpu_start, SMP_CACHE_BYTES);
#ifdef CONFIG_MODULES
if (size < PERCPU_ENOUGH_ROOM)
size = PERCPU_ENOUGH_ROOM;
#endif
/* Enough to cover all DEFINE_PER_CPUs in kernel, including modules. */
#ifndef PERCPU_ENOUGH_ROOM
#define PERCPU_ENOUGH_ROOM 32768
#endif
So, besides per-CPU variables of vmlinux, the memory area also have room for per-CPU variables of modules.
The kernel also duplicates NR_CPUS copies of per-CPU variables of modules into that memory area, during module loading time.
As for details, see:
/kernel/module.c - percpu_modinit() and so on # well, no enough energy to investigate.
====================================================================================
5.2. impl of dynamic per-CPU variable
====================================================================================
5.2.1. UP version
#
# The API alloc_percpu() is common to both UP / SMP, it is __alloc_percpu() which makes difference.
#
#define alloc_percpu(type) \
((type *)(__alloc_percpu(sizeof(type), __alignof__(type))))
#
# UP __alloc_percpu() just call kmalloc(size), to allocate the only one element.
#
static inline void *__alloc_percpu(size_t size, size_t align)
{
void *ret = kmalloc(size, GFP_KERNEL);
if (ret)
memset(ret, 0, size);
return ret;
}
#
# Correspondingly, UP free_percpu() is also simple, just deallocate the only one element.
#
static inline void free_percpu(const void *ptr)
{
kfree(ptr);
}
#
# UP per_cpu_ptr() simply returns the only one element.
#
#define per_cpu_ptr(ptr, cpu) (ptr)
====================================================================================
5.2.2. SMP version
#
# The API alloc_percpu() is common to both UP / SMP, it is __alloc_percpu() which makes difference.
#
#define alloc_percpu(type) \
((type *)(__alloc_percpu(sizeof(type), __alignof__(type))))
#
# SMP __alloc_percpu() is also not to allocate "array of NR_CPUS element" for this dynamic per-CPU varaible.
#
# But:
# Allocate a "percpu_data" intance, which is:
#
# struct percpu_data { # represent a dynamic per-CPU variable, but it is internal.
# void *ptrs[NR_CPUS];
# };
#
# Then, allocate per-CPU element from NUMA kmem_cache_alloc_node(), save each element into "percpu_data->ptrs".
#
# Return the encrypted value of "percpu_data *".
#
#
# So, for SMP dynamic per-CPU variable, its per-CPU elements are also not continugous in RAM, like static one.
#
void *__alloc_percpu(size_t size, size_t align)
struct percpu_data *pdata = kmalloc(sizeof (*pdata), GFP_KERNEL);
for (i = 0; i < NR_CPUS; i++) {
if (!cpu_possible(i))
continue;
pdata->ptrs[i] = kmem_cache_alloc_node(
kmem_find_general_cachep(size, GFP_KERNEL),
cpu_to_node(i));
memset(pdata->ptrs[i], 0, size);
#
# Note here, we don't simply return the address of "percpu_data", but a encrypted value.
#
return (void *) (~(unsigned long) pdata);
#
# Correspondingly, SMP per_cpu_ptr() is to return "percpu_data->ptrs[cpu]".
#
#define per_cpu_ptr(ptr, cpu) \
({ \
struct percpu_data *__p = (struct percpu_data *)~(unsigned long)(ptr); \
(__typeof__(ptr))__p->ptrs[(cpu)]; \
})
#
# And SMP free_percpu() is to deallocate the per-CPU elements in "percpu_data->ptrs[]", and then "percpu_data" instance.
#
void free_percpu(const void *objp)
#
# Decrypt and get the actual address of "percpu_data".
#
struct percpu_data *p = (struct percpu_data *) (~(unsigned long) objp);
for (i = 0; i < NR_CPUS; i++) {
if (!cpu_possible(i))
continue;
kfree(p->ptrs[i]);
}
kfree(p);
====================================================================================
6. Misc tips
NONE.
Index:
1. Intro
2. Reference
3. Basic theory
4. APIs
4.1. APIs for static per-CPU variable
4.2. APIs for dynamic per-CPU variable
5. Implementation details, based on kernel 2.6.11.12
5.1. impl of static per-CPU variable
5.1.1. UP version
5.1.2. SMP version
5.2. impl of dynamic per-CPU variable
5.2.1. UP version
5.2.2. SMP version
6. Misc tips
====================================================================================
1. Intro
This doc describles per-CPU variable.
====================================================================================
2. Reference
[1] ulk, ulk - OReilly.Understanding.The.Linux.Kernel.3rd.Edition
//5.2.1. Per-CPU Variables
====================================================================================
3. Basic theory
<<ulk>>
/5.2. Synchronization Primitives
Table 5-2. Various types of synchronization techniques used by the kernel
Technique Description Scope
Per-CPU variables Duplicate a data structure among the CPUs All CPUs
The basic theory of per-CPU variable is:
For per-CPU variables, the kernel arrange them like below:
* variable #0 variable #1 variable #2
* ------------------- ------------------- ------------
* | u0 | u1 | u2 | u3 | | u0 | u1 | u2 | u3 | | u0 | u1 | u
* ------------------- ...... ------------------- .... ------------
A per-CPU variable is in fact a array-like structure, its has NR_CPU elements, each element corresponds to a CPU.
[*] <<ulk>> says each element is aligned to CPU cache line, that is impl detail, we will see.
Then, the code only access the local CPU copy of the per-CPU variable.
per-CPU variable is divided into 2 types:
static per-CPU variable
like simple static variable, it is directly compiled and linked into vmlinux or module.
dynamic per-CPU variable
like simple dynamic variable, it is dynamically allocated in dynamic memory area.
As a synchronization techniqure, per-CPU variable alone is not that riable, consider the following scenario:
task #0, system call service routine is accessing a per-CPU variable local copy on CPU #0.
a HW IRQ issued, hardirq handler interrupts system call service routine, and run.
This hardirq handler wakes up a higher priority task #1.
harirq handler returns.
During IRET, kernel preemption happens, task #1 preempts task #0.
task #1 get to run.
.....
AFTER sometime, task #0 is migrated to other CPU #1, and get scheduled and resumed.
!!!__but now, task #0 is still accessing the per-CPU variable copy of CPU #0, not CPU #1. This causes problem.
So per-CPU variable MUST be used with other synchronization techniques, when accessing per-CPU variables, we need to:
disable preemption
This prevents the scenario above, so task #0 always on CPU #0 during its access to per-CPU variables.
disable softirq # including _lock_bh
disable hardirq # include _lock_irq / lock_irqsave
These implicitly disable preemption.
But additionally, if softirq / hardirq can possibly access a shared per-CPU variable, then, it is needed. In fact, these 2 are rules of using locks. See:
<<kdoc - kernerl-locking>>
[*] Note that, in scenario above, we use system call service routine as a example, but this DOES NOT mean per-CPU variable is only used in "user context", it can be used in ANY context.
====================================================================================
4. APIs
We use different sets of APIs to manipulate static per-CPU variable and dynamic per-CPU variable.
For use per-CPU variable, just:
#include <linux/percpu.h>
Don't include other percpu.h, which are about the architecture-specific implementation details of per-CPU variable.
[*] Note that, the following APIs are from kernel 2.6.11.12, the APIs of recent kernel keep the same, but the implementation changes a lot. For simplicity, we use kernel 2.6.11.12 for description.
====================================================================================
4.1. APIs for static per-CPU variable
#
# DECLARE_PER_CPU()
# is to externly declare a static per-CPU variable, because it uses 'extern' keyword in DECLARE_PER_CPU_SECTION().
#
# It is ususally used for declaring per-CPU variable in header file, or forward declaration in C file.
#
#
# DEFINE_PER_CPU()
# is to define a static per-CPU variable.
#
# It is used in C file.
#
#define DECLARE_PER_CPU(type, name)
#define DEFINE_PER_CPU(type, name)
#
# per_cpu(var, cpu)
# Selects the element for CPU cpu of the per-CPU array name.
#
# Note that, per_cpu() is NOT to retrieve local CPU copy of per-CPU variable, but to retrieve the per-CPU variable copy
# of the specified CPU(!^^__by CPU index).
#
# It should be considered as a internal API, and it is rare to use it in common kernel programming.
#
#define per_cpu(var, cpu) (*RELOC_HIDE(&per_cpu__##var, __per_cpu_offset[cpu]))
#
# __get_cpu_var(var()
# Get the local copy of per-CPU variable(!^^__that is, smp_processor_id() returns the local CPU index, and then, pass
# the local CPU index to per_cpu).
#
#define __get_cpu_var(var) per_cpu(var, smp_processor_id())
#
# get_cpu_var(var)
# Disables kernel preemption, then selects the local CPU's element of the per-CPU array name
#
# put_cpu_var(var)
# Enables kernel preemption (name is not used)
#
#
# As we can see, get_cpu_var() / put_cpu_var() internally disable / enable kernel preemption, so they are most commonly
# used APIs for static per-CPU variable in programming.
#
#define get_cpu_var(var) (*({ preempt_disable(); &__get_cpu_var(var); }))
#define put_cpu_var(var) preempt_enable()
===================================================================================
4.2. APIs for dynamic per-CPU variable
#
# alloc_percpu(type)
# Dynamically allocates a per-CPU array of type data structures and returns its address
#
#define alloc_percpu(type) \
((type *)(__alloc_percpu(sizeof(type), __alignof__(type))))
#
# free_percpu(pointer)
# Releases a dynamically allocated per-CPU array at address pointer
#
static inline void free_percpu(const void *ptr)
#
# per_cpu_ptr(pointer, cpu)
# Returns the address of the element for CPU cpu of the per-CPU array at address pointer
#
# Note that, unlike get_cpu_var() / put_cpu_var(), per_cpu_ptr() does not disable kernel preemption for us, so it is like
# __get_cpu_var(), so when we use per_cpu_ptr(), we need to disable preemption ourself.
#
#define per_cpu_ptr(ptr, cpu) \
({ \
struct percpu_data *__p = (struct percpu_data *)~(unsigned long)(ptr); \
(__typeof__(ptr))__p->ptrs[(cpu)]; \
})
====================================================================================
5. Implementation details, based on kernel 2.6.11.12
Although the theory / sematics / APIs remain the same for per-CPU variable across different kernel version, but in recent kernel, the internal implementation changes a lot, and much more complicated(!^^__same thing happen to workQ...).
For simplicity, here we describes the implementation based on kernel 2.6.11.12, which is enough to understand the internal of per-CPU variable.
====================================================================================
5.1. impl of static per-CPU variable
====================================================================================
5.1.1. UP version
#
# [*] In fact, when we use "name" to define a static per-CPU variable, the name of this per-CPU variable is not
# directly "name", but preappended with a prefix "per_cpu__". This handling is common to UP / SMP.
#
# In UP version, per-CPU variable is just defined simply like regular variables, no special handling, because there is
# only one CPU, so there is only one element, no need to define per-CPU variable as an array.
#
#define DEFINE_PER_CPU(type, name) \
__typeof__(type) per_cpu__##name
#
# So, In UP, per_cpu() and __get_cpu_var() just return the per-CPU variable directly.
#
#define per_cpu(var, cpu) (*((void)cpu, &per_cpu__##var))
#define __get_cpu_var(var) per_cpu__##var
#
# get_cpu_var() / put_cpu_var() are common to UP / SMP, it is the internal __get_cpu_var() they called which makes
# difference.
#
# Note that, even in UP, get_cpu_var() also disable kernel preemption. because it need to avoid the following case:
# task #0 is preempted by task #1, when accessing a per-CPU variable.
# task #1 will also access a same per-CPU variable.
# task #1 scheduled again, it sees a inconsistent view of this per-CPU variable.
#
#define get_cpu_var(var) (*({ preempt_disable(); &__get_cpu_var(var); }))
#define put_cpu_var(var) preempt_enable()
====================================================================================
5.1.2. SMP version
#
# In SMP, DEFINE_PER_CPU() performs some special handling when defining a per-CPU variable.
#
# It uses the section attribute ".data.percpu", then, the per-CPU variable would be compiled and linked into
# ".data.percpu" section of vmlinux or module.
#
#[*] Note that, even for SMP, a per-CPU variable is NOT directly defined as "a array of NR_CPU element", we will
# see how kernel handles this thing soon.
#
#define DEFINE_PER_CPU(type, name) \
__attribute__((__section__(".data.percpu"))) __typeof__(type) per_cpu__##name
#
# get_cpu_var() / put_cpu_var() are common to UP / SMP, it is the internal __get_cpu_var() they called which makes
# difference.
#
#define get_cpu_var(var) (*({ preempt_disable(); &__get_cpu_var(var); }))
#define put_cpu_var(var) preempt_enable()
#
# per_cpu() of SMP is different from that of UP, it is:
#
# &"per_cpu__##var" + __per_cpu_offset[cpu]
#
#define __get_cpu_var(var) per_cpu(var, smp_processor_id())
#define per_cpu(var, cpu) (*RELOC_HIDE(&per_cpu__##var, __per_cpu_offset[cpu]))
# define RELOC_HIDE(ptr, off) \
({ unsigned long __ptr; \
__ptr = (unsigned long) (ptr); \
(typeof(ptr)) (__ptr + (off)); })
-----------------------------------------------------------------------------------
#
# setup_per_cpu_areas() is to set up memory area containing static per-CPU variable of vmlinux.
#
@@trace - how kernel handles static per-CPU variable of vmlinux.
start_kernel()
setup_per_cpu_areas();
#
# __per_cpu_start[] and __per_cpu_end[] are 2 linker symbols, defined in:
# /arch/$(arch)/kernel/vmlinux.lds.S - x86, mips, ppc
# like:
# __per_cpu_start = .;
# .data.percpu : { *(.data.percpu) }
# __per_cpu_end = .;
# . = ALIGN(4096);
#
# As we see, they identify the start and the end of ".data.percpu" section, which contains all the
# static per-CPU variables of vmlinux.
#
/* Created by linker magic */
extern char __per_cpu_start[], __per_cpu_end[];
#
# Compute the size of ".data.percpu" section.
#
# Allocate a memory area of "size of .data.percpu" x NR_CPUS, from bootmem allocator.
#
# Copy the content of ".data.percpu" section, into this memory area, duplicated NR_CPUS copy.
# And set __per_cpu_offset[] accordingly.
#
/* Copy section for each CPU (we discard the original) */
size = ALIGN(__per_cpu_end - __per_cpu_start, SMP_CACHE_BYTES);
#ifdef CONFIG_MODULES
if (size < PERCPU_ENOUGH_ROOM)
size = PERCPU_ENOUGH_ROOM;
#endif
ptr = alloc_bootmem(size * NR_CPUS);
for (i = 0; i < NR_CPUS; i++, ptr += size) {
__per_cpu_offset[i] = ptr - __per_cpu_start;
memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
}
-----------------------------------------------------------------------------------
As we can see from above, the memory area allocated by setup_per_cpu_areas(), is in fact of the following layout:
------------------------------ <- __per_cpu_offset[0]
range #0 for CPU #0
------------------------------
SMP_CACHE_BYTES alignment
------------------------------ <- __per_cpu_offset[1]
range #1 for CPU #1
------------------------------
.
.
.
------------------------------
SMP_CACHE_BYTES alignment
------------------------------ <- __per_cpu_offset[N]
range #N for CPU #N
------------------------------
And in theory, a per-CPU variale is an "array of element", but in implementation, the elements are NOT organized continously in RAM, but separated in the different "range", like the following:
------------------------------------------- <- __per_cpu_offset[0]
range #0 for CPU #0
-----------------------------------
CPU #0 copy of per-CPU variable #a
-----------------------------------
CPU #0 copy of per-CPU variable #b
-----------------------------------
CPU #0 copy of per-CPU variable #c
-----------------------------------
-------------------------------------------
SMP_CACHE_BYTES alignment
------------------------------------------- <- __per_cpu_offset[1]
range #1 for CPU #1
-----------------------------------
CPU #1 copy of per-CPU variable #a
-----------------------------------
CPU #1 copy of per-CPU variable #b
-----------------------------------
CPU #1 copy of per-CPU variable #c
-----------------------------------
-------------------------------------------
.
.
.
-------------------------------------------
SMP_CACHE_BYTES alignment
------------------------------------------- <- __per_cpu_offset[N]
range #N for CPU #N
-----------------------------------
CPU #N copy of per-CPU variable #a
-----------------------------------
CPU #N copy of per-CPU variable #b
-----------------------------------
CPU #N copy of per-CPU variable #c
-----------------------------------
-------------------------------------------
In fact ".data.section" of vmlinux will be released after the memory area above has been constructed (!^^__perharps in bootmem allocator retire time), And actually SMP per_cpu() is to return the pointer to the per-CPU range of the memory area, by:
#
# &"per_cpu__##var" + __per_cpu_offset[cpu]
#
# [*] Note that, "per_cpu__##var" is the original address value(!^^__known in compile/link time) of
# per-CPU variable in ".data.section", which is discared. But we never access this address, but just add
# __per_cpu_offset[cpu] to it, to get the actual address value of the copy of per-CPU variable specified,
# in the per-CPU range of the memory area.
#
# [*] So, this is why we don't define a "array of element" in SMP DEFINE_PER_CPU(), and how per_cpu() works.
#
#define per_cpu(var, cpu) (*RELOC_HIDE(&per_cpu__##var, __per_cpu_offset[cpu]))
# define RELOC_HIDE(ptr, off) \
({ unsigned long __ptr; \
__ptr = (unsigned long) (ptr); \
(typeof(ptr)) (__ptr + (off)); })
-----------------------------------------------------------------------------------
[*] How to handle static per-CPU variable in module ???
Because we use the same APIs to access static per-CPU variables in kernel and module, so the module per-CPU variables are also organized into the per-CPU memory range, described by __per_cpu_offset[NR_CPUS].
As we see from:
setup_per_cpu_areas();
/* Copy section for each CPU (we discard the original) */
size = ALIGN(__per_cpu_end - __per_cpu_start, SMP_CACHE_BYTES);
#ifdef CONFIG_MODULES
if (size < PERCPU_ENOUGH_ROOM)
size = PERCPU_ENOUGH_ROOM;
#endif
/* Enough to cover all DEFINE_PER_CPUs in kernel, including modules. */
#ifndef PERCPU_ENOUGH_ROOM
#define PERCPU_ENOUGH_ROOM 32768
#endif
So, besides per-CPU variables of vmlinux, the memory area also have room for per-CPU variables of modules.
The kernel also duplicates NR_CPUS copies of per-CPU variables of modules into that memory area, during module loading time.
As for details, see:
/kernel/module.c - percpu_modinit() and so on # well, no enough energy to investigate.
====================================================================================
5.2. impl of dynamic per-CPU variable
====================================================================================
5.2.1. UP version
#
# The API alloc_percpu() is common to both UP / SMP, it is __alloc_percpu() which makes difference.
#
#define alloc_percpu(type) \
((type *)(__alloc_percpu(sizeof(type), __alignof__(type))))
#
# UP __alloc_percpu() just call kmalloc(size), to allocate the only one element.
#
static inline void *__alloc_percpu(size_t size, size_t align)
{
void *ret = kmalloc(size, GFP_KERNEL);
if (ret)
memset(ret, 0, size);
return ret;
}
#
# Correspondingly, UP free_percpu() is also simple, just deallocate the only one element.
#
static inline void free_percpu(const void *ptr)
{
kfree(ptr);
}
#
# UP per_cpu_ptr() simply returns the only one element.
#
#define per_cpu_ptr(ptr, cpu) (ptr)
====================================================================================
5.2.2. SMP version
#
# The API alloc_percpu() is common to both UP / SMP, it is __alloc_percpu() which makes difference.
#
#define alloc_percpu(type) \
((type *)(__alloc_percpu(sizeof(type), __alignof__(type))))
#
# SMP __alloc_percpu() is also not to allocate "array of NR_CPUS element" for this dynamic per-CPU varaible.
#
# But:
# Allocate a "percpu_data" intance, which is:
#
# struct percpu_data { # represent a dynamic per-CPU variable, but it is internal.
# void *ptrs[NR_CPUS];
# };
#
# Then, allocate per-CPU element from NUMA kmem_cache_alloc_node(), save each element into "percpu_data->ptrs".
#
# Return the encrypted value of "percpu_data *".
#
#
# So, for SMP dynamic per-CPU variable, its per-CPU elements are also not continugous in RAM, like static one.
#
void *__alloc_percpu(size_t size, size_t align)
struct percpu_data *pdata = kmalloc(sizeof (*pdata), GFP_KERNEL);
for (i = 0; i < NR_CPUS; i++) {
if (!cpu_possible(i))
continue;
pdata->ptrs[i] = kmem_cache_alloc_node(
kmem_find_general_cachep(size, GFP_KERNEL),
cpu_to_node(i));
memset(pdata->ptrs[i], 0, size);
#
# Note here, we don't simply return the address of "percpu_data", but a encrypted value.
#
return (void *) (~(unsigned long) pdata);
#
# Correspondingly, SMP per_cpu_ptr() is to return "percpu_data->ptrs[cpu]".
#
#define per_cpu_ptr(ptr, cpu) \
({ \
struct percpu_data *__p = (struct percpu_data *)~(unsigned long)(ptr); \
(__typeof__(ptr))__p->ptrs[(cpu)]; \
})
#
# And SMP free_percpu() is to deallocate the per-CPU elements in "percpu_data->ptrs[]", and then "percpu_data" instance.
#
void free_percpu(const void *objp)
#
# Decrypt and get the actual address of "percpu_data".
#
struct percpu_data *p = (struct percpu_data *) (~(unsigned long) objp);
for (i = 0; i < NR_CPUS; i++) {
if (!cpu_possible(i))
continue;
kfree(p->ptrs[i]);
}
kfree(p);
====================================================================================
6. Misc tips
NONE.
0 0
- synchronization---per-CPU variable
- per cpu
- per cpu 变量
- per-cpu变量
- Per-cpu 变量
- Linux Per-CPU Data
- Linux Per-cpu变量
- per-CPU变量
- Per-cpu 变量
- Per-CPU变量
- Per-CPU variables
- Per-CPU variables
- Linux Per-cpu
- Linux per-CPU实现分析
- Driver porting: per-CPU variables
- CPU私有变量(per-CPU变量)
- CPU私有变量(per-CPU变量)
- linux:CPU私有变量(per-CPU变量)
- 在ARM版上开发的贪吃蛇
- Computer Visualization Project1 总结
- mysql里怎样循环遍历游标
- HDU 1176 免费馅饼(DP)
- hdu 4857 逃生
- synchronization---per-CPU variable
- java实例变量初始化各语句执行顺序
- 如何构建高扩展性网站?
- 浅谈单片机以太网接入方案
- Update: ELCImagePickerController(利用Asset Libraries实现照片的多选功能)
- Android 相关 - R无法引用
- Hbase Filter
- 使用SQLQuery 在Hibernate中使用sql语句
- 好吧,来晚了。