Kernel Korner - Allocating Memory in the Kernel

来源：互联网发布：阿里云如何创建快照编辑：程序博客网时间：2024/05/20 06:06

In this article, Robert offers a refresher on kernel memory allocation and how it has changed for the 2.6 kernel.

Unfortunately for kernel developers, allocating memory in the kernel isnot as simple as allocating memory in user space. A number of factorscontribute to the complication, among them:

The kernel is limited to about 1GB of virtual and physical memory.
The kernel's memory is not pageable.
The kernel usually wants physically contiguous memory.
Often, the kernel must allocate the memory without sleeping.
Mistakes in the kernel have a much higher price than they do elsewhere.

Although easy access to an abundance of memory certainly is not a luxuryto the kernel, a little understanding of the issues can go a long waytoward making the process relatively painless.

A General-Purpose Allocator

The general interface for allocating memory inside of the kernel iskmalloc():#include <linux/slab.h>

void * kmalloc(size_t size, int flags);

It should look familiar—it is pretty much the same as user space'smalloc(), after all—except that it takes a second argument,flags. Let's ignore flags for a secondand see what we recognize. First off, size isthe same here as in malloc()'s—it specifies the size in bytes of theallocation. Upon successful return, kmalloc() returns a pointerto size bytes of memory. The alignment of the allocated memoryis suitable for storage of and access to any type of object. As withmalloc(), kmalloc() can fail, and you mustcheck its return value against NULL. Let's look at an example:

struct falcon *p;

p = kmalloc(sizeof (struct falcon), GFP_KERNEL);

if (!p)

  /* the allocation failed - handle appropriately */

Flags

The flags field controls the behavior of memory allocation.We can divide flags into three groups: action modifiers, zonemodifiers and types. Action modifiers tell the kernel how toallocate memory. They specify, for example, whether the kernel cansleep (that is, whether the call to kmalloc() can block)in order to satisfy the allocation. Zone modifiers, on the other hand,tell the kernel from where the request should be satisfied. For example,some requests may need to be satisfied from memory that hardware canaccess through direct memory access (DMA). Finally, type flags specify a type of allocation.They group together relevant action and zone modifiers into asingle mnemonic. In general, instead of specifying multiple action andzone modifiers, you specify a single type flag.

Table 1 is a listing of the action modifiers, and Table 2is a listing of the zone modifiers. Many different flags can be used;allocating memory in the kernel is nontrivial. It is possible to controlmany aspects of memory allocation in the kernel.Your code should use the type flags and not the individual actionand zone modifiers. The two most common flags are GFP_ATOMICand GFP_KERNEL. Nearly all of your kernel memory allocationsshould specify one of these two flags.

Table 1. Action Modifiers

FlagDescription__GFP_COLDThe kernel should use cache cold pages.__GFP_FSThe kernel can start filesystem I/O.__GFP_HIGHThe kernel can access emergency pools.__GFP_IOThe kernel can start disk I/O.__GFP_NOFAILThe kernel can repeat the allocation.__GFP_NORETRYThe kernel does not retry if the allocation fails.__GFP_NOWARNThe kernel does not print failure warnings.__GFP_REPEATThe kernel repeats the allocation if it fails.__GFP_WAITThe kernel can sleep.

Table 2. Zone Modifiers

FlagDescription__GFP_DMAAllocate only DMA-capable memory.No flagAllocate from wherever available.

The GFP_ATOMIC flag instructs the memory allocator never toblock. Use this flag in situations where it cannotsleep—where it must remain atomic—such as interrupt handlers, bottom halvesand process context code that is holding a lock. Because the kernelcannot block the allocation and try to free up sufficient memory tosatisfy the request, an allocation specifying GFP_ATOMIC hasa lesser chance of succeeding than one that does not. Nonetheless, ifyour current context is incapable of sleeping, it is your only choice.Using GFP_ATOMIC is simple:

struct wolf *p;

p = kmalloc(sizeof (struct wolf), GFP_ATOMIC);

if (!p)

    /* error */

Conversely, the GFP_KERNEL flag specifies a normal kernelallocation. Use this flag in code executing in process contextwithout any locks. A call to kmalloc() with this flag cansleep; thus, you must use this flag only when it is safe to do so.The kernel utilizes the ability to sleep in order to free memory, if needed.Therefore, allocations that specify this flag have a greater chanceof succeeding. If insufficient memory is available, for example, thekernel can block the requesting code and swap some inactive pages to disk,shrink the in-memory caches, write out buffers and so on.

Sometimes, as when writing an ISA device driver, you need to ensurethat the memory allocated is capable of undergoing DMA.For ISA devices, this is memory in the first 16MB of physicalmemory. To ensure that the kernel allocates from this specific memory, use theGFP_DMA flag. Generally, you would use this flag in conjunctionwith either GFP_ATOMIC or GFP_KERNEL; you can combineflags with a binary OR operation. For example, to instruct the kernelto allocate DMA-capable memory and to sleep if needed, do:

char *buf;
/* we want DMA-capable memory,
 * and we can sleep if needed */
buf = kmalloc(BUF_LEN, GFP_DMA | GFP_KERNEL);
if (!buf)
    /* error */

Table 3 is a listing of the type flags, and Table 4 shows towhich type flag each action and zone modifier equates.The header <linux/gfp.h> defines all of the flags.

Table 3. Types

FlagDescriptionGFP_ATOMICThe allocation is high-priority and does not sleep. This is the flagto use in interrupt handlers, bottom halves and other situations where youcannot sleep.GFP_DMAThis is an allocation of DMA-capable memory. Device drivers thatneed DMA-capable memory use this flag.GFP_KERNELThis is a normal allocation and might block. This is the flag to usein process context code when it is safe to sleep.GFP_NOFSThis allocation might block and might initiate disk I/O, but it doesnot initiate a filesystem operation. This is the flag to use in filesystemcode when you cannot start another filesystem operation.GFP_NOIOThis allocation might block, but it does not initiate block I/O. Thisis the flag to use in block layer code when you cannot start more blockI/O.GFP_USERThis is a normal allocation and might block. This flag is used toallocate memory for user-space processes.

Table 4. Composition of the Type Flags

__GFP_DMA

Returning Memory

When you are finished accessing the memory allocated viakmalloc(), you must return it to the kernel. This job isdone using kfree(), which is the counterpart to user space'sfree() library call. The prototype for kfree() is:

#include <linux/slab.h>
void kfree(const void *objp);

kfree()'s usage is identical to the user-space variant. Assume p is apointer to a block of memory obtained via kmalloc(). Thefollowing command, then, would free that block and return the memory to the kernel:

kfree(p);

As with free() in user space, calling kfree() on ablock of memory that already has been freed or on a pointer that isnot an address returned from kmalloc() is a bug, and it can resultin memory corruption. Always balance allocations and frees to ensurethat kfree() is called exactly once on the correct pointer.Calling kfree() on NULL is checkedfor explicitly and is safe, although it is not necessarily a sensible idea.

Let's look at the full allocation and freeing cycle:

struct sausage *s;
s = kmalloc(sizeof (struct sausage), GFP_KERNEL);
if (!s)
    return -ENOMEM;
/* ... */
kfree(s);

Allocating from Virtual Memory

The kmalloc() function returnsphysically and therefore virtually contiguousmemory. This is a contrast to user space'smalloc() function, which returns virtually butnot necessarily physically contiguous memory.Physically contiguous memory has two primarybenefits. First, many hardware devices cannotaddress virtual memory. Therefore, in order forthem to be able to access a block of memory, theblock must exist as a physically contiguous chunkof memory. Second, a physically contiguous blockof memory can use a single large page mapping.This minimizes the translation lookaside buffer(TLB) overhead of addressing the memory, as only asingle TLB entry is required.

Allocating physically contiguous memory has one downside: it is oftenhard to find physically contiguous blocks of memory, especially forlarge allocations. Allocating memory that is only virtually contiguoushas a much larger chance of success. If you do not needphysically contiguous memory, use vmalloc():

#include <linux/vmalloc.h>
void * vmalloc(unsigned long size);

You then return memory obtained with vmalloc() to the system by usingvfree():

#include <linux/vmalloc.h>

void vfree(void *addr);

Here again, vfree()'s usage is identical to user space's malloc() and free()functions:

struct black_bear *p;
p = vmalloc(sizeof (struct black_bear));
if (!p)
    /* error */

/* ... */
vfree(p);

In this particular case, vmalloc() might sleep.

Many allocations in the kernel can use vmalloc(), becausefew allocations need to appear contiguous to hardwaredevices.If you are allocating memory that only software accesses, such as dataassociated with a user process, there is no need for the memory to bephysically contiguous. Nonetheless, few allocations in the kerneluse vmalloc(). Most choose to use kmalloc(), even if it'snot needed, partly for historical and partly for performancereasons. Because the TLB overhead for physically contiguous pagesis reduced greatly, the performance gains often are well appreciated.Despite this, if you need to allocate tens of megabytes of memory inthe kernel, vmalloc() is your best option.

A Small Fixed-Size Stack

Unlike user-space processes, code executing in the kernel has neithera large nor a dynamically growing stack. Instead, each process in thekernel has a small fixed-size stack. The exact size of the stack isarchitecture-dependent. Most architectures allocate two pages for thestack, so the stack is 8KB on 32-bit machines.

Because of the small stack, allocations that are large, automatic and on-the-stackare discouraged. Indeed, you never should see anything such as this inkernel code:

#define BUF_LEN2048
void rabbit_function(void)
{
    char buf[BUF_LEN];
    /* ...  */
}

Instead, the following is preferred:

#define BUF_LEN2048
void rabbit_function(void)
{
    char *buf;
    buf = kmalloc(BUF_LEN, GFP_KERNEL);
    if (!buf)

        /* error! */
/* ... */

}

You also seldom see the equivalent of this stack inuser space, because there is rarely a reasonto perform a dynamic memory allocation when youknow the allocation size at the time you write the code.In the kernel, however, you should use dynamicmemory any time the allocation size is larger than ahandful of bytes or so. This helps prevent stackoverflow, which ruins everyone's day.

Conclusion

With a little understanding, getting a hold of memory in the kernel isdemystified and not too much more difficult to do than it is in user space. A few simplerules of thumb can go a long way:

Decide whether you can sleep (that is, whether the call tokmalloc() can block). If you are in an interrupt handler,in a bottom half, or if you hold a lock, you cannot. If you are inprocess context and do not hold a lock, you probably can.
If you can sleep, specify GFP_KERNEL.
If you cannot sleep, specify GFP_ATOMIC.
If you need DMA-capable memory (for example, for an ISA or brokenPCI device), specify GFP_DMA.
Always check for and handle a NULL returnvalue from kmalloc().
Do not leak memory; make sure you call kfree()somewhere.
Ensure that you do not race and call kfree()multiple times and that you never access a block of memory afteryou free it.

Robert Love (rml@tech9.net) is a kernel hacker atMontaVista Software and a student at the University of Florida.He is the author of Linux Kernel Development.Robert enjoys fine wine and lives in Gainesville, Florida.