OOM killer

来源：互联网发布：j2ee会用到java吗编辑：程序博客网时间：2024/05/01 20:09

Thomas Habets had an unfortunate experience recently. His Linux system ran out of memory, and the dreaded "OOM killer" was loosed upon the system's unsuspecting processes. One of its victims turned out to be his screen locking program, leaving his session open to whoever might happen to walk by. His response was the oom_pardon patch, which allows the system administrator to exempt certain processes from the OOM killer's revenge. It turns out that SUSE has a similar patch which allows administrators to set the "OOM score" of specific processes, increasing or decreasing their chances of being chosen for an untimely demise.

The OOM killer exists because the Linux kernel, by default, can commit to supplying more memory than it can actually provide. Overcommitting memory in this way allows the kernel to make fuller use of the system's resources, because processes typically do not use all of the memory they claim. As an example, consider the fork() system call, which copies all of a process's memory for the new child process. In fact, all it does is to mark the memory as "copy on write" and allow parent and child to share it. Should either change a page shared in this way, a true copy is made. In theory, the kernel could be called upon to copy all of the copy-on-write memory in this way; in practice, that does not happen. If the kernel reserved all of the necessary virtual memory (which includes swap space), some of that space would certainly go unused. Rather than waste that space - and fail to run programs or memory allocations that, in practice, it could have handled - the kernel overcommits itself and hopes for the best.

When the best does not happen, the OOM killer comes into play; its job is to kill processes and free up some memory. Getting it to kill the right processes has been an ongoing challenge, however. One person's useless memory hog is another's crucial application. Thus, over the years, numerous efforts have been made to refine the OOM killer's heuristics, and patches like "oom_pardon" have been created.

Not everybody agrees that this is a fruitful use of developer time. Andries Brouwer came up with this analogy:

An aircraft company discovered that it was cheaper to fly its planes with less fuel on board. The planes would be lighter and use less fuel and money was saved. On rare occasions however the amount of fuel was insufficient, and the plane would crash. This problem was solved by the engineers of the company by the development of a special OOF (out-of-fuel) mechanism. In emergency cases a passenger was selected and thrown out of the plane. (When necessary, the procedure was repeated.) A large body of theory was developed and many publications were devoted to the problem of properly selecting the victim to be ejected. Should the victim be chosen at random? Or should one choose the heaviest person? Or the oldest? Should passengers pay in order not to be ejected, so that the victim would be the poorest on board? And if for example the heaviest person was chosen, should there be a special exception in case that was the pilot? Should first class passengers be exempted? Now that the OOF mechanism existed, it would be activated every now and then, and eject passengers even when there was no fuel shortage. The engineers are still studying precisely how this malfunction is caused.

Overcommitting memory and fearing the OOM killer are not necessary parts of the Linux experience, however. Simply setting the sysctl parameter vm/overcommit_memory to 2 turns off the overcommit behavior and keeps the OOM killer forever at bay. Most modern systems should have enough disk space to provide an ample swap file for most situations. Rather than trying to keep pet processes from being killed when overcommitted memory runs out, it might be easier just to avoid the situation altogether.

come from: http://lwn.net/Articles/104179/