Introducing initramfs, a new model for initial RAM disks

来源:互联网 发布:stm8下载软件 编辑:程序博客网 时间:2024/05/16 23:02

The problem. (Why "root=" doesn't scale.)

Whenthe Linux kernel boots the system, it must find and run the first userprogram, generally called "init". User programs live in filesystems, sothe Linux kernel must find and mount the first (or "root") filesystemin order to boot successfully.

Ordinarily, available filesystemsare listed in the file /etc/fstab so the mount program can find them.But /etc/fstab is itself a file, stored in a filesystem. Finding thevery first filesystem is a chicken and egg problem, and to solve it thekernel developers created the kernel command line option "root=", tospecify which device the root filesystem lives on.

Fifteen yearsago, "root=" was easy to interpret. It was either a floppy drive or apartition on a hard drive. These days the root filesystem could be ondozens of different types of hardware (SCSI, SATA, flash MTD), or evenspread across several of them in a RAID. Its location could move aroundfrom boot to boot, such as hot pluggable USB devices on a system withmultiple USB ports -- when there are several USB devices, which one iscorrect? The root filesystem might be compressed (how?), encrypted(with what keys?), or loopback mounted (where?). It could even live outon a network server, requiring the kernel to acquire a DHCP address,perform a DNS lookup, and log in to a remote server (with username andpassword), all before the kernel can find and run the first userspaceprogram.

These days, "root=" just isn't enough information. Evenhard-wiring tons of special case behavior into the kernel doesn't helpwith device enumeration, encryption keys, or network logins that varyfrom system to system. Worse, programming the kernel to perform thesekind of complicated multipart tasks is like writing web software inassembly language: it can be done, but it's considerably easier tosimply use the proper tools for the job. The kernel is designed tofollow orders, not give them.

With no end to thisever-increasing complexity in sight, the kernel developers decided toback up and find a better way to deal with the whole problem.

The solution

Linux2.6 kernels bundle a small ram-based initial root filesystem into thekernel, and if this filesystem contains a program called "/init" thekernel runs that as its first program. At that point, finding someother filesystem containing some other program to run is no longer thekernel's problem, but is now the job of the new program.

Thecontents of initramfs don't have to be general purpose. If a givensystem's root filesystem lives on an encrypted network block device,and the network address, login, and decryption key are all to be foundon a USB device named "larry" (which requires a password to access),that system's initramfs can have a special-purpose program that knowsall about that, and makes it happen.

For systems that don't need a large root filesystem, there's no need to locate or switch to any other root filesystem.

How is this different from initrd?

Thelinux kernel already had a way to provide a ram-based root filesystem,the initrd mechanism. For 2.4 and earlier kernels, initrd is still theonly way to do this sort of thing. But the kernel developers chose toimplement a new mechanism in 2.6 for several reasons.

ramdisk vs ramfs

Aramdisk (like initrd) is a ram based block device, which means it's afixed size chunk of memory that can be formatted and mounted like adisk. This means the contents of the ramdisk have to be formatted andprepared with special tools (such as mke2fs and losetup), and like allblock devices it requires a filesystem driver to interpret the dataat runtime. This also imposes an artificial size limit that eitherwastes space (if the ramdisk isn't full, the extra memory it takes upstill can't be used for anything else) or limits capacity (if theramdisk fills up but other memory is still free, you can't expand itwithout reformatting it).

But ramdisks actually waste even morememory due to caching. Linux is designed to cache all files anddirectory entries read from or written to block devices, so Linuxcopies data to and from the ramdisk into the "page cache" (for filedata), and the "dentry cache" (for directory entries). The downside ofthe ramdisk pretending to be a block device is it gets treated like ablock device.

A few years ago, Linus Torvalds had a neat idea:what if Linux's cache could be mounted like a filesystem? Just keep thefiles in cache and never get rid of them until they're deleted or thesystem reboots? Linus wrote a tiny wrapper around the cache called"ramfs", and other kernel developers created an improved version called"tmpfs" (which can write the data to swap space, and limit the size ofa given mount point so it fills up before consuming all availablememory). Initramfs is an instance of tmpfs.

These ram basedfilesystems automatically grow or shrink to fit the size of the datathey contain. Adding files to a ramfs (or extending existing files)automatically allocates more memory, and deleting or truncating filesfrees that memory. There's no duplication between block device andcache, because there's no block device. The copy in the cache is theonly copy of the data. Best of all, this isn't new code but a newapplication for the existing Linux caching code, which means it addsalmost no size, is very simple, and is based on extremely well testedinfrastructure.

A system using initramfs as its root filesystemdoesn't even need a single filesystem driver built into the kernel,because there are no block devices to interpret as filesystems. Justfiles living in memory.

Initrd vs initramfs

Thechange in underlying infrastructure was a reason for the kerneldevelopers to create a new implementation, but while they were at itthey cleaned up a lot of bad behavior and assumptions.

Initrdwas designed as front-end to the old "root=" root device detectioncode, not a replacement for it. It ran a program called "/linuxrc"which was intended to perform setup functions (like logging on to thenetwork, determining which of several devices contained the rootpartition, or associating a loopback device with a file), tell thekernel which block device contained the real root device (by writingthe de_t number to /proc/sys/kernel/real-root-dev), and then return tothe kernel so the kernel could mount the real root device and executethe real init program.

This assumed that the "real root device"was a block device rather than a network share, and also assumed thatinitrd wasn't itself going to be the real root filesystem. The kerneldidn't even execute "/linuxrc" as the special process ID 1, becausethat process ID (and its special properties like being the only processthat can not be killed with "kill -9") was reserved for init, which thekernel was waiting to run after it mounted the real root filesystem.

Withinitramfs, the kernel developers removed all these assumptions. Oncethe kernel launches "/init" out of initramfs, the kernel is done makingdecisions and can go back to following orders. With initramfs, thekernel doesn't care where the real root filesystem is (it's initramfsuntil further notice), and the "/init" program from initramfs is run asa real init, with PID 1. (If initramfs's init needs to hand thatspecial Process ID off to another program, it can use the exec()syscall just like everybody else.)

Summary

Thetraditional root= kernel command-line option is still supported andusable, but new developments in the types of initial RAM diskssupported by the kernel provide many optimizations and much-neededflexibility for the future of the Linux kernel. The next article inthis series, available in next month's issue of TimeSource, explainshow you can start making the transition to the new initramfs initialRAM disk mechanism.


Copyright (c) 2006, TimeSys Corp. Allrights reserved. Reproduced by LinuxDevices.com with permission. Thisarticle was initially published in the TimeSys monthly newsletter.



About the author:Rob Landley is a Senior Linux Engineer at TimeSys Corporation and thenew maintainer of BusyBox. He first encountered Linux when the SLSdisks came across Fidonet in 1993, and has been building various Linuxsystems from source code for six years. He co-founded a combinationLinux Expo and Science Fiction convention (Penguicon) because thatreally is his idea of fun, as are writing documentation, researchingcomputer history, urban hiking, and knitting chain mail.


原创粉丝点击