hotplug and firmware loading with sysfs.========================================The 2.6.x Linux kernels export a device tree through sysfs, which is asynthetic filesystem generally mounted at "/sys". Among other things,this filesystem tells userspace what hardware is available, so userspace tools(such as udev or mdev) can dynamically populate a "/dev" directory with devicenodes representing the currently available hardware.Notification when hardware is inserted or removed is provided by thehotplug mechanism. Linux provides two hotplug interfaces: /sbin/hotplug andnetlink.The combination of sysfs and hotplug obsoleted the older "devfs", which wasremoved from the 2.6.16 kernel.Device nodes:=============Sysfs exports major and minor numbers for device nodes with which to populate/dev via mknod(2). These major and minor numbers are found in files named"dev", which contain two colon separated ascii decimal numbers followed byexactly one newline. I.E. $ cat /sys/class/mem/zero/dev 1:5Note that the name of the directory containing a dev entry is usually thetraditional name for the device node. (The above entry is for "/dev/zero".)Entries for block devices are found at the following locations: /sys/block/*/dev /sys/block/*/*/devEntries for char devices are found at the following locations: /sys/bus/*/devices/*/dev /sys/class/*/*/devA very simple bash script to populate /dev from /sys (without addressingownership or permissions of the resulting /dev nodes, and with truly horribleperformance) might look like: #!/bin/bash # Populate block devices for i in /sys/block/*/dev /sys/block/*/*/dev do if [ -f $i ] then MAJOR=$(sed 's/:.*//' < $i) MINOR=$(sed 's/.*://' < $i) DEVNAME=$(echo $i | sed -e 's@/dev@@' -e 's@.*/@@') mknod /dev/$DEVNAME b $MAJOR $MINOR fi done # Populate char devices for i in /sys/bus/*/devices/*/dev /sys/class/*/*/dev do if [ -f $i ] then MAJOR=$(sed 's/:.*//' < $i) MINOR=$(sed 's/.*://' < $i) DEVNAME=$(echo $i | sed -e 's@/dev@@' -e 's@.*/@@') mknod /dev/$DEVNAME c $MAJOR $MINOR fi doneHotplug:========The hotplug mechanism asynchronously notifies userspace when hardware isinserted, removed, or undergoes a similar significant state change. Linuxprovides two interfaces to hotplug; the kernel can spawn a usermode helperprocess, or it can send a message to an existing daemon listening to a netlinksocket.-- Usermode helperThe usermode helper hotplug mechanism spawns a new process to handle eachhotplug event. Each such helper process belongs to the root user (UID 0) andis a child of the init task (PID 1). The kernel spawns one process per hotplugevent, supplying environment variables to each new process describing thatparticular hotplug event. By default the kernel spawns instances of"/sbin/hotplug", but this default can be changed by writing a new path into"/proc/sys/kernel/hotplug" (assuming /proc is mounted).A simple bash script to record variables from hotplug events might look like: #!/bin/bash env >> /filenameIt's possible to disable the usermode helper hotplug mechanism (by writing anempty string into /proc/sys/kernel/hotplug), but there's little reason todo this unless you want to disable an existing hotplug mechanism. (From aperformance perspective, a usermode helper won't be spawned if /sbin/hotplugdoesn't exist, and negative dentries will record the fact it doesn't existafter the first lookup attempt.)-- NetlinkA daemon listening to the netlink socket receives a packet of data for eachhotplug event, containing the same information a usermode helper would receivein environment variables.The netlink packet contains a set of null terminated text lines.The first line of the netlink packet combines the $ACTION and $DEVPATH values,separated by an @ (at sign). Each line after the first contains aKEYWORD=VALUE pair defining a hotplug event variable.Here's a C program to print hotplug netlink events to stdout: #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/poll.h> #include <sys/socket.h> #include <sys/types.h> #include <unistd.h> #include <linux/types.h> #include <linux/netlink.h> void die(char *s) {write(2,s,strlen(s));exit(1); } int main(int argc, char *argv[]) {struct sockaddr_nl nls;struct pollfd pfd;char buf[512];// Open hotplug event netlink socketmemset(&nls,0,sizeof(struct sockaddr_nl));nls.nl_family = AF_NETLINK;nls.nl_pid = getpid();nls.nl_groups = -1;pfd.events = POLLIN;pfd.fd = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_KOBJECT_UEVENT);if (pfd.fd==-1)die("Not root/n");// Listen to netlink socketif (bind(pfd.fd, (void *)&nls, sizeof(struct sockaddr_nl)))die("Bind failed/n");while (-1!=poll(&pfd, 1, -1)) {int i, len = recv(pfd.fd, buf, sizeof(buf), MSG_DONTWAIT);if (len == -1) die("recv/n");// Print the data to stdout.i = 0;while (i<len) {printf("%s/n", buf+i);i += strlen(buf+i)+1;}}die("poll/n");// Dear gcc: shut up.return 0; }Hotplug event variables:========================Every hotplug event should provide at least the following variables: ACTION The current hotplug action: "add" to add the device, "remove" to remove it. The 2.6.22 kernel can also generate "change", "online", "offline", and "move" actions. DEVPATH Path under /sys at which this device's sysfs directory can be found. SUBSYSTEM If this is "block", it's a block device. Anything other subsystem is either a char device or does not have an associated device node.The following variables are also provided for some devices: MAJOR and MINOR If these are present, a device node can be created in /dev for this device. Some devices (such as network cards) don't generate a /dev node. [QUESTION: Any reliable way to get the default name?] DRIVER If present, a suggested driver (module) for handling this device. No relation to whether or not a driver is currently handling the device. INTERFACE and IFINDEX When SUBSYSTEM=net, these variables indicate the name of the interface and a unique integer for the interface. (Note that "INTERFACE=eth0" could be paired with "IFINDEX=2" because eth0 isn't guaranteed to come before lo and the count doesn't start at 0.) FIRMWARE The system is requesting firmware for the device. See "Firmware loading" below.Injecting events into hotplug via "uevent":===========================================Events can be injected into the hotplug mechanism through sysfs via the"uevent" files. Each directory in sysfs containing a "dev" file should alsocontain a "uevent" file. Write the name of the event (such as "add" or"remove") to the appropriate uevent file, and the kernel will deliver suchan event for that device via the hotplug mechanism.Note that in kernel versions 2.6.24 and newer, "uevent" is readable. Readingfrom uevent provides the set of "extra" variables associated with this event.A note about race conditions (or "why bother with netlink?"):=============================================================Some simple systems (such as embedded systems) scan sysfs once at boot timeto populate /dev, and ignore any hotplug events. Scanning again to probe fornew devices is a workable option (as long as mknod failing because thedevice already exists isn't considered an error condition). Systems that actually support hotplug should start to handle hotplug events_before_ scanning sysfs for existing devices, to ensure that that any devicesadded during the scan reliably have a /dev entry created for them.Devices removed while scanning /sys may still result in leftover /dev nodesafter the scan. The race is that the scanning process may read the "dev"entry for a device from sysfs, be interrupted by a hotplug event which attemptsto remove that device, and then the scanning process resumes and creates thedevice node for the already-removed device. In theory this is no more of asecurity concern than having a statically allocated /dev (the device nodewill return -ENODEV to programs that try to use it) but, it's untidy.In theory, transient devices (which are created and removed again almostinstantly, which can be caused by poorly written drivers that fail their deviceprobe) could have similar "leftover" /dev entries from the /sbin/hotplugmechanism. (If two processes are spawned simultaneously, which one completesfirst is not guaranteed.) This is not common, but theoretically possible.These sort of races are why the netlink mechanism was created. To avoidsuch potential races when using netlink, instead of reading each "dev" entry,fake "add" events by writing to each device's "uevent" file in sysfs. Thisfilters the sequencing through the kernel, which will not deliver an "add"event packet to the netlink process for a device that has been removed.Note also that on very large mainframe systems, /sbin/hotplug can potentiallyfork bomb the system during system bringup.Firmware loading================If the hotplug variable FIRMWARE is set, the kernel is requesting firmwarefor a device (identified by $DEVPATH). To provide the firmware to the kernel,do the following: echo 1 > /sys/$DEVPATH/loading cat /path/to/$FIRMWARE > /sys/$DEVPATH/data echo 0 > /sys/$DEVPATH/loadingNote that "echo -1 > /sys/$DEVPATH/loading" will cancel the firmware loadand return an error to the kernel, and /sys/class/firmware/timeout contains atimeout (in seconds) for firmware loads.See Documentation/firmware_class/ for more information.Loading firmware for statically linked devices==============================================An advantage of the usermode helper hotplug mechanism is that if initramfscontains an executable /sbin/hotplug, it can be called even before the kernelruns init. This allows /sbin/hotplug to supply firmware (out of initramfs) tostatically linked device drivers. (The netlink mechanism requires a daemon tolisten to a socket, and such a daemon cannot be spawned before init runs.)For licensing reasons, binary-only firmware should not be linked into thekernel image, but instead placed in an externally supplied initramfs whichcan be passed to the Linux kernel through the old initrd mechanism.See Documentation/filesystems/ramfs-rootfs-initramfs.txt for details.stable_api_nonsense:====================Note: Sysfs exports a lot of kernel internal state, and the maintainers ofsysfs do not believe that exposing information to userspace for use byuserspace programs constitues an "API" that must be "stable". The sysfsinfrastructure is maintained by the author ofDocumentation/stable_api_nonsense.txt, who seems to believe it applies touserspace as well. Therefore, at best only a subset of the information insysfs can be considered stable from version to version.The information documented here should remain stable. Some other parts ofsysfs are documented under Documentation/API, although that directory comeswith a warning that anything documented there can go away after two years.Any other information exported by sysfs should be considered debugging infoat best, and probably shouldn't have been exported at all since it's not a"stable API" intended for use by actual programs.