Network Device Initialization

来源:互联网 发布:js改变图片的src地址 编辑:程序博客网 时间:2024/06/04 08:34

Among the various initialization tasks, we are mainly interested in three:

Boot-time options
Two calls to parse_args, one direct and one indirect via parse_early_param, handle configuration parameters that a boot loader such as LILO or GRUB has passed to the kernel at boot time.

Interrupts and timers
Hardware and software interrupts are initialized with init_IRQ and softirq_init, respectively.

Initialization routines
Kernel subsystems and built-in device drivers are initialized by do_initcalls. free_init_mem frees a piece of memory that holds unneeded code. This optimization is possible thanks to smart routine tagging.

run_init_process determines the first process run on the system, the parent of all other processes; it has a PID of 1 and never halts until the system is done. Normally the program run is init, part of the SysVinit package. However, the administrator can specify a different program through the init= boot time option. When no such option is provided, the kernel tries to execute the init command from a set of wellknown locations, and panics if it cannot find any. The user can also provide boottime options that will be passed to init

Device Registration and Initialization

Hardware initialization
This is done by the device driver in cooperation with the generic bus layer (e.g., PCI or USB). The driver, sometimes alone and sometimes with the help of usersupplied parameters, configures such features of each device as the IRQ and I/O address so that they can interact with the kernel

Software initialization
Before the device can be used, depending on what network protocols are enabled and configured, the user may need to provide some other configuration parameters, such as IP addresses

Feature initialization
The Linux kernel comes with lots of networking options. Because some of them need per-device configuration, the device initialization boot sequence must take care of them. One example is Traffic Control, the subsystem that implements Quality of Service (QoS) and that decides, therefore, how packets are queued on and dequeued from the device egress’s queue (and with some limitations, also queued on and dequeued from the ingress’s queue).

Basic Goals of NIC Initialization

Each network device is represented in the Linux kernel by an instance of the net_device data structure. how device drivers allocate the resources needed to establish  device/kernel communication, such as:

IRQ line
NICs need to be assigned an IRQ and to use it to call for the kernel’s attention when needed. Virtual devices, however, do not need to be assigned an IRQ: the loopback device is an example because its activity is totally internal.

I/O ports and memory registration
It is common for a driver to map an area of its device’s memory (its configuration registers, for example) into the system memory so that read/write operations by the driver will be made on system memory addresses directly; this can simplify the code. I/O ports and memory are registered and released with request_region and release_region, respectively.

Interaction Between Devices and Kernel

Polling
Driven on the kernel side. The kernel checks the device status at regular intervals to see if it has anything to say.

Interrupt
Driven on the device side. The device sends a hardware signal (by generating an interrupt) to the kernel when it needs the kernel’s attention

With an interrupt, an NIC can tell its driver several different things. Among them are:

Reception of a frame
This is the most common and standard situation

Transmission failure
This kind of notification is generated on Ethernet devices only after a feature called exponential binary backoff has failed (this feature is implemented at the hardware level by the NIC). Note that the driver will not relay this notification to higher network layers; they will come to know about the failure by other means (timer timeouts, negative ACKs, etc.).

DMA transfer has completed successfully
Given a frame to send, the buffer that holds it is released by the driver once the frame has been uploaded into the NIC’s memory for transmission on the medium. With synchronous transmissions (no DMA), the driver knows right away when the frame has been uploaded on the NIC. But with DMA, which uses asynchronous transmissions, the device driver needs to wait for an explicit interrupt from the NIC.

Device has enough memory to handle a new transmission
It is common for an NIC device driver to disable transmissions by stopping the egress queue when that queue does not have sufficient free space to hold a frame of maximum size (e.g., 1,536 bytes for an Ethernet NIC). The queue is then reenabled when memory becomes available

The final case in the previous list covers a sophisticated way of throttling transmissions in a manner that can improve efficiency if done properly. In this system, a device driver disables transmissions for lack of queuing space, asks the NIC to issue an interrupt when the available memory is bigger than a given amount (typically the device’s Maximum Transmission Unit, or MTU), and then re-enables transmissions when the interrupt comes.

A device driver can also disable the egress queue before a transmission (to prevent the kernel from generating another transmission request on the device), and reenable it only if there is enough free memory on the NIC; if not, the device asks for an interrupt that allows it to resume transmission at a later time

The mapping of IRQs to handlers is stored in a vector of lists, one list of handlers for each IRQ (see Figure 5-2). A list includes more than one element only when multiple devices share the same IRQ. The size of the vector (i.e., the number of possible IRQ numbers) is architecture dependent and can vary from 15 (on an x86) to more than 200. With the introduction of interrupt sharing, even more devices can be supported on a system at once

irqaction

void (*handler)(int irq, void *dev_id, struct pt_regs *regs)
Function provided by the device driver to handle notifications of interrupts: whenever the kernel receives an interrupt on line irq, it invokes handler. Here are the function’s input parameters:

int irq
IRQ number that generated the notification. Most of the time it is not used by the NICs’ device drivers to accomplish their job; the device ID is sufficient

void *dev_id
Device identifier. The same driver can be responsible for different devices at the same time, so it needs the device ID to process the notification correctly

struct pt_regs *regs
Structure used to save the content of the processor’s registers at the moment the interrupt interrupted the current process. It is normally not used by the interrupt handler.

unsigned long flags

Set of flags

SA_SHIRQ
When set, the device driver can handle shared IRQs

SA_SAMPLE_RANDOM
When set, the device is making itself available as a source of random events. This can be useful to help the kernel generate random numbers for internal use, and is called contributing to system entropy

SA_INTERRUPT
When set, the handler runs with interrupts disabled on the local processor. This should be specified only for handlers that can get done very quickly

void *dev_id
Pointer to the net_device data structure associated with the device. The reason it is declared void * is that NICs are not the only devices to use IRQs. Because various device types use different data structures to identify and represent device instances, a generic type declaration is used

struct irqaction *next
All the devices sharing the same IRQ number are linked together in a list with this pointer

const char *name
Device name. You can read it by dumping the contents of /proc/interrupts

Initialization Options

Module options (macros of the module_param family)
These define options you can provide when you load a module. When a component is built into the kernel, you cannot provide values for these options at kernel boot time. However, with the introduction of the /sys filesystem, you can configure the options via those files at runtime

Boot-time kernel options (macros of the _ _setup family)
These define options you can provide at boot time with a boot loader. They are used mainly by modules that the user can build into the kernel, and kernel components that cannot be compiled as modules

Module Options

Each module is assigned a directory in /sys/modules. The subdirectory /sys/modules/ module/parameters holds a file for each parameter exported by module

Initializing the Device Handling Layer: net_dev_init

Let’s walk through the main parts of net_dev_init

  1. The per-CPU data structures used by the two networking software interrupts (softirqs) are initialized
  2. When the kernel is compiled with support for the /proc filesystem (which is the default configuration), a few files are added to /proc with dev_proc_init and dev_mcast_init
  3. netdev_sysfs_init registers the net class with sysfs. This creates the directory /sys/ class/net, under which you will find a subdirectory for each registered network device. These directories include lots of files, some of which used to be in /proc
  4. net_random_init initializes a per-CPU vector of seeds that will be used when generating random numbers with the net_random routine
  5. The protocol-independent destination cache (DST), is initialized with dst_init.
  6. The protocol handler vector ptype_base, used to demultiplex ingress traffic, is initialized
  7. When the OFFLINE_SAMPLE symbol is defined, the kernel sets up a function to run at regular intervals to collect statistics about the devices’ queue lengths. In this case, net_dev_init needs to create the timer that runs the function regularly
  8. A callback handler is registered with the notification chain that issues notifications about CPU hotplug events. The callback used is dev_cpu_callback. Currently, the only event processed is the halting of a CPU. When this notification is received, the buffers in the CPU’s ingress queue are dequeued and are passed to netif_rx

User-Space Helpers

/sbin/modprobe
Invoked when the kernel needs to load a module. This helper is part of the module-init-tools package

/sbin/hotplug
Invoked when the kernel detects that a new device has been plugged or unplugged from the system. Its main job is to load the correct device driver (module) based on the device identifier. Devices are identified by the bus they are plugged into (e.g., PCI) and the associated ID defined by the bus specification. This helper is part of the hotplug package

Virtual devices and real devices interact with the kernel in slightly different ways. Forexample, they differ with regard to the following points:

Initialization
Most virtual devices are assigned a net_device data structure, as real devices are. Often, most of the virtual device’s net_device’s function pointers are initialized to routines implemented as wrappers, more or less complex, around the function pointers used by the associated real devices. However, not all virtual devices are assigned a net_device instance. Aliasing devices are an example; they are implemented as simple labels on the associated real device

Configuration
It is common to provide ad hoc user-space tools to configure virtual devices, especially for the high-level fields that apply only to those devices and which could not be configured using standard tools such as ifconfig.

External interface
Each virtual device usually exports a file, or a directory with a few files, to the /proc filesystem. How complex and detailed the information exported with those files is depends on the kind of virtual device and on the design. Files associated with virtual devices are extra files; they do not replace the ones associated with the physical devices. Aliasing devices, which do not have their own net_device instances, are again an exception.

Transmission
When the relationship of virtual device to real device is not one-to-one, the routine used to transmit may need to include, among other tasks, the selection of the real device to use.* Because QoS is enforced on a per-device basis, the multiple relationships between virtual devices and associated real devices have implications for the Traffic Control configuration.

Reception
Because virtual devices are software objects, they do not need to engage in interactions with real resources on the system, such as registering an IRQ handler or allocating I/O ports and I/O memory. Their traffic comes secondhand from the physical devices that perform those tasks. Packet reception happens differently for different types of virtual devices. For instance, 802.1Q interfaces register an Ethertype and are passed only those packets received by the associated real devices that carry the right protocol ID.† In contrast, bridge interfaces receive any packet that arrives from the associated devices

External notifications
Notifications from other kernel components about specific events taking place in the kernel‡ are of interest as much to virtual devices as to real ones. Because virtual devices’ logic is implemented on top of real devices, the latter have no knowledge about that logic and therefore are not able to pass on those notifications. For this reason, notifications need to go directly to the virtual devices. Let’s use Bonding as an example: if one device in the bundle goes down, the algorithms used to distribute traffic among the bundle’s members have to be made aware of that so that they do not select the devices that are no longer available.
Unlike these software-triggered notifications, hardware-triggered notifications (e.g., PCI power management) cannot reach virtual devices directly because there is no hardware associated with virtual devices.

Tuning via /proc Filesystem

In /proc/net, you can find the files created by net_dev_init, via dev_proc_init and dev_mcast_init

dev
Displays, for each network device registered with the kernel, a few statistics about reception and transmission, such as bytes received or transmitted, number of packets, errors, etc.

dev_mcast
Displays, for each network device registered with the kernel, the values of a few parameters used by IP multicast.

wireless
Similarly to dev, for each wireless device, prints the values of a few parameters from the wireless block returned by the dev->get_wireless_stats virtual function. Note that dev->get_wireless_stats returns something only for wireless devices, because those allocate a data structure to keep those statistics (and so /proc/net/wireless will include only wireless devices).

softnet_stat
Exports statistics about the software interrupts used by the networking code


0 0