网卡驱动（二）

来源：互联网发布：c语言地址是什么类型编辑：程序博客网时间：2024/05/01 14:58

9.4. softnet_data Structure

We will see in Chapter 10 that each CPU has its own queue for incoming frames . Because each CPU has its own data structure to manage ingress and egress traffic, there is no need for any locking among different CPUs. The data structure for this queue, softnet_data, is defined in include/linux/netdevice.h as follows:

struct softnet_data{    int            throttle;    int            cng_level;    int            avg_blog;    struct sk_buff_head    input_pkt_queue;    struct list_head       poll_list;    struct net_device      *output_queue;    struct sk_buff         *completion_queue;    struct net_device      backlog_dev;}

The structure includes both fields used for reception and fields used for transmission. In other words, both the NET_RX_SOFTIRQ and NET_TX_SOFTIRQ softirqs refer to the structure. Ingress frames are queued to input_pkt_queue,^[*] and egress frames are placed into the specialized queues handled by Traffic Control (the QoS layer) instead of being handled by softirqs and the softnet_data structure, but softirqs are still used to clean up transmitted buffers afterward, to keep that task from slowing transmission.

^[*] You will see in Chapter 10 that this is no longer true for drivers using NAPI.

9.4.1. Fields of softnet_data

The following is a brief field-by-field description of this data structure; details will be given in later chapters. Some drivers use the NAPI interface, whereas others have not yet been updated to NAPI; both types of driver use this structure, but some fields are reserved for the non-NAPI drivers.

throttle
avg_blog
cng_level: These three parameters are used by the congestion management algorithm and are further described following this list, as well as in the "Congestion Management" section in Chapter 10. All three, by default, are updated with the reception of every frame.
input_pkt_queue: This queue, initialized in net_dev_init, is where incoming frames are stored before being processed by the driver. It is used by non-NAPI drivers; those that have been upgraded to NAPI use their own private queues.
backlog_dev: This is an entire embedded data structure (not just a pointer to one) of type net_device, which represents a device that has scheduled net_rx_action for execution on the associated CPU. This field is used by non-NAPI drivers. The name stands for "backlog device." You will see how it is used in the section "Old Interface Between Device Drivers and Kernel: First Part of netif_rx" in Chapter 10.
poll_list: This is a bidirectional list of devices with input frames waiting to be processed. More details can be found in the section "Processing the NET_RX_SOFTIRQ: net_rx_action" in Chapter 10.
output_queue
completion_queue: output_queue is the list of devices that have something to transmit, and completion_queue is the list of buffers that have been successfully transmitted and therefore can be released. More details are given in the section "Processing the NET_TX_SOFTIRQ: net_tx_action" in Chapter 11.

throttle is treated as a Boolean variable whose value is true when the CPU is overloaded and false otherwise. Its value depends on the number of frames in input_pkt_queue. When the throttle flag is set, all input frames received by this CPU are dropped, regardless of the number of frames in the queue.^[*]

^[*] Drivers using NAPI might not drop incoming traffic under these conditions.

avg_blog represents the weighted average value of the input_pkt_queue queue length; it can range from 0 to the maximum length represented by netdev_max_backlog. avg_blog is used to compute cng_level.

cng_level, which represents the congestion level, can take any of the values shown in Figure 9-4. As avg_blog hits one of the thresholds shown in the figure, cng_level changes value. The definitions of the NET_RX_XXX enum values are in include/linux/netdevice.h, and the definitions of the congestion levels mod_cong, lo_cong, and no_cong are in net/core/dev.c.^[] The strings within brackets (/DROP and /HIGH) are explained in the section "Congestion Management" in Chapter 10. avg_blog and cng_level are recalculated with each frame, by default, but recalculation can be postponed and tied to a timer to avoid adding too much overhead.

^[] The NET_RX_XXX values are also used outside this context, and there are other NET_RX_XXX values not used here. The value no_cong_thresh is not used; it used to be used by process_backlog (described in Chapter 10) to remove a queue from the throttle state under some conditions when the kernel still had support for the feature (which has been dropped).

Figure 9-4. Congestion level (NET_RX_XXX) based on the average backlog avg_blog

avg_blog and cng_level are associated with the CPU and therefore apply to non-NAPI devices, which share the queue input_pkt_queue that is used by each CPU.

9.4.2. Initialization of softnet_data

Each CPU's softnet_data structure is initialized by net_dev_init, which runs at boot time and is described in Chapter 5. The initialization code is:

    for (i = 0; i < NR_CPUS; i++) {        struct softnet_data *queue;        queue = &per_cpu(softnet_data,i);        skb_queue_head_init(&queue->input_pkt_queue);        queue->throttle = 0;        queue->cng_level = 0;        queue->avg_blog = 10; /* arbitrary non-zero */        queue->completion_queue = NULL;        INIT_LIST_HEAD(&queue->poll_list);        set_bit(_ _LINK_STATE_START, &queue->backlog_dev.state);        queue->backlog_dev.weight = weight_p;        queue->backlog_dev.poll = process_backlog;        atomic_set(&queue->backlog_dev.refcnt, 1);    }