blktap(4)

来源：互联网发布：淘宝客服主管培训课程编辑：程序博客网时间：2024/06/03 22:57

1.1 blkfront:
注册块设备：xlblk_init()register_blkdev(XENVBD_MAJOR, DEV_NAME) 这里XENVBD_MAJOR为202，DEV_NAME=”xvd”
注册xenbus的frontend：
xlblk_init()xenbus_register_frontend(&blkfront_driver)xenbus_register_driver_common()
driver_register()bus_add_driver()最后这个bus_add_driver()是linux的系统调用，把设备添加到总线。
这里注册的blkfront_driver如下：
static struct xenbus_driver blkfront_driver = {
.ids = blkfront_ids,
.probe = blkfront_probe,
.remove = blkfront_remove,
.resume = blkfront_resume,
.otherend_changed = blkback_changed,
.is_ready = blkfront_is_ready,
};
其中回调函数otherend_changed = blkback_changed
当backend端状态发生变化的时候，调用blkback_changed()
其中当backend的state==XenbusStateConnected，表明backend最终算’ready’，调用blkront_connect(info)，这里的info是blkfront_info类型的，存放所有关于blkfront interface的内容，每个vbd对应一个。它也是gendisk的private_data
blkbakc_changed()blkfront_connectedxlvbd_alloc_gendisk()xlvbd_init_blk_queue()完成vbd的初始化
这里设置info->tag_set.ops = & blkfront_mq_ops//tag_set用来定义队列组，ops是回调函数，猜测是用来处理请求的？
这里所有的请求都是操作都有类似mq的东西，查了一下，可能是指multi-queue。这个muli-queue是指在多核CPU的情况下，将不同的block层提交队列分配到不同的CPU核上，以更好的平衡IO的工作负载，提高存储设备的IO效率。
blkfront_mq_ops如下：
static struct blk_mq_ops blkfront_mq_ops = {
.queue_rq = blkif_queue_rq,
};
blkif_queue_rq()blkif_queue_request()->blkif_queue_rw_req()
static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *rinfo)：
真正的提交request给backend
（1）通过参数req得到请求涉及的segements，从而得到需要的grefs（grant references），如果比最大的还要大，就需要用INDIRECT()得到额外的。
（2）把req存储到rinfo->shadow数组，下标是id，这个id存在ring_req里。根据req设置ring_req。

1.2 blkback
初始化interface: xen_blkif_init()->xen_blkif_interface_init()
注册xenbus的backend：xenblkif_init()->xen_blkif_xenbus_init()->xenbus_register_backend(&xen_blkbk_driver)
xenbus_register_driver_common()和frontend的注册相同，都调用了这个common函数，不赘述。
static struct xenbus_driver xen_blkbk_driver = {
.ids = xen_blkbk_ids,
.probe = xen_blkbk_probe,
.remove = xen_blkbk_remove,
.otherend_changed = frontend_changed
};
(1) xen_blkbk_probe()->xenbus_watch_pathfmt(dev,&be->backend_watch,backend_changed,”%s/%s”,dev->nodename,”physical-device”)
调用xenbus_watch_pathfmt()给dev->nodename”/physical-device“这个path注册watch的callback函数backend_changed。
hotplug scirpts???
backend_changed()在注册完之后立马运行，创建vbd，与frontend连接。
（2）frontend_changed()
当frontend状态变化，state == XenbusStateConnected的时候，即前后端已经连接。
frontend_changed()->connect_ring()->read_per_ring_refs()->xen_blkif_map()
注册对于ring的event channel 和 irqhandler
err = bind_interdomain_evtchn_to_irqhandler(blkif->domid, evtchn,
xen_blkif_be_int, 0,
“blkif-backend”, ring);
xen_blkif_be_int()blkif_notify_work()
blkif_notify_work(struct xen_blkif_ring *ring)处理来自frontend即guest OS的notification
做了两件事：
ring->waiting_reqs = 1;
wake_up(&ring->wq);
（3）在（2）执行后
frontend_changed()xen_update_blkif_status()
xen_update_blkif_status(struct xen_blkif *blkif) 对blkif中所有的ring都启动一个线程：
//blkif存储了interface相关的信息
//blkif->rings是这个设备所有的ring
//nr_rings = number of hardware queues of frontend。过程：connect()xenbus_read_unsigned()
for (i = 0; i < blkif->nr_rings; i++) {
ring = &blkif->rings[i];
ring->xenblkd = kthread_run(xen_blkif_schedule, ring, “%s-%d”, name, i);
….//错误处理
}
这样，frontend和backend的ring都有一个xen_blkif_schedule()处理。
int xen_blkif_schedule(void *arg) // 如上arg = ring
（a）阻塞等待guest OS发出notification，见（2），blkif_notify_work()会改变ring->waiting_reqs和唤醒ring->wq。
timeout = wait_event_interruptible_timeout( ring->wq,ring->waiting_reqs || kthread_should_stop(),
timeout);
（b）阻塞等待直到有可用的pending_free，这说明请求队列的数量没有达到上限，pending_free的变化见（4）
timeout = wait_event_interruptible_timeout(ring->pending_free_wq,!list_empty(&ring->pending_free) ||kthread_should_stop(),timeout);
（c）调用do_block_io_op(ring)
xen_blkif_schedule()do_block_io_op()
循环调用__do_block_io_op()直到请求处理完成。
xen_blkif_schedule()do_block_io_op()->__do_block_io_op()
rc = blk_rings->common.req_cons ;//backend已经回复frontend最近的下标
rp = blk_rings->common.sring->req_prod;//frontend的请求的下标
通过判断rp == rc则说明这次backend把所有ring中的请求都处理了。在循环中重点做这些事情：
（a）根据blkif的protocol得到ring里面的request，赋值给req
（b）调用dispatch_rw_block_io()或者 dispatch_discard_io() 或者 dispatch_other_io()，这里看正常情况下调用的dispatch_rw_block_io()
xen_blkif_schedule()do_block_io_op()__do_block_io_op()dispatch_rw_block_io()
（a）做一些ring的统计，例如ring->st_rd_req++
（b）把req转换成bio类型，特别地，处理了之前通过grefs管理的请求的page
（c） plug住，然后调用submit_bio()把所有的bio传递给通用块层，调用blk_finish_plug()结束传递。

总结一下kernel这边：

0 0