xen grant table机制分析
来源:互联网 发布:rmvb源码 编辑:程序博客网 时间:2024/06/16 20:29
grant table是xen基于共享内存的,在不同domain之间进行通信的一种机制,grant table需要domain和xen共同配合才能进行
* Xen's grant tables provide a generic mechanism to memory sharing
* between domains. This shared memory interface underpins the split
* device drivers for block and network IO.
*
* Each domain has its own grant table. This is a data structure that
* is shared with Xen; it allows the domain to tell Xen what kind of
* permissions other domains have on its pages. Entries in the grant
* table are identified by grant references. A grant reference is an
* integer, which indexes into the grant table. It acts as a
* capability which the grantee can use to perform operations on the
* granter’s memory.
*
* This capability-based system allows shared-memory communications
* between unprivileged domains. A grant reference also encapsulates
* the details of a shared page, removing the need for a domain to
* know the real machine address of a page it is sharing. This makes
* it possible to share memory correctly with domains running in
* fully virtualised memory.
先来看domain中对grant table的操作
include/xen/interface/grant_table.h 中对grant table的操作注释
/* Some rough guidelines on accessing and updating grant-table entries * in a concurrency-safe manner. For more information, Linux contains a * reference implementation for guest OSes (arch/xen/kernel/grant_table.c). * * NB. WMB is a no-op on current-generation x86 processors. However, a * compiler barrier will still be required. * * Introducing a valid entry into the grant table: * 1. Write ent->domid. * 2. Write ent->frame: * GTF_permit_access: Frame to which access is permitted. * GTF_accept_transfer: Pseudo-phys frame slot being filled by new * frame, or zero if none. * 3. Write memory barrier (WMB). * 4. Write ent->flags, inc. valid type. * * Invalidating an unused GTF_permit_access entry: * 1. flags = ent->flags. * 2. Observe that !(flags & (GTF_reading|GTF_writing)). * 3. Check result of SMP-safe CMPXCHG(&ent->flags, flags, 0). * NB. No need for WMB as reuse of entry is control-dependent on success of * step 3, and all architectures guarantee ordering of ctrl-dep writes. * * Invalidating an in-use GTF_permit_access entry: * This cannot be done directly. Request assistance from the domain controller * which can set a timeout on the use of a grant entry and take necessary * action. (NB. This is not yet implemented!). * * Invalidating an unused GTF_accept_transfer entry: * 1. flags = ent->flags. * 2. Observe that !(flags & GTF_transfer_committed). [*] * 3. Check result of SMP-safe CMPXCHG(&ent->flags, flags, 0). * NB. No need for WMB as reuse of entry is control-dependent on success of * step 3, and all architectures guarantee ordering of ctrl-dep writes. * [*] If GTF_transfer_committed is set then the grant entry is 'committed'. * The guest must /not/ modify the grant entry until the address of the * transferred frame is written. It is safe for the guest to spin waiting * for this to occur (detect by observing GTF_transfer_completed in * ent->flags). * * Invalidating a committed GTF_accept_transfer entry: * 1. Wait for (ent->flags & GTF_transfer_completed). * * Changing a GTF_permit_access from writable to read-only: * Use SMP-safe CMPXCHG to set GTF_readonly, while checking !GTF_writing. * * Changing a GTF_permit_access from read-only to writable: * Use SMP-safe bit-setting instruction. */
grant_entry是一个结构体,代表某个page的共享信息,我们只分析v1版本的grant_entry结构体。domain的grant table由多个grant entry的数组组成,每个grant entry在数组中的索引用一个uint32_t来表示,作为一个grant reference,又称为GR
/* * Reference to a grant entry in a specified domain's grant table. */typedef uint32_t grant_ref_t;/* * A grant table comprises a packed array of grant entries in one or more * page frames shared between Xen and a guest. * [XEN]: This field is written by Xen and read by the sharing guest. * [GST]: This field is written by the guest and read by Xen. *//* * Version 1 of the grant table entry structure is maintained purely * for backwards compatibility. New guests should use version 2. *struct grant_entry_v1 { /* GTF_xxx: various type and flag information. [XEN,GST] */ uint16_t flags; /* The domain being granted foreign privileges. [GST] */ domid_t domid; /* * GTF_permit_access: Frame that @domid is allowed to map and access. [GST] * GTF_accept_transfer: Frame whose ownership transferred by @domid. [XEN] */ uint32_t frame;};grant_entry中的flags记录了grant entry的类型,最常用的是GTF_permit_access, GTP_accept_transfer两种:GTF_permit_access由共享page的domain指定授权给哪个domain(domid)来访问,包括读和写,以及访问哪个page frame(frame)。GTF_accept_transfer表示domid接收其他domain转移给自己的page。
grant_entry的flags还记录着当前grant entry的状态,e.g.
/* * Subflags for GTF_permit_access. * GTF_readonly: Restrict @domid to read-only mappings and accesses. [GST] * GTF_reading: Grant entry is currently mapped for reading by @domid. [XEN] * GTF_writing: Grant entry is currently mapped for writing by @domid. [XEN] * GTF_sub_page: Grant access to only a subrange of the page. @domid * will only be allowed to copy from the grant, and not * map it. [GST] */#define _GTF_readonly (2)#define GTF_readonly (1U<<_GTF_readonly)#define _GTF_reading (3)#define GTF_reading (1U<<_GTF_reading)#define _GTF_writing (4)#define GTF_writing (1U<<_GTF_writing)#define _GTF_sub_page (8)#define GTF_sub_page (1U<<_GTF_sub_page)/* * Subflags for GTF_accept_transfer: * GTF_transfer_committed: Xen sets this flag to indicate that it is committed * to transferring ownership of a page frame. When a guest sees this flag * it must /not/ modify the grant entry until GTF_transfer_completed is * set by Xen. * GTF_transfer_completed: It is safe for the guest to spin-wait on this flag * after reading GTF_transfer_committed. Xen will always write the frame * address, followed by ORing this flag, in a timely manner. */#define _GTF_transfer_committed (2)#define GTF_transfer_committed (1U<<_GTF_transfer_committed)#define _GTF_transfer_completed (3)#define GTF_transfer_completed (1U<<_GTF_transfer_completed)
xen中定义了结构体grant_table,用来保存每个domain内部的grant table表,对于映射类型的grant entry,xen中用一个active_grant_entry来跟踪映射的变化,domain内部是没有这个grant_table结构体的,通过映射xen的内存页得到自己的grant table
/* Per-domain grant information. */struct grant_table { /* Table size. Number of frames shared with guest */ unsigned int nr_grant_frames; /* Shared grant table (see include/public/grant_table.h). */ union { void **shared_raw; struct grant_entry_v1 **shared_v1; union grant_entry_v2 **shared_v2; }; /* Number of grant status frames shared with guest (for version 2) */ unsigned int nr_status_frames; /* State grant table (see include/public/grant_table.h). */ grant_status_t **status; /* Active grant table. */ struct active_grant_entry **active; /* Mapping tracking table. */ struct grant_mapping **maptrack; unsigned int maptrack_head; unsigned int maptrack_limit; /* Lock protecting updates to active and shared grant tables. */ spinlock_t lock; /* The defined versions are 1 and 2. Set to 0 if we don't know what version to use yet. */ unsigned gt_version;}; /* Count of writable host-CPU mappings. */#define GNTPIN_hstw_shift (0)#define GNTPIN_hstw_inc (1 << GNTPIN_hstw_shift)#define GNTPIN_hstw_mask (0xFFU << GNTPIN_hstw_shift) /* Count of read-only host-CPU mappings. */#define GNTPIN_hstr_shift (8)#define GNTPIN_hstr_inc (1 << GNTPIN_hstr_shift)#define GNTPIN_hstr_mask (0xFFU << GNTPIN_hstr_shift) /* Count of writable device-bus mappings. */#define GNTPIN_devw_shift (16)#define GNTPIN_devw_inc (1 << GNTPIN_devw_shift)#define GNTPIN_devw_mask (0xFFU << GNTPIN_devw_shift) /* Count of read-only device-bus mappings. */#define GNTPIN_devr_shift (24)#define GNTPIN_devr_inc (1 << GNTPIN_devr_shift)#define GNTPIN_devr_mask (0xFFU << GNTPIN_devr_shift)/* Active grant entry - used for shadowing GTF_permit_access grants. */struct active_grant_entry { u32 pin; /* Reference count information. */ domid_t domid; /* Domain being granted access. */ struct domain *trans_domain; uint32_t trans_gref; unsigned long frame; /* Frame being granted. */ unsigned long gfn; /* Guest's idea of the frame being granted. */ unsigned is_sub_page:1; /* True if this is a sub-page grant. */ unsigned start:15; /* For sub-page grants, the start offset in the page. */ unsigned length:16; /* For sub-page grants, the length of the grant. */};/* * Tracks a mapping of another domain's grant reference. Each domain has a * table of these, indexes into which are returned as a 'mapping handle'. */struct grant_mapping { u32 ref; /* grant ref */ u16 flags; /* 0-4: GNTMAP_* ; 5-15: unused */ domid_t domid; /* granting domain */};
xen通过do_grant_table_op来执行grant table相关的hypercall,我们重点关注如下几个操作:GNTTABOP_map_grant_ref, GNTTABOP_unmap_grant_ref, GNTTABOP_transfer, GNTTABOP_copy
GNTTABOP_map_grant_ref和GNTTABOP_unmap_grant_ref用来映射/撤销映射一个GR
/* * GNTTABOP_map_grant_ref: Map the grant entry (<dom>,<ref>) for access * by devices and/or host CPUs. If successful, <handle> is a tracking number * that must be presented later to destroy the mapping(s). On error, <handle> * is a negative status code. * NOTES: * 1. If GNTMAP_device_map is specified then <dev_bus_addr> is the address * via which I/O devices may access the granted frame. * 2. If GNTMAP_host_map is specified then a mapping will be added at * either a host virtual address in the current address space, or at * a PTE at the specified machine address. The type of mapping to * perform is selected through the GNTMAP_contains_pte flag, and the * address is specified in <host_addr>. * 3. Mappings should only be destroyed via GNTTABOP_unmap_grant_ref. If a * host mapping is destroyed by other means then it is *NOT* guaranteed * to be accounted to the correct grant reference! */struct gnttab_map_grant_ref { /* IN parameters. */ uint64_t host_addr; uint32_t flags; /* GNTMAP_* */ grant_ref_t ref; domid_t dom; /* remote domain */ /* OUT parameters. */ int16_t status; /* => enum grant_status */ grant_handle_t handle; uint64_t dev_bus_addr;};typedef struct gnttab_map_grant_ref gnttab_map_grant_ref_t;DEFINE_XEN_GUEST_HANDLE(gnttab_map_grant_ref_t);
其中flags有两个维度的定义,GNTMAP_device_map, GNTMAP_host_map用来表示这种映射是用于IO操作,e.g. mmio, dma这种,还是一般的内存操作。GNTMAP_application_map用于表示被映射的page是否可以由目标domain的用户态程序访问,GNTMAP_contains_pte表明被映射的page包含源domain的页表
我们来看gnttab_map_grant_ref的实现
static longgnttab_map_grant_ref( XEN_GUEST_HANDLE_PARAM(gnttab_map_grant_ref_t) uop, unsigned int count){ int i; struct gnttab_map_grant_ref op; for ( i = 0; i < count; i++ ) { if (i && hypercall_preempt_check()) return i; if ( unlikely(__copy_from_guest_offset(&op, uop, i, 1)) ) return -EFAULT; __gnttab_map_grant_ref(&op); if ( unlikely(__copy_to_guest_offset(uop, i, &op, 1)) ) return -EFAULT; } return 0;}其中__copy_from_guest_offset和__copy_to_guest_offset宏用来把参数从guest拷贝到xen以及从xen拷贝回guest,在gnttab_map_grant_ref的实现中,guest传递了一组共count个数的gnttab_map_grant_ref,每次通过传递offset依次拷贝一个gnttab_map_grant_ref
#define __copy_to_guest_offset(hnd, off, ptr, nr) ({ \ const typeof(*(ptr)) *_s = (ptr); \ char (*_d)[sizeof(*_s)] = (void *)(hnd).p; \ ((void)((hnd).p == (ptr))); \ __raw_copy_to_guest(_d+(off), _s, sizeof(*_s)*(nr));\})#define __copy_from_guest_offset(ptr, hnd, off, nr) ({ \ const typeof(*(ptr)) *_s = (hnd).p; \ typeof(*(ptr)) *_d = (ptr); \ __raw_copy_from_guest(_d, _s+(off), sizeof(*_d)*(nr));\})XEN_GUEST_HANDLE_PARAM宏是引入用来区分guest传递给xen的指针,用于hypercall参数的指针用XEN_GUEST_HANDLE_PARAM宏封装,否则用XEN_GUEST_HANDLE封装,请参考 http://lists.xen.org/archives/html/xen-devel/2012-08/msg01324.html
在include/public/arch-x86/xen.h里有关于XEN_GUEST_HANDLE, XEN_GUEST_HANDLE_PARAM的宏定义,在x86架构下两者没有区别
#define ___DEFINE_XEN_GUEST_HANDLE(name, type) \ typedef struct { type *p; } __guest_handle_ ## name/* * XEN_GUEST_HANDLE represents a guest pointer, when passed as a field * in a struct in memory. * XEN_GUEST_HANDLE_PARAM represent a guest pointer, when passed as an * hypercall argument. * XEN_GUEST_HANDLE_PARAM and XEN_GUEST_HANDLE are the same on X86 but * they might not be on other architectures. */#define __DEFINE_XEN_GUEST_HANDLE(name, type) \ ___DEFINE_XEN_GUEST_HANDLE(name, type); \ ___DEFINE_XEN_GUEST_HANDLE(const_##name, const type)#define DEFINE_XEN_GUEST_HANDLE(name) __DEFINE_XEN_GUEST_HANDLE(name, name)#define __XEN_GUEST_HANDLE(name) __guest_handle_ ## name#define XEN_GUEST_HANDLE(name) __XEN_GUEST_HANDLE(name)#define XEN_GUEST_HANDLE_PARAM(name) XEN_GUEST_HANDLE(name)那么XEN_GUEST_HANDLE_PARAM(gnttab_map_grant_ref_t)实际指向的是结构体__guest_handle_gnttab_map_grant_ref_t,定义为
typedef struct { gnttab_map_grant_ref_t* p } __guest_handle_gnttab_map_grant_ref_ttypedef struct { gnttab_map_grant_ref_t* p } __guest_handle_const_gnttab_map_grant_ref_t
最终映射通过__gnttab_map_grant_ref完成,该函数后续分析
GNTTABOP_unmap_grant_ref则用于撤销之前创建的map,注意撤销之后需要有个flush TLB的动作,通过调用flush_tlb_mask来完成
/* * GNTTABOP_unmap_grant_ref: Destroy one or more grant-reference mappings * tracked by <handle>. If <host_addr> or <dev_bus_addr> is zero, that * field is ignored. If non-zero, they must refer to a device/host mapping * that is tracked by <handle> * NOTES: * 1. The call may fail in an undefined manner if either mapping is not * tracked by <handle>. * 3. After executing a batch of unmaps, it is guaranteed that no stale * mappings will remain in the device or host TLBs. */struct gnttab_unmap_grant_ref { /* IN parameters. */ uint64_t host_addr; uint64_t dev_bus_addr; grant_handle_t handle; /* OUT parameters. */ int16_t status; /* => enum grant_status */};typedef struct gnttab_unmap_grant_ref gnttab_unmap_grant_ref_t;DEFINE_XEN_GUEST_HANDLE(gnttab_unmap_grant_ref_t);static longgnttab_unmap_grant_ref( XEN_GUEST_HANDLE_PARAM(gnttab_unmap_grant_ref_t) uop, unsigned int count){ int i, c, partial_done, done = 0; struct gnttab_unmap_grant_ref op; struct gnttab_unmap_common common[GNTTAB_UNMAP_BATCH_SIZE]; while ( count != 0 ) { c = min(count, (unsigned int)GNTTAB_UNMAP_BATCH_SIZE); partial_done = 0; for ( i = 0; i < c; i++ ) { if ( unlikely(__copy_from_guest(&op, uop, 1)) ) goto fault; __gnttab_unmap_grant_ref(&op, &(common[i])); ++partial_done; if ( unlikely(__copy_field_to_guest(uop, &op, status)) ) goto fault; guest_handle_add_offset(uop, 1); } flush_tlb_mask(current->domain->domain_dirty_cpumask); for ( i = 0; i < partial_done; i++ ) __gnttab_unmap_common_complete(&(common[i])); count -= c; done += c; if (count && hypercall_preempt_check()) return done; } return 0;fault: flush_tlb_mask(current->domain->domain_dirty_cpumask); for ( i = 0; i < partial_done; i++ ) __gnttab_unmap_common_complete(&(common[i])); return -EFAULT;}
GNTTABOP_transfer_grant_ref用于把page从源domain传递给目标domain,和map/unmap不同的是,transfer之后,源domain就永远丧失这个page了。首先由目标domain发起一个GR,该GR的flag包含GTF_accept_transfer,domid为源domain,该GR表明目标domain已经同意接收源domain的page transfer了。之后源domain通过gnttab_transfer开始传递
/* * GNTTABOP_transfer_grant_ref: Transfer <frame> to a foreign domain. The * foreign domain has previously registered its interest in the transfer via * <domid, ref>. * * Note that, even if the transfer fails, the specified page no longer belongs * to the calling domain *unless* the error is GNTST_bad_page. */struct gnttab_transfer { /* IN parameters. */ xen_pfn_t mfn; domid_t domid; grant_ref_t ref; /* OUT parameters. */ int16_t status;};typedef struct gnttab_transfer gnttab_transfer_t;DEFINE_XEN_GUEST_HANDLE(gnttab_transfer_t);
GNTTABOP_copy用于把源domain的内存内容拷贝到目标domain中,显而易见的是xen很适合做这类操作因为hypervisor能看到所有domain的内存分布,同时这种操作不用刷新TLB因此代价不一定比map更高(一个是cpu内存总线的锁,一个是cpu TLB cache的刷,很难说谁的代价更高,在intel SNB下有NUMA的支持,cpu和内存之间的延迟更低,同步开销更小,笔者觉得copy的代价甚至还要低于map)
/* * GNTTABOP_copy: Hypervisor based copy * source and destinations can be eithers MFNs or, for foreign domains, * grant references. the foreign domain has to grant read/write access * in its grant table. * * The flags specify what type source and destinations are (either MFN * or grant reference). * * Note that this can also be used to copy data between two domains * via a third party if the source and destination domains had previously * grant appropriate access to their pages to the third party. * * source_offset specifies an offset in the source frame, dest_offset * the offset in the target frame and len specifies the number of * bytes to be copied. */#define _GNTCOPY_source_gref (0)#define GNTCOPY_source_gref (1<<_GNTCOPY_source_gref)#define _GNTCOPY_dest_gref (1)#define GNTCOPY_dest_gref (1<<_GNTCOPY_dest_gref)struct gnttab_copy { /* IN parameters. */ struct { union { grant_ref_t ref; xen_pfn_t gmfn; } u; domid_t domid; uint16_t offset; } source, dest; uint16_t len; uint16_t flags; /* GNTCOPY_* */ /* OUT parameters. */ int16_t status;};typedef struct gnttab_copy gnttab_copy_t;DEFINE_XEN_GUEST_HANDLE(gnttab_copy_t);
gnttab_copy调用了__gnttab_copy,最终是通过memcpy来完成整个内容的拷贝的,后续详细分析该函数
- xen grant table机制分析
- Xen Internal - grant tables
- Mysql --skip-grant-table
- Xen分析
- grant table 相关代码分析(基于linux2.6.32) 初始化与构建部分
- xen credit scheduler 机制
- xen快照机制
- Xen授权机制
- 【XEN学习笔记】学习授权表(Grant Tables)
- 【XEN学习笔记】学习授权表(Grant Tables)
- XEN启动过程分析
- XEN do_hvm_op流程分析
- Xen基本机制和策略
- xen影子页表机制
- grant
- GRANT
- grant
- IOS Table中Cell的重用reuse机制分析
- 解决每次打开Word、Excel等都要重新配置安装的方法
- 螺旋队列
- ACM算法训练方案
- 10进制数转换为其它进制
- MySQL ERROR 1005: Can't create table (errno: 150)
- xen grant table机制分析
- 线性布局实现垂直分布
- c函数之【初级I/O函数】
- HDOJ 4642 Fliping game
- github结构
- 在Windows和Linux下使用LaTeX
- c函数之【标准I/O函数】
- C#-猜数游戏-控制台(随机数)---ShinePans
- HDU yt13双周赛1001 Digital Roots