xen grant table机制分析

来源:互联网 发布:rmvb源码 编辑:程序博客网 时间:2024/06/16 20:29

grant table是xen基于共享内存的,在不同domain之间进行通信的一种机制,grant table需要domain和xen共同配合才能进行

 * Xen's grant tables provide a generic mechanism to memory sharing
 * between domains. This shared memory interface underpins the split
 * device drivers for block and network IO.
 *
 * Each domain has its own grant table. This is a data structure that
 * is shared with Xen; it allows the domain to tell Xen what kind of
 * permissions other domains have on its pages. Entries in the grant
 * table are identified by grant references. A grant reference is an
 * integer, which indexes into the grant table. It acts as a
 * capability which the grantee can use to perform operations on the
 * granter’s memory.
 *
 * This capability-based system allows shared-memory communications
 * between unprivileged domains. A grant reference also encapsulates
 * the details of a shared page, removing the need for a domain to
 * know the real machine address of a page it is sharing. This makes
 * it possible to share memory correctly with domains running in
 * fully virtualised memory.

先来看domain中对grant table的操作

include/xen/interface/grant_table.h 中对grant table的操作注释

/* Some rough guidelines on accessing and updating grant-table entries * in a concurrency-safe manner. For more information, Linux contains a * reference implementation for guest OSes (arch/xen/kernel/grant_table.c). * * NB. WMB is a no-op on current-generation x86 processors. However, a *     compiler barrier will still be required. * * Introducing a valid entry into the grant table: *  1. Write ent->domid. *  2. Write ent->frame: *      GTF_permit_access:   Frame to which access is permitted. *      GTF_accept_transfer: Pseudo-phys frame slot being filled by new *                           frame, or zero if none. *  3. Write memory barrier (WMB). *  4. Write ent->flags, inc. valid type. * * Invalidating an unused GTF_permit_access entry: *  1. flags = ent->flags. *  2. Observe that !(flags & (GTF_reading|GTF_writing)). *  3. Check result of SMP-safe CMPXCHG(&ent->flags, flags, 0). *  NB. No need for WMB as reuse of entry is control-dependent on success of *      step 3, and all architectures guarantee ordering of ctrl-dep writes. * * Invalidating an in-use GTF_permit_access entry: *  This cannot be done directly. Request assistance from the domain controller *  which can set a timeout on the use of a grant entry and take necessary *  action. (NB. This is not yet implemented!). * * Invalidating an unused GTF_accept_transfer entry: *  1. flags = ent->flags. *  2. Observe that !(flags & GTF_transfer_committed). [*] *  3. Check result of SMP-safe CMPXCHG(&ent->flags, flags, 0). *  NB. No need for WMB as reuse of entry is control-dependent on success of *      step 3, and all architectures guarantee ordering of ctrl-dep writes. *  [*] If GTF_transfer_committed is set then the grant entry is 'committed'. *      The guest must /not/ modify the grant entry until the address of the *      transferred frame is written. It is safe for the guest to spin waiting *      for this to occur (detect by observing GTF_transfer_completed in *      ent->flags). * * Invalidating a committed GTF_accept_transfer entry: *  1. Wait for (ent->flags & GTF_transfer_completed). * * Changing a GTF_permit_access from writable to read-only: *  Use SMP-safe CMPXCHG to set GTF_readonly, while checking !GTF_writing. * * Changing a GTF_permit_access from read-only to writable: *  Use SMP-safe bit-setting instruction. */

grant_entry是一个结构体,代表某个page的共享信息,我们只分析v1版本的grant_entry结构体。domain的grant table由多个grant entry的数组组成,每个grant entry在数组中的索引用一个uint32_t来表示,作为一个grant reference,又称为GR

/* * Reference to a grant entry in a specified domain's grant table. */typedef uint32_t grant_ref_t;/* * A grant table comprises a packed array of grant entries in one or more * page frames shared between Xen and a guest. * [XEN]: This field is written by Xen and read by the sharing guest. * [GST]: This field is written by the guest and read by Xen. *//* * Version 1 of the grant table entry structure is maintained purely * for backwards compatibility.  New guests should use version 2. *struct grant_entry_v1 {    /* GTF_xxx: various type and flag information.  [XEN,GST] */    uint16_t flags;    /* The domain being granted foreign privileges. [GST] */    domid_t  domid;    /*     * GTF_permit_access: Frame that @domid is allowed to map and access. [GST]     * GTF_accept_transfer: Frame whose ownership transferred by @domid. [XEN]     */    uint32_t frame;};
grant_entry中的flags记录了grant entry的类型,最常用的是GTF_permit_access, GTP_accept_transfer两种:GTF_permit_access由共享page的domain指定授权给哪个domain(domid)来访问,包括读和写,以及访问哪个page frame(frame)。GTF_accept_transfer表示domid接收其他domain转移给自己的page。

grant_entry的flags还记录着当前grant entry的状态,e.g.

/* * Subflags for GTF_permit_access. *  GTF_readonly: Restrict @domid to read-only mappings and accesses. [GST] *  GTF_reading: Grant entry is currently mapped for reading by @domid. [XEN] *  GTF_writing: Grant entry is currently mapped for writing by @domid. [XEN] *  GTF_sub_page: Grant access to only a subrange of the page.  @domid *                will only be allowed to copy from the grant, and not *                map it. [GST] */#define _GTF_readonly       (2)#define GTF_readonly        (1U<<_GTF_readonly)#define _GTF_reading        (3)#define GTF_reading         (1U<<_GTF_reading)#define _GTF_writing        (4)#define GTF_writing         (1U<<_GTF_writing)#define _GTF_sub_page       (8)#define GTF_sub_page        (1U<<_GTF_sub_page)/* * Subflags for GTF_accept_transfer: *  GTF_transfer_committed: Xen sets this flag to indicate that it is committed *      to transferring ownership of a page frame. When a guest sees this flag *      it must /not/ modify the grant entry until GTF_transfer_completed is *      set by Xen. *  GTF_transfer_completed: It is safe for the guest to spin-wait on this flag *      after reading GTF_transfer_committed. Xen will always write the frame *      address, followed by ORing this flag, in a timely manner. */#define _GTF_transfer_committed (2)#define GTF_transfer_committed  (1U<<_GTF_transfer_committed)#define _GTF_transfer_completed (3)#define GTF_transfer_completed  (1U<<_GTF_transfer_completed)

xen中定义了结构体grant_table,用来保存每个domain内部的grant table表,对于映射类型的grant entry,xen中用一个active_grant_entry来跟踪映射的变化,domain内部是没有这个grant_table结构体的,通过映射xen的内存页得到自己的grant table

/* Per-domain grant information. */struct grant_table {    /* Table size. Number of frames shared with guest */    unsigned int          nr_grant_frames;    /* Shared grant table (see include/public/grant_table.h). */    union {        void **shared_raw;        struct grant_entry_v1 **shared_v1;        union grant_entry_v2 **shared_v2;    };    /* Number of grant status frames shared with guest (for version 2) */    unsigned int          nr_status_frames;    /* State grant table (see include/public/grant_table.h). */    grant_status_t       **status;    /* Active grant table. */    struct active_grant_entry **active;    /* Mapping tracking table. */    struct grant_mapping **maptrack;    unsigned int          maptrack_head;    unsigned int          maptrack_limit;    /* Lock protecting updates to active and shared grant tables. */    spinlock_t            lock;    /* The defined versions are 1 and 2.  Set to 0 if we don't know       what version to use yet. */    unsigned              gt_version;}; /* Count of writable host-CPU mappings. */#define GNTPIN_hstw_shift    (0)#define GNTPIN_hstw_inc      (1 << GNTPIN_hstw_shift)#define GNTPIN_hstw_mask     (0xFFU << GNTPIN_hstw_shift) /* Count of read-only host-CPU mappings. */#define GNTPIN_hstr_shift    (8)#define GNTPIN_hstr_inc      (1 << GNTPIN_hstr_shift)#define GNTPIN_hstr_mask     (0xFFU << GNTPIN_hstr_shift) /* Count of writable device-bus mappings. */#define GNTPIN_devw_shift    (16)#define GNTPIN_devw_inc      (1 << GNTPIN_devw_shift)#define GNTPIN_devw_mask     (0xFFU << GNTPIN_devw_shift) /* Count of read-only device-bus mappings. */#define GNTPIN_devr_shift    (24)#define GNTPIN_devr_inc      (1 << GNTPIN_devr_shift)#define GNTPIN_devr_mask     (0xFFU << GNTPIN_devr_shift)/* Active grant entry - used for shadowing GTF_permit_access grants. */struct active_grant_entry {    u32           pin;    /* Reference count information.             */    domid_t       domid;  /* Domain being granted access.             */    struct domain *trans_domain;    uint32_t      trans_gref;    unsigned long frame;  /* Frame being granted.                     */    unsigned long gfn;    /* Guest's idea of the frame being granted. */    unsigned      is_sub_page:1; /* True if this is a sub-page grant. */    unsigned      start:15; /* For sub-page grants, the start offset                               in the page.                           */    unsigned      length:16; /* For sub-page grants, the length of the                                grant.                                */};/* * Tracks a mapping of another domain's grant reference. Each domain has a * table of these, indexes into which are returned as a 'mapping handle'. */struct grant_mapping {    u32      ref;           /* grant ref */    u16      flags;         /* 0-4: GNTMAP_* ; 5-15: unused */    domid_t  domid;         /* granting domain */};

xen通过do_grant_table_op来执行grant table相关的hypercall,我们重点关注如下几个操作:GNTTABOP_map_grant_ref, GNTTABOP_unmap_grant_ref, GNTTABOP_transfer, GNTTABOP_copy

GNTTABOP_map_grant_ref和GNTTABOP_unmap_grant_ref用来映射/撤销映射一个GR

/* * GNTTABOP_map_grant_ref: Map the grant entry (<dom>,<ref>) for access * by devices and/or host CPUs. If successful, <handle> is a tracking number * that must be presented later to destroy the mapping(s). On error, <handle> * is a negative status code. * NOTES: *  1. If GNTMAP_device_map is specified then <dev_bus_addr> is the address *     via which I/O devices may access the granted frame. *  2. If GNTMAP_host_map is specified then a mapping will be added at *     either a host virtual address in the current address space, or at *     a PTE at the specified machine address.  The type of mapping to *     perform is selected through the GNTMAP_contains_pte flag, and the *     address is specified in <host_addr>. *  3. Mappings should only be destroyed via GNTTABOP_unmap_grant_ref. If a *     host mapping is destroyed by other means then it is *NOT* guaranteed *     to be accounted to the correct grant reference! */struct gnttab_map_grant_ref {    /* IN parameters. */    uint64_t host_addr;    uint32_t flags;               /* GNTMAP_* */    grant_ref_t ref;    domid_t  dom;                 /* remote domain */    /* OUT parameters. */    int16_t  status;              /* => enum grant_status */    grant_handle_t handle;    uint64_t dev_bus_addr;};typedef struct gnttab_map_grant_ref gnttab_map_grant_ref_t;DEFINE_XEN_GUEST_HANDLE(gnttab_map_grant_ref_t);

其中flags有两个维度的定义,GNTMAP_device_map, GNTMAP_host_map用来表示这种映射是用于IO操作,e.g. mmio, dma这种,还是一般的内存操作。GNTMAP_application_map用于表示被映射的page是否可以由目标domain的用户态程序访问,GNTMAP_contains_pte表明被映射的page包含源domain的页表

我们来看gnttab_map_grant_ref的实现

static longgnttab_map_grant_ref(    XEN_GUEST_HANDLE_PARAM(gnttab_map_grant_ref_t) uop, unsigned int count){    int i;    struct gnttab_map_grant_ref op;    for ( i = 0; i < count; i++ )    {        if (i && hypercall_preempt_check())            return i;        if ( unlikely(__copy_from_guest_offset(&op, uop, i, 1)) )            return -EFAULT;        __gnttab_map_grant_ref(&op);        if ( unlikely(__copy_to_guest_offset(uop, i, &op, 1)) )            return -EFAULT;    }    return 0;}
其中__copy_from_guest_offset和__copy_to_guest_offset宏用来把参数从guest拷贝到xen以及从xen拷贝回guest,在gnttab_map_grant_ref的实现中,guest传递了一组共count个数的gnttab_map_grant_ref,每次通过传递offset依次拷贝一个gnttab_map_grant_ref

#define __copy_to_guest_offset(hnd, off, ptr, nr) ({    \    const typeof(*(ptr)) *_s = (ptr);                   \    char (*_d)[sizeof(*_s)] = (void *)(hnd).p;          \    ((void)((hnd).p == (ptr)));                         \    __raw_copy_to_guest(_d+(off), _s, sizeof(*_s)*(nr));\})#define __copy_from_guest_offset(ptr, hnd, off, nr) ({  \    const typeof(*(ptr)) *_s = (hnd).p;                 \    typeof(*(ptr)) *_d = (ptr);                         \    __raw_copy_from_guest(_d, _s+(off), sizeof(*_d)*(nr));\})
XEN_GUEST_HANDLE_PARAM宏是引入用来区分guest传递给xen的指针,用于hypercall参数的指针用XEN_GUEST_HANDLE_PARAM宏封装,否则用XEN_GUEST_HANDLE封装,请参考 http://lists.xen.org/archives/html/xen-devel/2012-08/msg01324.html

在include/public/arch-x86/xen.h里有关于XEN_GUEST_HANDLE, XEN_GUEST_HANDLE_PARAM的宏定义,在x86架构下两者没有区别

#define ___DEFINE_XEN_GUEST_HANDLE(name, type) \    typedef struct { type *p; } __guest_handle_ ## name/* * XEN_GUEST_HANDLE represents a guest pointer, when passed as a field * in a struct in memory. * XEN_GUEST_HANDLE_PARAM represent a guest pointer, when passed as an * hypercall argument. * XEN_GUEST_HANDLE_PARAM and XEN_GUEST_HANDLE are the same on X86 but * they might not be on other architectures. */#define __DEFINE_XEN_GUEST_HANDLE(name, type) \    ___DEFINE_XEN_GUEST_HANDLE(name, type);   \    ___DEFINE_XEN_GUEST_HANDLE(const_##name, const type)#define DEFINE_XEN_GUEST_HANDLE(name)   __DEFINE_XEN_GUEST_HANDLE(name, name)#define __XEN_GUEST_HANDLE(name)        __guest_handle_ ## name#define XEN_GUEST_HANDLE(name)          __XEN_GUEST_HANDLE(name)#define XEN_GUEST_HANDLE_PARAM(name)    XEN_GUEST_HANDLE(name)
那么XEN_GUEST_HANDLE_PARAM(gnttab_map_grant_ref_t)实际指向的是结构体__guest_handle_gnttab_map_grant_ref_t,定义为

typedef struct { gnttab_map_grant_ref_t* p } __guest_handle_gnttab_map_grant_ref_ttypedef struct { gnttab_map_grant_ref_t* p } __guest_handle_const_gnttab_map_grant_ref_t

最终映射通过__gnttab_map_grant_ref完成,该函数后续分析


GNTTABOP_unmap_grant_ref则用于撤销之前创建的map,注意撤销之后需要有个flush TLB的动作,通过调用flush_tlb_mask来完成

/* * GNTTABOP_unmap_grant_ref: Destroy one or more grant-reference mappings * tracked by <handle>. If <host_addr> or <dev_bus_addr> is zero, that * field is ignored. If non-zero, they must refer to a device/host mapping * that is tracked by <handle> * NOTES: *  1. The call may fail in an undefined manner if either mapping is not *     tracked by <handle>. *  3. After executing a batch of unmaps, it is guaranteed that no stale *     mappings will remain in the device or host TLBs. */struct gnttab_unmap_grant_ref {    /* IN parameters. */    uint64_t host_addr;    uint64_t dev_bus_addr;    grant_handle_t handle;    /* OUT parameters. */    int16_t  status;              /* => enum grant_status */};typedef struct gnttab_unmap_grant_ref gnttab_unmap_grant_ref_t;DEFINE_XEN_GUEST_HANDLE(gnttab_unmap_grant_ref_t);static longgnttab_unmap_grant_ref(    XEN_GUEST_HANDLE_PARAM(gnttab_unmap_grant_ref_t) uop, unsigned int count){    int i, c, partial_done, done = 0;    struct gnttab_unmap_grant_ref op;    struct gnttab_unmap_common common[GNTTAB_UNMAP_BATCH_SIZE];    while ( count != 0 )    {        c = min(count, (unsigned int)GNTTAB_UNMAP_BATCH_SIZE);        partial_done = 0;        for ( i = 0; i < c; i++ )        {            if ( unlikely(__copy_from_guest(&op, uop, 1)) )                goto fault;            __gnttab_unmap_grant_ref(&op, &(common[i]));            ++partial_done;            if ( unlikely(__copy_field_to_guest(uop, &op, status)) )                goto fault;            guest_handle_add_offset(uop, 1);        }        flush_tlb_mask(current->domain->domain_dirty_cpumask);        for ( i = 0; i < partial_done; i++ )            __gnttab_unmap_common_complete(&(common[i]));        count -= c;        done += c;        if (count && hypercall_preempt_check())            return done;    }    return 0;fault:    flush_tlb_mask(current->domain->domain_dirty_cpumask);    for ( i = 0; i < partial_done; i++ )        __gnttab_unmap_common_complete(&(common[i]));    return -EFAULT;}

GNTTABOP_transfer_grant_ref用于把page从源domain传递给目标domain,和map/unmap不同的是,transfer之后,源domain就永远丧失这个page了。首先由目标domain发起一个GR,该GR的flag包含GTF_accept_transfer,domid为源domain,该GR表明目标domain已经同意接收源domain的page transfer了。之后源domain通过gnttab_transfer开始传递

/* * GNTTABOP_transfer_grant_ref: Transfer <frame> to a foreign domain. The * foreign domain has previously registered its interest in the transfer via * <domid, ref>. * * Note that, even if the transfer fails, the specified page no longer belongs * to the calling domain *unless* the error is GNTST_bad_page. */struct gnttab_transfer {    /* IN parameters. */    xen_pfn_t     mfn;    domid_t       domid;    grant_ref_t   ref;    /* OUT parameters. */    int16_t       status;};typedef struct gnttab_transfer gnttab_transfer_t;DEFINE_XEN_GUEST_HANDLE(gnttab_transfer_t);

GNTTABOP_copy用于把源domain的内存内容拷贝到目标domain中,显而易见的是xen很适合做这类操作因为hypervisor能看到所有domain的内存分布,同时这种操作不用刷新TLB因此代价不一定比map更高(一个是cpu内存总线的锁,一个是cpu TLB cache的刷,很难说谁的代价更高,在intel SNB下有NUMA的支持,cpu和内存之间的延迟更低,同步开销更小,笔者觉得copy的代价甚至还要低于map)

/* * GNTTABOP_copy: Hypervisor based copy * source and destinations can be eithers MFNs or, for foreign domains, * grant references. the foreign domain has to grant read/write access * in its grant table. * * The flags specify what type source and destinations are (either MFN * or grant reference). * * Note that this can also be used to copy data between two domains * via a third party if the source and destination domains had previously * grant appropriate access to their pages to the third party. * * source_offset specifies an offset in the source frame, dest_offset * the offset in the target frame and  len specifies the number of * bytes to be copied. */#define _GNTCOPY_source_gref      (0)#define GNTCOPY_source_gref       (1<<_GNTCOPY_source_gref)#define _GNTCOPY_dest_gref        (1)#define GNTCOPY_dest_gref         (1<<_GNTCOPY_dest_gref)struct gnttab_copy {    /* IN parameters. */    struct {        union {            grant_ref_t ref;            xen_pfn_t   gmfn;        } u;        domid_t  domid;        uint16_t offset;    } source, dest;    uint16_t      len;    uint16_t      flags;          /* GNTCOPY_* */    /* OUT parameters. */    int16_t       status;};typedef struct gnttab_copy  gnttab_copy_t;DEFINE_XEN_GUEST_HANDLE(gnttab_copy_t);

gnttab_copy调用了__gnttab_copy,最终是通过memcpy来完成整个内容的拷贝的,后续详细分析该函数



0 0