Documentation/virtual/kvm/locking.txt

来源：互联网发布：铁拳3知乎编辑：程序博客网时间：2024/06/05 16:26

Chinese translated version of Documentation/virtual/kvm/locking.txt

If you have any comment or update to the content, please contact the
original document maintainer directly. However, if you have a problem
communicating in English you can also ask the Chinese maintainer for
help. Contact the Chinese maintainer if this translation is outdated
or if there is a problem with the translation.

Chinese maintainer: 徐红 1534342777@qq.com
---------------------------------------------------------------------
Documentation/virtual/kvm/locking.txt的中文翻译

如果想评论或更新本文的内容，请直接联系原文档的维护者。如果你使用英文
交流有困难的话，也可以向中文版维护者求助。如果本翻译更新不及时或者翻
译存在问题，请联系中文版维护者。

中文版维护者：徐红 1534342777@qq.com
中文版翻译者：徐红 1534342777@qq.com

以下为正文

---------------------------------------------------------------------
KVM Lock Overview
=================
KVM锁概要
1. Acquisition Orders
---------------------
1.获取命令
(to be written)
（被写入）
2: Exception
------------
2：特例
Fast page fault:
快速页故障：
Fast page fault is the fast path which fixes the guest page fault out of
the mmu-lock on x86. Currently, the page fault can be fast only if the
shadow page table is present and it is caused by write-protect, that means
we just need change the W bit of the spte.
快速页故障是将MMU锁造成的访客页故障固定在X86上最快的方法。目前，只
有当阴影页表存在并且是由写保护造成的页故障才能很快，意味着我们只
需要改变spte的W位。

What we use to avoid all the race is the SPTE_HOST_WRITEABLE bit and
SPTE_MMU_WRITEABLE bit on the spte:
- SPTE_HOST_WRITEABLE means the gfn is writable on host.
- SPTE_MMU_WRITEABLE means the gfn is writable on mmu. The bit is set when
the gfn is writable on guest mmu and it is not write-protected by shadow
page write-protection.
我么用来避免竞争的是spte上的SPTE_HOST_WRITEABLE 位和SPTE_MMU_WRITEABLE
位：
- SPTE_HOST_WRITEABLE 表明gfn在主机上是可写的。
- SPTE_MMU_WRITEABLE 表明gfn在MMU上是可写的。当gfn在访客的MMU上是可写的
该位被置1，他不通过阴影页的写保护进行写保护。

On fast page fault path, we will use cmpxchg to atomically set the spte W
bit if spte.SPTE_HOST_WRITEABLE = 1 and spte.SPTE_WRITE_PROTECT = 1, this
is safe because whenever changing these bits can be detected by cmpxchg.
在快速页故障过程中，如果spte.SPTE_HOST_WRITEABLE = 1 且
spte.SPTE_WRITE_PROTECT = 1，我们将会用cmpxchg自动设置spte的W位。这样比较
安全，因为无论怎么改变，这些位都能被cmpxchg检测。

But we need carefully check these cases:
1): The mapping from gfn to pfn
The mapping from gfn to pfn may be changed since we can only ensure the pfn
is not changed during cmpxchg. This is a ABA problem, for example, below case
will happen:
但是我们需要认真检查这些情况：
1）：从gfn到pfn的映射
既然我们只能保证pfn在cmpxchg期间不会改变，从gfn到pfn的映射就有可能改变。
这是一个ABA问题，例如，下面这种情况会发生：
At the beginning:
开始的时候：
gpte = gfn1

gfn1 is mapped to pfn1 on host
在主机上gfn1被映射到pfn1上
spte is the shadow page table entry corresponding with gpte and
spte = pfn1
spte阴影页表入口对应的gpte,并且spte = pfn1

VCPU 0 VCPU0
on fast page fault path:
在快速页故障过程中：
old_spte = *spte;
pfn1 is swapped out:
spte = 0;

pfn1 is re-alloced for gfn2.

gpte is changed to point to
gfn2 by the guest:
spte = pfn1;

if (cmpxchg(spte, old_spte, old_spte+W)
mark_page_dirty(vcpu->kvm, gfn1)
OOPS!!!

We dirty-log for gfn1, that means gfn2 is lost in dirty-bitmap.
我们给gfn1标记脏日志，这意味着gfn2在脏位图中丢失。
For direct sp, we can easily avoid it since the spte of direct sp is fixed
to gfn. For indirect sp, before we do cmpxchg, we call gfn_to_pfn_atomic()
to pin gfn to pfn, because after gfn_to_pfn_atomic():
- We have held the refcount of pfn that means the pfn can not be freed and
be reused for another gfn.
- The pfn is writable that means it can not be shared between different gfns
by KSM.
对于直接sp，既然直接sp的spte已经固定在gfn上了，我们就能很简单的避免。
对于间接sp，在做复杂改变之前，我们调用gfn_to_pfn_atomic()函数，因为
gfn_to_pfn_atomic()执行之后：
我们已经掌握了pfn的引用计数，这表明pfn不能被释放，而且被另一个gfn拒绝。
pfn可写意味着他不能通过KSM在不同的gfn中共享。

Then, we can ensure the dirty bitmaps is correctly set for a gfn.
然后，我们就能保证为gfn设置了正确的脏位图。
Currently, to simplify the whole things, we disable fast page fault for
indirect shadow page.
现在，简化整个过程，我们禁用间接阴影页的快速页故障。

2): Dirty bit tracking
2）：脏位追踪
In the origin code, the spte can be fast updated (non-atomically) if the
spte is read-only and the Accessed bit has already been set since the
Accessed bit and Dirty bit can not be lost.
在源代码中，如果spte是只读的，那么他可以很快更新（非自动）。被访问的位
已经被置1，因为被访问的位和脏位不会丢失。

But it is not true after fast page fault since the spte can be marked
writable between reading spte and updating spte. Like below case:
但是快速页故障之后就被置为0，因为spte读和更新时会被标记为可写。像
下面这种情形：
At the beginning:
开始的时候：
spte.W = 0
spte.Accessed = 1

VCPU 0 VCPU0
In mmu_spte_clear_track_bits():

old_spte = *spte;

/* 'if' condition is satisfied. */
if (old_spte.Accssed == 1 &&
old_spte.W == 0)
spte = 0ull;
on fast page fault path:
快速页故障过程中：
spte.W = 1
memory write on the spte:
spte.Dirty = 1

else
old_spte = xchg(spte, 0ull)

if (old_spte.Accssed == 1)
kvm_set_pfn_accessed(spte.pfn);
if (old_spte.Dirty == 1)
kvm_set_pfn_dirty(spte.pfn);
OOPS!!!

The Dirty bit is lost in this case.
在这种情况下脏位丢失。

In order to avoid this kind of issue, we always treat the spte as "volatile"
if it can be updated out of mmu-lock, see spte_has_volatile_bits(), it means,
the spte is always atomicly updated in this case.
为了避免这种情况，如果由于MMU锁spte可以更新，我们通常把它当成不稳定的，
spte_has_volatile_bits()函数表明spte在这种情况下通常自动更新。

3): flush tlbs due to spte updated
3）：由于spte更新导致TLB刷新

If the spte is updated from writable to readonly, we should flush all TLBs,
otherwise rmap_write_protect will find a read-only spte, even though the
writable spte might be cached on a CPU's TLB.
如果spte从可写更新为只读，我们应该刷新所有的TLB，否则rmap_write_protect
会找一个只读spte，即使可写的spte可能躲在CPU的TLB中。

As mentioned before, the spte can be updated to writable out of mmu-lock on
fast page fault path, in order to easily audit the path, we see if TLBs need
be flushed caused by this reason in mmu_spte_update() since this is a common
function to update spte (present -> present).
像之前提到的，快速页故障过程中由于MMU锁，spte可以更新为可写，为了审查
过程，我们观察在mmu_spte_update()函数中，这个原因造成的更新TLB是否需要
刷新，因为这是更新spte的一种常用方法。

Since the spte is "volatile" if it can be updated out of mmu-lock, we always
atomicly update the spte, the race caused by fast page fault can be avoided,
See the comments in spte_has_volatile_bits() and mmu_spte_update().
由于spte是可变的，它会因MMU锁更新，我们总是自动更新spte，这样快速页故障
造成的竞争就能避免了。看下spte_has_volatile_bits() 和 mmu_spte_update()函
数中的说明。

3. Reference
3.参考
------------

Name: kvm_lock
Type: raw_spinlock
Arch: any
Protects: - vm_list
- hardware virtualization enable/disable
Comment: 'raw' because hardware enabling/disabling must be atomic /wrt
migration.
名称： kvm_lock
类型： raw_spinlock
Linux发行版：任何

保护：- vm_list

启用/禁用硬件虚拟化

说明：“raw”是因为硬件启用/禁用必须自动/写迁移。

Name: kvm_arch::tsc_write_lock
Type: raw_spinlock
Arch: x86
Protects: - kvm_arch::{last_tsc_write,last_tsc_nsec,last_tsc_offset}
- tsc offset in vmcb
Comment: 'raw' because updating the tsc offsets must not be preempted.
名称: kvm_arch::tsc_write_lock
类型: raw_spinlock
Linux发行版:x86
保护: - kvm_arch::{last_tsc_write,last_tsc_nsec,last_tsc_offset}
- tsc offset in vmcb
说明: “raw”是因为更新tsc偏移必须是非抢占的。

Name: kvm->mmu_lock
Type: spinlock_t
Arch: any
Protects: -shadow page/shadow tlb entry
Comment: it is a spinlock since it is used in mmu notifier.
名称: kvm->mmu_lock
类型: spinlock_t
Linux发行版本: 任何
保护: -阴影页/阴影TLB入口
说明: 由于它用于MMu通知，它是一个自旋锁.