英特尔虚拟化技术发展蓝图

来源:互联网 发布:淘宝网2017女装春装 编辑:程序博客网 时间:2024/06/05 21:54


当前非常热门的Virtualization虚拟化技术的出现和应用其实已经有数十年的历史了,在早期,这个技术主要应用在服务器以及大型主机上面,现在,随着PC性能的不断增长,Virtualization也开始逐渐在x86架构上流行起来。

虚拟化技术将各种资源虚拟出多台主机,以提高这些资源的共享率和利用率

虚拟化可以将IT环境改造成为更加强大、更具弹性、更富有活力的架构。通过把多个操作系统整合到一台高性能服务器上,最大化利用硬件平台的所有资源,用更少的投入实现更多的应用,还可以简化IT架构,降低管理资源的难度,避免IT架构的非必要扩张。客户虚拟机的真正硬件无关性还可以实现虚拟机的运行时迁移,可以实现真正的不间断运行,从而最大化保持业务的持续性,而不用为购买超高可用性平台而付出高昂的代价。

和Sun上的虚拟化技术(CPU分区)比起来,x86上的虚拟化要落后不少的,然而确实在不断进步着,在数年前,x86上还没有什么硬件支持,甚至连指令集都不是为虚拟化而设计,这时主要靠完全的软件来实现虚拟化,当时的代表是VMware的产品,以及尚未被Microsoft收购Connectix开发的Virtual PC,在服务器市场上应用的主要是VMware的产品,包括GSX Server和稍后的ESX Server,这些软件虚拟化产品在关键指令上都采用了二进制模拟/翻译的方法,开销显得比较大,后期出现了Para-Virtualization部分虚拟化技术,避免了一些二进制转换,性能得到了提升,不过仍然具有隔离性的问题。

今天,虚拟化技术的各方面都有了进步,虚拟化也从纯软件逐深入到处理器级虚拟化,再到平台级虚拟化乃至输入/输出级虚拟化,代表性技术就是Intel Virtualization Technology for Directed I/O,简写为Intel VT-d,在介绍这个Intel VT-d之前,我们先来看看x86硬件虚拟化的第一步:处理器辅助虚拟化技术,也就是Intel Virtualization Technology,分为对应Itanium平台的VT-i和对应x86平台的VT-x两个版本。AMD公司也有对应的技术AMD-V,用于x86平台。我们介绍的是x86平台上的VT-x技术,VT-i技术原理上略为相近。

纯软件虚拟化主要的问题是性能和隔离性。Full Virtualization完全虚拟化技术可以提供较好的客户操作系统独立性,不过其性能不高,在不同的应用下,可以消耗掉主机10%~30%的资源。而OS Virtualization可以提供良好的性能,然而各个客户操作系统之间的独立性并不强。无论是何种软件方法,隔离性都是由Hypervisor软件提供的,过多的隔离必然会导致性能的下降。


这些问题主要跟x86设计时就没有考虑虚拟化有关。我们先来看看x86处理器的Privilege特权等级设计。

x86架构为了保护指令的运行,提供了指令的4个不同Privilege特权级别,术语称为Ring,从Ring 0~Ring 3。Ring 0的优先级最高,Ring 3最低。各个级别对可以运行的指令有所限制,例如,GDT,IDT,LDT,TSS等这些指令就只能运行于Privilege 0,也就是Ring 0。要注意Ring/Privilege级别和我们通常认知的进程在操作系统中的优先级并不同。

操作系统必须要运行一些Privilege 0的特权指令,因此Ring 0是被用于运行操作系统内核,Ring 1和Ring 2是用于操作系统服务,Ring 3则是用于应用程序。然而实际上并没有必要用完4个不同的等级,一般的操作系统实现都仅仅使用了两个等级,即Ring 0和Ring 3,如图所示:

也就是说,在一个常规的x86操作系统中,系统内核必须运行于Ring 0,而VMM软件以及其管理下的Guest OS却不能运行于Ring 0——因为那样就无法对所有虚拟机进行有效的管理,就像以往的协同式多任务操作系统(如,Windows 3.1)无法保证系统的稳健运行一样。在没有处理器辅助的虚拟化情况下,挑战就是采用Ring 0之外的等级来运行VMM (Virtual Machine Monitor,虚拟机监视器)或Hypervisor,以及Guest OS。

现在流行的解决方法是Ring Deprivileging(暂时译为特权等级下降),并具有两种选择:客户OS运行于Privilege 1(0/1/3模型),或者Privilege 3(0/3/3模型)。

无论是哪一种模型,客户OS都无法运行于Privilege 0,这样,如GDT,IDT,LDT,TSS这些特权指令就必须通过模拟的方式来运行,这会带来很明显的性能问题。特别是在负荷沉重、这些指令被大量执行的时候。

同时,这些特权指令是真正的“特权”,隔离不当可以严重威胁到其他客户OS,甚至主机OS。Ring Deprivileging技术使用IA32架构的Segment Limit(限制分段)和Paging(分页)来隔离VMM和Guest OS,不幸的是EM64T的64bit模式并不支持Segment Limit模式,要想运行64bit操作系统,就必须使用Paging模式。

对于虚拟化而言,使用Paging模式的一个致命之处是它不区分Privileg 0/1/2模式,因此客户机运行于Privileg 3就成为了必然(0/3/3模型),这样Paging模式才可以将主机OS和客户OS隔离开来,然而在同一个Privileg模式下的不同应用程序(如,不同的虚拟机)是无法受到Privileg机构保护的,这就是目前IA32带来的隔离性问题,这个问题被称为Ring Compression

IA32不支持VT,就无法虚拟64-bit客户操作系统

这个问题的实际表现是:VMware在不支持Intel VT的IA32架构CPU上无法虚拟64-bit客户操作系统,因为无法在客户OS之间安全地隔离。


作为一个芯片辅助(Chip-Assisted)的虚拟化技术,VT可以同时提升虚拟化效率和虚拟机的安全性,下面我们就来看看Intel VT带来了什么架构上的变迁。我们谈论的主要是IA32上的VT技术,一般称之为VT-x,而在Itanium平台上的VT技术,被称之为VT-i。

VT-x将IA32的CU操作扩展为两个forms(窗体):VMX root operation(根虚拟化操作)和VMX non-root operation(非根虚拟化操作),VMX root operation设计来供给VMM/Hypervisor使用,其行为跟传统的IA32并无特别不同,而VMX non-root operation则是另一个处在VMM控制之下的IA32环境。所有的forms都能支持所有的四个Privileges levels,这样在VMX non-root operation环境下运行的虚拟机就能完全地利用Privilege 0等级。

两个世界:VMX non-root和VMX root

和一些文章认为的很不相同,VT同时为VMM和Guest OS提供了所有的Privilege运行等级,而不是只让它们分别占据一个等级:因为VMM和Guest OS运行于不同的两个forms。

由此,GDT、IDT、LDT、TSS等这些指令就能正常地运行于虚拟机内部了,而在以往,这些特权指令需要模拟运行。而VMM也能从模拟运行特权指令当中解放出来,这样既能解决Ring Aliasing问题(软件运行的实际Ring与设计运行的Ring不相同带来的问题),又能解决Ring Compression问题,从而大大地提升运行效率。Ring Compression问题的解决,也就解决了64bit客户操作系统的运行问题。

为了建立这种两个虚拟化窗体的架构,VT-x设计了一个Virtual-Machine Control Structure(VMCS,虚拟机控制结构)的数据结构,包括了Guest-State Area(客户状态区)和Host-State Area(主机状态区),用来保存虚拟机以及主机的各种状态参数,并提供了VM entry和VM exit两种操作在虚拟机与VMM之间切换,用户可以通过在VMCS的VM-execution control fields里面指定在执行何种指令/发生何种事件的时候,VMX non-root operation环境下的虚拟机就执行VM exit,从而让VMM获得控制权,因此VT-x解决了虚拟机的隔离问题,又解决了性能问题。

我们可以看到,Inter VT的出现,可以解决了重要的虚拟处理器架构问题,让纯软件虚拟化解决方案的性能问题得以大大缓解。然而要做的事情还有很多。

我们知道对于服务器而言,很重要的一个组成部分就I/O,CPU的计算能力提升虽然可以更快地处理数据,但是前提是数据能够顺畅的到达CPU,因此,无论是存储,还是网络,以及图形卡、内存等,I/O能力都是企业级架构的一个重要部分。为此,人们不但在传输带宽上投资(比如从百兆以太网到千兆以太网再到万兆以太网),还在各种系统和架构上进行了大量的投入(比如吞吐量更高的RAID系列、多层数据中心)。

在虚拟化技术中,随着整体处理器资源的利用效率的提升,对数据I/O也提出了更高的要求。


我们可以看到,Inter VT的出现,可以解决了重要的虚拟处理器架构问题,让纯软件虚拟化解决方案的性能问题得以大大缓解。然而要做的事情还有很多。
  我们知道对于服务器而言,很重要的一个组成部分就I/O,CPU的计算能力提升虽然可以更快地处理数据,但是前提是数据能够顺畅的到达CPU,因此,无论是存储,还是网络,以及图形卡、内存等,I/O能力都是企业级架构的一个重要部分。为此,人们不但在传输带宽上投资(比如从百兆以太网到千兆以太网再到万兆以太网),还在各种系统和架构上进行了大量的投入(比如吞吐量更高的RAID系列、多层数据中心)。
  在虚拟化技术中,随着整体处理器资源的利用效率的提升,对数据I/O也提出了更高的要求。
  VMM虚拟机管理器必须提供I/O虚拟化来支持处理来自多个客户机的I/O请求,当前的虚拟化技术采用下列的方式来处理I/O虚拟化。
  

  模拟I/O设备:VMM对客户机摸拟一个I/O设备,通过完全模拟设备的功能,客户机可以使用对应真实的驱动程序,这个方式可以提供完美的兼容性(而不管这个设备事实上存不存在),但是显然这种模拟会影响到性能。作为例子,各种虚拟机在使用软盘映像提供虚拟软驱的时候,就运行在这样的方式,以及Virtual PC的模拟的真实的S3 Virge 3D显卡,VMware系列模拟的Sound Blaster 16声卡,都属于这种方式。
  

  额外软件界面:这个模型比较像I/O模拟模型,VMM软件将提供一系列直通的设备接口给虚拟机,从而提升了虚拟化效率,这有点像Windows操作系统的DirectX技术,从而提供比I/O模拟模型更好的性能,当然兼容性有所降低,例如VMware模拟的VMware显卡就能提供不错的显示速度,不过不能完全支持DirectDraw技术,Direct3D技术就更不用想了。相似的还有VMware模拟的千兆网卡,等等,这些品牌完全虚拟的设备(例如,VMware牌显卡,VMware牌网卡)需要使用特制的驱动程序部分直接地和主机、硬件通信,比起以前完全模拟的通过虚拟机内的驱动程序访问虚拟机的十兆百兆网卡,可以提供更高的吞吐量。
  现在的I/O设备虚拟化主要是采用模拟方式或者软件接口方式,因此性能上很容易成为瓶颈——毕竟传统的机器上,I/O设备都很容易成为瓶颈,因此Intel就适时提出了Intel Virtualization Technology for Directed I/O,简称为Intel VT-d。
  I/O虚拟化的关键在于解决I/O设备与虚拟机数据交换的问题,而这部分主要相关的是DMA直接内存存取,以及IRQ中断请求,只要解决好这两个方面的隔离、保护以及性能问题,就是成功的I/O虚拟化。
  

  和处理器上的Intel VT-i和VT-x一样,Intel VT-d技术是一种基于North Bridge北桥芯片(或者按照较新的说法:MCH)的硬件辅助虚拟化技术,通过在北桥中内置提供DMA虚拟化和IRQ虚拟化硬件,实现了新型的I/O虚拟化方式,Intel VT-d能够在虚拟环境中大大地提升 I/O 的可靠性、灵活性与性能。
  传统的IOMMUs(I/O memory management units,I/O内存管理单元)提供了一种集中的方式管理所有的DMA——除了传统的内部DMA,还包括如AGP GART、TPT、RDMA over TCP/IP等这些特别的DMA,它通过在内存地址范围来区别设备,因此容易实现,却不容易实现DMA隔离,因此VT-d通过更新设计的IOMMU架构,实现了多个DMA保护区域的存在,最终实现了DMA虚拟化。这个技术也叫做DMA Remapping。
  I/O设备会产生非常多的中断请求,I/O虚拟化必须正确地分离这些请求,并路由到不同的虚拟机上。传统设备的中断请求可以具有两种方式:一种将通过I/O中断控制器路由,一种是通过DMA写请求直接发送出去的MSI(message signaled interrupts,消息中断),由于需要在DMA请求内嵌入目标内存地址,因此这个架构须要完全访问所有的内存地址,并不能实现中断隔离。
  VT-d实现的中断重映射(interrupt-remapping)架构通过重新定义MSI的格式来解决这个问题,新的MSI仍然是一个DMA写请求的形式,不过并不嵌入目标内存地址,取而代之的是一个消息ID,通过维护一个表结构,硬件可以通过不同的消息ID辨认不同的虚拟机区域。VT-d实现的中断重映射可以支持所有的I/O源,包括IOAPICs,以及所有的中断类型,如通常的MSI以及扩展的MSI-X。
  VT-d进行的改动还有很多,如硬件缓冲、地址翻译等,通过这些种种措施,VT-d实现了北桥芯片级别的I/O设备虚拟化。VT-d最终体现到虚拟化模型上的就是新增加了两种设备虚拟化方式:
  

  左边是传统的I/O模拟虚拟化,右边是直接I/O设备分配
  直接I/O设备分配:虚拟机直接分配物理I/O设备给虚拟机,这个模型下,虚拟机内部的驱动程序直接和硬件设备直接通信,只需要经过少量,或者不经过VMM的管理。为了系统的健壮性,需要硬件的虚拟化支持,以隔离和保护硬件资源只给指定的虚拟机使用,硬件同时还需要具备多个I/O容器分区来同时为多个虚拟机服务,这个模型几乎完全消除了在VMM中运行驱动程序的需求。例如CPU,虽然CPU不算是通常意义的I/O设备——不过它确实就是通过这种方式分配给虚拟机,当然CPU的资源还处在VMM的管理之下。
  I/O设备共享:这个模型是I/O分配模型的一个扩展,对硬件具有很高的要求,需要设备支持多个功能接口,每个接口可以单独分配给一个虚拟机,这个模型无疑可以提供非常高的虚拟化性能表现。
  运用VT-d技术,虚拟机得以使用直接I/O设备分配方式或者I/O设备共享方式来代替传统的设备模拟/额外设备接口方式,从而大大提升了虚拟化的I/O性能。
   

  主流双路Xeon Stoakley平台将支持Intel VT-d技术
  

  高端四路Caneland平台也会支持VT-d功能
  根据资料表明,不日发布的Stoakley平台和Caneland平台上将包含VT-d功能,Stoakley平台是现在的Bensley的下一代产品,用于双路Xeon处理器,而Caneland则是Truland的继任者,用于四路Xeon处理器,这些芯片组都能支持最新的45nm Penryn处理器。
  

  从Intel虚拟化技术发展路线图来看,虚拟化无疑是从处理器逐渐扩展到其他设备的,从VT-i/VT-x到VT-d就非常体现了这个过程,对于关注I/O性能的企业级应用而言,完成了处理器的虚拟化和I/O的虚拟化,整个平台的虚拟化就接近完成了,因此在未来,Intel将会持续地开发VT-d技术,将各种I/O设备中加入虚拟化特性,从而提供一个强大的虚拟化基础架构。


Intel® Virtualization Technology for Directed I/O (VT-d): Enhancing Intel platforms for efficient virtualization of I/O devices

Virtualization solutions allow multiple operating systems and applications to run in independent partitions all on a single computer. Using virtualization capabilities, one physical computer system can function as multiple "virtual" systems.Intel® Virtualization Technology (Intel VT) improves the performance and robustness of today's virtual machine solutions by adding hardware support for efficient virtual machines.

Intel® Virtualization Technology for Directed I/O (VT-d) extends Intel's Virtualization Technology (VT) roadmap by providing hardware assists for virtualization solution. VT-d continues from the existing support for IA-32 (VT-x) and Itanium® processor (VT-i) virtualization adding new support for I/O-device virtualization.

Intel VT-d can help end users improve security and reliability of the systems and also improve performance of I/O devices in virtualized environment. These inherently helps IT managers reduce the overall total cost of ownership by reducing potential down time and increasing productive throughput by better utilization of the data center resources.


Introduction

To create virtual machines (or guests) a virtual machine monitor (VMM) aka hypervisor acts as a host and has full control of the platform hardware. The VMM presents guest software (the operating system and application software) with an abstraction of the physical machine and is able to retain selective control of processor resources, physical memory, interrupt management, and data I/O.

A VMM supports virtualization of I/O requests from guest software. This is done in software using either of two well known models: Emulation of devices or Paravirtualization. A general reliability and protection requirement for these or any I/O-device virtualization (IOV) models is the ability to isolate and contain device accesses to only those resources that are assigned to the device by the VMM.

Intel VT-d is the latest part of the Intel Virtualization Technology hardware architecture. VT-d helps the VMM better utilize hardware by improving application compatibility and reliability, and providing additional levels of manageability, security, isolation, and I/O performance. By using the VT-d hardware assistance built into Intel’s chipsets the VMM can achieve higher levels of performance, availability, reliability, security, and trust.

Intel® Virtualization Technology for Directed I/O provides VMM software with the following capabilities:

  • Improve reliability and security through device isolation using hardware assisted remapping
  • Improve I/O performance and availability by direct assignment of devices

Hardware Assisted Remapping for Protection

Intel VT-d enables protection by restricting direct memory access (DMA) of the devices to pre-assigned domains or physical memory regions. This is achieved by a hardware capability known as DMA-remapping. The VT-d DMA-remapping hardware logic in the chipset sits between the DMA capable peripheral I/O devices and the computer’s physical memory. It is programmed by the computer s ystem software. In a virtualization environment the system software is the VMM. In a native environment where there is no virtualization software, the system software is the native OS. DMA-remapping translates the address of the incoming DMA request to the correct physical memory address and perform checks for permissions to access that physical address, based on the information provided by the system software.

Intel VT-d enables system software to create multiple DMA protection domains. Eachprotection domain is an isolated environment containing a subset of the host physical memory. Depending on the software usage model, a DMA protection domain may represent memory allocated to avirtual machine (VM), or the DMA memory allocated by a guest-OS driver running in a VM or as part of the VMM itself. The VT-d architecture enables system software to assign one or more I/O devices to a protection domain. DMA isolation is achieved by restricting access to a protection domain's physical memory from I/O devices not assigned to it by using address-translation tables.  This provides the necessary isolation to assure separation between each virtual machine’s computer resources.

When any given I/O device tries to gain access to a certain memory location, DMA remapping hardware looks up the address-translation tables for access permission of that device to that specific protection domain. If the device tries to access outside of the range it is permitted to access, the DMA remapping hardware blocks the access and reports a fault to the system software. Please seeFigure 1.

Figure-1: VT-d DMA Remapping. Device-1 is not assigned to Domain-C, so when Device-1 tries to access Domain-C memory location range, it is restricted by the VT-d hardware.

To improve the performance, frequently used remapping-structure entries such as mapping of I/O devices to protection domains and page-table entries for DMA address translation, are cached. VT-d also supports the Peripheral Component Interconnect Special Interest Group (PCI-SIG) Address Translation Services (ATS) specification, which specifies standard means to allow caching of device specific DMA-translations in the endpoint device.


I/O performance through direct Assignment

Virtualization allows the creation of multiple virtual machines on a single server. This consolidation maximizes server hardware utilization, but server applications require a significant amount of I/O performance. Software based I/O virtualization methods use emulation of the I/O devices. With this emulation layer the VMM provides a consistent view of a hardware device to the VMs and the device can be shared amongst many VMs. However it could also slow down the I/O performance of high I/O performance devices. VT-d can address loss of native performance or of native capability of a virtualized I/O device by directly assigning the device to a VM.

In this model, the VMM restricts itself to a controlling function for enabling direct assignment of devices to its partitions. Rather than invoking the VMM for all (or most) I/O requests from a partition, the VMM is invoked only when guest software accesses protected resources (such as I/O configuration accesses, interrupt management, etc.) that impact system functionality and isolation.

To support direct VM assignment of I/O devices, a VMM must enforce isolation of DMA requests. I/O devices can be assigned to domains, and the DMA remapping hardware can be used to restrict DMA from an I/O device to the physical memory presently owned by its domain.

When a VM or a Guest is launched over the VMM, the address space that the Guest OS is provided as its physical address range, known asGuest Physical Address (GPA), may not be the same as the real Host Physical Address (HPA). DMA capable devices need HPA to transfer the data to and from physical memory locations. However, in a direct assignment model, the guest OS device driver is in control of the device and is providing GPA instead of HPA required by the DMA capable device. DMA remapping hardware can be used to do the appropriate conversion. Since the GPA is provided by the VMM it knows the conversion from the GPA to the HPA. The VMM programs the DMA remapping hardware with the GPA to HPA conversion information so the DMA remapping hardware can perform the necessary translation. Using the remapping, the data can now be transferred directly to the appropriate buffer of the guests rather than going through an intermediate software emulation layer.

Figure 2 - Software Emulation based I/O vs. Hardware based Direct Assignment I/O

Figure 2 illustrates software emulation based I/O in comparison to hardware direct assignment based I/O. In the emulation based I/O the intermediate software layer controls all the I/O between the VMs and the device. The data gets transferred through the emulation layer to the device and from the device to the to the emulation layer.

In the direct assignment model, the unmodified guest OS driver controls the device it is assigned. On the receive path the DMA-remapping hardware converts the GPA provided by the guest OS driver to the correct HPA, such that the data is transferred directly to the buffers of the guest OS (instead of passing through the emulation layer). Interrupt remapping support in VT-d architecture allows interrupt control to also be directly assigned to the VM, further reducing the VMM overheads.


Intel VT-d Usage Models

Enabled OSs and the VMMs can utilize the VT-d functionality of I/O memory management to isolate devices to protection domains preventing devices from performing any delinquent DMA that can effect the functioning of the system.

VT-d can become the foundation for creating secure and isolated work partitions in servers, workstations and new class of combined hardware and software offerings called virtual appliances. A virtual appliance is a self-contained execution environment solution optimized to a predefined set of applications and/or services, such as a virus scanning and firewall appliance or a hardware management appliance.

Virtual machines in a virtual environment can be segregated into different protection domains from the application end to the device end. This way, a problem with one I/O device in one domain is isolated from affecting the other domains and provides IT users with better system reliability and uptime.

Test and development environments using servers with mult iple VMs, workstations with multiple co-existing OSes running in virtualized environments can all benefit from isolated work partitions.


Server Usage Models

Many server applications are I/O intensive, especially for networking and storage. Key I/O requirements within the data center are scalability and performance. These enable server consolidation, reliability and availability as mission-critical applications are moved onto virtualized data center servers and infrastructures. Government and health care can also benefit from the isolation and security I/O virtualization provides by supporting multiple partitions with multiple OSes to meet various mission critical needs in the dynamic health care environment. Government and health care can both benefit from the added security that is the need to protect private individual’s information that these institutions regularly deal with.


Enhancing Performance

Virtualization enables the consolidation of workloads to an under-utilized server. As more work loads are consolidated the I/O usage and bandwidth requirements increase and I/O performance can become a bottleneck. To improve performance a dedicated high performance I/O device can be assigned directly to a VM that needs increased I/O performance. Intel VT-d based I/O virtualization allows high-performance I/O devices, such as multi-port gigabit and 10 gigabit network adapters, to be assigned to particular VMs where I/O performance is critical, without concerns that other VMs on the platform will affect their operation.  Intel is an active participant in the PCI-SIG driven I/O virtualization specification that is working towards having a single device natively shared amongst multiple VMs.


Enhancing Reliability and Security – Native OS and Server Consolidation

The use of multiple I/O devices in consolidated virtualized servers is increasing; up to four networking devices in a virtualized server is not uncommon. Intel VT-d can help VMMs improve reliability and security by isolating these devices to protected domains.

By controlling access of devices to specific memory ranges, end to end (VM to device) isolation can be achieved by the VMM. This helps improve security, reliability and availability.

Device isolation can be achieved in non-virtualized platforms as well. Device driver developers can use device isolation to specific memory ranges for debugging hardware or a device driver DMA that is accessing undesired memory ranges.


Getting around “Bounce Buffer” Conditions

System software using Intel VT-d DMA remapping capabilities improves performance by avoiding bounce buffer[i] conditions.  When bounce buffers are used between a 32 bit device performing DMA and a physical memory range that is inaccessible due to 32 bit address limitations, system software can use Intel VT-d DMA remapping capability to redirect the data to high memory rather than performing buffer copies.


Client Usage Models

Intel® Virtualization Technology (Intel® VT) enables deployment of self-contained virtual appliances from third party vendors to perform vital security and management services for activities such as deep packet inspection and policy compliance on desktop PCs with Intel® vPro™ technology. These tamper resistant virtual appliances provide a more secure, stable environment for critical services and include all necessary software in a single package for greater ease and efficiency. Using VT-d with a services or manageability partition provides an isolated, controlled, and protected environment to support the client platform while assuring memory protections and I/O optimization for the virtual machines.


VT-d based Virtual Appliances

A virtual appliance is a self-contained virtual execution environment optimized to a predefined set of applications and/or services. TheLightweight Virtual Machine Monitor (LVMM) is a Virtual Machine Monitor (VMM) using Intel VT to partition a client platform into two execution environments. One is the user’s VM that can run an OS such as Windows XP* and applications that the user needs such as video or rendering applications, development and test applications and typical office applications. The second is a service partition (or Service VM) that runs aservices OS (SOS) in an isolated execution environment. The user partition owns all the devices on the platform except for (in this example) the network interface controllers. These are owned by the services partition, providing an ability to monitor and/or filter network traffic and virtualize the network devices for the other VMs on the client platform. Management applications that run in the services partition provide a remote console the ability to administer the client system in isolation of the rest of the platform and user environment.



The architecture depicted in Figure 3 shows that the network traffic flows through a physicalnetwork interface card (NIC) driver owned by the services partition. A bridge driver then routes the packets between the services partition network stack and the user partition network stack. In the user partition, a virtual NIC driver sends all outgoing packets from the user partition to the bridge driver and the bridge driver forwards them to the physical NIC.

 

Figure 3 - Client VMM architecture

This networking architecture provides a higher level of protection from malicious network traffic.  It also creates the ability to isolate malicious attacks to a single partition and its assigned resources through the use of VT and VT-d. VT-d creates a foundation for a new class of applications based on “Virtual Appliance” architecture. It performs better than a virtualization scheme that exposes a NIC device model to the user partition. In this scheme all the user partition accesses to the NIC device are intercepted and emulated to protect proliferation of malicious code.

The LVMM and the services partition have to be protected from DMA bus mastering devices mapped to the user partition. These DMA-capable devices can access the entire system memory and can intentionally or unintentionally access (read/write) memory pages hosting the LVMM and services partition code and data structures. Such accesses could compromise IT secrets or render the platform useless by memory corruption. VT-d is used to prevent these device DMA problems.

As stated before, VT-d allows two views of the system memory: Guest Physical Address (GPA) andHost Physical Address (HPA). The LVMM keeps the HPA view, the system physical address space and the user and services partitions are provided their respective GPA views. The LVMM maintains shadow page tables to translate GPA to HPA for accesses from the CPU. Similarly, using VT-d DMA remapping engines and corresponding translation tables, the LVMM maintains GPA-to-HPA mapping for all DMA-capable I/O devices.Figure 4 illustrates this usage model.

Figure 4 - VT-d usage model in the client VMM

DMA mapping is performed as follows:

  • All services partition memory pages are added to one domain such that only DMA devices mapped to services partition (NICs) can access these pages.
  • All remaining pages (except LVMM and BIOS reserved) are added to the user partition domain, and all devices except those mapped to services partition can access these pages (e.g., iGFX, PCI/PCIe add-on cards etc.).
  • The LVMM and BIOS reserved regions are protected from DMA accesses by virtue of being absent from the VT-d translation page tables.

This device-to-domain mapping has the following benefits:

  • I/O devices mapped to one domain can't access the memory of another domain. For example PCI/PCIe add-on cards in user partitions can't access the LVMM or the services partition.
  • Device drivers in the services and user partitions run without any changes to comprehend GPA-to-HPA mapping. This translation is transparently performed by VT-d hardware when the device issues an I/O request using GPA.
  • If a device misbehaves by trying to access an address outside of the mapped domain, the VT-d hardware generates a fault. This fault is captured by LVMM and is indicated to the services partition. An optional management application in the services partition can process these faults by taking appropriate actions such as displaying an error message or initiating a platform reboot, depending on the severity of the fault.

Client Usage Models

IT departments face many issues in managing assets and, at the same time, maintaining security. Here are some examples of using VT-d in client usage models.

Client Isolation and Recovery

IT departments benefit from the ability to isolate key manageability and security services from end-user access while still maintaining the same level of flexibility and performance for end-user services. Management and security services are isolated to a virtual management appliance or service partition, consequently, protecting the IT services.  Another benefit of the user and services partitions are that if the user partition were to have a critical issue the service partition or IT partition has the ability to rebuild the user partition remotely and independently.

Endpoint Access Control

Using VT-d to virtualize devices allows more secure Endpoint Access Control(EAC - Network Access Control). This allows more protection of client access to an enterprise creating better manageability of the access points. The enterprise determines the parameters of acceptability expressed in the form of an access policy. The policy is interpreted by a Policy Decision Point (PDP) which controlsPolicy Enforcement Points (PEPs) that control access. Access controls can include any of the following:

  • Unrestricted access.
  • Conditional access based on traffic filtering.
  • Restricted access where only specific resources are accessible.

EAC follows a methodology that can be broken down into the following general steps:

  • Collection – monitoring, reading and storage of security measurements of the client system.
  • Reporting – formatting collected measurements for consumption by a PDP.
  • Evaluation – interpretation of reports and organizational policies.
  • Enforcement – applies access control rules.
  • Remediation – applies configuration rules designed to bring the platform into compliance.

Outbreak Containment

IT departments continue to face the challenge of containing vulnerabilities. Viruses can enter the PC and attempt to access or harm confidential data or can proliferate throughout the enterprise.  Outbreak Containment provides containment of a threat once it is detected. Intel® VT and VT-d can help detect and contain viruses sooner, limiting the exposure in attacked systems as well as other, connected systems. The Virtual Appliance or Service Partition boot process is monitored to help ensure corrupted or rogue software is not loaded if a different VMM, Virtual Appliance program, drivers, or OS attempts to load.  Consequently, the corrupted partition can be halted and IT notified while allowing the uncorrupted environment, OS, and SW to load.

The corrupted partition can be switched to a private network to facilitate remediation or, in a known threat scenario; the client is updated with a patch to protect it against the outbreak. With a more serious situation, the client may be powered off to protect it and the rest of the network.

Embedded PC Health

Embedded PC Health reduces the client PC lifecycle costs by providing embedded asset management, provisioning, self-diagnostic, self-repair, and self-optimization capabilities within the Intel platform. This OS- independent framework, based on Intel Active Management Technology, utilizes platform-specific knowledge from Intel's processor, chipset and NIC.

Embedded PC Health main objectives are for it to be:

  • Deployable: Utilize currently deployed protocols and services in the IT environment. Minimize the need to develop and deploy new protocols and services.
  • Highly available: Provide remote management capabilities regardless of the operational state of the PC hardware or OS.
  • OS-independent: Provide a base set of platform management functions and interfaces regardless of the OS type or version installed on the PC.
  • Tamper-resistant: Prevent the end user from removing or disabling the remote management service.

Security Implications of VT-d

Trusted partitions and memory protec tion are created on the PC to allow companies and IT to better secure sensitive data. A virtual appliance or Service Partition manages the multiple secure partitions and facilitates trusted communication of information based on business segment needs and policies. Complexity creates the potential for vulnerabilities. Using VT-d means that IT departments need to add complexity only where it is needed; creating safer execution environments and improving the ability to detect and prevent attacks.

VT and VT-d enabled systems only allow code that is approved by IT staff to be loaded. If mal-ware code is in the system, an IT verified boot procedure will detect the modification and apply the appropriate remediation such as reloading a safe backup virtual image.

Network based attacks are countered by monitoring memory pages that should not change. Monitoring agents notify the VMM when an invalid page access is attempted and the VMM can respond by blocking such accesses. The Integrity Agents are themselves protected by a VM boundary where direct access between partitions is not allowed.

IT security mechanisms are based on the ability to create isolated execution environments that are less susceptible to attack. Intel VT and VT-d technology are instrumental in creating such trusted environments that can act in the case of malicious attack or hardware failure.


Intel® VT-d requirements

VT-d will be available on Intel Client, Workstation and select Server products in second half of 2007.

Hardware

  • A platform that has a chipset with VT-d support

Software enabling required for VT-d

  • A VMM (or Hypervisor) with the support required for VT-d features in the virtualization environment. No changes are required for guests running over the VMM.
  • OS enabling is required for the OS to take advantage of VT-d protection features in native OS environment or non-virtualization environment.

BIOS requirements for the platform

  • BIOS enabling is required for VT-d use. The BIOS needs to expose VT-d capabilities (e.g. # of DMA remap engines etc) to the VMM through the ACPI table.

Conclusion

The architecture of VT-d provides hardware mechanisms for building a virtualized environment with complete application-to-I/O device data transfer isolation. This enables the creation of a virtual environment with greater availability, reliability, and security. With VT-d, software developers can develop and evolve their architectures that provide fully protected sharing of I/O resources that are highly available, provide high performance, and scale to increasing I/O demands.

VT-d support on Intel platforms for I/O-device virtualization complements the existing Intel VT capability to virtualize processor and memory resources. Together, this roadmap of VT technologies offers a complete solution to provide full hardware support for the virtualization of Intel platforms. The virtualization of I/O resources is an important step toward enabling a significant set of emerging usage models in the data center, the enterprise, and the home.


Resources

  • Intel® Virtualization Technology - http://www.intel.com/technology/virtualization/index.htm
  • Technology & Research
  • Architecture
  • Silicon
  • Platform Benefits
  • Software & Applications
  • Research
  • Standards & Initiatives
  • News & Events
  • PCI-SIG I/O Virtualization (IOV) Specifications - Address Translation Services

Related Reading

  • How to Incorporate Intel Virtualization Technology into an Overview of Itanium Architecture
  • How to Solve Virtualization Challenges with VT-x and VT-i
  • Intel® Virtualization Developer Community
  • Intel® Virtualization Technology for Directed I/O

Trademark Information

Intel's trademarks may be used publicly with permission only from Intel. Fair use of Intel's trademarks in advertising and promotion of Intel products requires proper acknowledgement.

*Other names and brands may be claimed as the property of others.

[i] A “bounce buffer” is a memory area used for the temporary storage of data that is copied between an I/O device and a device-inaccessible memory area. This copying imposes significant overhead, resulting in increased latency, reduced throughput, and/or increased CPU load when performing I/O.


About the Author

Thomas Wolfgang Burger is the owner of Thomas Wolfgang Burger Consulting. He has been a consultant, instructor, writer, analyst, and applications developer since 1978. He can be reached attwburger@gmail.com.


Categories:
  • Virtualization



0 0