着色器编程入门 - 顶点着色的基本原则(转)

来源：互联网发布：lol mac版国服编辑：程序博客网时间：2024/05/22 01:48

着色器编程入门 - 顶点着色的基本原则

出处：

中国游戏开发者

[ 2002-10-21 ]

作者：Wolfgang F.Engel

　　目录
　　1.1 你要学习什么
　　1.2 需要的知识/设备
　　1.3 引言的体系
　　1.4 管道线内的顶点着色
　　1.5 为什么使用顶点着色
　　1.6 顶点着色工具
　　1.7 顶点着色器的架构
　　1.8 顶点着色编程概览

　　原文：Introduction to Shader Programming Fundamentals of Vertex Shaders
　　译者：clane、wonyee（Soloeden.COM）、CGD
　　版本：the first edition（Ver 1.0）

　　自1995年3DFX Voodoo加速卡首次发布以来，我们看到个人电脑上图形处理器的性能日益增强。而且它使得游戏执行速度运行得更加的快，但其并不能证明图形处理器就拥有高质量的图像表现力。这就是迄今PC图形处理器发展的主要局限性 — 固定函数。固定函数意味着芯片设计者需要把特定图形算法的硬件代码加入到图形芯片中，并且它也限制了游戏与应用程序开发者只能在这些特定的固定算法里徘徊。
　　We have seen ever-increasing graphics performance in PCs since the release of the first 3dfx Voodoo cards in 1995. Although this performance increase has allowed PCs to run graphics faster, it arguably has not allowed graphics to run much better. The fundamental limitation thus far in PC graphics accelerators has been that they are mostly fixed-function. Fixed-function means that the silicon designers have hard-coded specific graphics algorithms into the graphics chips, and as a result the game and application developers have been limited to using these specific fixed algorithms.

　　为之十年，Pixar动画工作室现实级的RenderMan作为一种大家悉知的图形语言经住了时间的考验并且为专业人员实现高质量现实主义的渲染提供了可能。
　　For over a decade, a graphics language known as Photorealistic RenderMan from Pixar Animation Studio has with stood the test of time and has been the choice of professionals for high-quality photo-realistic rendering.

　　Pixar在使用RenderMan为电影《玩具兵人》及《臭虫的一生》开发容貌特征的时候，其达到的照片级的图形水平，让全世界的观众都为之感到惊讶。Renderman的可编程特性使它成为一种新的主力渲染技术。在计算处理上，它没有严格的限制，RenderMan要给程序员的就是最大可能的灵活性与创造性。然而，这个可编程特性是有一点限制的，它通过RenderMan的软件模拟来实现。
　　Pixar's use of RenderMan in its development of feature films such as "Toy Story" and "A Bug's Life" has resulted in a level of photorealistic graphics which have amazed audiences worldwide. RenderMan's programmability has allowed it to evolve as major new rendering techniques were invented. By not imposing strict limits on computations, RenderMan allows programmers the utmost in flexibility and creativity. However, this programmability has limited RenderMan to only software implementations.

　　现在，第一次，低成本的带有类似于RenderMan图形语言的可编程要素的基础硬件，开始进军普通消费市场了。
　　Now, for the first time, low-cost consumer hardware has reached the point where it can begin implementing the basics of programmable shading similar to the RenderMan graphics language with real-time performance.

　　3D应用编程接口（OpenGL及DirectX）一直沿着图形硬件在不断地发展进步。DirectX Graphics中最重要的一个特征就是增加了可编程管道，提供汇编语言接口处理变形与光照的硬件（顶点着色）及象素管道（象素着色）。可编程管道为开发者们提供了极大的自主权，以用其去实现以前从未见过的实时应用程序。
　　The principal 3D APIs (DirectX and OpenGL) have evolved alongside graphics hardware. One of the most important new features in DirectX Graphics is the addition of a programmable pipeline that provides an assembly language interface to the transformation and lighting hardware (vertex shader) and the pixel pipeline (pixel shader). This programmable pipeline gives the developer a lot more freedom to do things, which have never been seen in real time applications before.

　　着色编程是游戏开发者要面临着的新挑战。勇敢的面对它吧…
　　Shader programming is the new and real challenge for Game-Coders. Face it ...

1.1、你要学习什么（What You Are Going To Learn）
　　了解关于顶点着色与象素着色的编程基础。是的，你将从这开始，从零做起，在Windows家庭操作系统上准备你学习顶点及象素着色编程需要的东西。
　　This introduction covers the fundamentals of Vertex Shader and Pixel Shader Programming. You are going to learn here all the stuff necessary to start programming vertex and pixel shaders for the Windows-family of operating systems from scratch.

　　我们的学程如下：（We will deal with）

编写及编译顶点着色程序
基于顶点着色的光照处理
基于顶点着色的变换
编写及编译象素着色程序
基于象素着色的纹理贴图
纹理效果
基于象素着色的逐象素光照
Writing and compiling a vertex shader program
Lighting with vertex shaders
Transformation with vertex shaders
Writing and compiling a pixel shader program
Texture mapping with the pixel shader
Texture effects
Per-pixel lighting with pixel shaders

　　还有更多…
　　and much more ...

　　该引言部分摘自《ShaderX - Vertex and Pixel Shader Programming Tips and Tricks》一书。
　　This introduction is an excerpt from the book ShaderX - Vertex and Pixel Shader Programming Tips and Tricks, Wordware Inc., 2002, ISBN 1-55622-041-3.

1.2、你需要的知识/设备（What You Need to Know/Equipment）
　　你需要懂得一些应用在游戏引擎中的数学基础知识以及须对DirectX Graphics应用程序接口有一定的理解。因为它可以帮助你了解T&L（Transform & Lighting）管道线及SetTextureStageState()调用的具体内容。假如你需要这些主题的帮助，我建议你先看完一段介绍性的文字再开始以下的工作。例如《Beginning Direct3D Game Programming》一书可能会对你有帮助。
　　You need a basic understanding of the math typically used in a game engine and you need a basic to intermediate understanding of the DirectX Graphics API. It helps if you know how to use the Transform & Lighting (T&L) pipeline and the SetTextureStageState() calls. If you need help with these topics, I recommend working through an introductory level text first. For example "Beginning Direct3D Game Programming" might help :-).

　　你的开发系统应该具备的硬件及软件：
　　Your development system should consist of the following hardware and software:

　　假如你不是GeForce 3/4TI的物主，RADEON 8x00或（其它支持硬件着色）等效的图形加速卡也是可行的，标准汇编接口已提供了高调谐的软件顶点着色器，就比如AMD及Intel都对他们的中央处理器做了性能优化。当我们没有发现兼容顶点着色硬件的设备时，软件模拟就可投入使用了。但与此相反的是，象素着色器无法得到可依靠的软件仿真途径。
　　If you are not a lucky owner of a GeForce3/4TI, RADEON 8x00 or an equivalent graphics card (that supports Shaders in hardware), the standardized assembly interface will provide highly-tuned software vertex shaders that AMD and Intel have optimized for their CPUs. These software implementations should jump in, when there is no vertex shader capable hardware found. There is no comparable software-emulation fallback path for pixel shaders.

1.3、引言的体系（How This Introduction is Organized）
　　在这四章中，我们的工作就是通过基础学习达到高级编程水平，首先，学习顶点着色而后是象素着色。我们行路图如下：
　　We work through the fundamentals to a more advanced level in four chapters, first for vertex shaders and later for pixel shaders. Our road map looks like this:

顶点着色的基本原理（Fundamentals of Vertex Shaders）
顶点着色的程序设计（Programming Vertex Shaders）
象素着色的基本原理（Fundamentals of Pixel Shaders）
象素着色的程序设计（Programming Pixel Shaders）

　　让我们从Direct3D内的顶点着色部分开始吧……
　　Let's start by examining the place of vertex shaders in the Direct3D pipeline ...

1.4、管道线内的顶点着色（Vertex Shaders in the Pipeline）
　　下面的图表显示出源数据或多边形，在Direct3D 管道线内处理顶点及象素着色是一个很简单的过程：
　　The following diagram shows the Source or Polygon, Vertex and Pixel Operations level of the Direct3D pipeline in a very simplified way:

图1 Direct3D管道线
　　在源数据层，顶点被组合及镶嵌。高维图元模块的工作是处理镶嵌高维图元，例如N-Patche（ATI RADEON 8500提供了硬件支持），五次贝塞尔曲线，B样条及矩形与三角形（RT）的修正。GPU支持RT-Patches突变三角形与顶点的高维线段及表面。
　　On the source data level, the vertices are assembled and tessellated. This is the high-order primitive module, which works to tessellate high-order primitives such as N-Patches (as supported by the ATI RADEON 8500 in hardware), quintic Béziers, B-splines and rectangular and triangular (RT) patches. A GPU that supports RT-Patches breaks higher-order lines and surfaces into triangles and vertices.

　　It appears that, beginning with the 21.81 drivers, NVIDIA no longer supports RT-patches on the GeForce3/4TI.

　　GPU支持N-Patches为输入数据的每个三角形产生Bézier三角形的控制点。这个控制网格是基于源三角形的位置及法线的。Bézier表面然后被镶嵌与求值，这以在碎片上创建更多的三角形［Vlachos01］。
　　A GPU that supports N-Patches generates the control points of a Bézier triangle for each triangle in the input data. This control mesh is based on the positions and normals of the original triangle. The Bézier surface is then tessellated and evalueaded, creating more triangels on chip [Vlachos01].

　　The N-Patches functionality was enhanced in Direct3D 8.1. There is more control over the interpolation order of the positions and normals of the generated vertices. The new D3DRS_POSITIONORDER and D3DRS_NORMALORDER render states control this interpolation order. The position interpolation order can be set to either D3DORDER_LINEAR or D3DORDER_CUBIC.

　　The normal interpolation order can be set to either D3DORDER_LINEAR or D3DORDER_QUADRATIC. In Direct3D 8.0, the position interpolation was hard wired to D3DORDER_CUBIC and the normal interpolation was hard wired to D3DORDER_LINEAR. Note: If you use N-Patches together with programmable vertex shaders, you have to store the position and normal information in input registers v0 and v3. That's because the N-Patch Tesselator needs to know where these informations are to notify the driver.

　　显示在图1的下一段包含了Direct3D管道线顶点操作的过程。在那里有两个关于顶点处理的不同方法。
　　The next stage shown in Figure 1 covers the vertex operations in the Direct3D pipeline. There are two different ways of processing vertices.

　　1）“固定函数管道线”。这是标准的变换与光照（Transform & Lighting）管道线，在那，函数本质上是不变的。T&L管道线会受设置的渲染状态，矩阵，及光照与材质参数的约束。
　　The "fixed-function" pipeline. This is the standard Transform & Lighting (T&L) pipeline, where the functionality is essentially fixed. The T&L pipeline can be controlled by setting render states, matrices, and lighting and material parameters.

　　2）顶点着色器。这是DirectX 8引入的一种新机制。它取代了设置参数控制管道线的局限，你写的顶点着色程序将直接在图形硬件上执行。
　　Vertex Shaders. This is the new mechanism introduced in DirectX 8. Instead of setting parameters to control the pipeline, you write a vertex shader program that executes on the graphics hardware.

　　我们的兴趣集中在顶点着色上。你显而易见的可看到图1的这幅简图，在顶点着色之后的操作的管道状态，面拣选，用户裁减平面，视锥拣选，齐次（代数式中所有的项都同次的）除法及视口映射。因此这些状态可以是固定的，也可以是受顶点着色器控制。
　　A vertex shader is also not capable of writing to other vertices than the one it currently shades. It is also not capable of creating vertices; it generates one output vertex from each vertex it receives as input. 　　Our focus is on Vertex Shaders. It is obvious from this simplified diagram in Figure 1 that Face Culling, User Clip Planes, Frustrum Clipping, Homogenous Divide and Viewport Mapping operate on pipeline stages after the vertex shader. Therefore these stages are fixed and can't be controlled by a vertex shader. A vertex shader is also not capable of writing to other vertices than the one it currently shades. It is also not capable of creating vertices; it generates one output vertex from each vertex it receives as input.

　　那么，我们使用顶点着色的性能及利益在那里呢？
　　So what are the capabilities and benefits of using Vertex Shaders ?

1.5、为什么使用顶点着色（Why use Vertex Shaders ?）
　　假如你要使用顶点着色，你就应该回避固定函数管道线或T&L管道线。你也许会问为什么要跳过它们？
　　If you use Vertex Shaders, you bypass the fixed-function pipeline or T&L pipeline. Why would you want to skip them ?

　　因为传统型硬件的T&L管道线不支持流行的顶点特性运算，其处理经常是在几何引擎与CPU之间共享工作。有时，这会导致冗余性。
　　Because the hardware of a traditional T&L pipeline doesn't support all of the popular vertex attribute calculations on its own, processing is often job shared between the geometry engine and the CPU. Sometimes, this leads to redundancy.

　　缺乏自由是一个问题。在基于T&L管道线的游戏程序中使用着许多相似的效果。固定函数管道没有赋予开发者必要的开发独特及革命性图形特效的自主权。在程序上的模型中顶点着色允许更多一般语法实现指定通用操作。基于顶点着色的灵活性，为开发者提供的执行能力包括：
　　There is also a lack of freedom. Many of the effects used in games look similar with the hard-wired T&L pipeline. The fixed-function pipeline doesn't give the developer the freedom he need to develop unique and revolutionary graphical effects. The procedural model used with vertex shaders enables a more general syntax for specifying common operations. With the flexibility of the vertex shaders developers are able to perform operations including:

程序上的几何模型
用于蒙皮及顶点变形的高级顶点混合
纹理产生
高级关键帧的插补
粒子系统的渲染
实时透视图的修正
高级光照模型
首步置换式贴图
Procedural Geometry (cloth simulation, soap bubble [Isidoro/Gosslin])
Advanced Vertex Blending for Skinning and Vertex Morphing (tweening) [Gosselin]
Texture Generation [Riddle/Zecha]
Advanced Keyframe Interpolation (complex facial expression and speech)
Particle System Rendering [Le Grand]
Real-Time Modifications of the Perspective View (lens effects, underwater effect)
Advanced Lighting Models (often in cooperation with the pixel shader) [Bendel]
First Steps to Displacement Mapping [Calver]

　　用顶点着色还可能会实现很多的效果，或许在此之前没有人会注意到它。例如，近年来大量的SIGGRAPH论文描述着的图形效果，迄今为止它们都只能运行在SGI的图形硬件上。不过也可移植到带有顶点及象素着色的普通消费硬件上去实现那些效果，这确实是一大挑战。
　　And there are a many more effects possible with vertex shaders, perhaps effects that nobody thought of before. For example a lot of SIGGRAPH papers from the last couple of years describe graphical effects, that are realized only on SGI hardware so far. It might be a great challenge to port these effects with the help of vertex and pixel shaders to consumer hardware.

　　除此之外它们还增强了开发者及艺术家的创造性能力，着色器同样可以解决束缚在视频显存带宽的问题。举个例子来说，Bézier片。假设每个顶点有两个浮点值（加上每个图元固定的值数），一个负责顶点着色产生的位置，一个是法线及纹理坐标号。顶点着色器一般会给你变性的压缩位置，法线，颜色，矩阵和纹理坐标数据及一些不带附加成本的有价值的带段。
　　In addition to opening up creative possibilities for developers and artists, shaders also attack the problem of constrained video memory bandwidth by executing on-chip on shader-capable hardware. Take, for example, Bézier patches. Given two floating point values per vertex (plus a fixed number of values per primitive), one can design a vertex shader to generate a position, a normal and a number of texture coordinates. Vertex Shaders even give you the possibility to decompress compressed position, normal, color, matrix and texture coordinate data and to save a lot of valuable band　with without any additional cost [Calver].

　　它也为你将来学习曲线提供好处。就程序上的规划模型而言，顶点着色是绝对可升级的。由此，增加新的指令及新的寄存器对开发者是一条直观的路线。
　　And there is also a benefit for your future learning curve. The procedural programming model used by vertex shaders is very scalable. Therefore the adding of new instructions and new registers will happen in a more intuitive way for developers.

1.6、顶点着色工具（Vertex Shader Tools）
　　不久你会发觉，编写顶点着色程序需要你掌握一种特定的面向RISC（指令集电脑）的汇编语言，因为使用顶点着色时几何处理器负责规划。由此，从拥有正确的工具开始去开发着色程序是实现快速及多产的可能。
　　As you will soon see, you are required to master a specific RISC-oriented assembly language to program vertex shaders, because using the vertex shader is taking responsibility for programming the geometry processor. Therefore, it is important to get the right tools to begin to develop shaders as quickly and productivly as possible.

　　我愿意在下面的时间介绍我所知道的工具。
　　I would like to present the tools that I am aware of at the time of this writing.

　　1.6.1 NVIDIA效果浏览器（NVIDIA Effects Browser 2/3）
　　NVIDIA在提供他们自己的DirectX 8 SDK时，包含了他们所有基于DirectX 8.0上的工具，演示及演讲稿。所有演示都使用一致的框架因此叫其做效果浏览器。
　　NVIDIA provides their own DirectX 8 SDK, that encapsulates all their tools, demos and presentations on DirectX 8.0. All the demos use a consistent framework called Effects Browser.

图2 NVIDIA效果浏览器
　　效果浏览器是一个测试及开发顶点与象素着色器的奇妙工具。在左栏你可选择想看的效果名。中间栏给出实现该效果的顶点或象素着色的源代码。右栏显示运行效果。
　　The Effects Browser is a wonderful tool to test and develop vertex and pixel shaders. You can select the effect you would like to see in the left column. The middle column gives you the ability to see the source of the vertex and/or pixel shader. The right column displays the effect.

　　可在效果浏览器上运行的图形卡并不一定支持所有的效果。例如GeForce3/4TI则可以支持所有的效果。浏览器独立于你当前显卡的参数选择，因此我建议你下载NVIDIA的DirectX 8 SDK。有很多的例程，包括了详细解释，显示给你顶点与象素着色可能实现的效果。即将来到的NVIDIA Effects Browser 3将提供自动在线更新的功能。
　　Not all graphics cards will support all the effects available in the Effects Browser. GeForce3/4TI will support all the effects. Independent of your current graphic card preferences, I recommend downloading the NVIDIA DirectX 8 SDK and trying it out. The many examples, including detailed explanations, show you a variety of the effects possible with vertex and pixel shaders. The upcoming NVIDIA EffectsBrowser 3 will provide automatic online update capabilities.

　　1.6.2 NVIDIA着色调试器（NVIDIA Shader Debugger）
　　一旦你使用它，你就会过不了没有它的生活。NVIDIA着色调试器向你提供有关临时寄存器，输入流，输出寄存器，及常量存储器的当前状态的信息。This data changes interactively while stepping through the shaders. It is also possible to set instruction breakpoints as well as specific breakpoint.
　　Once you have used it, you won't live without it.The NVIDIA shader debugger provides you with information about the current state of the temporary registers, the input streams, the output registers, and the constant memory. This data changes interactively while stepping through the shaders. It is also possible to set instruction breakpoints as well as specific breakpoint.

图3 NVIDIA着色调试器
　　用户手册可能会说明所有存在的特性。你必须在Windows2000加Service Pack 1的环境下才能运行着色调试器，因为DX8及DX8.1的调试服务仅面向Windows 2000及更高的版本。这是重要的，在调试过程其间你的应用程序将使用软件顶点处理（或你已经转换到参考光栅化方式）。
　　A user manual that explains all the possible features is provided. You need at least Windows 2000 with Service Pack 1 to run the Shader Debugger because debug services in DX8 and DX8.1 are only supplied in Windows 2000 and higher. It is important that your application use software vertex processing (or you have switched to the reference rasterizer) in the runtime for the debugging process.

　　You are also able to debug pixel shaders with this debugger, but due to a bug in DirectX 8.0 the contents of t0 are never displayed correctly and user-added pixel shader breakpoints will not trigger. DirectX 8.1 fixes these issues and you receive a varning message if the application finds an installation of DirectX 8.0.

　　1.6.3 着色器都市（Shader City）
　　在这，你可以发现另外的一些顶点及象素着色工具，随同有源代码。
　　You can find another vertex and pixel shader tool, along with source code at http://www.palevich.com/3d/ShaderCity/ Designed and implemented by Jack Palevich, Shader City allows you to see any modification of the vertex and/or pixel shaders in the small client window in the left upper edge:

图4 着色器都市
　　The results of a modification of a vertex and/or pixel shader can be seen after they are saved and re-loaded. Besides your are able to load index and vertex buffers from a file. The source code for this tool might help you to encapsulate Direct3D in an ActiveX control ... so try it.

　　1.6.4 顶点着色汇编器（Vertex Shader Assembler）
　　为将一个顶点着色的ASCII文件（例如basic.vsh）编译成二进制文件（例如basic.vso），你必须使用顶点着色汇编器。至于我所知道，这有两个顶点着色的汇编程序：Microsof的顶点着色汇编程序及NVIDIA顶点/象素着色宏汇编程序。后者除了提供所有基本特征外，它还附带了不少的新特性，然而Microsof顶点着色汇编程序却给你使用D3DX效果文件的能力（自DirectX 8.1起）。
　　To compile a vertex shader ASCII file (for example basic.vsh) into a binary file (for example basic.vso), you must use a vertex shader assembler. As far as I know, there are two vertex shader assemblers: the Microsoft vertex shader assembler and the NVIDIA vertex and pixel shader macro assembler. The latter provides all of the features of the Vertex Shader Assembler plus many other features, whereas the Vertex Shader Assembler gives you the ability to also use the D3DX effect files (as of DirectX 8.1).

　　1.6.4.1 NVIDIA NVASM（Vertex and Pixel Shader Macro Assembler）
　　NVIDIA已把它提供的顶点与象素着色宏汇编程序当作它们的DirectX 8 SDK的一部分。NVASM有很强的错误报告功能。它不仅能告诉你错误行在那里，而且也能支持追踪错误。好文档会帮助你有个好的开始。NVASM是由ShaderX的作者肯尼思·赫尔利编写的，在他的ShaderX文章中会提供附加的信息。在下一章的时候我们会学习如何使用这个工具。
　　NVIDIA provides its Vertex and Pixel Shader Macro Assembler as part of their DirectX 8 SDK. NVASM has very robust error reporting built into it. It will not only tell you what line the error was on, it is also able to back track errors. Good documentation helps you get started. NVASM was written by ShaderX author Kenneth Hurley, who provides additional information in his ShaderX article [Hurley]. We will learn how to use this tool in one of the upcoming examples in the next chapter.

　　1.6.4.2 MS Vertex Shader Assembler（Microsoft Vertex Shader Assembler）
　　Microsoft的顶点着色汇编程序放在DirectX 8.1 SDK的“C:/dxsdk/bin/DXUtils”里
　　The Microsoft vertex shader assembler is delivered in the DirectX 8.1 SDK in

　　注意：DirectX 8 SDK的默认路径为“c:/mssdk”。而DirectX 8.1 SDK的默认路径为“c:/dxsdk”。
　　Note: The default path of the DirectX 8 SDK is c:/mssdk. The default path of DirectX 8.1 SDK is c:/dxsdk.

　　假如你通过命令行方式调用vsa.exe，你可进行如下操作：
　　If you call vsa.exe from the command line, you will get the following options:

　　usage: vsa -hp012

　　-h : Generate .h files (instead of .vso files)
　　-p : Use C preprocessor (VisualC++ required)

　　-0 : Debug info omitted, no shader validation performed
　　-1 : Debug info inserted, no shader validation performed
　　-2 : Debug info inserted, shader validation performed. (default)

　　我没有发现任何关于顶点着色汇编程序的文档。It is used by the D3DXAssembleShader*() methods or by the effect file method D3DXCreateEffectFromFile(), that compiles the effect file.
　　I haven't found any documentation for the Vertex Shader Assembler. It is used by the D3DXAssembleShader*() methods or by the effect file method D3DXCreateEffectFromFile(), that compiles the effect file.

　　假如你希望独立于硬件供应商（与设备无关），你应该使用Microsoft的顶点着色汇编程序。
　　If you want to be hardware-vendor independent you should use the Microsoft Vertex Shader Assembler.

　　1.6.4.3 着色组（Shader Studio）
　　ShaderX作者John Schwab开发的一个工具会对你在进行顶点及象素着色开发工作时有着极大的帮助。无论你是一个初学者或是一个高级Direct3D程序员，这工具都会为你节省大量的时间，it will allow you to get right down to development of any shader without actually writing any Direct3D code。因此你可把更多的时间花费在重要的工作上，着色器。
　　ShaderX author John Schwab has developed a tool that will greatly aid in your development of vertex and pixel shaders. Whether you are a beginner or an advanced Direct3D programmer this tool will save you a lot of time, it will allow you to get right down to development of any shader without actually writing any Direct3D code. Therefore you can spend your precious time working on what's important, the shaders.

图5 着色组：Pong光照处理
　　这工具压缩了一个完善的顶点及象素着色引擎，及几个较好的创意。最新的版本可通过以下链接得到：http://www.shaderstudio.com/。
　　The tool encapsulates a complete vertex and pixel shader engine with a few nice ideas. For a hand on tutorial and detailed explanation see [Schwab]. The newest version should be available online at: http://www.shaderstudio.com./

　　1.6.4.4 NVLink 2.x
　　NVLink是一个很值得一提的工具，它可允许你实现：
　　NVLink is a very interesting tool, that allows you to:

写的顶点着色程序可以由“片员”#beginfragment及#endfragment组成。例如：
Write vertex shaders that consists of "fragments" with #beginfragment and the #endfragment statements. For example:

#beginfragment world_transform
dp4 r_worldpos.x, v_position, c_world0
dp3 r_worldpos.y, v_position, c_world1
dp4 r_worldpos.z, v_position, c_world2
#endfragment
在NVASM 的顶点着色汇编程序文件中可加入“片员”
Assemble vertex shader files with NVASM into "fragments"
运行时可链接这些片员产生一个二进制顶点着色程序
Link those fragments to produce a binary vertex shader at run-time

　　NVLink会帮助你产生在最终用户硬件限制适合（寄存器/指令/常量）的需求量上的着色器。这工具的最吸引人的特征是它将缓冲区且即时地优化你的着色程序。NVLink显示在NVEffects浏览器上。
　　NVLink helps you to generate shaders on demand that will fit into the end-users hardware limits (registers/instructions/constants). The most attractive feature of this tool is that it will cache and optimize your shaders on the fly. NVLink is shown in the NVEffects Browser.

图6 NVLink
　　在对话框及返回的顶点着色器中你可选择顶点着色的性能，它将通过output0.nvv显示在中间栏。
　　You can choose the vertex shader capabilities in the dialog box and the resulting vertex shader will be shown in output0.nvv in the middle column.

　　Note: the NVLink 2.x example shows the implementation of the fixed-function pipeline in a vertex shader.

　　1.6.4.5 NVIDIA的Photoshop插件（NVIDIA Photoshop PlugIn's）
　　你去NVIDIA 的网站就会发现他们一直频繁的更新着两个适用于Adobe Photoshop 的插件。NVIDIA的Normal Map Generator及Photoshop Compression Plugin。用Normal Map Generator，你可以产生法向量图，例如Dot3光照。
　　You will find on NVIDIA's web-site two frequently updated plugin's for Adobe Photoshop. NVIDIA's Normal Map Generator and Photoshop compression plug in. The Normal Map Generator can generate normal maps that can be used, for example, for Dot3 lighting.

图7 NVIDIA Normal Map Generator
　　插件要求DirectX 8.0或更高的版本。动态预览窗定位在最左上角，看一个例子与CTRL＋left-mouse-button一起被移动的光照。你可能会固定或缠绕由选择或取消选择缠绕检查盒产生法向量图的边。正常的法向量图的高度值是由在规模入口域提供的高度值的尺度。
　　The plugin requires DirectX 8.0 or later to be installed. The dynamic preview window, located in the upper left corner, shows an example light that is moved with the CTRL + left-mouse-button. You are able to clamp or wrap the edges of the generated normal map by selecting or deselecting the wrap check box. The height values of the normal map can be scaled by providing a height value in the Scale entry field.

　　产生高度有不同的选项：
　　There are different options for height generation:

ALPHA - use alpha channel
AVERAGE_RGB = average R, G, B
BIASED_RGB - h = average (R, G, B) - average of whole image
RED - use red channel
GREEN - use green channel
BLUE - use blue channel
MAX - use max of R, G, B
COLORSPACE, h = 1.0 - [(1.0 - r) * (1.0 - g) * (1.0 - b)]

　　这个插件也须与层一起工作。自述文件会给你关于它的更多信息。
　　This plugin also works with layers. The readme.txt file provides you with more information about its features.

　　另一个Adobe Photoshop插件是Photoshop Compression Plugin。它的具体使用是这样的，在Adobe Photoshop里选择，然后选择文件格式。下面的对话框提供了特征的自由变化：
　　Another Adobe Photoshop plugin provided by NVIDIA is the Photoshop Compression Plugin. It is used by choosing in Adobe Photoshop and then the file format. The following dialog provides a wide variety of features:

图8 NVIDIA Compression Plugin
　　3D预览显示了不同压缩格式所返回的不同的质量等级。这工具也能产生纹理细化及将高度图转换为法向量图。插件附带的自述文件会对你使用这工具的特征有启发性帮助。就如同它名字所暗示，两者都支持Adobe Photoshop 5.0或更高的版本。
　　A 3D preview shows the different quality levels that result from different compression formats. This tool can additionally generate mip-maps and convert a height map to a normal map. The provided readme file is very instructive and explains all of the hundreds of features of this tool. As the name implies, both tools support Adobe Photoshop 5.0 and higher.

　　1.6.4.6 漫射立方图工具（Diffusion Cubemap Tool）
　　ShaderX的作者Kenneth Hurley写了这工具，它可帮助你产生漫射立方图。它从数字图像帮你抽取立方图。图像呈球反射。程序也允许你绘制一个基于立方图排除矩形外的图像。
　　ShaderX author Kenneth Hurley wrote a tool, that helps you producing diffusion cube maps. It aids in extraction of cube maps from digital pictures. The pictures are of a completely reflective ball. The program also allows you to draw an exclusion rectangle to remove the picture taker from the cube map.

　　为了分析，反射图首先在图像装载然后使用鼠标绘制这个被附在矩形上的椭圆。矩形可以伸缩及移动，因此在球的边上形成了椭圆。然后在菜单项设置与图像联合的方向。例如下面截图使用了负X及负Z方向：
　　To extract the reflection maps first load in the picture and then use the mouse to draw the ellipse enclosed in a rectangle. This rectangle should be stretched and moved so that the ellipse falls on the edges of the ball. Then set which direction is associated with the picture in the menu options. The following screenshots use the Negative X and Negative Z direction:

图9 Negative X Sphere Pic
图10 Negative Z Sphere Pic
　　立方图由“Generate”菜单选择产生。程序，源代码及更多信息可到那去寻找[Hurley]。
　　The Cube maps are generated with the "Generate" menu option. The program, the source code and much more information can be found at [Hurley].

　　1.6.4.7 Direct3D插件的DLL侦探（DLL Detective with Direct3D Plugin）
　　ShaderX的作者ádám Moravánszky写了一个叫做DLL Detective的工具。它不仅对技术性能分析有用，而且对顶点与象素着色编程也是如此：
　　ShaderX author ádám Moravánszky wrote a tool called DLL Detective. It is not only very useful as a performance analysis tool but also for vertex and pixel shader programming:

图11 DLL侦探
　　它可以截取顶点及象素着色器，分解同时写它们到文件内。许多不同的图形显示了Direct3D API在不同情形下的用法，这方法有助于发现性能泄露。你甚至可以仿真其它情形抑制API的调用。阻止CPU与GPU用法的并行性，你能锁定渲染目的缓冲区。
　　It is able to intercept vertex and pixel shaders, disassemble and write them into a file. A lot of different graphs show the usage of the Direct3D API under different conditions and help to find performance leaks this way. You can even suppress API calls to simulate other conditions. To impede the parallelism of the CPU and GPU usage, you can lock the rendertarget buffer.

　　DLL侦探是尤其适用于游戏程序的工具，或者其它运行在全屏模式下的应用程序，以防止别的窗口轻易实现访问（例如，象DLL侦探）。
　　To instrument such programs, DLL Detective can be configured to control instrumentation via a multimonitor setup, and even from another PC over a network. DLL Detective is especially suited to instrumenting games, or any other applications which run in fullscreen mode, preventing easy access to other windows (like DLL Detective, for example). To instrument such programs, DLL Detective can be configured to control instrumentation via a multimonitor setup, and even from another PC over a network.

　　全部源代码及编译好的程序都可在作者的网站上得到：http://n.ethz.ch/student/adammo/DLLDetective/index.html。
　　The full source code and compiled binaries can be downloaded from the web-site of the author at http://n.ethz.ch/student/adammo/DLLDetective/index.html.

　　1.6.4.8 3D Studio MAX 4.x / gmax 1.1
　　新的3D Studio MAX 4.x 给了美工以顶点/象素着色代码产生模型及动画的能力。
　　The new 3D Studio MAX 4.x gives a graphic artist the ability to produce vertex shader code and pixel shader code while producing the models and animations.

图12 3D Studio Max 4.x/gmax 1.1
　　A WYSIWYG view of your work will appear by displaying multitextures, true transparency, opacity mapping, and the results of custom pixel and vertex shaders. A WYSIWYG view of your work will appear by displaying multitextures, true transparency, opacity mapping, and the results of custom pixel and vertex shaders.

　　Gmax作为3D Studio Max 4.x的派生物的，其也提供了顶点与象素着色编程的支持。然而，免费产品（Gmax）不提供用户接口存取或编辑这些控制特性。在官方网站你可得到更多的信息。
　　gmax gmax as a derivative of 3D Studio Max 4.x does support vertex and pixel shader programming. However, the gmax free product provides no user interface to access or edit these controls.Find more information at discreet.

1.7、顶点着色器的架构（Vertex Shader Architecture）
　　让我们更深入顶点着色编程去看看顶点着色架构在图形上的表现力：
　　Let's get deeper into vertex shader programming by looking on a graphical representation of the vertex shader architecture:

图13 顶点着色器的架构
　　顶点着色器的所有数据都由128位四元浮点数组成（4 x 32-bit）：
　　All data in a vertex shader is represented by 128-bit quad-floats (4 x 32-bit):

图14 128位
　　因为使用一个指令但处理一组数据，一个顶点着色器（Vertex Shader）的硬件可以被看成是一个典型的SMID（单指令多数据）处理器。顶点着色器使用的这种数据格式非常的有用，因为大多数的转换和光线计算的进行都需要使用4x4的矩阵或者四元组。顶点着色指令非常的简单和容易理解。因为顶点着色器不允许任何的循环，跳转和条件分支，这意味着它仅仅是线性的执行程序，一个指令接着一个指令。顶点着色器的程序在DirectX 8.1中最长为128个指令。我们可以结合几个顶点着色器使用，一个计算转换，一个计算光照。但是在同一时候仅仅只有一个顶点着色器可以被激活，并且激活的顶点着色器必须要计算每个顶点所有需要的输出数据。
　　A hardware vertex shader can be seen as a typical SIMD (Single Instruction Multiple Data) processor for you are applying one instruction and affecting a set of up to four 32-bit variables. This data format is very useful, because most of the transformation and lighting calculations are performed using 4x4 matrices or quaternions. The instructions are very simple and easy to understand. The vertex shader does not allow any loops, jumps or conditional branches, which means that it executes the program linearly - one instruction after the other. The maximum length of a vertex shader program in DirectX 8.x is limited to 128 instructions. Combining vertex shaders to have one to compute the transformation and the next one to compute the lighting is impossible. Only one vertex shader can be active at a time and the active vertex shader must compute all required per-vertex output data.

　　一个顶点着色器使用16个输入寄存器（v0 — v15，每一个寄存器都由128位的四元浮点数构成）来读取输入的数据。通过输入寄存器，顶点着色器可以非常容易的表示一个典型顶点的数据：位置坐标，法线，漫反射颜色和镜面反射颜色，雾坐标和贴图大小信息。
　　A vertex shader use up to 16 input registers (named v0 - v15, where each register consists of 128 bit (4x32bit) quad-floats) to access vertex input data. The vertex input register can easily hold the data for a typical vertex: its position coordinates, normal, diffuse and specular color, fog coordinate and point size information with space for the coordinates of several textures.

　　常量寄存器在顶点着色开始执行指定程序之前被CPU加载。常量寄存器是只读的，一般用于储存例如光源位置、材质、特殊动画所需数据等参数。常量寄存器可以通过地址寄存器a0.x来间接寻址。常量寄存器除了在顶点着色中还可以在程序中被使用，但是在每一条指令中仅仅可以引用一个常量就寄存器。如果一条指令需要引用超过一个的常量寄存器，它只能通过暂存寄存器来引用。一般的常量寄存器为c0 — c95，但在ATI RADEOM 8500中是c0 — c191。
　　The constant registers (Constant Memory) are loaded by the CPU, before the vertex shader starts executing parameters defined by the programmer. The vertex shader is not able to write to the constant registers. They are used to store parameters such as light position, matrices, procedural data for special animation effects, vertex interpolation data for morphing/key frame interpolation and more. The constants can be applied within the program and they can even be addressed indirectly with the help of the address register a0.x, but only one constant can be used per instruction. If an instruction needs more than one constant, it must be loaded into one of the temporary regsiters before it its required. The names of the constant registers are c0 - c95 or in case of the ATI RADEON 8500 c0 - c191.

　　暂存寄存器由12个寄存器组成，是可读写的，可以用于数据的存储和读取。它们分别是r0 — r11。
　　The temporary Registers consist of 12 registers used to perform intermediate calculations. They can be used to load and store data (read/write). The names of the temporary registers are r0 - r11.

　　根据具体的硬件的不同，有至少13个输出寄存器。每个输出寄存器都以o打头。输出寄存器在光栅化时可以被使用。存在输出寄存器中的最终结果是另外的一个顶点，一个转换入“同源剪裁空间（齐次剪裁空间）”的顶点。下面的表中列出了所有可用的寄存器：
　　There are up to 13 output registers (Vertex Output), depending on the underlying hardware. The names of the output registers always start with o for output. The Vertex Output is available per rasterizer and your vertex shader program has write-only access to it. The final result is yet another vertex, a vertex transformed to the "homogenous clip space". Here is an overview of all available registers:

寄存器（Registers）
寄存器的数量（Number of Registers）
属性（Properties）

Input (v0 - v15)

RO1

Output (o*)

GeForce 3/4TI: 9; RADEON 8500: 11

Constants (c0 - c95)

vs.1.1 Specification: 96; RADEON 8500: 192

RO1

Constants (c0 - c95)

R1W3

Address (a0.x)

1 (vs.1.1 and higher)

WO (W: only with mov)

表1

1.8、顶点着色编程概览（High Level View on Vertex Shader Programming）
　　因为在同一个时候仅仅有一个顶点着色器可以被激活，为每一个基本的功能块编写一个顶点着色是一个不错的主意。一般来说在不同的顶点着色器之间切换的性能消耗要比变换一个贴图的性能消耗都要小。所以如果一个物体需要一种特殊的转换或者灯光，最好就在它的任务中给它一个恰当的顶点着色器。让我们看看下面的例子：你在一个外星球遇难了，身上穿着正规军的盔甲，但仅仅装备着一个锯子。当你在一个烛光照耀着的地下室穿行时，一个怪物出现了，然后你就躲到了一个在任何星球都很常见的箱子后面。在考虑你作为一个使用锯子拯救这个世界的英雄的命运同时，我们开始计算这个场景所需要的顶点着色器数目。
　　Only one vertex shader can be active at a time. It is a good idea to write vertex shaders on a per-task basis. The overhead of switching between different vertex shaders is smaller than for example a texture change. So if an object needs a special form of transformation or lighting it will get the proper shader for this task. Let's build an abstract example: You are shipwrecked on a foreign planet. Dressed in your regular armor, armed only with a jigsaw, you move through the candle lit cellars. A monster appears and you crouch behind one of those crates one normally find on other planets. While thinking about your destiny as a hero who saves worlds with jigsaws, you start counting the number of vertex shaders for this scene.

　　首先需要一个顶点着色器作为怪物的动画需要，光照渲染和可能存在的环境反射渲染。其他的顶点着色器将分配给地板，墙，箱子，视角，烛光和你的锯子。或许地板，墙，箱子和锯子可以使用同一个着色器，但是烛光和视角必须要有不同的着色器。这依赖于你的设计和特定图形硬件的性能。
　　There is one for the monster to animate it, light it and perhaps to reflect its environment. Other vertex shaders will be used for the floor, the walls, the crate, the camera, the candlelight and your jigsaw. Perhaps the floor, the walls, the jigsaw and the crate use the same shader, but the candlelight and the camera might each use one of their own. It depends on your design and the power of the underlying graphic hardware.

　　You might also use vertex shaders on a per-object or per-mesh basis. If for example a *.md3 model consists of, let's say, 10 meshes, you can use 10 different vertex shaders, but that might harm your game performance.

　　每一个顶点着色器驱动的程序都必须要有下面的几个步骤：
　　Every vertex shader-driven program must run through the following steps:

通过检查D3DCAPS8::VertexShaderVersion来确定顶点着色是否被支持。
使用D3DVSD_*宏来定义顶点着色器，使顶点着色器的流映射到输入寄存器。
使用SetVertexShaderConstant()来设定顶点着色常量寄存器。
使用D3DXAssembleShader*()编译刚才所写的顶点着色程序（或者你可以使用一个着色编译器进行预编译）
使用CreateVertexShader()创建一个顶点着色句柄。
使用SetVertexShader()将顶点着色与一个特定的物体相连。
使用DeleteVertexShader()删除顶点着色器。
Check for vertex shader support by checking the D3DCAPS8::VertexShaderVersion field
Declaration of the vertex shader with the D3DVSD_* macros, to map vertex buffer streams to input registers
Setting the vertex shader constant registers with SetVertexShaderConstant()
Compiling previously written vertex shader with D3DXAssembleShader*() (Alternative: could be pre-compiled using a Shader Assembler)
Creating a vertex shader handle with CreateVertexShader()
Setting a vertex shader with SetVertexShader() for a specific object
Delete a vertex shader with DeleteVertexShader()

　　1.8.1 检查顶点着色器的支持状况（Check for Vertex Shader Support）
　　检查最终用户安装的关于顶点着色器的软件和硬件接口非常的重要。如果缺少一些特殊功能的支持，程序应该使用默认的行为（例如使用T&L）或者给用户一个提示，使用户做一些使这些特殊功能得以支持的事。下面的代码段检查用户的电脑是否支持1.1版的顶点着色器：
　　It is important to check the installed vertex shader software or hardware implementation of the end-user hardware. If there is a lack of support for specific features, then the application can fallback to a default behavior or give the user a hint, as to what he might do to enable the required features. The following statement checks for support of vertex shader version 1.1:

　　if( pCaps->VertexShaderVersion < D3DVS_VERSION(1,1) )
　　　　return E_FAIL;

　　下面的代码段检查用户的电脑是否支持1.0版的顶点着色器：
　　The following statement checks for support of vertex shader version 1.0:

　　if( pCaps->VertexShaderVersion < D3DVS_VERSION(1,0) )
　　　　return E_FAIL;

　　在程序启动阶段，必须通过GetDeviceCaps()函数来得到一个D3DCCAPS8的结构caps。如果你使用DirectX 8.1 SDK中提供的DirectX框架来搭建你的应用程序，这个会被自动完成。如果检查后发现你的硬件不支持你需要的顶点着色器版本，你必须通过设置D3DCREATE_SOFTWARE_VERTEXPROCESSING属性调用CreateDevice()来切换使用软件顶点着色器。这时将由对Intel和AMD不同CPU进行优化过的软件接口来进行顶点着色执行。
　　The D3DCAPS8 structure caps must be filled in the startup phase of the application with a call to GetDeviceCaps(). If you use the Common Files Framework provided with the DirectX 8.1 SDK, this is done by the framework. If your graphics hardware does not support your requested vertex shader version, you must switch to software vertex shaders by using the D3DCREATE_SOFTWARE_VERTEXPROCESSING flag in the CreateDevice() call. The previously mentioned optimized software implementations made by Intel and AMD for their respective CPU's will then process the vertex shaders.

　　下面是不同版本DirectX支持的顶点着色器的版本：
　　Supported vertex shader versions are:

版本（Version:）
功能（Number of Registers）

0.0

DirectX 7

1.0

DirectX 8 without address register A0

1.1

DirectX 8 and DirectX 8.1 with one address register A0

2.0

DirectX 9

表2

　　在1.0和1.1之间的唯一区别就是对于a0寄存器的支持。DirectX 8.0和DirectX 8.1对应的光栅化器和Inter、AMD为他们各自CPU所写的软件模拟接口都支持1.1版本。在本文写成之时，市面上支持1.1的硬件只有RADEON 8500、GeForce3/4TI，同时要注意的是并没有只支持1.0版本的显卡，支持1.0的也肯定支持1.1，1.0只是一个过渡版本。
　　The only difference between the levels 1.0 and 1.1 is the support of the a0 register. The DirectX 8.0 and DirectX 8.1 reference rasterizer and the software emulation delivered by Microsoft and written by Intel and AMD for their respective CPUs support version 1.1. At the time of this writing, only GeForce3/4TI and RADEON 8500-driven boards support version 1.1 in hardware. No known graphics card supports vs.1.0-only at the time of writing, so this is a legacy version.

　　1.8.2 顶点着色器的定义（Vertex Shader Declaration）
　　你必须在使用顶点着色器前定义它。它的定义可以通过一个静态的外部接口来完成。看起来有可能是这个样子：
　　You must declare a vertex shader before using it. This declaration can be called a static external interface. An example might look like this:

　　float c[4] = {0.0f,0.5f,1.0f,2.0f};
　　DWORD dwDecl0[] = {
　　　　D3DVSD_STREAM(0),
　　　　D3DVSD_REG(0, D3DVSDT_FLOAT3 ),　　// 输入寄存器v0
　　　　D3DVSD_REG(5, D3DVSDT_D3DCOLOR ),　// 输入寄存器v5
　　　　　　　　　　　　　　　　　　　　　 // 设置几个常量寄存器

　　　　D3DVSD_CONST(0,1),*(DWORD*)&c[0],*(DWORD*)&c[1],*(DWORD*)&c[2],*(DWORD*)&c[3], D3DVSD_END()
　　};

　　上面的顶点着色器定义使用D3DVSD_STREAM(0)来设置它成为0号数据流。在以后SetStreamSource()将会通过这个声明绑定一个顶点buffer到设备数据流。你可以通过这种方法提供Direct3D渲染引擎不同的数据流。
　　This vertex shader declaration sets data stream 0 with D3DVSD_STREAM(0). Later, SetStreamSource() binds a vertex buffer to a device data stream by using this declaration. You are able to feed different data streams to the Direct3D rendering engine this way.

　　举个例子，我们可以用第一个数据流表示位置和法线，第二个数据流来表示颜色和贴图坐标。它也可以使在单纹理渲染和多纹理渲染之间的切换变得非常容易：只要使有第二套纹理坐标的的数据流失效就可以了。
　　For example, one data stream could hold positions and normals, while a second held color values and texture coordinates. This also makes switching between single texture rendering and multi texture rendering trivial: just don't enable the stream with the second set of texture coordinates.

　　对于哪一个顶点属性或者输入的顶点数据被映射到哪一个输入寄存器，你也必须给出定义。D3DVSD_REG将一个顶点寄存器和一个顶点数据流中的顶点元素（或者属性）加以绑定。在我们上面的例子中，D3DVSDT_FLOAT3将被放入第一个输入寄存器中而D3DVSDT_D3DCOLOR将被放入到第6个输入寄存器中。举个另外的例子，通过D3DVSD_REG(0, D3DVSDT_FLOAT3)定义关于位置的数据可以被0号输入寄存器（v0）处理，而通过D3DVSD_REG(3, D3DVSDT_FLOAT3)的定义，法线数据可以被3号输入寄存器（v3）处理。如果一个人想使用N-Patches，开发者如何将输入的顶点属性映射入不同的寄存器比较的重要，因为N-Patch的镶嵌需要它的位置数据放在v0而法线数据放在v3。否则，开发者可以自由的映射到自己看上去适合的寄存器。例如，通过D3DVSD_REG(0, D3DVSDT_FLOAT3)定义使0号输入寄存器（v0）处理关于位置的数据，而通过D3DVSD_REG(3, D3DVSDT_FLOAT3)的定义，使3号输入寄存器（v3）处理法线数据。
　　You must declare, which input vertex properties or incoming vertex data has to be mapped to which input register. D3DVSD_REG binds a single vertex register to a vertex element/property from the vertex stream. In our example a D3DVSDT_FLOAT3 value should be placed into the first input register and a D3DVSDT_D3DCOLOR color value should be placed in the sixth input register. For example the position data could be processed by the input register 0 (v0) with D3DVSD_REG(0, D3DVSDT_FLOAT3 ) and the normal data could be processed by input register 3 (v3) with D3DVSD_REG(3, D3DVSDT_FLOAT3 ). How a developer maps each input vertex property to a specific input register is only important, if one want to use N-Patches, because the N-Patch Tessellator needs the position data in v0 and the normal data in v3. Otherwise the developer is free to define the mapping as they see fit. For example the position data could be processed by the input register 0 (v0) with D3DVSD_REG(0, D3DVSDT_FLOAT3) and the normal data could be processed by input register 3 (v3) with D3DVSD_REG(3, D3DVSDT_FLOAT3).

　　与此形成对比的是在固定函数渲染管道中，映射入不同寄存器的数据是固定的。d3d8types.h中有一张关于固定函数管道渲染输入的数据的预定义。特定的顶点元素例如位置必须被放置在位于顶点输入内存中的特定寄存器。例如，顶点的位置被D3DVSDE_POSITION限定放于0号寄存器，漫反射光颜色被D3DVSDE_DIFFUSE限定放于3号寄存器。下面是d3d8types.h中的整张列表：
　　In contrast the mapping of the vertex data input to specific registers is fixed for the fixed-function pipeline. d3d8types.h holds a list of #defines that predefine the vertex input for the fixed-function pipeline. Specific vertex elements such as position or normal must be placed in specified registers located in the vertex input memory. For example the vertex position is bound by D3DVSDE_POSITION to Register 0, the diffuse color is bound by D3DVSDE_DIFFUSE to Register 5 etc.. Here's the whole list from d3d8types.h:

　　#define D3DVSDE_POSITION　　 0
　　#define D3DVSDE_BLENDWEIGHT　1
　　#define D3DVSDE_BLENDINDICES 2
　　#define D3DVSDE_NORMAL　　　3
　　#define D3DVSDE_PSIZE　　　　4
　　#define D3DVSDE_DIFFUSE　　　5
　　#define D3DVSDE_SPECULAR　　6
　　#define D3DVSDE_TEXCOORD0　　7
　　#define D3DVSDE_TEXCOORD1　　8
　　#define D3DVSDE_TEXCOORD2　　9
　　#define D3DVSDE_TEXCOORD3　　10
　　#define D3DVSDE_TEXCOORD4　　11
　　#define D3DVSDE_TEXCOORD5　　12
　　#define D3DVSDE_TEXCOORD6　　13
　　#define D3DVSDE_TEXCOORD7　　14
　　#define D3DVSDE_POSITION2　　15
　　#define D3DVSDE_NORMAL2　　　16

　　D3DVSD_REG中的第二个参数表示了纬度和算法数据类型。下面的是定义在d3d8types.h中的值：
　　The second parameter of D3DVSD_REG specifies the dimensionality and arithmetic data type. The following values are defined in d3d8types.h:

　　// bit declarations for _Type fields
　　#define D3DVSDT_FLOAT1 0x00　// 1D float expanded to (value, 0., 0., 1.)
　　#define D3DVSDT_FLOAT2 0x01　// 2D float expanded to (value, value, 0., 1.)
　　#define D3DVSDT_FLOAT3 0x02　// 3D float expanded to (value, value, value, 1.)
　　#define D3DVSDT_FLOAT4 0x03　// 4D float

　　// 4D packed unsigned bytes mapped to 0. to 1. range
　　// Input is in D3DCOLOR format (ARGB) expanded to (R, G, B, A)
　　#define D3DVSDT_D3DCOLOR 0x04

　　#define D3DVSDT_UBYTE4 0x05 　　// 4D unsigned byte
　　// 2D signed short expanded to (value, value, 0., 1.)
　　#define D3DVSDT_SHORT2 0x06
　　#define D3DVSDT_SHORT4 0x07　　 // 4D signed short

　　注意：GeForce3/4TI并不支持D3DVSDT_UBYTE4，它在D3DVTXPCAPS_NO_VSDT_UBYTE4属性中表示出来。
　　Note. GeForce3/4TI don't support D3DVSDT_UBYTE4, as indicated by the D3DVTXPCAPS_NO_VSDT_UBYTE4 caps bit.

　　D3DVSD_CONST将常量加载进顶点着色常量内存。它的第一个参数是填充有常量数据的数组的起始地址。数值的范围是0到95，如果是RADEON 8500，数值范围是0到191。在这里，我们从0开始。第二个参数指的是加载的常量向量（四元浮点数）的数量。一个向量是128bit，所以我们一次加载4个32bit的浮点数。如果你想加载一个4x4的矩阵，你可以用下面的代码段，加载4个128bit的四元浮点数到c0dd到c3寄存器:
　　D3DVSD_CONST loads the constant values into the vertex shader constant memory. The first parameter is the start address of the constant array to begin filling data. Possible values range from 0 to 95 or in case of the RADEON 8500 from 0 - 191. We start at address 0. The second number is the number of constant vectors (quad-float) to load. One vector is 128 bit long, so we load four 32-bit FLOATs at once. If you want to load a 4x4 matrix, you would use the following statement to load four 128-bit quad-floats into the constant registers c0 - c3:

　　float c[16] = (0.0f, 0.5f, 1.0f, 2.0f,
　　　　　　　　　 0.0f, 0.5f, 1.0f, 2.0f,
　　　　　　　　　 0.0f, 0.5f, 1.0f, 2.0f,
　　　　　　　　　 0.0f, 0.5f, 1.0f, 2.0f);
　　D3DVSD_CONST(0, 4), *(DWORD*)&c[0],*(DWORD*)&c[1],*(DWORD*)&c[2],*(DWORD*)&c[3],
　　　　　　　　　　　　*(DWORD*)&c[4],*(DWORD*)&c[5],*(DWORD*)&c[6],*(DWORD*)&c[7],
　　　　　　　　　　　　*(DWORD*)&c[8],*(DWORD*)&c[9],*(DWORD*)&c[10],*(DWORD*)&c[11],
　　　　　　　　　　　　*(DWORD*)&c[12],*(DWORD*)&c[13],*(DWORD*)&c[14],*(DWORD*)&c[15],

　　D3DVSD_END产生一个结束的标志表示顶点着色器定义的结束。下面给出顶点着色器的另一个定义：
　　D3DVSD_END generates an END token to mark the end of the vertex shader declaration. Another example can be:

　　float c[4] = {0.0f,0.5f,1.0f,2.0f};
　　DWORD dwDecl[] = {
　　　　D3DVSD_STREAM(0), D3DVSD_REG(0, D3DVSDT_FLOAT3 ),　// input register v0
　　　　D3DVSD_REG(3, D3DVSDT_FLOAT3 ),　　　　　　　　　　// input register v3
　　　　D3DVSD_REG(5, D3DVSDT_D3DCOLOR ),　　　　　　　　　// input register v5
　　　　D3DVSD_REG(7, D3DVSDT_FLOAT2 ),　　　　　　　　　　// input register v7
　　　　D3DVSD_CONST(0,1),*(DWORD*)&c[0],*(DWORD*)&c[1],*(DWORD*)&c[2],*(DWORD*)&c[3], D3DVSD_END()
　　};

　　在上面的例子中，D3DVSD_STREAM(0)设置它为0号数据流。位置值被放入v0，法线值放入v3，漫反射光颜色放入v5，一个材质坐标被放入v7。常量寄存器c0放置了一个128位的值。
　　Data stream 0 is set with D3DVSD_STREAM(0). The position values (value, value, value, 1.0) might be bound to v0, the normal values might be bound to v3, the diffuse color might be bound to v5 and one texture coordinate (value, value, 0.0, 1.0) might be bound to v7. The constant register c0 get one 128-bit value.

　　1.8.3 设置顶点着色常量寄存器（Setting the Vertex Shader Constant Registers）
　　你可以使用SetVertexShaderConstant()填充顶点着色寄存器，或者通过GetVertexShaderConstant()得到这些寄存器的值：
　　You will fill the vertex shader constant registers with SetVertexShaderConstant() and get the values from this registers with GetVertexShaderConstant():

　　// Set the vertex shader constants
　　m_pd3dDevice->SetVertexShaderConstant( 0, &vZero, 1 );
　　m_pd3dDevice->SetVertexShaderConstant( 1, &vOne, 1 );
　　m_pd3dDevice->SetVertexShaderConstant( 2, &vWeight, 1 );
　　m_pd3dDevice->SetVertexShaderConstant( 4, &matTranspose, 4 );
　　m_pd3dDevice->SetVertexShaderConstant( 8, &matCameraTranspose, 4 );
　　m_pd3dDevice->SetVertexShaderConstant( 12, &matViewTranspose, 4 );
　　m_pd3dDevice->SetVertexShaderConstant( 20, &fLight, 1 );
　　m_pd3dDevice->SetVertexShaderConstant( 21, &fDiffuse, 1 );
　　m_pd3dDevice->SetVertexShaderConstant( 22, &fAmbient, 1 );
　　m_pd3dDevice->SetVertexShaderConstant( 23, &fFog, 1 );
　　m_pd3dDevice->SetVertexShaderConstant( 24, &fCaustics, 1 );
　　m_pd3dDevice->SetVertexShaderConstant( 28, &matProjTranspose, 4 );

　　SetVertexShaderConstant()的宣告如下：
　　SetVertexShaderConstant() is declared as

　　HRESULT SetVertexShaderConstant(
　　　　DWORD Register,
　　　　CONST void* pConstantData,
　　　　DWORD ConstantCount);

　　As stated earlier, there are at least 96 constant registers (RADEON 8500 has 192), that can be filled with four floating-point values before the vertex shader is executed. The first parameter holds the register address at which to start loading data into the vertex constant array. The last parameter holds the number of constants (4 x 32-bit values) to load into the vertex constant array. So in the first row above, vZero will be loaded into register 0. matTranspose will be loaded into register 4, 5, 6, and 7. matViewTranspose will be loaded into 12, 13, 14, 15. The registers 16, 17, 18, 19 are not used. fLight is loaded into register 20. The registers 25, 26, 27 are not used.

　　So what's the difference between D3DVSD_CONST used in the vertex shader declaration and SetVertexShaderConstant() ? D3DVSD_CONST can be used only once. SetVertexShaderConstant() can be used before every DrawPrimitive*() call.

　　好了...我们现在已经学会了如何检测系统支持的顶点着色器的版本号，如何宣告一个顶点着色器及如何在一个顶点着色单元内的常量寄存器中设置常量。下一步，我将学习如何编写与编译一个顶点着色程序。
　　Ok ... now we have learned how to check the supported version number of the vertex shader hardware, how to declare a vertex shader and how to set the constants in the constant registers of a vertex shader unit. Next we shall learn, how to write and compile a vertex shader program.

　　1.8.4 编写和编译顶点着色程序（Writing and Compiling a Vertex Shader）
　　在我们可以编译一个顶点着色程序之前，我们必须要写一个顶点着色器...（古老的智慧）。我首先将给你们一个关于指令的大概了解，然后再在下一章中介绍关于“顶点着色器编程”的更多细节。
　　Before we are able to compile a vertex shader, we must write one ... (old wisdom

). I would like to give you a high-level overview of the instruction set first and then give further details of vertex shader programming in the next chapter named "Programming Vertex Shaders".

　　每条指令的语法结构如下：
　　The syntax for every instruction is:

　　OpName dest, [-]s1 [,[-]s2 [,[-]s3]] ;comment

　　举例来说（e.g.）：

　　mov r1, r2
　　mad r1, r2, -r3, r4 ;contents of r3 are negated

　　这里共有17种不同的指令：
　　There are 17 different instructions:

指令（Instruction）
参数（Parameters）
用途（Action）

add

dest,
src1, src2

将src1加到src1（可通过在src2前加-，选择为减）

dp3
dest,
src1, src2
三维点的乘积
　　dest.x = dest.y = dest.z = dest.w =
(src1.x * src2.x) + (src1.y * src2.y) + (src1.z * src2.z)

dp4
dest,
src1, src2
四维点的乘积
　　dest.w = (src1.x * src2.x) + (src1.y * src2.y) + (src1.z * src2.z) + (src1.w * src2.w);
　　dest.x = dest.y = dest.z = the scalar result of dp4;

dp4和mul的区别是什么呢？dp4产生一个标量的结果，而mul是一个分量乘分量的向量乘积。

dst
dest,
src1, src2
dst指令将这样工作：第一个源操作数（src1）将被看作是这样一个向量（忽略，d*d，d*d，忽略），第二个源操作数（src2）将被看作向量（忽略，1/d，忽略，1/d）。

计算结果向量：

　　dest.x = 1;
　　dest.y = src1.y * src2.y
　　dest.z = src1.z
　　dest.w = src2.w

　　在计算标准衰减的时候dst非常的有益。下面是计算一个点光源的衰减的代码段：

　　; r7.w = distance * distance = (x*x) + (y*y) + (z*z)
　　dp3 r7.w, VECTOR_VERTEXTOLIGHT, VECTOR_VERTEXTOLIGHT

　　; VECTOR_VERTEXTOLIGHT.w = 1/sqrt(r7.w)
　　; = 1/||V|| = 1/distance
　　rsq VECTOR_VERTEXTOLIGHT.w, r7.w
　　...
　　; Get the attenuation
　　; d = distance
　　; Parameters for dst:
　　; src1 = (ignored, d * d, d * d, ignored)
　　; src2 = (ignored, 1/d, ignored, 1/d)
　　;
　　; r7.w = d * d
　　; VECTOR_VERTEXTOLIGHT.w = 1/d
　　dst r7, r7.wwww, VECTOR_VERTEXTOLIGHT.wwww
　　; dest.x = 1
　　; dest.y = src0.y * src1.y
　　; dest.z = src0.z
　　; dest.w = src1.w
　　; r7(1, d * d * 1 / d, d * d, 1/d)

　　; c[LIGHT_ATTENUATION].x = a0
　　; c[LIGHT_ATTENUATION].y = a1
　　; c[LIGHT_ATTENUATION].z = a2
　　; (a0 + a1*d + a2* (d * d)) dp3 r7.w, r7, c[LIGHT_ATTENUATION]
　　rcp ATTENUATION.w, r7.w
　　...
　　; Scale the light factors by the attenuation
　　mul r6, r5, ATTENUATION.w

expp
dest,
src.w
E10位精度指数：
------------------------------------------
　　float w = src.w;
　　float v = (float)floor(src.w);

　　dest.x = (float)pow(2, v);
　　dest.y = w - v;

　　// Reduced precision exponent
　　float tmp = (float)pow(2, w);
　　DWORD tmpd = *(DWORD*)&tmp & 0xffffff00;

　　dest.z = *(float*)&tmpd;
　　dest.w = 1;
--------------------------------------------
捷径：

　　dest.x = 2 **(int) src.w
　　dest.y = mantissa(src.w)
　　dest.z = expp(src.w)
　　dest.w = 1.0

lit
dest, src
从点的乘积和一个幂中计算光的系数
---------------------------------------------
为了计算光的系数，如下设置寄存器：

　　src.x=N*L　　　;法线和光方向的乘积
　　src.y=N*H　　　;法线和不完全向量的乘积
　　src.z=ignored　;该值忽略
　　src.w=specular power　;镜面反射的幂。它的值必须在28.0和128.0之间
----------------------------------------------
使用方法：

　　dp3 r0.x, rn, c[LIGHT_POSITION]
　　dp3 r0.y, rn, c[LIGHT_HALF_ANGLE]
　　mov r0.w, c[SPECULAR_POWER]
　　lit r0, r0
------------------------------------------------
　　dest.x = 1.0;
　　dest.y = max (src.x, 0.0, 0.0);
　　dest.z= 0.0;
　　if (src.x > 0.0 && src.w == 0.0)
　　　　dest.z = 1.0;
　　else if (src.x > 0.0 && src.y > 0.0)
　　　　dest.z = (src.y)src.w
　　dest.w = 1.0;

logp
dest,
src.w
10位精度对数log2(x)
------------------------------------------------
　　float v = ABSF(src.w);
　　if (v != 0)
　　{
　　　　int p = (int)(*(DWORD*)&v >> 23) - 127;
　　　　dest.x = (float)p;　// exponent

　　　　p = (*(DWORD*)&v & 0x7FFFFF) | 0x3f800000;
　　　　dest.y = *(float*)&p;　// mantissa;

　　　　float tmp = (float)(log(v)/log(2));
　　　　DWORD tmpd = *(DWORD*)&tmp & 0xffffff00;
　　　　dest.z = *(float*)&tmpd;

　　　　dest.w = 1;
　　}
　　else
　　{
　　　　dest.x = MINUS_MAX();
　　　　dest.y = 1.0f;
　　　　dest.z = MINUS_MAX();
　　　　dest.w = 1.0f;
　　}
--------------------------------------------------
Sortcut：

　　dest.x = exponent((int)src.w)
　　dest.y = mantissa(src.w)
　　dest.z = log2(src.w)
　　dest.w = 1.0

mad
dest,
src1,
src2, src3
dest = (src1 * src2) + src3

max
dest,
src1, src2
dest = (src1 >= src2)?src1:src2

min
dest,
src1, src2
dest = (src1 < src2)?src1:src2

mov
dest,
src
移动数据
优化提示：在每一次使用mov前都自己问一下，是否必须使用，因为这里经常会有直接通过源寄存器和需要输出的输出寄存器执行希望操作的方法。

mul
dest,
src1, src2
dest是src1和src2的乘积

　　; To calculate the Cross Product (r5 = r7 X r8),
　　; r0 used as a temp
　　mul r0,-r7.zxyw,r8.yzxw
　　mad r5,-r7.yzxw,r8.zxyw,-r0

nop

什么也不做

rcp
dest,
src.w
　　if(src.w == 1.0f)
　　{
　　　　dest.x = dest.y = dest.z = dest.w = 1.0f;
　　}
　　else if(src.w == 0)
　　{
　　　　dest.x = dest.y = dest.z = dest.w = PLUS_INFINITY();
　　}
　　else
　　{
　　　　dest.x = dest.y = dest.z = m_dest.w = 1.0f/src.w;
　　}

Division：

　　; scalar r0.x = r1.x/r2.x
　　RCP r0.x, r2.x
　　MUL r0.x, r1.x, r0.x

rsq
dest,
src
src平方根的倒数（比平方根有用的多）

　　float v = ABSF(src.w);
　　if(v == 1.0f)
　　{
　　　　dest.x = dest.y = dest.z = dest.w = 1.0f;
　　}
　　else if(v == 0)
　　{
　　　　dest.x = dest.y = dest.z = dest.w = PLUS_INFINITY();
　　}
　　else
　　{
　　　　v = (float)(1.0f / sqrt(v));
　　　　dest.x = dest.y = dest.z = dest.w = v;
　　}

平方根：

　　; scalar r0.x = sqrt(r1.x)
　　RSQ r0.x, r1.x
　　MUL r0.x, r0.x, r1.x

sge
dest,
src1, src2
　　dest = (src1 >=src2) ? 1 : 0

用于模拟条件判断非常有用：

　　; compute r0 = (r1 >= r2) ? r3 : r4
　　; one if (r1 >= r2) holds, zero otherwise
　　SGE r0, r1, r2
　　ADD r1, r3, -r4
　　; r0 = r0*(r3-r4) + r4 = r0*r3 + (1-r0)*r4
　　; effectively, LERP between extremes of r3 and r4
　　MAD r0, r0, r1, r4

slt
dest,
src1, src2
　dest = (src1 < src2) ? 1 : 0

表3

　　你可以从shaderx.com网站下载这个列表的WORD文件。如果想得到更多的信息，请参看SDK文档。
　　You can download this list as a word file from http://www.shaderx.com./ Check out the SDK for additional information.

　　顶点着色运算器是一个处理四元浮点数的多线程处理器。它有两个功能模块。SIMD（单指令多数据，Single Instruction Multi Data）向量模块对应着mov，mul，add。mad，dp3，dp4，dst，min，max，slt，sge指令。还有一个是特殊功能模块，对应着rcp，rsq，log，exp和lit指令。大部分指令的执行都只要一个周期，rcp和rsp在特殊情况下需要多于一个周期的时间。他们仅仅使用一条总线，这就使得当需要立即使用结果时，指令需要多于一个的周期，因为有一个寄存器延迟。
　　The Vertex Shader ALU is a multi-threaded vector processor that operates on quad-float data. It consists of two functional units. The SIMD Vector Unit is responsible for the mov, mul, add, mad, dp3, dp4, dst, min, max, slt and sge instructions. The Special Function Unit is responsible for the rcp, rsq, logp, expp and lit instructions. Most of these instructions take one cycle to execute, rcp and rsq take more than one cycle under specific circumstances. They take only one slot in the vertex shader, but they actually take longer then one cycle to execute, when the result is used immediately, because that leads to a register stall.

　　程序提示（Application Hints）
　　rsq主要用于正则化将用于光照等式中的向量。指数指令expp可以用于雾效果，噪声生成（参看NVIDIA Perlin Noise例子），在一个粒子系统中的粒子行为（参看NVIDIA Perlin System例子）或者表现在游戏中一个物体是如何被损坏的。当需要一个快速变换的功能时你将会在许多地方使用到它。相反当需要一个非常慢的表现时（即使他们在开始的时候变换非常快），对数功能lopg将会非常的游泳。对数功能是指数功能的对立面，这意味着logp指令可以用于撤销expp指令。
　　rsq is, for example, used in normalizing vectors to be used in lighting equations. The exponential instruction expp can be used for fog effects, procedural noise generation (see NVIDIA Perlin Noise example included with the NVIDIA EffectsBrowser), behavior of particles in a particle system (see NVIDIA Particle System example included with the NVIDIA EffectsBrowser) or to implement a system how objects in a game are damaged. You will use it in any case when a fast changing function is necessary. This is contrary of the use of logarithm functions with logp, that are useful if an extremely slow growing is necessary (also they grow at the beginning pretty fast). A log function can be the inverse of a exponential function, means it undoes the operation of the exponential function.

　　光照指令默认被方向光处理。它给予N*L，N*H和镜面反射幂来计算漫反射和镜面反射因素。计算结果并不含有衰减，但是你可以通过使用dst指令来计算个别的衰减等级。这对于构造点光源和面光源的衰减因数非常的有用。

　　The lit instruction deals by default with directional lights. It calculates the diffuse & specular factors with clamping based on N * L and N * H and the specular power. There is no attenuation involved, but you can use an attenuation level separately with the result of lit by using the dst instruction. This is useful for constructing attenuation factors for point and spot lights.

　　min和max指令允许截取和绝对值计算。
　　The min and max instructions allow for clamping and absolute value computation.

　　顶点着色器中的复杂指令（Complex Instructions in the Vertex Shader）
　　还有一些被顶点着色器支持的复杂指令。虽然，它有点类似于宏，但“宏”这个术语并不能用于这些指令，因为它们并不是像C预编译宏一样的简单替换。在使用这些指令前，你必须考虑清楚。如果你使用这些指令，你或许会失去会使你超过128条指令的限制和可能的优化路径。但在另一方面，Intel或者AMD提供的对于他们的处理器的软件模拟能够提供一个类似于m4x4的复杂指令（或者将会提供）。或者，在将来一些硬件或许会使用门数（gate count）来优化m4x4。所以，如果你需要，例如在你的顶点着色汇编代码中有4个dp4的调用，最好用m4x4来替换它们。如果你决定在你的着色器中使用m4x4指令，以后你就不应该再使用dp4来调用相同的数据，因为在结果之间可能会有一些轻微的差别。
　　There are also complex instructions, that are supported by the vertex shader. The term "macro" should not be used to refer to these instructions, because they are not simply substituted like a C-preprocessor macro. You should think carefully before using these instructions. If you use them, you might lose control over your 128-instruction limit and possible optimization path(s). On the other hand, the software emulation mode provided by Intel or by AMD for their processors is able to optimize a m4x4 complex instruction (and perhaps others now or in the future). It is also possible that, in the future some graphics hardware may use gate count to optimize the m4x4. So, if you need, for example four dp4 calls in your vertex shader assembly source, it might be a good idea to replace them by m4x4. If you have decided to use for example a m4x4 instruction in your shader, you should not use a dp4 call on the same data later, because there are slightly different transformation results. If, for example, both instructions are used for position calculation, z-fighting could result:

宏
参数
行为
时钟数

expp
dest, src1
提供精度至少在1/2²⁰的2的幂计算

frc
dest, src1
返回每一个输入部分的小数
3

log
dest, src1
提供精度至少在1/2²⁰的log2(x)计算
12

m3x2
dest, src1, src2
计算输入向量和一个3x2矩阵的乘积
2

m3x3
dest, src1, src2
计算输入向量和一个3x3矩阵的乘积
3

m3x4
dest, src1, src2
计算输入向量和一个3x4矩阵的乘积
4

m4x3
dest, src1, src2
计算输入向量和一个4x3矩阵的乘积
3

m4x4
dest, src1, src2
计算输入向量和一个4x4矩阵的乘积
4
　　你可以通过这些指令来执行所有的转换和光照操作。如果看上去好像还缺少一些指令，那肯定是因为你可以通过存在的指令来实现它们。例如，除法指令可以通过一个倒数和乘法指令来实现。你甚至可以在顶点着色器中使用这些指令来实现整个固定函数管道渲染。你可以参看NVIDIA的例子NVLink。
　　You are able to perform all transform and lighting operations with these instructions. If it seems to you that some instructions are missing, rest assured that you can achieve them through the existing instructions for example, the division of two numbers can be realized with a reciprocal and a multiply. You can even implement the whole fixed-function pipeline by using these instructions in a vertex shader. This is shown in the NVLink example of NVIDIA.

　　将它们放到一起（Putting it All Together）
　　现在让我们来看看在顶点着色运算器中这些寄存器和指令如何被典型的运用。
　　Now let's see how these registers and instructions are typically used in the vertex shader ALU.

　　在vs 1.1中，每一个光栅化中，有16个输入寄存器，96个常量寄存器，12个暂存寄存器，1个地址寄存器和13个输出寄存器。没一个寄存器含有4x32bit的值，每一个32bit的值可以通过x，y，z和w来访问。为了访问这些寄存器部件，你必须加上.x，.y，.z和.w在这些寄存器名字的末尾。让我们从输入寄存器开始：
　　In vs.1.1 there are 16 input registers, 96 constant registers, 12 temporary registers, 1 address register and up to 13 output registers per rasterizer. Each register can handle 4x32-bit values. Each 32-bit value is accessible via an x, y, z and w subscript. That is, a 128-bit value consists of a x, y, z and w value. To access these register components, you must add .x, .y, .z and .w at the end of the register name. Let's start with the input registers:

　　使用输入寄存器（Using the Input Registers）
　　16个输入寄存器可以通过使用它们的名字v0 — v15来访问。在输入寄存器中提供的典型的值往往是这些：
　　The 16 input registers can be accessed by using their names v0 to v15. Typical values provided to the input vertex registers are:

位置（x,y,z,w）
漫反射光颜色（r,g,b,a） -> 0.0 to +1.0
镜面反射光颜色（r,g,b,a） -> 0.0 to +1.0
最多8个纹理坐标（each as s, t, r, q or u, v , w, q），但一般使用4或6个，具体依赖于硬件支持。
雾（f,*,*,*） -> 在雾等式中使用
点大小（p,*,*,*）
Position(x,y,z,w)
Diffuse color (r,g,b,a) -> 0.0 to +1.0
Specular color (r,g,b,a) -> 0.0 to +1.0
Up to 8 Texture coordinates (each as s, t, r, q or u, v , w, q) but usually 4 or 6, dependent on hardware support
Fog (f,*,*,*) -> value used in fog equation
Point size (p,*,*,*)

　　你可以访问位置属性的x分量使用v0.x，访问y分量则使用v0.y。如果你需要知道RGBA的漫反射光颜色的R分量，你可以调用v1.y。如果使用雾属性，你设置v7.x为你需要的值，至于v7.y，v7.z，v7.w则将会废弃不用。输入寄存器是只读的，在每一条指令中都只可以访问一个输入寄存器。如果一个输入寄存器没有预先定义，那么x,y,z分量将是0,而z是1.0。在接下来的例子中，v0和c0 — c3分别计算乘积并放入oPos中：
　　You can access the x-component of the position with v0.x, the y-component with v0.y and so on. If you need to know the green component of the RGBA diffuse color, you check v1.y. You may set the fog value for example into v7.x. The other three 32-bit components, v7.y, v7.z and v7.w would not be used. The input registers are read-only. Each instruction may access only one vertex input register. Unspecified components of the input register default to 0.0 for the x, y and z components and to 1.0 for the w component. In the following example the four-component dot product between each of c0 - c3 and v0 is stored in oPos:

　　dp4 oPos.x , v0 , c0
　　dp4 oPos.y , v0 , c1
　　dp4 oPos.z , v0 , c2
　　dp4 oPos.w , v0 , c3

　　这样的一个代码片断通常用于从模型空间到裁减空间的映射。这四个部件的点乘积执行下面的计算。
　　Such a code fragment is usually used to map from projection space, with the help of the already concatenated world-, view- and projection matrices, to clip space. The four component dot product performs the following calculation:

　　oPos.x = (v0.x * c0.x) + (v0.y * c0.y) + (v0.z * c0.z) + (v0.w * c0.w)

　　如果我们使用单位长度(正则化)向量，很显然，两个向量之间的乘积的值将会在[-1,1]之间。因此，oPos也将会得到一个这个范围内的一个值。我们也可以这样使用：
　　Given that we use unit length (normalized) vectors, it is known that the dot product of two vectors will always range between [-1, 1]. Therefore oPos will always get values in that range. Alternatively, you could use:

　　m4x4 oPos, v0 , c0

　　别忘了这些事情，在你的顶点着色器中一致的使用这些复杂指令，因为向上面描述的，在dp4和m4x4的结果之间可能会有轻微的差别。同时，你将被约束于在一条指令时只能使用一个寄存器。
　　Don't forget to use those complex instructions consistently throughtout your vertex shader, because as described above, there might be slight differences between dp4 and m4x4 results. You are restricted to using only one input register in each instruction.

　　所有的输入寄存器数据在整个顶点着色执行过程甚至更长的过程都持续存在。这意味着它们的数据将被保存比一个顶点着色器生命周期还长的时间。也就是说，有可能重新使用输入寄存器的数据在下一个顶点着色器。
　　All data in an input register remains persistent throughout the vertex shader execution and even longer. That means they retain their data longer than the life-time of a vertex shader. So it is possible to re-use the data of the input registers in the next vertex shader.

　　使用常量寄存器（Using the Constant Registers (RO1)）
　　常量寄存器的典型应用包括：
　　Typical uses for the constant registers include:

矩阵数据：四元浮点数通常为一个4x4矩阵的一列
光属性（位置，衰减等等）
当前时间
顶点插值数据
程序使用数据
Matrix data: quad-floats are typically one row of a 4x4 matrix
Light characteristics, (position, attenuation etc)
Current time
Vertex interpolation data
Procedural data

　　有96个四元浮点数可以存储常量数据。因此可以存储相当多的矩阵用于诸如，顶点索引混合，等操作。
　　There are 96 quad-floats (or in the case of the RADEON 8500, 192 quad-floats) for storing constant data. This reasonably large set of matrices can be used for example, for indexed vertex blending, more commonly known as "matrix palette skinning".

　　常量寄存器在顶点着色中是只读的，尽管在程序中可以读和写常量寄存器。常量寄存器保持它们的数据并不仅仅在一个顶点着色器的生存时间，所以可以在下一个顶点着色器中重新使用这些数据。这避免了在程序中过多的SetVertexShaderConstant()调用。如果试图读一个没有设置过的常量寄存器，将会返回(0.0,0.0,0.0,0.0)。
　　The constant registers are read-only from the perspective of the vertex shader, whereas the application can read and write into the constant registers. The constant registers retain their data longer than the life-time of a vertex shader so it is possible to re-use this data in the next vertex shader. This allows an app to avoid making redundant SetVertexShaderConstant() calls. Reads from out-of-range constant registers return (0.0, 0.0, 0.0, 0.0).

　　在每一条指令中，你仅仅可以使用一个常量寄存器，但是不限次数。例如：
　　You can use only one constant register per instruction, but you can use it several times. For example:

　　; 下面的指令是合法的
　　mul r5, c11, c11　; c11的乘积存于r5中

　　; 这个不合法
　　add v0, c4, c3

　　一个更复杂，但合法的例子：
　　A more complicated-looking, but legal, example is:

　　; dest = (src1 * src2) + src3
　　mad r0, r0, c20, c20　; 将r0和c20相乘，然后加上c20使用地址寄存器

　　使用地址寄存器（Using the Address Register (RW; mov)）
　　你可以通过a0到an来访问地址寄存器（在顶点着色器以后的版本中将会有超过一个的地址寄存器）。在vs1.1中a0的唯一用处就是作为常量寄存器的间接寻址。
　　You access the address registers with a0 to an (more than one address register should be available in vertex shader versions higher than 1.1). The only use of a0 in vs.1.1 is as an indirect addressing operator to offset constant memory.

　　c[a0.x + n]　; 仅仅在1.1或者以后版本中支持
　　; n是基址a0.x是地址偏移量

　　下面是一个使用地址寄存器的例子：
　　Here is an example using the address register:

　　...
　　// Set 1
　　mov a0.x,r1.x
　　m4x3 r4,v0,c[a0.x + 9];
　　m3x3 r5,v3,c[a0.x + 9];
　　...

　　根据存在暂存寄存器中的值，不同的常量寄存器在m4x3和m3x3中被使用。请注意，寄存器a0仅仅存储整数部分，并且a0.x是a0可以用的唯一分量。而且顶点着色器仅仅可以通过mov指令写a0.x。
　　Depending on the value that is stored in temporary register r1.x, different constant registers are used in the m4x3 and m3x3 instructions. Please not that register a0 only stores whole numbers and no fractions (integers only) and that a0.x is the only valid component of a0. Further, a vertex shader may write to a0.x only via the mov instruction.

　　如果在软件模拟模式下那请小心使用a0.x:它会显著的降低性能。
　　Beware of a0.x if there is only a software emulation mode: performance can be significantly reduced [Pallister].

　　使用暂存寄存器（Using the Temporary Registers (R3W1)）
　　你可以通过r0-r11来访问暂存寄存器。下面是一些例子：
　　You can access the 12 temporary registers using r0 to r11. Here are a few examples:

　　dp3 r2, r1, -c4　; 一个三元的乘积: dest.x = dest.y = dest.z =
　　　　　　　　　　; dest.w = (r1.x * -c4.x) + (r1.y * -c4.y) + (r1.z * -c4.z)
　　...
　　mov r0.x, v0.x
　　mov r0.y, c4.w
　　mov r0.z, v0.y
　　mov r0.w, c4.w

　　每一个暂存寄存器有一个写和三个读的访问。因此，一个指令可以读同一个暂存寄存器3次。顶点着色器不允许在写一个暂存寄存器之前读它。如果你试图读一个没有数据的暂存寄存器，当你在创建顶点着色器的时候CreateVertexShader()将会给出一个出错信息。
　　Each temporary register has single write and triple read access. Therefore an instruction could have the same temporary register as a source three times. Vertex shaders can not read a value from a temporary register before writing to it. If you try to read a temporary register that was not filled with a value, the API will give you an error message while creating the vertex shader (== CreateVertexShader()).

　　使用输出寄存器（Using the Output Registers (WO)）
　　总共有13个只写的输出寄存器可以被访问。它们将被作为光栅化的输入，并且每一个寄存器的名字前面将会加上一个小写的‘o’。输出寄存器被命名以建议它们被象素着色器使用。
　　There are up to 13 write-only output registers that can be be accessed using the following register names. They are defined as the inputs to the rasterizer and the name of each registers is preceded by a lower case 'o'. The output registers are named to suggest their use by pixel shaders.

名字（name）
值（value）
描述（Description）

oDn
2元浮点数
输出颜色值到象素着色器。oD0存放漫反射光颜色，oD1存放镜面反射光颜色。

oPos
1元浮点数
输出在裁减空间中的位置。必须被一个顶点着色所写。

oTn
至多8元浮点数
RADEON 8500：6
Geforce3/4TI：4
输出的纹理坐标。要求有纹理的最大数量和纹理混合场景的范围。

oPts.x
1标量浮点数
输出的点大小，仅仅x分量是可用的。

oFog.x
1标量浮点数
用于插值的雾因子，马上就被加入到雾列表中。仅仅x分量可用。
　　这里是一个典型的例子，显示了如何使用oPos,oD0和oT0寄存器：
　　Here is a typical example, that shows how to use the oPos, oD0 and oT0 registers:

　　dp4 oPos.x , v0 , c4　; 投影的x位置
　　dp4 oPos.y , v0 , c5　; 投影的y位置
　　dp4 oPos.z , v0 , c6　; 投影的z位置
　　dp4 oPos.w , v0 , c7　; 投影的w位置
　　mov oD0, v5　　　　　 ; 设置漫反射光颜色
　　mov oT0, v2　　　　　 ; 从输入寄存器v2中输出纹理坐标到oT0

　　使用4条dp4指令从模型空间映射到裁减空间已经在上面提到过了。第一个mov指令移动v5输入寄存器的值到颜色输出寄存器，第二个mov指令移动v2输入寄存器的值到第一个纹理输出寄存器。
　　Using the four dp4 instructions to map from projection to clip space with the already concatenated world-, view- and projection matrices was already shown above. The first mov instruction moves the content of the v5 input register into the color output register and the second mov instruction moves the values of the v2 register into the first output texture register.

　　下面的例子演示了如何使用oFog.x寄存器：
　　Using the oFog.x output register is shown in the following example:

　　; 按比例缩放按照雾参数
　　; c5.x = fog start
　　; c5.y = fog end
　　; c5.z = 1/range
　　; c5.w = fog max
　　dp4 r2, v0, c2　; r2 = distance to camera
　　sge r3, c0, c0　; r3 = 1
　　add r2, r2, -c5.x　　　　　　; camera space depth (z) - fog start
　　mad r3.x, -r2.x, c5.z, r3.x　; 1.0 - (z - fog start) * 1/range
　　　　　　　　　　　　　　　　 ; because fog=1.0 means no fog, and
　　　　　　　　　　　　　　　　 ; fog=0.0 means full fog
　　max oFog.x, c5.w, r3.x　　　 ; 限制雾在我们规定的范围内

　　使用雾距离这个属性使得可以产生比使用位置的z，w值更多的雾效果。在以后的管道渲染中被使用的标准雾等式使用的雾距离值是被插值过的。
　　Having a fog distance value permits more general fog effects, than using the position's z or w values. The fog distance value is interpolated before use as a distance in the standard fog equations used later in the pipeline.

　　每一个顶点着色器必须向oPos的一个分量写入值，否则编译器就会返回一个错误。
　　Every vertex shader must write at least to one component of oPos or you will get an error message by the assembler.

　　当使用顶点着色时，D3DTSS_TEXCOORDINDEX中的所有D3DTSS_TCI_*属性都会失效。所有的纹理坐标都会被映射为数字顺序。
　　Every vertex shader must write at least to one component of oPos or you will get an error message by the assembler.

　　优化提示：尽可能早的输出oPos，以触发pixel shader的平行度优化。在写完着色汇编的时候，重新编排一下汇编指令的顺序，以便尽可能早的输出oPox。所有提到过的值输出顶点着色器的时候都会被限制在[0..1]的范围之内。如果需要在象素着色器中使用有符号数，你必须在顶点着色器中标记它们，然后在象素着色器使用_bx2重新扩展它们。
　　All iterated values transferred out of the vertex shader are clamped to [0..1]. That means any negative values are cut off to 0 and the positive values remain unchanged. If you need signed values in the pixel shader, you must bias them in the vertex shader by multiplying them with 0.5 and adding 0.5, and then re-expand them in the pixel shader by using _bx2.

　　交叉混合和掩码（Swizzling and Masking）
　　如果你使用输入、常量和暂存寄存器作为源寄存器，你可以独立的交叉混合每个.x，.y，.z和.w的值。如果使用输出寄存器或者暂存寄存器作为目标寄存器，你则可以使用.x，.y，.z，.w作为写入值的掩码。

　　交叉混合（仅仅源寄存器：vn，cn，rn）
　　交叉混合在源寄存器需要旋转过的交叉乘积时，对于效率的提高非常用帮助。交叉混合的另外一个用处是转换常量例如（0.5,0.0,1.0,0.6）到（0.0,0.0,1.0,0.0）或者（0.6,1.0,-0.5,0.6）之类的常量。例如：
　　Swizzling is very useful for efficiently, where the source registers need to be rotated - like cross products. Another use is converting constants such as (0.5, 0.0, 1.0, 0.6) into other forms such as (0.0, 0.0, 1.0, 0.0) or (0.6, 1.0, -0.5, 0.6). All registers, that are used in instructions as source registers can be swizzled. For example:

　　mov R1, R2.wxyz;

图15 交叉混合
　　这里目标寄存器是R1，R可以是一个任一个可写的寄存器例如output(o*)或者任何的暂存寄存器（r）。源寄存器是R2，R可以是输入寄存器（v），常量寄存器（c）或者暂存寄存器（在指令语法结构中源寄存器的位置在目标寄存器的右边）。
　　The destination register is R1, where R could be a write-enabled register like the output (o*) or any of the temporary registers (r). The source register is R2, where R could be a input (v), constant (c) or temporary register (source registers are located on the right side of the destination register in the instruction syntax).

　　接下来的指令拷贝R2.x的负数到R1.x，R2.y的负数到R1.y和R1.z,拷贝到R2.z的负数到R1.w。可以看到，所有的源寄存器可以在同一时间被交叉混合和取负。
　　The following instruction copies the negation of R2.x into R1.x, the negation of R2.y into R1.y and R1.z and the negation of R2.z into R1.w. As shown, all source registers can be negated and swizzled at the same time:

　　mov R1, -R2.xyyz

图16 交叉混合2
　　掩码（仅仅目的寄存器：on，rn）
　　一个目的寄存器可以掩码指出哪个分量被写入。如果你使用R1作为目的寄存器（实际上可以是任何可写的寄存器：o*，r），R2的每一个分量都会写入R1。但如果你使用这种形式：
　　A destination register can mask which components are written to it. If you use R1 as the destination register (acutally any write-enabled registers : o*, r), all the components are written from R2 to R1. If you choose for example:

　　mov R1.x, R2

　　将仅仅只有x分量被写到了R1。
　　only the x component is written to R1, whereas

　　mov R1.xw, R2

　　R2的x和w分量将会被写入R1。目标寄存器不支持交叉混合和取负。
　　writes only the x and w components of R2 to R1. No swizzling or negation is supported on the destination registers.

　　下面是一个3维向量的交叉乘积计算：
　　Here is the source for a 3-vector cross-product:

　　; r0 = r1 x r2 (3-vector cross-product)
　　mul r0, r1.yzxw, r2.zxyw
　　mad r0, -r2.yzxw, r1.zxyw, r0

　　这在[LeGrand]中有详细的介绍。下面的这个表格总结了交叉混合和掩码：
　　This is explained in detail in [LeGrand]. The following table summarizes swizzling and masking:

修改的分量（Component Modifier ）
描述（Description）

R.[x][y][z][w]
目标寄存器掩码

R.xwzy（for example）
源寄存器交叉混合

-R
源取负
　　因为可以对目标取负，所以减法指令就没有存在的必要了。
　　Since any source can be negated, there is no need for a subtract instruction.

　　编写顶点着色程的方针（Guidelines for Writing Vertex Shaders）
　　下面我列举了一些在编写顶点着色程时你必须注意的东西：
　　The most important restrictions you should remember when writing vertex shaders are the following:

至少要输出oPos的一个分量
有128指令长度的限制
每一条指令中不得有超过一个的不同常量寄存器。例如，add r0,c3,c4这条指令是一条非法指令
每一条指令中不得有超过一个的不同输入寄存器。例如，
顶点着色器中没有与C语言相类似的条件判断语句，但是你可以使用sge指令模仿r0=(r1<=r2)?r3:r4之类的指令
顶点着色器输出的转换的值的范围都在[0..1]之间
They must write to at least one component of the output register oPos
There is a 128 instruction limit
Every instruction may source no more than one constant register, e.g. add r0, c4, c3 will fail
Every instruction may source no more than one input register, e.g. add r0, v1, v2 will fail
There are no C-like conditional statements, but you can mimic an instruction of the form r0 = (r1 >= r2) ? r3 : r4 with the sge instruction
All iterated values transferred out of the vertex shader are clamped to [0..1]

　　有一些方法可以优化顶点着色程序，下面是最重要的一些部分：
　　There are several ways to optimize vertex shaders. Here are a few rules of thumb:

学习Kim Pallister的关于优化软件顶点着色器的论文[Pallister]。
设置常量寄存器的时候，设法在一个SetVertexShaderConstant()调用中设置所有的值
暂停思考使用mov指令；或许你可以避免使用它的。
尽量选择一次执行多重操作的指令
Read the paper from Kim Pallister on optimizing software vertex shaders [Pallister]. It helps too on optimizing vertex shaders running in hardware
When setting vertex shader constant data, try to set all data in one SetVertexShaderConstant() call
Pause and think about using a mov instruction; you may be able to avoid it
Choose instructions that perform multiple operations over instructions that perform single operations

mad r4,r3,c9,r4
mov oD0,r4
==
mad oD0,r3,c9,r4
在考虑优化前，移去例如m4x4，m3x3之类的复杂指令
一条CPU和GPU之间负载平衡的重要规则：在着色器中的许多计算可以在外部重新以物体而非点为单位计算。如果你进行一个以物体而非点为单位的计算，那你最好使用CPU中计算，然后将将结果作为一个常量输入到顶点着色器。
Collapse (remove complex instructions like m4x4 or m3x3 instructions) vertex shaders before thinking about optimizations
A rule of thumb for load-balancing between the CPU/GPU: Many calculations in shaders can be pulled outside and reformulated per-object instead of per-vertex and put into constant registers. If you are doing some calculation which is per object rather than per vertex, then do it on the CPU and upload it on the vertex shader as a constant, rather than doing it on the GPU

　　压缩你的顶点数据是最有意思的一个优化你程序所需带宽的方法。
　　One of the most interesting methods to optimize your applications bandwidth usage, is the usage of compressed vertex data [Calver].

　　现在你对如何编写顶点着色程序已经有了一个抽象的概念，接下来，我将介绍三种编译顶点着色程序的方法。
　　Now that you have an abstract overview, of how to write vertex shaders, I would like to mention at least three different ways to compile one.

　　编译顶点着色程序（Compiling a Vertex Shader）
　　OpenGL解析的是字符串，而Direct3D使用的是二进制字节。因此，Direct3D开发者需要使用编译器编译顶点着色程序。这可以帮助你尽早的在开发周期中发现bug，并且它也缩短了工作时间。
　　Direct3D uses byte-codes, whereas OpenGL implementations parses a string. Therefore the Direct3D developer needs to assemble the vertex shader source with an assembler. This might help you find bugs earlier in your development cycle and it also reduces load-time.

　　我了解有3中方法可以编译一个顶点着色程序：
　　I see three different ways to compile a vertex shader:

将顶点着色源代码写入到一个单独的ASCII文件中例如test.vsh，然后使用顶点着色编译器将它编译成一个二进制文件，例如test.vso。编译后的文件将会在游戏开始后被读进来。使用这种方法，就不是每一个人都可以看到你的顶点着色源文件了。
write the vertex shader source into a separate ASCII file for example test.vsh and compile it with a vertex shader assembler into a binary file, for example test.vso. This file will be opened and read at game start up. This way, not every person will be able to read and modify your vertex shader source.

NVLink可以在运行期将已经编译过的着色器片员连接到一起。
Don't forget that NVLink can link together already compiled shader fragments at run-time.
当顶点着色源文件在一个ASCII文件或者以cpp文件中一个字符串形式出现时，你可以在程序启动后用D3DXAssembleShader*()加载顶点着色。
当顶点着色源文件在一个特效文件中，并被一个应用程序打开后。顶点着色代码可以使用 D3DXCreateEffectFromFile()来编译。通过这种方法也可以预编译顶点着色器。使用这种方法的顶点着色器大部分处理都很简单并且被特效文件的函数所调用。
write the vertex shader source into a separate ASCII file or as a char string into your *.cpp file and compile it "on the fly" while the app starts up with the D3DXAssembleShader*() functions.
write the vertex shader source in an effects file and open this effect file when the app starts up. The vertex shader can be compiled by reading the effect files with D3DXCreateEffectFromFile(). It is also possible to pre-compile an effects file. This way, most of the handling of vertex shaders is simplified and handled by the effect file functions.

还有一个方法就是使用在d3dtypes.h中的操作码，写一个自己的顶点编译/反编译器。
Another way is to use the opcodes shown in d3dtypes.h and build your own vertex assembler/disassembler.

　　让我们复习一下，看看我们已经学过了什么：
　　Let's review, what we have examined so far. After we ...

使用D3DCAPS8::VertexShaderVersion检查顶点着色是否被支持
使用D3DVSD_*宏来定义顶点着色器
使用SetVertexShaderConstant()来设置常量寄存器
编写并编译顶点着色程序
checked the vertex shader support with the D3DCAPS8::VertexShaderVersion field we declared a vertex shader with the D3DVSD_* macros
then we set the constant registers with SetVertexShaderConstant()
and wrote and compiled the vertex shader

　　现在让我们来学习如何得到一个顶点着色句柄并调用它。
　　Now we need to get a handle to call it.

　　1.8.5 创建顶点着色器材（Creating a Vertex Shader）
　　CreateVertexShader()函数用于创建一个顶点着色器并使之生效：
　　The CreateVertexShader() function is used to create and validate a vertex shader:

　　HRESULT CreateVertexShader(
　　　　CONST DWORD* pDeclaration,
　　　　CONST DWORD* pFunction,
　　　　DWORD* pHandle,
　　　　DWORD Usage);

　　这个函数的第一个参数是指向以前定义的顶点着色器的指针，并且返回一个着色器句柄在pHandle中。第二个参数pFunction指向使用D3DXAssembleShader() / D3DXAssembleShaderFromFile()编译过或者已经用编译器编译过的顶点着色器的二进制代码。你可以强制设置第四个参数为D3DUSAGE_SOFTWAREPROCESSING来强制使用软件顶点模式。如果D3DRS_SOFTWAREVERTEXPROCESSING被设置成TRUE，第四个参数必须被设置。通过显式的设置软件模拟，顶点着色器被CPU通过CPU厂商提供的软件接口模拟。如果有一个可以使用着色器的GPU，使用硬件着色器将会大大的提高速度。如果在使用NVIDIA着色调试器时，你必须设置这个属性或者使用参考光栅化。
　　This function takes the vertex shader declaration (which maps vertex buffer streams to different vertex input registers) in pDeclaration as a pointer and returns the shader handle in pHandle. The second parameter pFunction gets the vertex shader instructions compiled by D3DXAssembleShader() / D3DXAssembleShaderFromFile() or the binary code pre-compiled by a vertex shader assembler. With the fourth parameter you can force software vertex processing with D3DUSAGE_SOFTWAREPROCESSING. It must be used, when D3DRS_SOFTWAREVERTEXPROCESSING is set to TRUE. By setting the software processing path explicitly, vertex shades are simulated by the CPU by using the software vertex shader implementation of the CPU vendors. If a vertex shader-capable GPU is available, using hardware vertex processing should be faster. You must use this flag or the reference rasterizer for debugging with the NVIDIA Shader Debugger.

　　1.8.6 设置顶点着色器（Setting a Vertex Shader）
　　在调用DrawPrimitive*()绘画一个物体之前，你必须使用SetVertexShader()为这个物体设置一个顶点着色器。这个函数在两个primitive调用之间被动态的调用。
　　You set a vertex shader for a specific object by using SetVertexShader() before the DrawPrimitive*() call of this object. This function dynamically loads the vertex shader between the primitive calls.

　　// set the vertex shader
　　m_pd3dDevice->SetVertexShader( m_dwVertexShader );

　　函数要传入的唯一参数就是你使用CreateVertexShader()创建的顶点着色句柄。这个函数调用的性能消耗相当的少，甚至比SetTexture() 还低，所以可以经常的使用。
　　The only parameter you must provide is the handle of the vertex shader created by CreateVertexShader(). The overhead of this call is lower than a SetTexture() call, so you are able to use it often.

　　使用SetVertexShader()调用的顶点着色次数将等同于顶点数。例如，如果你试着用索引的三角形列表模式旋转一个四个顶点的四方形，你可以在NVIDIA着色调试器中看到，在DrawPrimitive*()函数被调用之前顶点着色器运行了4次。
　　Vertex Shaders are executed with SetVertexShader() as many times as there are vertices. For example if you try to visualize a rotating quad with four vertices implemented as an indexed triangle list, you will see in the NVIDIA Shader Debugger, that the vertex shader runs four times, before the DrawPrimitive*() function is called.

　　1.8.8 释放顶点着色器资源（Free Vertex Shader Resources）
　　当一个游戏结束或者一个设备改变了，顶点着色器拥有的资源必须被释放。这通过调用DeleteVertexShader()来完成：
　　When the game shuts down or when the device is changed, the resources taken by the vertex shader must be released. This must be done by calling DeleteVertexShader() with the vertex shader handle:

　　// delete the vertex shader
　　if (m_pd3dDevice->m_dwVertexShader != 0xffffffff)
　　{
　　　　m_pd3dDevice->DeleteVertexShader( m_dwVertexShader );
　　　　m_pd3dDevice->m_dwVertexShader = 0xffffffff;
　　}

　　概要（Summarize）
　　现在我们已经大致了解了顶点着色器的创建过程。让我们看一下迄今为止我们所学到的东西：
　　We have now stepped through the vertex shader creation process on a high-level ... let's summarize what was shown so far:

为了使用顶点着色，你必须检查你的最终用户电脑中所安装的软件或者硬件顶点着色器接口，这可以通过检查 D3DCAPS8::VertexShaderVersion来实现。
你必须要定义好，哪一个顶点属性或者哪个顶点数据被映射到哪个输入寄存器。这个定义由D3DVSD_*宏来完成。你还可以使用SetVertexShaderConstant()或者提供的宏来设置顶点着色器的常量寄存器。
当你准备好了任何东西并且已经写了一个顶点着色器，你可以编译它，然后通过CreateVertexShader()得到它的句柄，并最后调用SetVertexShader()执行它。
为了释放顶点着色器所申请的资源，你必须在游戏结束时调用DeleteVertexShader()释放这些资源。
To use vertex shaders, you must check the vertex shader support of hardware vertex shader implementation installed on the computer of your end-user with the D3DCAPS8::VertexShaderVersion field.
You must declare, which input vertex properties or incoming vertex data have to be mapped to which input register. This mapping is done with the D3DVSD_* macros. You are able to fill the constant registers of the vertex shader with values by using the provided macros or by using the SetVertexShaderConstant() function.
After you have prepared everything this way and you have written a vertex shader, you are able to compile it, retrieve a handle to it by calling CreateVertexShader() and make it for execute by using SetVertexShader().
To release the resources that are allocated by the vertex shader you should call DeleteVertexShader() at the end of your game.

　　下一章内容（What happens next ?）
　　在下一个章节“顶点着色器编程”中我们将开始编写我们自己的顶点着色程序。并且我们将讨论基本的光照算法和如何调用它们。
　　In the next chapter "Programming Vertex Shaders" we will start writing our first vertex shader. We will discuss basic lighting algorithms and how to implement them.

　　参考文献（References）
　　[Bendel] Steffen Bendel, "Smooth Lighting with ps.1.4", ShaderX, Wordware Inc., pp ?? - ??, 2002, ISBN 1-55622-041-3
　　[Calver] Dean Calver, "Vertex Decompression in a Shader", ShaderX, Wordware Inc., pp ?? - ??, 2002, ISBN 1-55622-041-3
　　[Gosselin] David Gosselin, "Character Animation with Direct3D Vertex Shaders", ShaderX, Wordware Inc., pp ?? - ??, 2002, ISBN 1-55622-041-3
　　[Hurley] Kenneth Hurley, "Photo Realistic Faces with Vertex and Pixel Shaders", ShaderX, Wordware Inc., pp ?? - ??, 2002, ISBN 1-55622-041-3
　　[Isidoro/Gosslin], John Isidoro, David Gosselin, "Bubble Shader", ShaderX, Wordware Inc., pp ?? - ??, 2002, ISBN 1-55622-041-3
　　[LeGrand] Scott Le Grand, Some Overlooked Tricks for Vertex Shaders, ShaderX, Wordware Inc., pp ?? - ??, 2002, ISBN 1-55622-041-3
　　[Pallister] Kim Pallister, "Optimizing Software Vertex Shaders", ShaderX, Wordware Inc., pp ?? - ??, 2002, ISBN 1-55622-041-3
　　[Riddle/Zecha] Steven Riddle, Oliver C. Zecha, "Perlin Noise and Returning Results from Shader Programs", ShaderX, Wordware Inc., pp ?? - ??, 2002, ISBN 1-55622-041-3
　　[Schwab] John Schwab, "Basic Shader Development with Shader Studio", ShaderX, Wordware Inc., pp ?? - ??, 2002, ISBN 1-55622-041-3
　　[Vlachos01] Alex Vlachos, J?rg Peters, Chas Boyd and Jason L. Mitchell, "Curved PN Triangles", ACM Symposium on Interactive 3D Graphics, 2001
　　(http://www.ati.com/na/pages/resource_centre/dev_rel/CurvedPNTriangles.pdf).

　　附加资源（Additional Ressources）
　　A lot of information on vertex shaders can be found at the web-sites of NVIDIA (developer.nvidia.com) and ATI (http://www.ati.com/). I would like to name a few:

Author
Article
Published at

Richard Huddy
Introduction to DX8 Vertex Shaders
NVIDIA web-site

Erik Lindholm, Mark J Kilgard, Henry Moreton
SIGGRAPH 2001 -- A User Programmable Vertex Engine
NVIDIA Web-Site

Evan Hart, Dave Gosselin, John Isidoro
Vertex Shading with Direct3D and OpenGL
ATI Web-Site

Jason L. Mitchell
Advanced Vertex and Pixel Shader Techniques
ATI Web-Site

Philip Taylor
Series of articles on Shader Programming
http://msdn.microsoft.com/directx

Keshav B. Channa
Geometry Skinning / Blending and Vertex Lighting
http://www.flipcode.com/tutorials/tut_dx8shaders.shtml

Konstantin Martynenko
Introduction to Shaders
http://www.reactorcritical.com/review-shadersintro/review-shadersintro.shtml
　　致谢（Acknowledgements）
　　I'd like to recognize a couple of individuals that were involved in proof-reading and improving this paper (in alphabetical order):

David Callele (University of Saskatchewan)
Jason L. Mitchell (ATI)
Jeffrey Kiel (NVIDIA)