3DNow! Instruction Set
来源:互联网 发布:社交网络中英文剧本 编辑:程序博客网 时间:2024/05/08 18:55
3DNow! — An Overview
3DNow! was AMD's logical extension to Intel's MMX. While MMX only provided parallel integer operations, 3DNow! addressed the need for parallel floating point operation. It was introduced in 1998. 3DNow! operates with exising MMX extensions, and allow programs to mix integer code (MMX) and floating point code (3DNow!) at the same time without needing to switch context, which was necessary with just MMX. It is easiest to visualize 3DNow! as an extention of MMX, since they both operate on the same registers and can operate side by side.3DNow! — The Registers
Like MMX, 3DNow! maps into the FPU registers, giving the programmer 8 64-bit wide registers to use. As in MMX, they are also addressed as MM0 - MM7. 3DNow!'s register format isn't as flexible as MMX's. In 3DNow! each register is composed of 2 32-bit floating point value. When using integer units, the formats are identical to the MMX formats (1 64-bit quantity, 2 32-bit quantities, 4 16-bit quantities, or 8 8-bit quantities).3DNow! — State Management
3DNow! uses the same space as MMX, so it can be cleared with the MMXemms
instruction. It should be used in the same places (when transitioning from MMX/3DNow! mode to regular floating point mode) as well.In addition to the above, 3DNow! adds one more state management instruction,
femms
(Faster emms
), which operates very much like emms
. femms
leaves the contents of the MMX/3DNow! registers undefined, allowing it to execute faster.3DNow! — Cache Management
Beginning with 3DNow!, SIMD instruction sets have added a few instructions that deal with managing the data cache of the processors. These allow the programmer to fetch data into cache while other data is being operated on, effectively hiding the RAM latency and preventing the CPU from stalling on cache misses.3DNow! adds two new instructions,
prefetch
and prefetchw
for these purposes.prefetch
and prefetchw
are almost identical. The only exception is that prefetchw
prepares the cache to be written to, in anticipation of writing. This is useful if the programmer knows that they'll be changing the values located there. In contrast, prefetch
just loads the data into cache without expecting to write back to it. Early AMD processors such as the K6-2 and K6-III treated prefetchw
exactly the same as prefetch
. On the AMD Athlon, however, prefetchw
caused the processor to mark the cache line as modified.prefetch
and prefetchw
take 1 parameter, which is the address where to start loading data. A full cache line is loaded, which is at least 32 bytes.3DNow! — Integer Instructions
While 3DNow! is mainly for floating point use, there were a few integer instructions that were added to MMX as well.pavgusb
gives the rounded-up average of 8 unsigned 8-bit quantity pairs. It takes 2 parameters. One parameter must be an MMX register, and the other can be an MMX register or a memory location.pmulhrw
multiplies 4 16-bt quantity pairs, and returns the highest 16 bits, rounded up. This is similar to the MMX instruction pmulh
, except that this one rounds. It takes 2 parameters, one of which is an MMX register. The other can be another MMX register or a memory location.3DNow! — Conversion Instructions
3DNow! provides 2 instructions to convert between integer and floating point types. They arepi2fd
and pf2id
, which convert integers to floating point and floating point to integers, respectively.pi2fd
takes 2 parameters. One is the destination, which gets the floating point value, and the second is an MMX register or a memory location that has the integer to convert.pf2id
also takes 2 parameters, for the same purposes. Of course, this converts the other way.3DNow! — Floation Point Instructions
Floating Point operation is the real power behind 3DNow!'s instruction set. There are instructions for all kinds of operations, including max and min functions, reciprocals, square roots (and reciprocal square roots), as well as ordinary add, subtract and multiply functions.Max and Min
pfmax
is the instruction used to get the maximum value of 2 pairs of floating point values (one register). Its first parameter is an MMX register, and its second is another register or a memory location. Once completed, the initial register will have the larger value of each pair.pfmin
operates in the same way as pfmax
, only it stores the minimum instead of the maximum.Comparison
3DNow! gives us a few instructions for comparing MMX registers.pfcmpeq
is used to check for equality between a register and another register or a register and memory. It compares both 32-bit values at the same time. This instruction sets the initial register to all zeros if the compare is false and all ones if the compare is true.pfcmpge
operates the same way pcfcmpeq
does, only it checks for greater than or equal to, not just equal to.pfcmpgt
operates the same way pfcmpeq
does, only it checks for greater than (not equal to). Basic Arithmetic
pfadd
adds an MMX register and another MMX register or an MMX register and a memory location together. Fairly simple.pfacc
performs an accumulation operation. It adds the top and bottom values of the first register into the bottom of that register, and stores the sum of another register or memory location's top and bottom into the top of the first register.pfsub
subtracts an MMX register from another register or a memory location. Just like pfadd
, but in reverse.pfsubr
performs a reverse-subtract. Instead of subtracting the second parameter from the first (r1=r1-r2), this one subtracts the first from the second (r1=r2-r1).Advanced Arithmetic
pfmul
multiplies two registers or a register and a memory locaion, and stores the results in the first register.pfrcp
stores the reciprocal of a register or memory location into a register. This instruction is only accurate to 14 bits and takes 2 clock cycles to complete. Higher precision can be obtained by using a few more instructions (listed later). This instruction duplicates the result into the top and bottom halves of the destination register.pfrsqrt
performs a reciprocal square root of a memory location or register, and stores it in the top and bottom halves of a destination register, similar to pfrcp
. This instruction is only accurate to about 15 bits, and full precision can be obtained by using a few more instructions (listed below).High Precision Reciprocals and Square Roots
For some applications, a quick approximation of a reciprocal or square root may be satisfactory. However, if more precision is needed, 3DNow! provides a few more instructions that extendpfrcp
and pfrsqrt
above to higher-precision operations. These improve accuracy by using a Newton-Raphston algorithm.pfrcpit1
is the first iteration of Newton-Raphston. It takes two input operands, the first being the number being recriprocated, and the second being the output of that number passed through pfrcp
.pfrcpit2
is essentially the same as pfrcpit2
, except it is the second iteration. Its inputs are the outputs of pfrcp
or pfsqrt
and pfrcpit1
or pfrsqit1
.pfrsqit1
is the first iteration of Newron-Raphson after using pfrsqrt
. It parallels pfrcpit1
for reciprocals.Trademark Information
3DNow! is a registered trademark of Advanced Micro Devices, Inc.MMX is a registered trademark of Intel Corporation or its subsidiaries in the United States and other countries.
- 3DNow! Instruction Set
- 3DNow! — Extended Instruction Set
- Instruction set
- instruction set
- Instruction set
- 3DNow!指令简明参考
- Intel Assemble Instruction Set
- Intel Assemble Instruction Set
- MSIL Instruction Set
- SSE4 Instruction Set
- AltiVec Instruction Set
- ARM instruction set
- The ARM Instruction Set
- ABI与Instruction Set
- REDUCE INSTRUCTION SET COMPUTERS
- Technology: SIMD / MMX / SSE / SSE2 / 3DNow!
- Y86 Instruction Set Architecure(ISA)
- PowerPC application-level instruction set
- C Runtime Library来历, API, MFC, ATL关系
- 40种网页常用小技巧(javascript)
- AMD MMX Extensions
- 用java删除文件夹里的所有文件
- Cyrix EMMX Instructions
- 3DNow! Instruction Set
- 3DNow! — Extended Instruction Set
- Streaming SIMD Extensions (SSE)
- Streaming SIMD Extensions 2 (SSE2)
- 嵌入式技术行业知识
- Streaming SIMD Extensions 3 (SSE3)
- jQuery中一些不常用的方法属性
- Supplemental Streaming SIMD Extensions 3 (SSSE3)
- IIS之Web服务器