如何编写64位程序

来源：互联网发布：linux gdb 编辑：程序博客网时间：2024/04/25 08:47

Everything You Need To Know To Start Programming 64-Bit Windows Systems

Background on 64-bit versions of Windows
Just enough x64 architecture to get by
Developing for x64 with Visual C++ 2005
Debugging techniques for your x64 builds

This article uses the following technologies:
Windows, Win64, Visual Studio 2005

Contents

The x64 Operating System
Just Enough x64 to Get By
Developing for x64 with Visual C++
Making Your Code Win64-Compliant
Debugging
What About Managed Code?
Wrap Up

One of the pleasures of working on the bleeding edge of Windows®is poking around in a new technology to see how it works. I don'treally feel comfortable with an operating system until I have a littleunder-the-hood knowledge. So when the 64-bit editions of Windows XP andWindows Server™ 2003 appeared on the scene, I was all over them.

Thenice thing about Win64 and the x64 CPU architecture is that they'redifferent enough from their predecessors to be interesting, while notrequiring a huge learning curve. While we developers would like tothink that moving to x64 is just a recompile away, the reality is thatwe'll still spend far too much time in the debugger. A good workingknowledge of the OS and CPU is invaluable.

In this article I'll boil down my experiences with Win64 and the x64 architecture to the essentials that a hotshot Win32®programmer needs for the move to x64. I'll assume that you know basicWin32 concepts, basic x86 concepts, and why your code should run onWin64. This frees me to focus on the good stuff. Think of this overviewas a look at just the important differences relative to your knowledgeof Win32 and the x86 architecture.

Onenice thing about x64 systems is that you can use either Win32 or Win64on the same machine without serious performance losses, unlikeItanium-based systems. And despite a few obscure differences betweenthe Intel and AMD x64 implementations, the same x64-compatible build ofWindows should run on either. You don't need one version of Windows forAMD x64 systems and another for Intel x64 systems.

I'vedivided the discussion into three broad areas: OS implementationdetails, just enough x64 CPU architecture to get by, and developing forx64 with Visual C++®.

The x64 Operating System

Inany overview of the Windows architecture, I like to start with memoryand the address space. Although a 64-bit processor could theoreticallyaddress 16 exabytes of memory (2^64), Win64 currently supports 16terabytes, which is represented by 44 bits. Why can't you just load amachine up with 16 exabytes to use all 64 bits? There are a number ofreasons.

Forstarters, current x64 CPUs typically only allow 40 bits (1 terabyte) ofphysical memory to be accessed. The architecture (but no currenthardware) can extend this to up to 52 bits (4 petabytes) Even if thatrestriction was removed, the size of the page tables to map that muchmemory would be enormous.

Justas in Win32, the addressable range is divided into user and kernel modeareas. Each process gets its own unique 8TB at the bottom end, whilekernel mode code lives in the upper 8 terabytes and is shared by allprocesses. The different versions of 64-bit Windows have differentphysical memory limits as shown in Figure 1 and Figure 2.

Figure 2 Physical Memory and CPU Limits

Physical Memory and CPU Limits 32-Bit Models 64-Bit Models Windows XP Professional4GB (1-2 CPUs)128GB (1-2 CPUs)Windows Server 2003, Standard Edition4GB (1-4 CPUs)32GB (1-4 CPUs)Windows Server 2003, Enterprise Edition64GB (1-8 CPUs)1TB (1-8 CPUs)Windows Server 2003, Datacenter Edition 64GB (8-32 CPUs) 1TB (8-64 CPUs)

Figure 1 General Memory Limits

32-Bit Models 64-Bit Models Total virtual address space (based on a single process)4GB 16TBVirtual address space per 32-bit process2GB (3GB if system is booted with /3GB switch)4GB if compiled with /LARGEADDRESSAWARE (2GB otherwise)Virtual address space per 64-bit processNot applicable8TBPaged pool470MB128GBNon-paged pool256MB128GBSystem Page Table Entry (PTE)660MB to 900MB128GB

Alsojust like in Win32, the x64 page size is 4KB. The first 64KB of addressspace is never mapped in, so the lowest valid address you'd expect tosee is 0x10000. Unlike in Win32, system DLLs don't have a default loadaddress near the top of the user mode address range. Instead, they'reloaded above 4GB, typically at addresses around 0x7FF00000000.

Anice feature of many newer x64 processors is support for the CPU NoExecute bit that Windows uses to implement hardware Data ExecutionProtection (DEP). On the x86 platform, many bugs and viruses existbecause the CPU can execute data as if it were legal code bytes. Abuffer overrun (intentional or not) can end up with the CPU executingthrough memory that was intended for data storage. With DEP, the OS canset much more clear boundaries around valid code regions, thus causingthe CPU to trap if execution goes outside these expected boundaries.This helps in the continuing battle to make Windows less vulnerable toattack.

Ina move designed to catch errors, the x64 linker assigns default loadaddresses for executables just above 32 bits (4GB). This helps you toquickly find these areas in existing code after the code has beenported to Win64. Specifically, if a pointer is stored in a 32-bit sizedvalue (a DWORD, for example), it will effectively be truncated whenrunning in a Win64 build, making the pointer invalid and thustriggering an access violation. This trick makes it much easier to findthose nasty pointer bugs.

[Editor's Update - 5/2/2006:Handles are defined as pointer values. Thus in Win64, a handle is 8 bytes, not 4 bytes.]

Thefile format for Win64 is called PE32+. From nearly every viewpoint, theformat is structurally identical to the Win32 PE file. A very fewfields such as the ImageBase in the header have been widened, one fieldwas deleted, and one field was changed to reflect a different CPU type.Figure 3 shows the fields that have changed.

Figure 3 Changes to PE File FIelds

Header Field Change Magic Set to 0x20b instead of 0x10bBaseOfDataDeletedImageBaseWidened to 64 bitsSizeOfStackReserveWidenedSizeOfStackCommitWidenedSizeOfHeapReserveWidenedSizeOfHeapCommit Widened

Beyondthe PE header, there aren't many changes. A few structures such asIMAGE_LOAD_CONFIG and IMAGE_THUNK_DATA simply had some of their fieldswidened to 64 bits. The addition of the PDATA section is moreinteresting, as it highlights one of the major differences between theWin32 and Win64 implementation: exception handling.

Inthe x86 world, exception handling is stack-based. When a Win32 functioncontains try/catch or try/finally code, the compiler emits instructionsthat create a small data block on the stack. In addition, each try datablock points to the previous try data structure, thus forming a linkedlist with the most recently added structure at the list head. Asfunctions are called and exited, the head of the linked list keepsupdating. When an exception occurs, the OS walks the linked list ofblocks on the stack, looking for the appropriate handler. My January1997 MSJ article describes this in much more detail, so I'll keep the description to a minimum here.

Incontrast to the Win32 exception handling, Win64 (both x64 and Itaniumversions) uses table-based exception handling. No linked list of trydata blocks is built on the stack. Instead, each Win64 executablecontains a runtime function table. Each function table entry containsboth the starting and ending address for the function, as well as thelocation of a rich set of data about exception-handling code in thefunction and the function's stack frame layout. See theIMAGE_RUNTIME_FUNCTION_ENTRY structure in WINNT.H and in the x64 SDKfor the nitty-gritty on these structures.

Whenan exception occurs, the OS walks the regular thread stack. As thestack walk encounters each frame and saved instruction pointer, the OSdetermines within which executable module the instruction pointer lies.The OS then searches the runtime function table in that module, locatesthe appropriate runtime function entry, and makes the appropriateexception-processing decisions from that data.

Whatif you're a rocket scientist and you've generated code directly inmemory without an underlying PE32+ module? You're still covered in thiscase. Win64 has a RtlAddFunctionTable API that lets you tell the OSabout your dynamically generated code.

Thedownside to table-based exception handling (relative to the x86stack-based model) is that looking up function table entries from codeaddresses takes more time than just walking a linked list. The upsideis that functions don't have the overhead of setting up a try datablock every time the function executes.

Remember,this is just a quick introduction rather than a full fledgeddescription of x64 exception processing, however exciting that mightbe! For a more in-depth overview of the x64 exception model, be sure toread Kevin Frei's blog entry.

x64-compatibleversions of Windows don't contain dramatic numbers of truly new APIs;most new Win64 APIs were added to the Windows releases for Itaniumprocessors. In the interest of keeping things brief, the main twoexisting APIs of importance are IsWow64Process and GetNativeSystemInfo.These allow Win32 apps to determine if they're running on Win64, and ifso, the true capabilities of the system. Otherwise, a 32-bit processthat calls GetSystemInfo only sees the system capabilities as if it's a32-bit system. For instance, GetSystemInfo can only report the addressrange of 32-bit processes. Figure 4 shows the APIs that were not previously available on x86, but are available for x64.

Figure 4 New 64-Bit APIs

Functionality API Exception HandlingRtlAddFunctionTableRtlDeleteFunctionTableRtlRestoreContextRtlLookupFunctionEntryRtlInstallFunctionTableCallbackRegistryRegDeleteKeyExRegGetValueRegQueryReflectionKeyNUMA (Non-Uniform Memory Access)GetNumaAvailableMemoryNodeGetNumaHighestNodeNumberGetNumaNodeProcessorMaskGetNumaProcessorNodeWOW64 RedirectionWow64DisableWow64FsRedirectionWow64RevertWow64FsRedirectionRegDisableReflectionKeyRegEnableReflectionKeyMiscellaneousGetLogicalProcessorInformationQueryWorkingSetExSetThreadStackGuaranteeGetSystemFileCacheSizeSetSystemFileCacheSizeEnumSystemFirmwareTablesGetSystemFirmwareTable

Whilerunning a fully 64-bit Windows system sounds great, the reality is thatyou'll very likely need to run Win32 code for a while. Towards thatend, x64 versions of Windows include the WOW64 subsystem that letsWin32 and Win64 processes run side-by-side on the same system. However,loading your 32-bit DLL into a 64-bit process, or vice versa, isn'tsupported. (It's a good thing, trust me.) And you can finally kiss goodbye to 16-bit legacy code!

Inx64 versions of Windows, a process that starts from a 64-bit executablesuch as Explorer.exe can only load Win64 DLLs, while a process startedfrom a 32-bit executable can only load Win32 DLLs. When a Win32 processmakes a call to kernel mode—to read a file, for instance—the WOW64 codeintercepts the call silently and invokes the correct x64 equivalentcode in its place.

Ofcourse, processes of different lineages (32-bit versus 64-bit) need tobe able to communicate with each other. Luckily, all the usualinterprocess communication mechanisms that you know and love in Win32also work in Win64, including shared memory, named pipes, and namedsynchronization objects.

Youmight be thinking, "But what about the system directory? The samedirectory can't hold both 32- and 64-bit versions of system DLLs suchas KERNEL32 or USER32, right?" WOW64 magically takes care of this foryou by the doing selective file system redirection. File activity froma Win32 process that would normally go to the System32 directoryinstead goes to a directory named SysWow64. Under the covers, WOW64silently changes these requests to point at the SysWow64 directory. AWin64 system effectively has two /Windows/System32 directories—one withx64 binaries, the other with the Win32 equivalents.

Smoothas it may seem, this can be confusing. For instance, I was at one pointusing (unbeknownst to me) a 32-bit command-line prompt. When I ran DIRon Kernel32.dll in the System32 directory, I got the exact same resultsas when I did the same thing in the SysWow64 directory. It took a lotof head scratching before I figured out that the file systemredirection was working just like it should. That is, even though Ithought I was working in the /Windows/System32 directory, WOW64 wasredirecting the calls to the SysWow64 directory. Incidentally, if youreally do want to get at the 32-bit /Windows/System32 directory from anx64 app, the GetSystemWow64Directory API gives you the correct path. Besure to read the MSDN® documentation for the complete story.

Beyondfile system redirection, another bit of magic performed by WOW64 isregistry redirection. Consider my earlier statement about Win32 DLLsnot loading into Win64 processes, and then think about COM and its useof the registry to load in-process server DLLs. What if a 64-bitapplication uses CoCreateInstance to create an object that'simplemented in a Win32 DLL? The DLL can't load, right? WOW64 saves theday again by redirecting access from 32-bit applications to the/Software/Classes (and related) registry nodes. The net effect is thatWin32 applications have a different (but mostly parallel) view of theregistry than x64 applications. As you'd expect, the OS provides anescape hatch for 32-bit applications to read the actual 64-bit registryvalue by specifying new flag values when calling RegOpenKey and friends.

Drillingdown a bit, the last few OS differences near and dear to my heartconcern thread local data. In x86 versions of Windows, the FS registeris used to point at per-thread memory areas, including the "last error"and Thread Local Storage (GetLastError and TlsGetValue, respectively).On x64 versions of Windows, the FS register has been replaced by the GSregister. Otherwise they work pretty much in the same manner.

Althoughthis article focuses on x64 from the user mode perspective, there isone important kernel mode architectural addition to point out. New inWindows for x64 is a technology called PatchGuard, which is aimed atboth security and robustness. In a nutshell, user mode programs ordrivers that alter key kernel data structures such as the syscalltables and the interrupt dispatch table (IDT) create security holes andpotential stability problems. For the x64 architecture, the Windowsfolks decided that modifying kernel memory in unsupported ways wouldn'tbe tolerated. The technology to enforce this is PatchGuard. It uses akernel mode thread to monitor changes to critical kernel memorylocations. If that memory is changed, the system stops via a bugcheck.

Allthings considered, if you're familiar with the Win32 architecture andhow to write native code that runs on it, you won't find many surprisesin the move to Win64. You can consider it to be mostly just a roomierenvironment.

Just Enough x64 to Get By

Nowlet's take a look at the CPU architecture itself, since a basicunderstanding of the CPU's instruction set makes development(especially debugging!) much easier. The first thing you'll notice incompiler-generated x64 code is how remarkably similar it is to the x86code you know and love. This definitely wasn't the case for you folkswho learned Intel IA64 coding.

Thesecond thing you'll notice shortly thereafter is that the registernames are slightly different than you're used to, and that there's alot more of them. General-purpose x64 registers have names that beginwith R, as in RAX, RBX, and so on. This is an evolution of the oldE-based naming scheme for 32-bit x86 registers. Way back in the mistsof time, the 16-bit AX register became the 32-bit EAX, the 16-bit BXbecame the 32-bit EBX, and so on. Transitioning from the 32-bitversions, all the E registers become R registers in their 64-bitincarnations. Thus, RAX is the successor to EAX, RBX succeeds EBX, RSIreplaces ESI, and so on down the line.

Inaddition, eight new general-purpose registers (R8-R15) were added. Thelist of primary 64-bit general-purpose registers looks like Figure 5.

Figure 5

RAXRBXRCXRDXRSIRDIRSPRBPR8R9R10R11R12R13R14R15

Also,the 32-bit EIP register becomes the RIP register. Of course 32-bitinstructions must continue to run, so the original, smaller form factorversions of these registers (EAX, AX, AL, AH, and so on) are stillavailable.

Lestyou graphics and scientific programming gurus feel left out, the x64CPU also has 16 128-bit SSE2 registers, which are named XMM0 throughXMM15. The full set of x64 registers preserved by Windows can be foundin the appropriately #ifdef'ed _CONTEXT structure defined in WINNT.H.

Atany given time, an x64 CPU is operating in either legacy 32-bit mode orin 64-bit mode. In 32-bit mode, the CPU decodes and acts oninstructions just like any other x86 class CPU. In 64-bit mode, the CPUhas made some slight adjustment to certain instruction encodings tosupport the new registers and instructions.

Ifyou're familiar with the CPU opcode encoding diagrams, you'll rememberthat the space for new instruction encodings was disappearing fast, andsqueezing in eight new registers isn't an easy task. One way to do thiswas to eliminate a few rarely used instructions. So far, the onlyinstructions I miss are 64-bit versions of PUSHAD and POPAD, which saveand restore all the general purpose registers on the stack. Another waywas to free up instruction encoding space was to eliminate segmentsentirely in 64-bit mode. Thus the life of CS, DS, ES, SS, FS, and GScome to an end. Not that many people will miss them.

With addresses being 64 bits, you might be wondering about code size. For instance, this is a common 32-bit instruction:

Copy Code

CALL DWORD PTR [XXXXXXXX]

Here, the X'ed portion is a 32-bit address. In 64-bit mode, does thisbecome a 64-bit address, thereby turning a 5-byte instruction into 9bytes? Luckily, the answer is no. The instruction remains the samesize. In 64-bit mode, the 32-bit operand portion of the instruction istreated as a data offset relative to the current instruction. Anexample makes this clearer. In 32-bit mode, here's the instruction tocall the 32-bit pointer value stored at address 00020000h:

Copy Code

00401000: CALL DWORD PTR [00020000h]

In64-bit mode, the same opcodebytes call the 64-bit pointer value storedat address 00421000h (4010000h + 20000h). A little thought reveals thatthis relative addressing mode has important ramifications if you'regenerating code yourself. You can't just specify an 8-byte pointervalue in an instruction. Instead, you need to specify a 32-bit relativeaddress to a memory location where the actual 64-bit target addressresides. Thus, there's an unspoken assumption that the 64-bit targetpointer must lie within 2GB of the instruction that uses it. Not thatbig a deal for most folks, but if you do dynamic code generation ormodify existing code in memory, it can byte you!

Akey advantage of all the x64 registers is that compilers can finallygenerate code that passes most parameters in registers rather than onthe stack. Pushing parameters on the stack incurs memory accesses.We've all had it drilled into our heads that a memory access that's notfound in the CPU cache causes the CPU to stall for many cycles waitingfor your regular RAM memory to catch up.

Indesigning the calling convention, the x64 architecture took advantageof the opportunity to clean up the mess of existing Win32 callingconventions such as __stdcall, __cdecl, __fastcall, _thiscall, and soon. In Win64, there's just one native calling convention, and modifierslike __cdecl are ignored by the compiler. The reduction in callingconvention flavors is a wonderful boon for debuggability, among otherthings.

Theprimary thing to know about the x64 calling convention is itssimilarity to the x86 fastcall convention. Using the x64 convention,the first four integer arguments (from left to right) are passed in64-bit registers designated for that purpose:

Copy Code

RCX: 1st integer argument
RDX: 2nd integer argument
R8: 3rd integer argument
R9: 4th integer argument

Integerarguments beyond the first four are passed on the stack. The thispointer is considered an integer argument, so can always be found inthe RCX register. As for floating point parameters, the first four arepassed in the XMM0 through XMM3 registers, with subsequent floatingpoint parameters placed on the thread stack.

Drillinginto the calling convention a bit, even though an argument can bepassed in a register, the compiler still reserves space on the stackfor it by decrementing the RSP register. At a minimum, each functionmust reserve 32 bytes (four 64-bit values) on the stack. This spaceallows registers passed into the function to be easily copied to awell-known stack location. The callee function isn't required to spillthe input register params to the stack, but the stack space reservationensures that it can if needed. Of course, if more than four integerparameters are passed, the appropriate additional amount of stack spacemust be reserved.

Let'slook at an example. Consider a function passing two integer parametersto a child function. The compiler not only sticks the values in RCX andRDX, it also subtracts 32 bytes from the RSP stack pointer register. Inthe callee function, the parameters can be accessed in the registers(RCX and RDX). If the callee code needs the register for some otherpurpose, it can copy the registers into the reserved 32-byte stackregion. Figure 6 shows the registers and stack after six integer parameters have been passed.

Figure 6 Passing Integers

Parameterstack cleanup is a bit funny on x64 systems. Technically, the caller isresponsible for cleaning up the stack, not the callee. However, you'llrarely see RSP adjusted anywhere other than in the prologue andepilogue code. Unlike the x86 compiler, which explicitly adds andremoves parameters on to the stack with PUSH and POP instructions, thex64 code generator reserves enough stack space to call whatever thelargest target function (parameter-wise) uses. It then uses the samestack region over and over to set up the parameters when calling childfunctions.

Putanother way, the RSP rarely changes. This is quite different from x86code, where the ESP value fluctuates as parameters are added andcleared from the stack.

Anexample helps here. Consider an x64 function that calls three otherfunctions. The first function takes four params (0x20 bytes), thesecond takes 12 params (0x60 bytes), and the third takes eight params(0x40 bytes). In the prologue, the generated code simply reserves 0x60bytes on the stack and copies parameter values into the appropriatespots within the 0x60 bytes so that the target functions can locatethem.

A good description of the more detailed intricacies of the x64 calling convention can be found in Raymond Chen's blog.I won't belabor all the details, but here are some highlights. First,integer parameters that are less than 64-bits are sign extended, thenstill passed via the appropriate register, if among the first fourinteger parameters. Second, at no point should any parameter be in astack location that's not a multiple of 8 bytes, thus preserving 64-bitalignment. Any argument that's not 1, 2, 4, or 8 bytes (includingstructs) is passed by reference. And finally, structs and unions of 8,16, 32, or 64-bits are passed as if they were integers of the same size.

Afunction's return value is stored in the RAX register. The exception isfor floating-point types, which are returned in XMM0. Across calls,these registers must be preserved: RBX, RBP, RDI, RSI, R12, R13, R14,and R15. These register are volatile and can be destroyed: RAX, RCX,RDX, R8, R9, R10, and R11.

EarlierI mentioned that the OS walks stack frames as part of the exceptionhandling mechanism. If you've ever written stack-walking code, you knowthat the almost ad hoc nature of Win32 frame layout makes the process atricky proposition. The situation is much better on x64 systems. If afunction allocates stack space, calls other functions, preserves anyregisters, or uses exception handling, that function must use awell-defined set of instructions for generating standard prologues andepilogues.

Enforcinga standard way of creating a function's stack frame is one way the OScan guarantee (in theory) that the stack can always be walked. Inaddition to consistent, standardized prologues, the compiler and linkermust also create an associated function table data entry. For thecurious, all these function entries end up in table that's an array ofIMAGE_FUNCTION_ENTRY64, defined in winnt.h. How do you find this table?It's pointed to by the IMAGE_DIRECTORY_ENTRY_EXCEPTION entry in the PEheader's DataDirectory field.

I'vecovered a lot of architectural ground in a short amount of space.However, with an understanding of these big picture concepts and anexisting knowledge of 32-bit assembly language, you should be able tounderstand x64 instruction in the debugger within a relatively shortperiod of time. As always, practice makes perfect.

Developing for x64 with Visual C++

Although it was possible to write x64 code with the Microsoft® C++ compiler prior to Visual Studio®2005, it was a clunky experience in the IDE. For this article I'lltherefore assume that you're working with Visual Studio 2005 and thatyou've selected the x64 tools, which aren't enabled in a defaultinstallation. I'll also assume that you have an existing Win32 usermode project in C++ that you'd like to build for both x86 and x64platforms.

Thefirst step in building for x64 is to create the 64-bit buildconfiguration. As a good Visual Studio user, you're already aware thatyour projects have two configurations by default: Debug and Retail. Allyou need to do here is create two more configurations: Debug and Retailin their x64 guises.

Beginwith your existing project/solution loaded. On the Build menu, selectConfiguration Manager. In the Configuration Manager dialog box, fromthe Active solution platform dropdown menu, select New (see Figure 7). You should now see another dialog entitled New Solution Platform.

Figure 7 Creating a New Build Configuration

Select x64 as your new platform (see Figure 8)and leave the other settings in their default state; then click OK.That's it! You should now have four possible build configurations:Win32 Debug, Win32 Retail, x64 Debug, and x64 Retail. Using theConfiguration Manager, you can easily switch between them.

Nowlet's see how x64-compliant your code is. Make the x64 Debugconfiguration the default, and then build the project. Unless the codeis trivial, odds are that you'll get some compiler errors that don'toccur in the Win32 configuration. Unless you've completely forsaken allprinciples of writing portable C++ code, it's relatively easy to fixthese issues so that your code is truly Win32 and x64 ready, withoutrequiring reams of conditionally compiled code.

Figure 8 Selecting the Build Platform

Making Your Code Win64-Compliant

Probablythe biggest effort in converting Win32 code to x64 is in getting yourtype definitions correct. Remember my earlier discussion of the Win64type system? By using the Windows typedef types rather than the C++compiler's native types (int, long, and so on) the Windows headers makeit easy to write clean Win32 x64 code. You should continue thisconsistency in your own code. For instance, if Windows passes you anHWND, don't store it in a FARPROC just because it's handy and easy.

Havingupgraded a lot of code, perhaps the most common and easy error I'veseen is in assuming that a pointer value can be stored or transportedin a 32-bit type such as an int, long, or even a DWORD. Pointers inWin32 and Win64 are different sizes by necessity, while integer typesremain the same size. However, it's also not feasible for the compilerto disallow pointers from being stored in an integral type. It's a C++habit that's just too ingrained.

Tothe rescue come the _PTR types defined in the Windows headers. Typessuch as DWORD_PTR, INT_PTR, and LONG_PTR let you declare variables thatare of integral type, but that are always wide enough to store apointer on the target platform. For instance, a variable defined astype DWORD_PTR is a 32-bit integer when compiled for Win32 and 64-bitwhen compiled for Win64. With practice, it became second nature for mewhen declaring types to ask, "Do I want a DWORD here, or do I reallymean DWORD_PTR?"

Asyou'd expect, there might be occasions when you specify exactly howmany bytes you need for an integer type. The same header file(Basetsd.h) that defines DWORD_PTR and friends also defines sizespecific integers such as INT32, INT64, INT16, UINT32, and DWORD64.

Anotherissue related to type size differences is printf and sprintfformatting. I'm certainly guilty of using %X or %08X to format pointervalues in the past, and have been bitten when I ran that code on x64systems. The correct way is to use %p, which automatically accounts forthe pointer size on the target platform. In addition, printf andsprintf have the I prefix for size-dependent types. For instance, youmight use %Iu to print out a UINT_PTR variable. Likewise, if you knowthe variable will always be a 64-bit signed value, you could use %I64d.

Havingcleaned up errors caused by type definitions that aren't Win64 ready,you may still have code that can only run in x86 mode. Or perhaps youneed to write two versions of a function, one for Win32 and the otherfor x64. This is where a set of preprocessor macros come in handy:

Copy Code

_M_IX86
_M_AMD64
_WIN64

Proper use of the preprocessor macros is essential towriting correct cross-platform code. _M_IX86 and _M_AMD64 are definedonly when compiling for the specified processor. _WIN64 is defined whencompiling for any 64-bit version of Windows, including the Itaniumedition.

Whenusing a preprocessor macro, think hard about what you want. Forinstance, is the code truly specific to the x64 processor and nothingelse? Then use something like:

Copy Code

#ifdef _M_AMD64

On the other hand, if the same code could work on both x64 and Itanium, you might be better off with something like:

Copy Code

#ifdef _WIN64

Aconvention that I've found useful is that whenever I use one of thesemacros, I always explicitly create the #else cases so that I know earlyif I've forgotten something. Consider the following badly written code:

Copy Code

#ifdef _M_AMD64
// My x64 code here
#else
// My x86 code here
#endif

What happens if I now compile this for a third CPUarchitecture? My x86 code will unintentionally be compiled. A muchbetter way to phrase the previous code is like this:

Copy Code

#ifdef _M_AMD64
// My x64 code here
#elif defined (_M_IX86)
// My x86 code here
#else
#error !!! Need to write code for this architecture
#endif

Theone part of my Win32 code that didn't port easily to x64 was my inlineassembler, which Visual C++ doesn't support for the x64 target. Fearnot, assembler heads. A 64-bit MASM (ML64.exe) is provided and isdocumented via MSDN. ML64.exe and other x64 tools (including CL.EXE andLINK.EXE) are available from the command line. You can just run theVCVARS64.BAT file, which adds them to your path.

Debugging

You'vefinally gotten your code to compile cleanly on Win32 and x64 builds.The final piece of the puzzle is running and debugging it. Regardlessof whether you build your x64 version on an x64 box, you'll need to usethe Visual Studio remote debugging features to debug in x64 mode.Luckily, if you're running the Visual Studio IDE on the 64-bit machine,the IDE takes care of all of the following steps for you. If, for somereason, you can't use remote debugging, your other option is to use thex64 version of WinDbg. However, you'll be giving up many of the debugging niceties found in the Visual Studio debugger.

Ifyou've never used remote debugging, there's not much cause for concern.Once you get it set up, remote debugging can be as seamless as local.

Thefirst step is to install the 64-bit MSVSMON on the target machine. Thisis typically done by running the RdbgSetup program that comes withVisual Studio. Once MSVSMON is running, use the Tools menu to configurethe appropriate security settings (or lack thereof) for the connectionbetween your 32-bit Visual Studio and the MSVSMON instance.

Next,within Visual Studio you'll want to configure your project to useremote debugging for x64 code, rather than attempting local debugging.You can start this process by bringing up the project's properties (seeFigure 9).

Figure 9 Debugging Properties

Makesure your 64-bit configuration is current and select Debugging underConfiguration Properties. Near the top is a drop-down menu titledDebugger to launch. Normally this is set to Local Windows Debugger.Change this to Remote Windows Debugger. Below that, you can specify theremote command to execute when you start debugging (the program name,for example) as well as the remote machine name and connection type.

Ifyou set up everything properly, it's possible to start debugging yourx64 target application in the same way you start Win32 apps. You'llknow you've successfully connected to the MSVSMON because its tracewindow emits a "connected" string each time the debugger successfullyattaches. From here, it's mostly the same Visual Studio debugger thatyou know and love. Be sure to bring up the registers window and look atall those glorious 64-bit registers, and pop into the disassemblywindow to check out that oh-so-familiar-but-just-a-little-different x64assembly code.

Notethat a 64-bit minidump can't be loaded directly into Visual Studio like32-bit dumps. Instead, you'll need to use the Remote Debugging. Also,interop debugging between native and managed 64-bit code isn'tcurrently supported in Visual Studio 2005.

What About Managed Code?

Oneof the great things about coding with the Microsoft .NET Framework isthat much of the underlying operating system is abstracted away forgeneral-purpose code. In addition, the IL instruction format is CPUagnostic. As a result, at a theoretical level, it should possible for a.NET-based program binary built on a Win32 system to run unmodified onan x64 system. The reality is a little bit more complicated.

The.NET Framework 2.0 comes with an x64 version. After installing this onmy x64 machine, I was able to run the same .NET executables that I'dpreviously run on my Win32 box. How cool is that? Of course there's noguarantee that every single .NET-based program will run equally well onWin32 and x64 without a recompile, but it does "just work" a reasonablepercentage of the time.

If your managed code explicitly invokes native code (for instance, through P/Invoke in C# or Visual Basic®),you will very likely run into trouble if you try to run against the64-bit CLR. However, there is a compiler switch (/platform) that allowsyou to be more specific about which platform your code should run on.For instance, you might want your managed code to run in WOW64, eventhough a 64-bit CLR is available.

Wrap Up

Allthings considered, moving to an x64 version of Windows was a relativelypainless experience for me. Once you have a good grasp of therelatively minor differences in the OS architecture and tools, it'seasy to keep one code base running on both platforms. Visual Studio2005 makes the effort substantially easier. And more x64-specificversions of device drivers and tools such as Process Explorer fromSysInternals.com are appearing every day, so there's no reason not tojump in!

Matt Pietrek has co-written several books on Windows system-level programming as well as the Under the Hood column for MSDN Magazine. Previously he was a primary architect for the NuMega/Compuware BoundsChecker series of products. He now works on the Visual Studio team at Microsoft.