A Catalog of Local Windows Kernel-mode Backdoor Techniques

来源:互联网 发布:淘宝装修怎么上传图片 编辑:程序博客网 时间:2024/05/10 13:32
A Catalog of Local Windows Kernel-mode Backdoor TechniquesAugust, 2007skape & Skywing mmiller@hick.org & Skywing@valhallalegends.com        Abstract: This paper presents a detailed catalog of techniques that can beused to create local kernel-mode backdoors on Windows.  These techniquesinclude function trampolines, descriptor table hooks, model-specific registerhooks, page table modifications, as well as others that have not previouslybeen described.  The majority of these techniques have been publicly known farin advance of this paper.  However, at the time of this writing, there appearsto be no detailed single point of reference for many of them.  The intentionof this paper is to provide a solid understanding on the subject of localkernel-mode backdoors.  This understanding is necessary in order to encouragethe thoughtful discussion of potential countermeasures and perceivedadvancements.  In the vein of countermeasures, some additional thoughts aregiven to the common misconception that PatchGuard, in its current design, canbe used to prevent kernel-mode rootkits.1) IntroductionThe classic separation of privileges between user-mode and kernel-mode hasbeen a common feature included in most modern operating systems.  Thisseparation allows operating systems to make security guarantees relating toprocess isolation, kernel-user isolation, kernel-mode integrity, and so on.These security guarantees are needed in order to prevent a lesser privilegeduser-mode process from being able to take control of the system itself.  Akernel-mode backdoor is one method of bypassing these security restrictions.There are many different techniques that can be used to backdoor the kernel.For the purpose of this document, a backdoor will be considered to besomething that provides access to resources that would otherwise normally berestricted by the kernel.  These resources might include executing code withkernel-mode privileges, accessing kernel-mode data, disabling security checks,and so on.  To help further limit the scope of this document, the authors willfocus strictly on techniques that can be used to provide local backdoors intothe kernel on Windows.  In this context, a local backdoor is a backdoor thatdoes not rely on or make use of a network connection to provide access toresources.  Instead, local backdoors can be viewed as ways of weakening thekernel in an effort to provide access to resources from non-privilegedentities, such as user-mode processes.The majority of the backdoor techniques discussed in this paper have beenwritten about at length and in great detail in many different publications[20,8, 12, 18, 19, 21, 25, 26].  The primary goal of this paper is to act as apoint of reference for some of the common, as well as some of thenot-so-common, local kernel-mode backdoor techniques.  The authors haveattempted to include objective measurements for each technique along with adescription of how each technique works.  As a part of defining theseobjective measurements, the authors have attempted to research the origins ofsome of the more well-known backdoor techniques.  Since many of thesetechniques have been used for such a long time, the origins have provensomewhat challenging to uncover.The structure of this paper is as follows.  In , each of the individualtechniques that can be used to provide a local kernel-mode backdoor arediscussed in detail.   provides a brief discussion into general strategiesthat might be employed to prevent some of the techniques that are discussed.attempts to refute some of the common arguments against preventing kernel-modebackdoors in and of themselves.  Finally,  attempts to clarify why Microsoft'sPatchGuard should not be considered a security solution with respect tokernel-mode backdoors.2) TechniquesTo help properly catalog the techniques described in this section, the authorshave attempted to include objective measurements of each technique.  Thesemeasurements are broken down as follows:- Category  The authors have chosen to adopt Joanna Rutkowska's malware categorization in  the interest of pursuing a standardized classification[34].  This model describes  three types of malware.  Type 0 malware categorizes non-intrusive malware;  Type I includes malware that modifies things that should otherwise never be  modified (code segments, MSRs, etc); Type II includes malware that modifies  things that should be modified (global variables, other data); Type III is not  within the scope of this document[33, 34].  In addition to the four malware types described by Rutkowska, the authors  propose Type IIa which would categorize writable memory that should  effectively be considered write-once in a given context.  For example, when a  global DPC is initialized, the DpcRoutine can be considered write-once.  The  authors consider this to be a derivative of Type II due to the fact that the  memory remains writable and is less likely to be checked than that of Type I.- Origin  If possible, the first known instance of the technique's use or some  additional background on its origin is given.- Capabilities  The capabilities the backdoor offers.  This can be one or more of the  following: kernel-mode code execution, access to kernel-mode data, access to  restricted resources.  If a technique allows kernel-mode code execution,  then it implicitly has all other capabilities listed.- Considerations  Any restrictions or special points that must be made about the use of a  given technique.- Covertness  A description of how easily the use of a given technique might be detected.Since many of the techniques described in this document have been known forquite some time, the authors have taken a best effort approach to identifyingsources of the original ideas.  In many cases, this has proved to be difficultor impossible.  For this reason, the authors request that any inaccuracy incitation be reported so that it may be corrected in future releases of thispaper.2.1) Image PatchesPerhaps the most obvious approach that can be used to backdoor the kernelinvolves the modification of code segments used by the kernel itself.  Thiscould include modifying the code segments of kernel-mode images likentoskrnl.exe, ndis.sys, ntfs.sys, and so on.  By making modifications to thesecode segments, it is possible to hijack kernel-mode execution whenever ahooked function is invoked.  The possibilities surrounding the modification ofcode segments are limited only by what the kernel itself is capable of doing.2.1.1) Function Prologue HookingFunction hooking is the process of intercepting calls to a given function byredirecting those calls to an alternative function.  The concept of functionhooking has been around for quite some time and it's unclear who originallypresented the idea.  There are a number of different libraries and papers thatexist which help to facilitate the hooking of functions[21].  With respect tolocal kernel-mode backdoors, function hooking is an easy and reliable methodof creating a backdoor.  There are a few different ways in which functions canbe hooked.  One of the most common techniques involves overwriting theprologue of the function to be hooked with an architecture-specific jumpinstruction that transfers control to an alternative function somewhere elsein memory.  This is the approach taken by Microsoft's Detours library.  Whileprologue hooks are conceptually simple, there is actually quite a bit of codeneeded to implement them properly.In order to implement a prologue hook in a portable and reliable manner, it isoften necessary to make use of a disassembler that is able to determine thesize, in bytes, of individual instructions.  The reason for this is that inorder to perform the prologue overwrite, the first few bytes of the functionto be hooked must be overwritten by a control transfer instruction (typicallya jump).  On the Intel architecture, control transfer instructions can haveone of three operands: a register, a relative offset, or a memory operand.Each operand type controls the size of the jump instruction that will beneeded: 2 bytes, 5 bytes, and 6 bytes, respectively.  The disassembler makesit possible to copy the first n instructions from the function's prologueprior to performing the overwrite.  The value of n is determined bydisassembling each instruction in the prologue until the number of bytesdisassembled is greater than or equal to the number of bytes that will beoverwritten when hooking the function.The reason the first n instructions must be saved in their entirety is to makeit possible for the original function to be called by the hook function.  Inorder to call the original version of the function, a small stub of code mustbe generated that will execute the first n instructions of the function'soriginal prologue followed by a jump to instruction n + 1 in the originalfunction's body.  This stub of code has the effect of allowing the originalfunction to be called without it being diverted by the prologue overwrite.This method of implementing function prologue hooks is used extensively byDetours and other hooking libraries[21].Recent versions of Windows, such as XP SP2 and Vista, include image files thatcome with a more elegant way of hooking a function with a function prologueoverwrite.  In fact, these images have been built with a compiler enhancementthat was designed specifically to improve Microsoft's ability to hook its ownfunctions during runtime.  The enhancement involves creating functions with atwo byte no-op instruction, such as a mov edi, edi, as the first instructionof a function's prologue.  In addition to having this two byte instruction,the compiler also prefixes 5 no-op instructions to the function itself.  Thetwo byte no-op instruction provides the necessary storage for a two byterelative short jump instruction to be placed on top of it.  The relative shortjump, in turn, can then transfer control into another relative jumpinstruction that has been placed in the 5 bytes that were prefixed to thefunction itself.  The end result is a more deterministic way of hooking afunction using a prologue overwrite that does not rely on a disassembler.  Acommon question is why a two byte no-op instruction was used rather than twoindividual no-op instructions.  The answer for this has two parts.  First, atwo byte no-op instruction can be overwritten in an atomic fashion whereasother prologue overwrites, such as a 5 byte or 6 byte overwrite, cannot.  Thesecond part has to do with the fact that having a two byte no-op instructionprevents race conditions associated with any thread executing code from withinthe set of bytes that are overwritten when the hook is installed.  This racecondition is common to any type of function prologue overwrite.To better understand this race condition, consider what might happen if theprologue of a function had two single byte no-op instructions.  Prior to thisfunction being hooked, a thread executes the first no-op instruction.  Inbetween the execution of this first no-op and the second no-op, the functionin question is hooked in the context of a second thread and the first twobytes are overwritten with the opcodes associated with a relative short jumpinstruction, such as 0xeb and 0xf9.  After the prologue overwrite occurs, thefirst thread begins executing what was originally the second no-opinstruction.  However, now that the function has been hooked, the no-opinstruction may have been changed from 0x90 to 0xf9.  This may have disastrouseffects depending on the context that the hook is executed in.  While thisrace condition may seem unlikely, it is nevertheless feasible and cantherefore directly impact the reliability of any solution that uses prologueoverwrites in order to hook functions.Category: Type IOrigin: The concept of patching code has ``existed since the dawn of digitalcomputing''[21].Capabilities: Kernel-mode code executionConsiderations: The reliability of a function prologue hook is directlyrelated to the reliability of the disassembler used and the number of bytesthat are overwritten in a function prologue.  If the two byte no-opinstruction is not present, then it is unlikely that a function prologueoverwrite will be able to be multiprocessor safe.  Likewise, if a disassemblerdoes not accurately count the size of instructions in relation to the actualprocessor, then the function prologue hook may fail, leading to an unexpectedcrash of the system.  One other point that is worth mentioning is that authorsof hook functions must be careful not to inadvertently introduce instabilityissues into the operating system by failing to properly sanitize and checkparameters to the function that is hooked.  There have been many exampleswhere legitimate software has gone the route of hooking functions withouttaking these considerations into account[38].Covertness: At the time of this writing, the use of function prologueoverwrites is considered to not be covert.  It is trivial for tools, such asJoanna Rutkowska's System Virginity Verifier[32], to compare the in-memory versionof system images with the on-disk versions in an effort to detect in-memoryalterations.  The Windows Debugger (windbg) will also make an analyst aware ofdifferences between in-memory code segments and their on-disk counterparts.2.1.2) Disabling SeAccessCheckIn Phrack 55, Greg Hoglund described the benefits of patching nt!SeAccessCheckso that it never returns access denied[19].  This has the effect of causing accesschecks on securable objects to always grant access, regardless of whether ornot the access would normally be granted.  As a result, non-privileged userscan directly access otherwise privileged resources.  This simple modificationdoes not directly make it possible to execute privileged code, but it doesindirectly facilitate it by allowing non-privileged users to interact with andmodify system processes.Category: Type IOrigin: Greg Hoglund was the first person to publicly identify this techniquein September, 1999[19].Capabilities: Access to restricted resources.Covertness: Like function prologue overwrites, the nt!SeAccessCheck patch canbe detected through differences between the mapped image of ntoskrnl.exe andthe on-disk version.2.2) Descriptor TablesThe x86 architecture has a number of different descriptor tables that are usedby the processor to handle things like memory management (GDT), interruptdispatching (IDT), and so on.  In addition to processor-level descriptortables, the Windows operating system itself also includes a number of distinctsoftware-level descriptor tables, such as the SSDT.  The majority of thesedescriptor tables are heavily relied upon by the operating system andtherefore represent a tantalizing target for use in backdoors.  Like thefunction hooking technique described in , all of the techniques presented inthis subsection have been known about for a significant amount of time.  Theauthors have attempted, when possible, to identify the origins of eachtechnique.2.2.1) IDTThe Interrupt Descriptor Table (IDT) is a processor-relative structure that isused when dispatching interrupts.  Interrupts are used by the processor as ameans of interrupting program execution in order to handle an event.Interrupts can occur as a result of a signal from hardware or as a result ofsoftware asserting an interrupt through the int instruction[23].  The IDT contains256 descriptors that are associated with the 256 interrupt vectors supportedby the processor.  Each IDT descriptor can be one of three types of gatedescriptors (task, interrupt, trap) which are used to describe where and howcontrol should be transferred when an interrupt for a particular vectoroccurs.  The base address and limit of the IDT are stored in the idtr registerwhich is populated through the lidt instruction.  The current base address andlimit of the idtr can be read using the sidt instruction.The concept of an IDT hook has most likely been around since the origin of theconcept of interrupt handling.  In most cases, an IDT hook works byredirecting the procedure entry point for a given IDT descriptor to analternative location.  Conceptually, this is the same process involved inhooking any function pointer (which is described in more detail in ).  Thedifference comes as a result of the specific code necessary to hook an IDTdescriptor.On the x86 processor, each IDT descriptor is an eight byte data structure.IDT descriptors that are either an interrupt gate or trap gate descriptorcontain the procedure entry point and code segment selector to be used whenthe descriptor's associated interrupt vector is asserted.  In addition tocontaining control transfer information, each IDT descriptor also containsadditional flags that further control what actions are taken.  The Windowskernel describes IDT descriptors using the following structure:kd> dt _KIDTENTRY   +0x000 Offset           : Uint2B   +0x002 Selector         : Uint2B   +0x004 Access           : Uint2B   +0x006 ExtendedOffset   : Uint2BIn the above data structure, the Offset field holds the low 16 bits of theprocedure entry point and the ExtendedOffset field holds the high 16 bits.Using this knowledge, an IDT descriptor could be hooked by redirecting theprocedure entry point to an alternate function.  The following codeillustrates how this can be accomplished:typedef struct _IDT{  USHORT          Limit;  PIDT_DESCRIPTOR Descriptors;} IDT, *PIDT;static NTSTATUS HookIdtEntry(  IN UCHAR DescriptorIndex,  IN ULONG_PTR NewHandler,  OUT PULONG_PTR OriginalHandler OPTIONAL){  PIDT_DESCRIPTOR Descriptor = NULL;  IDT             Idt;  __asm sidt [Idt]  Descriptor = &Idt.Descriptors[DescriptorIndex];  *OriginalHandler =    (ULONG_PTR)(Descriptor->OffsetLow +                (Descriptor->OffsetHigh << 16));  Descriptor->OffsetLow  =    (USHORT)(NewHandler & 0xffff);  Descriptor->OffsetHigh =    (USHORT)((NewHandler >> 16) & 0xffff);  __asm lidt [Idt]  return STATUS_SUCCESS;}In addition to hooking an individual IDT descriptor, the entire IDT can behooked by creating a new table and then setting its information using the lidtinstruction.Category: Type I; although some portions of the IDT may be legitimatelyhooked.Origin: The IDT hook has its origins in Interrupt Vector Table (IVT) hooks.In October, 1999, Prasad Dabak et al wrote about IVT hooks[31].  Sadly, they alsoseemingly failed to cite their sources.  It's certain that IVT hooks haveexisted prior to 1999.  The oldest virus citation the authors could find wasfrom 1994, but DOS was released in 1981 and it is likely the first IVT hookswere seen shortly thereafter.  A patent that was filed in December, 1985entitled Dual operating system computer talks about IVT ``relocation'' in amanner that suggests IVT hooking of some form.Capabilities: Kernel-mode code execution.Covertness: Detection of IDT hooks is often trivial and is a common practicefor rootkit detection tools[32].2.2.2)  GDT / LDTThe Global Descriptor Table (GDT) and Local Descriptor Table (LDT) are used tostore segment descriptors that describe a view of a system's address space. Each processor has its own GDT.  Segment descriptors include the base address,limit, privilege information, and other flags that are used by the processorwhen translating a logical address (seg:offset) to a linear address.  Segmentselectors are integers that are used to indirectly reference individualsegment descriptors based on their offset into a given descriptor table.Software makes use of segment selectors through segment registers, such as CS,DS, ES, and so on.  More detail about the behavior on segmentation can befound in the x86 and x64 system programming manuals[1].In Phrack 55, Greg Hoglund described the potential for abusing conforming codesegments[19].  A conforming code segment, as opposed to a non-conforming codesegment, permits control transfers where CPL is numerically greater than DPL.However, the CPL is not altered as a result of this type of control transfer.As such, effective privileges of the caller are not changed.  For this reason,it's unclear how this could be used to access kernel-mode memory due to thefact that page protections would still prevent lesser privileged callers fromaccessing kernel-mode pages when paging is enabled.Derek Soeder identified an awesome flaw in 2003 that allowed a user-modeprocess to create an expand-down segment descriptor in the calling process'LDT[40].  An expand-down segment descriptor inverts the meaning of the limit andbase address associated with a segment descriptor.  In this way, the limitdescribes the lower limit and the base address describes the upper limit.  Thereason this is useful is due to the fact that when kernel-mode routinesvalidate addresses passed in from user-mode, they assume flat segments thatstart at base address zero.  This is the same thing as assuming that a logicaladdress is equivalent to a linear address.  However, when expand-down segmentdescriptors are used, the linear address will reference a memory location thatcan be in stark contrast to the address that's being validated by kernel-mode.In order to exploit this condition to escalate privileges, all that'snecessary is to identify a system service in kernel-mode that will run withescalated privileges and make use of segment selectors provided by user-modewithout properly validating them.  Derek gives an example of a MOVSinstruction in the int 0x2e handler.  This trick can be abused in the contextof a local kernel-mode backdoor to provide a way for user-mode code to be ableto read and write kernel-mode memory.In addition to abusing specific flaws in the way memory can be referencedthrough the GDT and LDT, it's also possible to define custom gate descriptorsthat would make it possible to call code in kernel-mode from user-mode[23].  Oneparticularly useful type of gate descriptor, at least in the context of abackdoor, is a call gate descriptor.  The purpose of a call gate is to allowlesser privileged code to call more privileged code in a secure fashion[45].  Toabuse this, a backdoor can simply define its own call gate descriptor and thenmake use of it to run code in the context of the kernel.Category: Type IIa; with the exception of the LDT.  The LDT may be betterclassified as Type II considering it exposes an API to user-mode that allowsthe creation of custom LDT entries (NtSetLdtEntries).Origin: It's unclear if there were some situational requirements that would beneeded in order to abuse the issue described by Greg Hoglund.  The flawidentified by Derek Soeder in 2003 was an example of a recurrence of an issuethat was found in older versions of other operating systems, such as Linux.For example, a mailing list post made by Morten Welinder to LKML in 1996describes a fix for what appears to be the same type of issue that wasidentified by Derek[44].  Creating a custom gate descriptor for use in the contextof a backdoor has been used in the past.  Greg Hoglund described the use ofcall gates in the context of a rootkit in 1999[19].Capabilities: In the case of the expand-down segment descriptor, access tokernel-mode data is possible.  This can also indirectly lead to kernel-modecode execution, but it would rely on another backdoor technique.  If a gatedescriptor is abused, direct kernel-mode code execution is possible.Covertness: It is entirely possible to write have code that will detect theaddition or alteration of entries in the GDT or each individual process LDT.For example, PatchGuard will currently detect alterations to the GDT.2.2.3) SSDTThe System Service Descriptor Table (SSDT) is used by the Windows kernel whendispatching system calls.  The SSDT itself is exported in kernel-mode throughthe nt!KeServiceDescriptorTable global variable.  This variable containsinformation relating to system call tables that have been registered with theoperating.  In contrast to other operating systems, the Windows kernelsupports the dynamic registration (nt!KeAddSystemServiceTable) of new systemcall tables at runtime.  The two most common system call tables are those usedfor native and GDI system calls.In the context of a local kernel-mode backdoor, system calls represent anobvious target due to the fact that they are implicitly tied to the privilegeboundary that exists between user-mode and kernel-mode.  The act of hooking asystem call handler in kernel-mode makes it possible to expose a privilegedbackdoor into the kernel using the operating system's well-defined system callinterface.  Furthermore, hooking system calls makes it possible for thebackdoor to alter data that is seen by user-mode and thus potentially hide itspresence to some degree.In practice, system calls can be hooked on Windows using two distinctstrategies.  The first strategy involves using generic function hookingtechniques which are described in .  The second strategy involves using thefunction pointer hooking technique which is described in .  Using the functionpointer hooking involves simply altering the function pointer associated witha specific system call index by accessed the system call table which containsthe system call that is to be hooked.The following code shows a very simple illustration of how one might go abouthooking a system call in the native system call table on 32-bit versions ofWindows System call hooking on 64-bit versions of Windows would requirePatchGuard to be disabled:PVOID HookSystemCall(  PVOID SystemCallFunction,  PVOID HookFunction){  ULONG SystemCallIndex =    *(ULONG *)((PCHAR)SystemCallFunction+1);  PVOID *NativeSystemCallTable =    KeServiceDescriptorTable[0];  PVOID OriginalSystemCall =    NativeSystemCallTable[SystemCallIndex];  NativeSystemCallTable[SystemCallIndex] = HookFunction;  return OriginalSystemCall;}Category: Type I if prologue hook is used.  Type IIa if the function pointerhook is used.  The SSDT (both native and GDI) should effectively be consideredwrite-once.Origin: System call hooking has been used extensively for quite some time.Since this technique has become so well-known, its actual origins are unclear.The earliest description the authors could find was from M. B. Jones in apaper from 1993 entitled Interposition agents: Transparently interposing usercode at the system interface[27].  Jones explains in his section on related workthat he was unable to find any explicit research on the subject prior ofagent-based interposition prior to his writing.  However, it seems clear thatsystem calls were being hooked in an ad-hoc fashion far in advance of thispoint.  The authors were unable to find many of the papers cited by Jones.Plaguez appears to be one of the first (Jan, 1998) to publicly illustrate theusefulness of system call hooking in Linux with a specific eye toward securityin Phrack 52[30].Capabilities: Kernel-mode code execution.Considerations: On certain versions of Windows XP, the SSDT is marked asread-only.  This must be taken into account when attempting to write to theSSDT across multiple versions of Windows.Covertness: System call hooks on Windows are very easy to detect.  Comparingthe in-memory SSDTs with the on-disk versions is one of the most commonstrategies employed.2.3) Model-specific RegistersIntel processors support a special category of processor-specific registersknown as Model-specific Registers (MSRs).  MSRs provide software with theability to control various hardware and software features.  Unlike otherregisters, MSRs are tied to a specific processor model and are not guaranteedto be supported in future versions of a processor line.  Some of the featuresthat MSRs offer include enhanced performance monitoring and debugging, amongother things.  Software can read MSRs using the rdmsr instruction and writeMSRs using the wrmsr[23].This subsection will describe some of the MSRs that may be useful in thecontext of a local kernel-mode backdoor.2.3.1) IA32_SYSENTER_EIPThe Pentium II introduced enhanced support for transitioning between user-modeand kernel-mode.  This support was provided through the introduction of twonew instructions: sysenter and sysexit.  AMD processors also introduced enhancednew instructions to provide this feature.  When a user-mode application wishesto transition to kernel-mode, it issues the sysenter instruction.  When thekernel is ready to return to user-mode, it issues the sysexit instruction.Unlike the the call instruction, the sysenter instruction takes no operands.Instead, this instruction uses three specific MSRs that are initialized by theoperating system as the target for control transfers[23].The IA32_SYSENTER_CS (0x174) MSR is used by the processor to set the kernel-modeCS.  The IA32_SYSENTER_EIP (0x176) MSR contains the virtual address of thekernel-mode entry point that code should begin executing at once thetransition has completed.  The third MSR, IA32_SYSENTER_ESP (0x175), containsthe virtual address that the stack pointer should be set to.  Of these threeMSRs, IA32_SYSENTER_EIP is the most interesting in terms of its potential foruse in the context of a backdoor.  Setting this MSR to the address of afunction controlled by the backdoor makes it possible for the backdoor tointercept all system calls after they have trapped into kernel-mode.  Thisprovides a very powerful vantage point.For more information on the behavior of the sysenter and sysexit instructions,the reader should consult both the Intel manuals and John Gulbrandsen'sarticle[23, 15].Category: Type IOrigin: This feature is provided for the explicit purpose of allowing anoperating system to control the behavior of the sysenter instruction.  Assuch, it is only logical that it can also be applied in the context of abackdoor.  Kimmo Kasslin mentions a virus from December, 2005 that made use ofMSR hooks[25].  Earlier that year in February, fuzenop from rootkit.com released aproof of concept[12].Capabilities: Kernel-mode code executionConsiderations: This technique is restricted by the fact that not allprocessors support this MSR.  Furthermore, user-mode processes are notnecessarily required to use it in order to transition into kernel-mode whenperforming a system call.  These facts limit the effectiveness of thistechnique as it is not guaranteed to work on all machines.Covertness: Changing the value of the IA32_SYSENTER_EIP MSR can be detected.For example, PatchGuard currently checks to see if the equivalent AMD64 MSRhas been modified as a part of its polling checks[36].  It is more difficult forthird party vendors to perform this check due to the simple fact that thedefault value for this MSR is an unexported symbol named nt!KiFastCallEntry:kd> rdmsr 176msr[176] = 00000000`804de6f0kd> u 00000000`804de6f0nt!KiFastCallEntry:804de6f0 b923000000      mov     ecx,23hWithout having symbols, third parties have a more difficult time ofdistinguishing between a value that is sane and one that is not.2.4) Page Table EntriesWhen operating in protected mode, x86 processors support virtualizing theaddress space through the use of a feature known as paging.  The pagingfeature makes it possible to virtualize the address space by adding atranslation layer between linear addresses and physical addresses.  When pagingis not enabled, linear addresses are equivalent to physical addresses.    Totranslate addresses, the processor uses portions of the address beingreferenced to index directories and tables that convey flags and physicaladdress information that describe how the translation should be performed.The majority of the details on how this translation is performed are outsideof the scope of this document.  If necessary, the reader should consultsection 3.7 of the Intel System Programming Manual[23].  Many other papers in thereferences also discuss this topic[41].The paging system is particularly interesting due to its potential for abusein the context of a backdoor.  When the processor attempts to translate alinear address, it walks a number of page tables to determine the associatedphysical address.  When this occurs, the processor makes a check to ensurethat the task referencing the address has sufficient rights to do so.  Thisaccess check is enforced by checking the User/Supervisor bit of thePage-Directory Entry (PDE) and Page-Table Entry (PTE) associated with thepage.  If this bit is clear, only the supervisor (privilege level 0) isallowed to access the page.  If the bit is set, both supervisor and user areallowed to access the page This isn't always the case depending on whether ornot the WP bit is set in CR0.The implications surrounding this flag should be obvious.  By toggling theflag in the PDE and PTE associated with an address, a backdoor can gain accessto read or write kernel-mode memory.  This would indirectly make it possibleto gain code execution by making use of one of the other techniques listed inthis document.Category: Type IIOrigin: The modification of PDE and PTE entries has been supported since thehardware paging's inception.  The authors were not able to find an exactsource of the first use of this technique in a backdoor.  There have been anumber of examples in recent years of tools that abuse the supervisor bit inone way or another[29, 41].  PaX team provided the first documentation of theirPAGEEXEC code in March, 2003.  In January, 1998, Mythrandir mentions thesupervisor bit in phrack 52 but doesn't explicitly call out how it could beabused[28].Capabilities: Access to kernel-mode data.Considerations: Code that attempts to implement this approach would need toproperly support PAE and non-PAE processors on x86 in order to work reliably.This approach is also extremely dangerous and potentially unreliable dependingon how it interacts with the memory manager.  For example, if pages are notproperly locked into physical memory, they may be pruned and thus any PDE orPTE modifications would be lost.  This would result in the user-mode processlosing access to a specific page.Covertness: This approach could be considered fairly covert without thepresence of some tool capable of intercepting PDE or PTE modifications.Locking pages into physical memory may make it easier to detect in a pollingfashion by walking the set of locked pages and checking to see if theirassociated PDE or PTE has been made accessible to user-mode.2.5) Function PointersThe use of function pointers to indirectly transfer control of execution fromone location to another is used extensively by the Windows kernel[18].  Like thefunction prologue overwrite described in , the act of hooking a function byaltering a function pointer is an easy way to intercept future calls to agiven function.  The difference, however, is that hooking a function byaltering a function pointer will only intercept indirect calls made to thehooked function through the function pointer.  Though this may seem like afairly significant limitation, even these restrictions do not drasticallylimit the set of function pointers that can be abused to provide a kernel-modebackdoor.The concept itself should be simple enough.  All that's necessary is to modifythe contents of a given function pointer to point at untrusted code.  When thefunction is invoked through the function pointer, the untrusted code isexecuted instead.  If the untrusted code wishes to be able to call thefunction that is being hooked, it can save the address that is stored in thefunction pointer prior to overwriting it.  When possible, hooking a functionthrough a function pointer is a simple and elegant solution that should havevery little impact on the stability of the system (with obvious exception tothe quality of the replacement function).Regardless of what approach is taken to hook a function, an obvious questionis where the backdoor code associated with a given hook function should beplaced.  There are really only two general memory locations that the code canbe stored.  It can either stored in user-mode, which would generally make itspecific to a given process, or kernel-mode, which would make it visiblesystem wide.  Deciding which of the two locations to use is a matter ofdetermining the contextual restrictions of the function pointer beingleveraged.  For example, if the function pointer is called through at a raisedIRQL, such as DISPATCH, then it is not possible to store the hook function'scode in pageable memory.  Another example of a restriction is the processcontext in which the function pointer is used.  If a function pointer may becalled through in any process context, then there are only a finite number oflocations that the code could be placed in user-mode.  It's important tounderstand some of the specific locations that code may be stored inPerhaps the most obvious location that can be used to store code that is toexecute in kernel-mode is the kernel pools, such as the PagedPool andNonPagedPool, which are used to store dynamically allocated memory.  In somecircumstances, it may also be possible to store code in regions of memory thatcontain code or data associated with device drivers.  While these few examplesillustrate that there is certainly no shortage of locations in which to storecode, there are a few locations in particular that are worth calling out.One such location is composed of a single physical page that is shared betweenuser-mode and kernel-mode.  This physical page is known as SharedUserData andit is mapped into user-mode as read-only and kernel-mode as read-write.  Thevirtual address that this physical page is mapped at is static in bothuser-mode (0x7ffe0000) and kernel-mode (0xffdf0000) on all versions of WindowsNT+ The virtual mappings are no longer executable as of Windows XP SP2.However, it is entirely possible for a backdoor to alter these pagepermissions..  There is also plenty of unused memory within the page that isallocated for SharedUserData.  The fact that the mapping address is staticmakes it a useful location to store small amounts of code without needing toallocate additional storage from the paged or non-paged pool[24].Though the SharedUserData mapping is quite useful, there is actually analternative location that can be used to store code that is arguably morecovert.  This approach involves overwriting a function pointer with theaddress of some code from the virtual mapping of the native DLL, ntdll.dll.The native DLL is special in that it is the only DLL that is guaranteed to bemapped into the context of every process, including the System process.  It isalso mapped at the same base address in every process due to assumptions madeby the Windows kernel.  While these are useful qualities, the best reason forusing the ntdll.dll mapping to store code is that doing so makes it possibleto store code in a process-relative fashion.  Understanding how this works inpractice requires some additional explanation.The native DLL, ntdll.dll, is mapped into the address space of the Systemprocess and subsequent processes during kernel and process initialization,respectively.  This mapping is performed in kernel-mode by nt!PspMapSystemDll.One can observe the presence of this mapping in the context of the Systemprocess through a debugger as shown below.  These same basic steps can betaken to confirm that ntdll.dll is mapped into other processes as well (Thecommand !vad is used to dump the virtual address directory for a givenprocess.  This directory contains descriptions of memory regions within agiven process):kd> !process 0 0 SystemPROCESS 81291660  SessionId: none  Cid: 0004    Peb: 00000000  ParentCid: 0000    DirBase: 00039000  ObjectTable: e1000a68    HandleCount: 256.    Image: Systemkd> !process 81291660PROCESS 81291660  SessionId: none  Cid: 0004    Peb: 00000000  ParentCid: 0000    DirBase: 00039000  ObjectTable: e1000a68    HandleCount: 256.    Image: System    VadRoot 8128f288 Vads 4...kd> !vad 8128f288VAD     level start end   commit...81207d98 ( 1) 7c900 7c9af 5 Mapped  Exekd> dS poi(poi(81207d98+0x18)+0x24)+0x30e13591a8  "/WINDOWS/system32/ntdll.dll"To make use of the ntdll.dll mapping as a location in which to store code, onemust understand the implications of altering the contents of the mappingitself.  Like all other image mappings, the code pages associated withntdll.dll are marked as Copy-on-Write (COW) and are initially shared betweenall processes.  When data is written to a page that has been marked with COW,the kernel allocates a new physical page and copies the contents of the sharedpage into the newly allocated page.  This new physical page is then associatedwith the virtual page that is being written to.  Any changes made to the newpage are observed only within the context of the process that is making them.This behavior is why altering the contents of a mapping associated with animage file do not lead to changes appearing in all process contexts.Based on the ability to make process-relative changes to the ntdll.dllmapping, one is able to store code that will only be used when a functionpointer is called through in the context of a specific process.  When notcalled in a specific process context, whatever code exists in the defaultmapping of ntdll.dll will be executed.  In order to better understand how thismay work, it makes sense to walk through a concrete example.In this example, a rootkit has opted to create a backdoor by overwriting thefunction pointer that is used when dispatching IRPs using theIRP_MJ_FLUSH_BUFFERS major function for a specific device object.  Theprototype for the function that handles IRP_MJ_FLUSH_BUFFERS IRPs is shownbelow:NTSTATUS DispatchFlushBuffers(    IN PDEVICE_OBJECT DeviceObject,    IN PIRP Irp);In order to create a context-specific backdoor, the rootkit has chosen tooverwrite the function pointer described above with an address that resideswithin ntdll.dll.  By default, the rootkit wants all processes except thosethat are aware of the backdoor to simply have a no-operation occur whenIRP_MJ_FLUSH_BUFFERS is sent to the device object.  For processes that are awareof the backdoor, the rootkit wants arbitrary code execution to occur inkernel-mode.  To accomplish this, the function pointer should be overwrittenwith an address that resides in ntdll.dll that contains a ret 0x8 instruction.This will simply cause invocations of IRP_MJ_FLUSH_BUFFERS to return (withoutcompleting the IRP).  The location of this ret 0x8 should be in a portion ofcode that is rarely executed in user-mode.  For processes that wish to executearbitrary code in kernel-mode, it's as simple as altering the code that existsat the address of the ret 0x8 instruction.  After altering the code, theprocess only needs to issue an IRP_MJ_FLUSH_BUFFERS through the FlushFileBuffersfunction on the affected device object.  The context-dependent execution ofcode is made possible by the fact that, in most cases, IRPs are processed inthe context of the requesting process.The remainder of this subsection will describe specific function pointers thatmay be useful targets for use as backdoors.  The authors have tried to coversome of the more intriguing examples of function pointers that may be hooked.Still, it goes without saying that there are many more that have not beenexplicitly described.  The authors would be interested to hear aboutadditional function pointers that have unique and useful properties in thecontext of a local kernel-mode backdoor.2.5.1) Import Address TableThe Import Address Table (IAT) of a PE image is used to store the absolutevirtual addresses of functions that are imported from external PEimages[35].  When a PE image is mapped into virtual memory, the dynamic loader (inkernel-mode, this is ntoskrnl) takes care of populating the contents of the PEimage's IAT based on the actual virtual address locations of dependentfunctions For the sake of simplicity, bound imports are excluded from thisexplanation.  The compiler, in turn, generates code that uses an indirect callinstruction to invoke imported functions.  Each imported function has afunction pointer slot in the IAT.  In this fashion, PE images do not need tohave any preconceived knowledge of where dependent PE images are going to bemapped in virtual memory.  Instead, this knowledge can be postponed until aruntime determination is made.The fundamental step involved in hooking an IAT entry really just boils downto changing a function pointer.  What distinguishes an IAT hook from othertypes of function pointer hooks is the context in which the overwrittenfunction pointer is called through.  Since each PE image has their own IAT,any hook that is made to a given IAT will implicitly only affect theassociated PE image.  For example, consider a situation where both foo.sys andbar.sys import ExAllocatePoolWithTag.  If the IAT entry forExAllocatePoolWithTag is hooked in foo.sys, only those calls made from withinfoo.sys to ExAllocatePoolWithTag will be affected.  Calls made to the samefunction from within bar.sys will be unaffected.  This type of limitation canactually be a good thing, depending on the underlying motivations for a givenbackdoor.Category: Type I; may legitimately be modified, but should point to expectedvalues.Origin: The origin of the first IAT hook is unclear.  In January, 2000, Silviodescribed hooking via the ELF PLT which is, in some aspects, functionallyequivalent to the IAT in PE images.Capabilities: Kernel-mode code executionConsiderations: Assuming the calling restrictions of an IAT hook areacceptable for a given backdoor, there are no additional considerations thatneed to be made.Covertness: It is possible for modern tools to detect IAT hooks by analyzingthe contents of the IAT of each PE image loaded in kernel-mode.  To detectdiscrepancies, a tool need only check to see if the virtual address associatedwith each function in the IAT is indeed the same virtual address as exportedby the PE image that contains a dependent function.2.5.2) KiDebugRoutineThe Windows kernel provides an extensive debugging interface to allow thekernel itself (and third party drivers) to be debugged in a live, interactiveenvironment (as opposed to after-the-fact, post-mortem crash dump debugging).This debugging interface is used by a kernel debugger program (kd.exe, orWinDbg.exe) in order to perform tasks such as the inspecting the running state(including memory, registers, kernel state such as processes and threads, andthe like) of the kernel on-demand.  The debugging interface also providesfacilities for the kernel to report various events of interest to a kerneldebugger, such as exceptions, module load events, debug print output, and ahandful of other state transitions.  As a result, the kernel debuggerinterface has ``hooks'' built-in to various parts of the kernel for thepurpose of notifying the kernel debugger of these events.The far-reaching capabilities of the kernel debugger in combination with thefact that the kernel debugger interface is (in general) present in acompatible fashion across all OS builds provides an attractive mechanism thatcan be used to gain control of a system.  By subverting KiDebugRoutine toinstead point to a custom callback function, it becomes possible tosurepticiously gain control at key moments (debug prints, exceptiondispatching, kernel module loading are the primary candidates).The architecture of the kernel debugger event notification interface can besummed up in terms of a global function pointer (KiDebugRoutine) in thekernel.  A number distinct pieces of code, such as the exception dispatcher,module loader, and so on are designed to call through KiDebugRoutine in orderto notify the kernel debugger of events.  In order to minimize overhead inscenarios where the kernel debugger is inactive, KiDebugRoutine is typicallyset to point to a dummy function,  KdpStub, which performs almost no actionsand, for the most part, simply returns immediately to the caller.  However,when the system is booted with the kernel debugger enabled, KiDebugRoutine maybe set to an alternate function, KdpTrap, which passes the informationsupplied by the caller to the remote debugger.Although enabling or disabling the kernel debugger has traditionally been aboot-time-only decision, newer OS builds such as Windows Server 2003 andbeyond have some support for transitioning a system from a ``kernel debuggerinactive'' state to a ``kernel debugger active'' state.  As a result, there issome additional logic now baked into the dummy routine (KdpStub) which canunder some circumstances result in the debugger being activated on-demand.This results in control being passed to the actual debugger communicationroutine (KdpTrap) after an on-demand kernel debugger initialization.  Thus, insome circumstances, KdpStub will pass control through to KdpTrap.Additionally, in Windows Server 2003 and later, it is possible to disable thekernel debugger on the fly.  This may result in KiDebugRoutine being changedto refer to KdpStub instead of the boot-time-assigned KdpTrap.  This behavior,combined with the previous points, is meant to show that provided a system isbooted with the kernel debugger enabled it may not be enough to just enforce apolicy that KiDebugRoutine must not change throughout the lifetime of thesystem.Aside from exception dispatching notifiations, most debug events find theirway to KiDebugRoutine via interrupt 0x2d, otherwise known as ``DebugService''.This includes user-mode debug print events as well as kernel mode originatedevents (such as kernel module load events).  The trap handler for interrupt0x2d packages the information supplied to the debug service interrupt into theformat of a special exception that is then dispatched via KiExceptionDispatch(the normal exception dispatcher path for interrupt-generated exceptions).This in turn leads to KiDebugRoutine being called as a normal part of theexception dispatcher's operation.Category: Type IIa, varies.  Although on previous OS versions KiDebugRoutinewas essentially write-once, recent versions allow limited changes of thisvalue on the fly while the system is booted.Origin: At the time of this writing, the authors are not aware of existingmalware using KiDebugRoutine.Capabilities: Redirecting KiDebugRoutine to point to a caller-controlledlocation allows control to be gained during exception dispatching (a verycommon occurrence), as well as certain other circumstances (such as moduleloading and debug print output).  As an added bonus, because KiDebugRoutine isintegral to the operation of the kernel debugger facility as a whole, itshould be possible to ``filter'' the events received by the kernel debugger bymanipulation of which events are actually passed on to KdpTrap, if a kerneldebugger is enabled.  However, it should be noted that other steps would needto be taken to prevent a kernel debugger from detecting the presence of code,such as the interception of the kernel debugger read-memory facilities.Considerations: Depending on how the system global flags (NtGlobalFlag) areconfigured, and whether the system was booted in such a way as to suppressnotification of user mode exceptions to the kernel debugger, exception eventsmay not always be delivered to KiDebugRoutine.  Also, as KiDebugRoutine is notexported, it would be necessary to locate it in order to intercept it.Furthermore, many of the debugger events occur in an arbitrary context, suchthat pointing KiDebugRoutine to user mode (except within ntdll space) may beconsidered dangerous.  Even while pointing KiDebugRoutine to ntdll, there isthe risk that the system may be brought down as some debugger events may bereported while the system cannot tolerate paging (e.g. debug prints).  From athread-safety perspective, an interlocked exchange on KiDebugRoutine should bea relatively synchronization-safe operation (however the new callback routinemay never be unmapped from the address space without some means of ensuringthat no callbacks are active).Covertness: As KiDebugRoutine is a non-exported, writable kernel global, ithas some inherent defenses against simple detection techniques.  However, inlegitimate system operation, there are only two legal values forKiDebugRoutine: KdpStub, and KdpTrap.  Though both of these routines are notexported, a combination of detection techniques (such as verifying theintegrity of read only kernel code, and a verification that KiDebugRoutinerefers to a location within an expected code region of the kernel memoryimage) may make it easier to locate blatant attacks on KiDebugRoutine.  Forexample, simply setting KiDebugRoutine to point to an out-of-kernel locationcould be detected with such an approach, as could pointing it elsewhere in thekernel and then writing to it (either the target location would need to beoutside the normal code region, easily detectable, or normally read-only codewould have to be overwritten, also relatively easily detectable).  Also, allversions of PatchGuard protect KiDebugRoutine in x64 versions of Windows.This means that effective exploitation of KiDebugRoutine in the long term onsuch systems would require an attacker to deal with PatchGuard.  This isconsidered a relatively minor difficulty by the authors.2.5.3) KTHREAD's SuspendApcIn order to support thread suspension, the Windows kernel includes a KAPCfield named SuspendApc in the KTHREAD structure that is associated with eachthread running on a system.  When thread suspension is requested, the kerneltakes steps to queue the SuspendApc structure to the thread's APC queue.  Whenthe APC queue is processed, the kernel invokes the APC's NormalRoutine, whichis typically initialized to nt!KiSuspendThread, from the SuspendApc structurein the context of the thread that is being suspended.  Once nt!KiSuspendThreadcompletes, the thread is suspended.  The following shows what values theSuspendApc is typically initialized to:kd> dt -r1 _KTHREAD 80558c20... +0x16c SuspendApc      : _KAPC  +0x000 Type           : 18  +0x002 Size           : 48  +0x004 Spare0         : 0  +0x008 Thread         : 0x80558c20 _KTHREAD  +0x00c ApcListEntry   : _LIST_ENTRY [ 0x0 - 0x0 ]  +0x014 KernelRoutine  : 0x804fa8a1 nt!KiSuspendNop  +0x018 RundownRoutine : 0x805139ed nt!PopAttribNop  +0x01c NormalRoutine  : 0x804fa881 nt!KiSuspendThread  +0x020 NormalContext  : (null)  +0x024 SystemArgument1: (null)  +0x028 SystemArgument2: (null)  +0x02c ApcStateIndex  : 0 ''  +0x02d ApcMode        : 0 ''  +0x02e Inserted       : 0 ''Since the SuspendApc structure is specific to a given KTHREAD, anymodification made to a thread's SuspendApc.NormalRoutine will affect only thatspecific thread.  By modifying the NormalRoutine of the SuspendApc associatedwith a given thread, a backdoor can gain arbitrary code execution inkernel-mode by simply attempting to suspend the thread.  It is trivial for auser-mode application to trigger the backdoor.  The following sample codeillustrates how a thread might execute arbitrary code in kernel-mode if itsSuspendApc has been modified:SuspendThread(GetCurrentThread());The following code gives an example of assembly that implements the techniquedescribed above taking into account the InitialStack insight described in theconsiderations below:public _RkSetSuspendApcNormalRoutine@4_RkSetSuspendApcNormalRoutine@4 proc  assume fs:nothing  push  edi  push  esi  ; Grab the current thread pointer  xor   ecx, ecx  inc   ch  mov   esi, fs:[ecx+24h]  ; Grab KTHREAD.InitialStack  lea   esi, [esi+18h]  lodsd  xchg  esi, edi  ; Find StackBase  repne scasd  ; Set KTHREAD->SuspendApc.NormalRoutine  mov   eax, [esp+0ch]  xchg  eax, [edi+1ch]  pop   esi  pop   edi  ret_RkSetSuspendApcNormalRoutine@4 endpCategory: Type IIaOrigin: The authors believe this to be the first public description of thistechnique.  Skywing is credited with the idea.  Greg Hoglund mentions abusingAPC queues to execute code, but he does not explicitly call outSuspendApc[18].Capabilities: Kernel-mode code execution.Considerations: This technique is extremely effective.  It provides a simpleway of executing arbitrary code in kernel-mode by simply hijacking themechanism used to suspend a specific thread.  There are also some interestingside effects that are worth mentioning.  Overwriting the SuspendApc'sNormalRoutine makes it so that the thread can no longer be suspended.  Evenbetter, if the hook function that replaces the NormalRoutine never returns, itbecomes impossible for the thread, and thus the owning process, to be killedbecause of the fact that the NormalRoutine is invoked at APC level.  Both ofthese side effects are valuable in the context of a rootkit.One consideration that must be made from the perspective of a backdoor is thatit will be necessary to devise a technique that can be used to locate theSuspendApc field in the KTHREAD structure across multiple versions of Windows.Fortunately, there are heuristics that can be used to accomplish this.  In allversions of Windows analyzed thus far, the SuspendApc field is preceded by theStackBase field.  It has been confirmed on multiple operating systems that theStackBase field is equal to the InitialStack field.  The InitialStack field islocated at a reliable offset (0x18) on all versions of Windows checked by theauthors.  Using this knowledge, it is trivial to write some code that scansthe KTHREAD structure on pointer aligned offsets until it encounters a valuethat is equal to the InitialStack.  Once a match is found, it is possible toassume that the SuspendApc immediately follows it.Covertness: This technique involves overwriting a function pointer in adynamically allocated region of memory that is associated with a specificthread.  This makes the technique fairly covert, but not impossible to detect.One method of detecting this technique would be to enumerate the threads ineach process to see if the NormalRoutine of the SuspendApc is set to theexpected value of nt!KiSuspendThread.  It would be challenging for someoneother than Microsoft to implement this safely.  The authors are not aware ofany tool that currently does this.2.5.4) Create Thread Notify RoutineThe Windows kernel provides drivers with the ability to register a callbackthat will be notified when threads are created and terminated.  This abilityis provided through the Windows Driver Model (WDM) exportnt!PsSetCreateThreadNotifyRoutine.  When a thread is created or terminated,the kernel enumerates the list of registered callbacks and notifies them ofthe event.Category: Type IIOrigin: The ability to register a callback that is notified when threads arecreated and terminated has been included since the first release of the WDM.Capabilities: Kernel-mode code execution.Considerations: This technique is useful because a user-mode process cancontrol the invocation of the callback by simply creating or terminating athread.  Additionally, the callback will be notified in the context of theprocess that is creating or terminating the thread.  This makes it possible toset the callback routine to an address that resides within ntdll.dll.Covertness: This technique is covert in that it is possible for a backdoor toblend in with any other registered callbacks.  Without having a known-goodstate to compare against, it would be challenging to conclusively state that aregistered callback is associated with a backdoor.  There are some indicatorsthat could be used that something is odd, such as if the callback routineresides in ntdll.dll or if it resides in either the paged or non-paged pool.2.5.5) Object Type InitializersThe Windows NT kernel uses an object-oriented approach to representingresources such as files, drivers, devices, processes, threads, and so on.Each object is categorized by an object type.  This object type categorizationprovides a way for the kernel to support common actions that should be appliedto objects of the same type, among other things.  Under this design, eachobject is associated with only one object type.  For example, process objectsare associated with the nt!PsProcessType object type.  The structure used torepresent an object type is the OBJECT_TYPE structure which contains a nestedstructure named OBJECT_TYPEIN_ITIALIZER.  It's this second structure thatprovides some particularly interesting fields that can be used in a backdoor.As one might expect, the fields of most interest are function pointers.  Thesefunction pointers, if non-null, are called by the kernel at certain pointsduring the lifetime of an object that is associated with a particular objecttype.  The following debugger output shows the function pointer fields:kd> dt nt!_OBJECT_TYPE_INITIALIZER...   +0x02c DumpProcedure    : Ptr32   +0x030 OpenProcedure    : Ptr32   +0x034 CloseProcedure   : Ptr32   +0x038 DeleteProcedure  : Ptr32   +0x03c ParseProcedure   : Ptr32   +0x040 SecurityProcedure : Ptr32   +0x044 QueryNameProcedure : Ptr32   +0x048 OkayToCloseProcedure : Ptr32Two fairly easy to understand procedures are OpenProcedure and CloseProcedure.These function pointers are called when an object of a given type is openedand closed, respectively.  This gives the object type initializer a chance toperform some common operation on an instance of an object type.  In the caseof a backdoor, this exposes a mechanism through which arbitrary code could beexecuted in kernel-mode whenever an object of a given type is opened orclosed.Category: Type IIaOrigin: Matt Conover gave an excellent presentation on how object typeinitializers can be used to detect rootkits at XCon 2005[8].  Conversely, theycan also be used to backdoor the system.  The authors are not aware of publicexamples prior to Conover's presentation.  Greg Hoglund also mentions thistype of approach[18] in June, 2006.Capabilities: Kernel-mode code execution.Considerations: There are no unique considerations involved in the use of thistechnique.Covertness: This technique can be detected by tools designed to validate thestate of object type initializers against a known-good state.  Currently, theauthors are not aware of any tools that perform this type of check.2.5.6) PsInvertedFunctionTableWith the introduction of Windows for x64, significant changes were made to howexceptions are processed with respect to how exceptions operate in x86versions of Windows.  On x86 versions of Windows, exception handlers wereessentially demand-registered at runtime by routines with exception handlers(more of a code-based exception registration mechanism).  On x64 versions ofWindows, the exception registration path is accomplished using a moredata-driven model.  Specifically, exception handling (and especially unwindhandling) is now driven by metadata attached to each PE image (known as the``exception directory''), which describes the relationship between routinesand their exception handlers, what the exception handler function pointer(s)for each region of a routine are, and how to unwind each routine's machinestate in a completely data-driven fashion.While there are significant advantages to having exception and unwinddispatching accomplished using a data-driven model, there is a potentialperformance penalty over the x86 method (which consisted of a linked list ofexception and unwind handlers registered at a known location, on a per-threadbasis).  A specific example of this can be seen when noting that all of theinformation needed for the operating system to locate and call the exceptionhandler for purposes of exception or unwind processing was in one location(the linked list in the NTTIB) on Windows for x86 is now scattered across allloaded modules in Windows for x64.  In order to locate an exception handlerfor a particular routine, it is necessary to search the loaded module list forthe module that contains the instruction pointer corresponding to thefunction in question.  After the module is located, it is then necessary toprocess the PE header of the module to locate the module's exceptiondirectory.  Finally, it is then necessary to search the exception directoryof that module for the metadata corresponding to a location encompassingthe requested instruction pointer.  This process must be repeated for everyfunction for which an exception may traverse.In an effort to improve the performance of exception dispatching on Windowsfor x64, Microsoft developed a multi-tier cache system that speeds theresolution of exception dispatching information that is used by the routineresponsible for looking up metadata associated with a function.  Theroutine responsible for this is named RtlLookupFunctionTable.  Whensearching for unwind information (a pointer to a RUNTIME_FUNCTION entrystructure), depending on the reason for the search request, an internalfirst-level cache (RtlpUnwindHistoryTable) of unwind information forcommonly occurring functions may be searched.  At the time of this writing,this table consists of RtlUnwindex, _C_specific_handler,RtlpExecuteHandlerForException, RtlDispatchException, RtlRaiseStatus,KiDispatchException, and  KiExceptionDispatch.  Due to how exceptiondispatching  operates on x64[39], many of these functions will commonly appearin any exception call stack.  Because of this it is beneficial toperformance to have a first-level, quick reference for them.After RtlpUnwindHistoryTable is searched, a second cache, known asPsInvertedFunctionTable (in kernel-mode) or LdrpInvertedFunctionTable (inuser-mode) is scanned.  This second-level cache contains a list of the first0x200 (Windows Server 2008, Windows Vista) or 0xA0 (Windows Server 2003)loaded modules.  The loaded module list contained withinPsInvertedFunctionTable / LdrpInvertedFunctionTable is presented as a quicklysearchable, unsorted linear array that maps the memory occupied by an entireloaded image to a given module's exception directory.  The lookup through theinverted function table thus eliminates the costly linked list (loaded modulelist) and executable header parsing steps necessary to locate the exceptiondirectory for a module.  For modules which are referenced byPsInvertedFunctionTable / LdrpInvertedFunctionTable, the exception directorypointer and size information in the PE header of the module in question areunused after the module is loaded and the inverted function table ispopulated.  Because the inverted function table has a fixed size, if enoughmodules are loaded simultaneously, it is possible that after a point somemodules may need to be scanned via loaded module list lookup if all entries inthe inverted function table are in use when that module is loaded.  However,this is a rare occurrence, and most of the interesting system modules (such asHAL and the kernel memory image itself) are at a fixed-at-boot position withinPsInvertedFunctionTable[37].By redirecting the exception directory pointer in PsInvertedFunctionTable torefer to a ``shadow'' exception directory in caller-supplied memory (outsideof the PE header of the actual module), it is possible to change the exception(or unwind) handling behavior of all code points within a module.  Forinstance, it is possible to create an exception handler spanning every codebyte within a module through manipulation of the exception directoryinformation.  By changing the inverted function table cache for a module,multiple benefits are realized with respect to this goal.  First, anarbitrarily large amount of space may be devoted to unwind metadata, as thepatched unwind metadata need not fit within the confines of a particularimage's exception directory (this is particular important if one wishes to``gift'' all functions within a module with an exception handler).  Second,the memory image of the module in question need not be modified, improving theresiliency of the technique against naive detection systems.Category: Type IIa, varies.  Although the entries for always-loaded modulessuch as the HAL and the kernel in-memory image itself are essentiallyconsidered write-once, the array as a whole may be modified as the system isrunning when kernel modules are either loaded or unloaded.  As a result, whilethe first few entries of PsInvertedFunctionTable are comparatively easy toverify, the ``dynamic'' entries corresponding to demand-loaded (and possiblydemand-unloaded) kernel modules may frequently change during the legitimateoperation of the system, and as such interception of the exception directorypointers of individual drivers may be much less simple to detect than theinterception of the kernel's exception directory.Origin: At the time of this writing, the authors are not aware of existingmalware using PsInvertedFunctionTable.  Hijacking of PsInvertedFunctionTablewas proposed as a possible bypass avenue for PatchGuard version 2 bySkywing[37].  Its applicability as a possible attack vector with respect tohiding kernel mode code was also briefly described in the same article.Capabilities: The principal capability afforded by this technique is toestablish an exception handler at arbitrary locations within a target module(even every code byte within a module if so desired).  By virtue of creatingsuch exception handlers, it is possible to gain control at any location withina module that may be traversed by an exception, even if the exception wouldnormally be handled in a safe fashion by the module or a caller of the module.Considerations: As PsInvertedFunctionTable is not exported, one must firstlocate it in order to patch it (this is considered possible as many exportedroutines reference it in an obvious, patterned way, such asRtlLookupFunctionEntry.  Also, although the structure is guarded by anon-exported synchronization mechanism (PsLoadedModuleSpinLock in WindowsServer 2008), the first few entries corresponding to the HAL and the kernelin-memory image itself should be static and safely accessible withoutsynchronization (after all, neither the HAL nor the kernel in-memory image maybe unloaded after the system has booted).  It should be possible to perform aninterlocked exchange to swap the exception directory pointer, provided thatthe exception directory shall not be modified in a fashion that would requiresynchronization (e.g. only appended to) after the exchange is made.  The sizeof the exception directory is supplied as a separate value in the invertedfunction table entry array and would need to be modified separately, which maypose a synchronization problem if alterations to the exception directory arenot carefully planned to be safe in all possible contingencies with respect toconcurrent access as the alterations are made.  Additionally, due to the32-bit RVA based format of the unwind metadata, all exception handlers for amodule must be within 4GB of that module's loaded base address.  This meansthat custom exception handlers need to be located within a ``window'' ofmemory that is relatively near to a module.  Allocating memory at a specificbase address involves additional work as the memory cannot be in an arbitrarypoint in the address space, but within 4GB of the target.  If a caller canquery the address space and request allocations based at a particular region,however, this is not seen as a particular unsurmountable problem.Covertness: The principal advantage of this approach is that it allows acaller to gain control at any point within a module's execution where anexception is generated without modifying any code or data within the module inquestion (provided the module is cached within PsInvertedFunctionTable).Because the exception directory information for a module is unused after thecache is populated, integrity checks against the PE header are useless fordetecting the alteration of exception handling behavior for a cached module.Additionally, PsInvertedFunctionTable is a non-exported, writable kernel-modeglobal which affords it some intrinsic protection against simple detectiontechniques.  A scan of the loaded module list and comparison of exceptiondirectory pointers to those contained within PsInvertedFunctionTable couldreveal most attacks of this nature, however, provided that the loaded modulelist retains integrity.  Additionally, PatchGuard version 3 appears to guardkey portions of PsInvertedFunctionTable (e.g. to block redirection of thekernel's exception directory), resulting in a need to bypass PatchGuard forlong-term exploitation on Windows x64 based systems.  This is considered arelatively minor difficulty by the authors.2.5.7) Delayed ProceduresThere are a number of features offered by the Windows kernel that allow devicedrivers to asynchronously execute code.  Some examples of these featuresinclude asynchronous procedure calls (APCs), deferred procedure calls (DPCs),work items, threading, and so on.  A backdoor can simply make use of the APIsexposed by the kernel to make use of any number of these to schedule a taskthat will run arbitrary code in kernel-mode.  For example, a backdoor mightqueue a kernel-mode APC using the ntdll.dll trick described at the beginningof this section.  When the APC executes, it runs code that has been altered inntdll.dll in a kernel-mode context.  This same basic concept would work forall other delayed procedures.Category: Type IIOrigin: This technique makes implicit use of operating system exposed featuresand therefore falls into the category of obvious.  Greg Hoglund mentions thesein particular in June, 2006[18].Capabilities: Kernel-mode code execution.Considerations: The important consideration here is that some of the methodsthat support running delayed procedures have restrictions about where the codepages reside.  For example, a DPC is invoked at dispatch level and musttherefore execute code that resides in non-paged memory.Covertness: This technique is covert in the sense that the backdoor is alwaysin a transient state of execution and therefore could be considered largelydormant.  Since the backdoor state is stored alongside other transient statein the operating system, this technique should prove more difficult to detectwhen compared to some of the other approaches described in this paper.2.6) Asynchronous Read LoopIt's not always necessary to hook some portion of the kernel when attemptingto implement a local kernel-mode backdoor.  In some cases, it's easiest tojust make use of features included in the target operating system to blend inwith normal behavior.  One particularly good candidate for this involvesabusing some of the features offered by Window's I/O (input/output) manager.The I/O model used by Windows has many facets to it.  For the purposes of thispaper, it's only necessary to have an understanding of how it operates whenreading data from a file.  To support this, the kernel constructs an I/ORequest Packet (IRP) with its MajorFunction set to IRP_MJ_READ.  The kernel thenpasses the populated IRP down to the device object that is related to the filethat is being read from.  The target device object takes the steps needed toread data from the underlying device and then stores the acquired data in abuffer associated with the IRP.  Once the read operation has completed, thekernel will call the IRP's completion routine if one has been set.  This givesthe original caller an opportunity to make forward progress with the data thathas been read.This very basic behavior can be effectively harnessed in the context of abackdoor in a fairly covert fashion.  One interesting approach involves auser-mode process hosting a named pipe server and a blob of kernel-mode codereading data from the server and then executing it in the kernel-mode context.This general behavior would make it possible to run additional code in thekernel-mode context by simply shuttling it across a named pipe.  The specificsof how this can be made to work are almost as simple as the steps described inthe previous paragraph.The user-mode part is simple; create a named pipe server using CreateNamedPipeand then wait for a connection.  The kernel-mode part is more interesting.One basic idea might involve having a kernel-mode routine that builds anasynchronous read IRP where the IRP's completion routine is defined as thekernel-mode routine itself.  In this way, when data arrives from the user-modeprocess, the routine is notified and given an opportunity to execute the codethat was supplied.  After the code has been executed, it can simply re-use thecode that was needed to pass the IRP to the underlying device associated withthe named pipe that it's interacting with.  The following pseudo-codeillustrates how this could be accomplished:KernelRoutine(DeviceObject, ReadIrp, Context){  // First time called, ReadIrp == NULL  if (ReadIrp == NULL)  {    FileObject = OpenNamedPipe(...)  }  // Otherwise, called during IRP completion  else  {    FileObject = GetFileObjectFromIrp(ReadIrp)    RunCodeFromIrpBuffer(ReadIrp)  }  DeviceObject = IoGetRelatedDeviceObject(FileObject)  ReadIrp = IoBuildAsynchronousFsdRequest(...)  IoSetCompletionRoutine(ReadIrp, KernelRoutine)  IoCallDriver(DeviceObject, ReadIrp)}Category: Type IIOrigin: The authors believe this to be the first public description of thistechnique.Capabilities: Kernel-mode code execution.Covertness: The authors believe this technique to be fairly covert due to thefact that the kernel-mode code profile is extremely minimal.  The only codethat must be present at all times is the code needed to execute the readbuffer and then post the next read IRP to the target device object.  There aretwo main strategies that might be taken to detect this technique.  The firstcould include identifying malicious instances of the target device, such as amalicious named pipe server.  The second might involve attempting to performan in-memory fingerprint of the completion routine code, though this would befar from fool proof, especially if the kernel-mode code is encoded untilinvoked.2.7) Leaking CSWith the introduction of protected mode into the x86 architecture, the conceptof separate privilege levels, or rings, was born.  Lesser privileged rings(such as ring 3) were designed to be restricted from accessing resourcesassociated with more privileged rings (such as ring 0).  To support thisconcept, segment descriptors are able to define access restrictions based onwhich rings should be allowed to access a given region of memory.  Theprocessor derives the Current Privilege Level (CPL) by looking at the loworder two bits of the CS segment selector when it is loaded.  If all bits arecleared, the processor is running at ring 0, the most privileged ring.  If allbits are set, then processor is running at ring 3, the least privileged ring.When certain events occur that require the operating system's kernel to takecontrol, such as an interrupt, the processor automatically transitions fromwhatever ring it is currently executing at to ring 0 so that the request maybe serviced by the kernel.  As part of this transition, the processor savesthe value of the a number of different registers, including the previous valueof CS, to the stack in order to make it possible to pick up execution where itleft off after the request has been serviced.  The following structuredescribes the order in which these registers are saved on the stack:typedef struct _SAVED_STATE{    ULONG_PTR Eip;    ULONG_PTR CodeSelector;    ULONG     Eflags;    ULONG_PTR Esp;    ULONG_PTR StackSelector;} SAVED_STATE, *PSAVED_STATEPotential security implications may arise if there is a condition where somecode can alter the saved execution state in such a way that the saved CS ismodified from a lesser privileged CS to a more privileged CS by clearing thelow order bits.  When the saved execution state is used to restore the activeprocessor state, such as through an iret, the original caller immediatelyobtains ring 0 privileges.Category: Undefined; this approach does not fit into any of the definedcategories as it simply takes advantage of hardware behavior relating aroundhow CS is used to determine the CPL of a processor.  If code patching is usedto be able to modify the saved CS, then the implementation is Type I.Origin: Leaking CS to user-mode has been known to be dangerous since theintroduction of protected mode (and thus rings) into the x86 architecture withthe 80286 in 1982[22].  This approach therefore falls into the category of obviousdue to the documented hardware implications of leaking a kernel-mode CS whentransitioning back to user-mode.Capabilities: Kernel-mode code execution.Considerations: Leaking the kernel-mode CS to user-mode may have undesiredconsequences.  Whatever code is to be called in user-mode must take intoaccount that it will be running in a kernel-mode context. Furthermore, thekernel attempts to be as rigorous as possible about checking to ensure that athread executing in user-mode is not allowed a kernel-mode CS.Covertness: Depending on the method used to intercept and alter the savedexecution state, this method has the potential to be fairly covert.  If themethod involves secondary hooking in order to modify the state, then it may bedetected through some of the same techniques as described in the section onimage patching.3) Prevention & MitigationThe primary purpose of this paper is not to explicitly identify approachesthat could be taken to prevent or mitigate the different types of attacksdescribed herein.  However, it is worth taking some time to describe thevirtues of certain approaches that could be extremely beneficial if one wereto attempt to do so.  The subject of preventing backdoors from being installedand persisted is discussed in more detail in section  and therefore won't beconsidered in this section.One of the more interesting ideas that could be applied to prevent a number ofdifferent types of backdoors would be immutable memory.  Memory is immutablewhen it is not allowed to be modified.  There are a few key regions of memoryused by the Windows kernel that would benefit greatly from immutable memory,such as executable code segments and regions that are effectively write-once,such as the SSDT.  While immutable memory way work in principle, there iscurrently no x86 or x64 hardware (that the authors are aware of) that permitsthis level of control.Even though there appears to be no hardware support for this, it is stillpossible to implement immutable memory in a virtualized environment.  This isespecially true in hardware-assisted virtualization implementations that makeuse of a hypervisor in some form.  In this model, a hypervisor can easilyexpose a hypercall (similar to a system call, but traps into the hypervisor)that would allow an enlightened guest to mark a set of pages as beingimmutable.  From that point forward, the hypervisor would restrict all writesto the pages associated with the immutable region.As mentioned previously, particularly good candidates for immutable memory arethings like the SSDT, Window's ALMOSTRO write-once segment, as well as othersingle-modification data elements that exist within the kernel.  Enforcingimmutable memory on these regions would effectively prevent backdoors frombeing able to establish certain types of hooks.  The downside to it would bethat the kernel would lose the ability to hot-patch itself There are someinstances where kernel-mode hot-patching is currently require, especially onx64.  Still, the security upside would seem to out-weigh the potentialdownside.  On x64, the use of immutable memory would improve the resilience ofPatchGuard by allowing it to actively prevent hot-patching rather than relyingon detecting it with the use of a polling cycle.4) Running Code in Kernel-ModeThere are many who might argue that it's not even necessary to write code thatprevents or detects specific types of kernel-mode backdoors.  This argumentcan be made on the grounds of two very specific points.  The first point isthat in order for one to backdoor the kernel, one must have some way ofexecuting code in kernel-mode.  Based on this line of reasoning, one mightargue that the focus should instead be given to preventing untrusted code fromrunning in kernel-mode.  The second point in this argument is that in orderfor one to truly compromise the host, some form of data must be persisted.  Ifthis is assumed to be the case, then an obvious solution would be toidentify ways of preventing or detecting the persistent data.  While theremay also be additional points, these two represent the common themesobserved by the authors.  Unfortunately, the fact is that both of thesepoints are, at the time of this writing, flawed.It is currently not possible with present day operating systems and x86/x64hardware to guarantee that only specific code will run in the context of anoperating system's kernel.  Though Microsoft wishes it were possible, which isclearly illustrated by their efforts in Code Integrity and Trusted Boot, thereis no real way to guarantee that kernel-mode code cannot be exploited in amanner that might lead to code execution[2].  There have been no shortage ofWindows kernel-mode vulnerabilities to illustrate the feasibility of this typeof vector[6, 10].  This matter is also not helped by the fact that the Windows kernelcurrently has very few exploit mitigations.  This makes the exploitation ofkernel vulnerabilities trivial in comparison to some of the mitigations foundin user-mode on Windows XP SP2 and, more recently, Windows Vista.In addition to the exploitation vector, it is also important to consideralternative ways of executing code in kernel-mode that would be largelyinvisible to the kernel itself.  John Heasman has provided some excellentresearch into the subject of using the BIOS, expansion ROMs, and theExtensible Firmware Interface (EFI) as a means of running arbitrary code inthe context of the kernel without necessarily relying on any hooks directlyvisible to the kernel itself[16, 17].  LoDuflot described how to use the  SystemManagement Mode (SMM) of Intel processors as a method of subverting theoperating system to bypass BSD's securelevel restrictions[9].  There has alsobeen a lot discussion around using DMA to directly interact with and modifyphysical memory without involving the operating system.  However, this form ofattack is of less concern due to the fact that physical access is required.The idea of detecting or preventing a rootkit from persisting data issomething that is worthy of thoughtful consideration.  Indeed, it's true thatin order for malware to survive across reboots, it must persist itself in someform or another.  By preventing or detecting this persisted data, it would bepossible to effectively prevent any form of sustained infection.  On thesurface, this idea is seemingly both simple and elegant, but the devil is inthe details.  The fact that this idea is fundamentally flawed can be plainlyillustrated using the current state of Anti-Virus technology.For the sake of argument, assume for the moment that there really is a way todeterministically prevent malware from persisting itself in any form.  Now,consider a scenario where a web server at financial institution is compromisedand a memory resident rootkit is used.  The point here should be obvious: nodata associated with the rootkit touches the physical hardware.  In thisexample, one might rightly think that the web server will not be rebooted foran extended period of time.  In these circumstances, there is really nodifference between a persistent and non-persistent rootkit.  Indeed, a memoryresident rootkit may not be ideal in certain situations, but it's important tounderstand the implications.Based on the current state-of-the-art, it is not possible to deterministicallyprevent malware from persisting itself.  There are far too many methods ofpersisting data.  This is further illustrated by John Heasman in his ACPI andexpansion ROM work.  To the authors' knowledge, modern tools focus theirforensic analysis on the operating system and on file systems.  This isn'tsufficient, however, as rootkit data can be stored in locations that arelargely invisible to the operating system.  While this may be true, there hasbeen a significant push in recent years to provide the hardware necessary toimplement a trusted system boot.  This initiative is being driven by theTrusted Computing Group with involvement from companies such as Microsoft andIntel[42].  One of the major outcomes of this group has been the Trusted PlatformModule (TPM) which strives to facilitate a trusted system boot, among otherthings[43].  At the time of this writing, the effectiveness of TPM is largelyunknown, but it is expected that it will be a powerful and useful securityfeature as it matures.The fact that there is really no way of preventing untrusted code from runningin kernel-mode in combination with the fact that there is really no way touniversally prevent untrusted code from persisting itself helps to illustratethe need for thoughtful consideration of ways to both prevent and detectkernel-mode backdoors.5) PatchGuard versus RootkitsThere has been some confusion centering around whether or not PatchGuard canbe viewed as a deterrent to rootkits.  On the surface, it would appear thatPatchGuard does indeed represent a formidable opponent to rootkit developersgiven the fact that it checks for many different types of hooks.  Beneath thesurface, it's clear that PatchGuard is fundamentally flawed with respect toits use as a rootkit deterrent.  This flaw centers around the fact thatPatchGuard, in its current implementation, runs at the same privilege level asother driver code.  This opens PatchGuard up to attacks that are designed toprevent it from completing its checks.  The authors have previously outlinedmany different approaches that can be used to disable PatchGuard[36, 37].  It iscertainly possible that Microsoft could implement fixes for these attacks, andindeed they have implemented some in more recent versions, but the problemremains a cat-and-mouse game.  In this particular cat-and-mouse game, rootkitauthors will always have an advantage both in terms of time and in terms ofvantage point.In the future, PatchGuard can be improved to leverage features of a hypervisorin a virtualized environment that might allow it to be protected frommalicious code running in the context of a guest.  For example, the currentversion of PatchGuard currently makes extensive use of obfuscation in order topresumably prevent malware from finding its code and context structures inmemory.  The presence of a hypervisor may permit PatchGuard to make moreextensive use of immutable memory, or to alternatively run at a privilegelevel that is greater than that of an executing guest, such as within thehypervisor itself (though this could have severe security implications if doneimproperly).Even if PatchGuard is improved to the point where it's no longer possible todisable its security checks, there will still be another fundamental flaw.This second flaw centers around the fact that PatchGuard, like any other codedesigned to perform explicit checks, is like a horse with blinders on.  It'sonly able to detect modifications to the specific structures that it knowsabout.  While it may be true that these structures are the most likelycandidates to be hooked, it is nevertheless true that many other structuresexist that would make suitable candidates, such as the SuspendApc of aspecific thread.  These alternative candidates are meant to illustrate thechallenges PatchGuard faces with regard to continually evolving its checks tokeep up with rootkit authors.  In this manner, PatchGuard will continue to beforced into a reactive mode rather than a proactive mode.  If IDS productshave illustrated one thing it's that reactive security solutions are largelyinadequate in the face of a skilled attacker.PatchGuard is most likely best regarded as a hall monitor.  Its job is to makesure students are doing things according to the rules.  Good students, such asISVs, will inherently bend to the will of PatchGuard lest they find themselvesin unsupported waters.  Bad students, such as rootkits, fear not the wrath ofPatchGuard and will have few qualms about sidestepping it, even if thetechnique used to sidestep may not work in the future.6) AcknowledgementsThe authors would like to acknowledge all of the people, named or unnamed,whose prior research contributed to the content included in this paper.7) ConclusionAt this point it should be clear that there is no shortage of techniques thatcan be used to expose a local kernel-mode backdoor on Windows.  Thesetechniques provide a subtle way of weakening the security guarantees of theWindows kernel by exposing restricted resources to user-mode processes.  Theseresources might include access to kernel-mode data, disabling of securitychecks, or the execution of arbitrary code in kernel-mode.  There are manydifferent reasons why these types of backdoors would be useful in the contextof a rootkit.The most obvious reason these techniques are useful in rootkits is for thevery reason that they provide access to restricted resource.  A less obviousreason for their usefulness is that they can be used as a method of reducing arootkit's kernel-mode code profile.  Since many tools are designed to scankernel-mode memory for the presence of backdoors[32, 14], any reduction of arootkit's kernel-mode code profile can be useful.  Rather than placing code inkernel-mode, techniques have been described for redirecting code execution tocode stored in user-mode in a process-specific fashion.  This is accomplishedby redirecting code into a portion of the ntdll mapping which exists in everyprocess, including the System process.Understanding how different backdoor techniques work is necessary in order toconsider approaches that might be taken to prevent or detect rootkits thatemploy them.  For example, the presence of immutable memory may eliminate someof the common techniques used by many different types of rootkits.  Likewise,when these techniques are eliminated, new ones will be developed, continuingthe cycle that permeates most adversarial systems.References[1] AMD.  AMD64 Architecture Programmer's Manual Volume 2: System Programming.  Dec, 2005.[2] Anonymous Hacker.  Xbox 360 Hypervisor Privilege Escalation Vulnerability.  Bugtraq.  Feb, 2007.  http://www.securityfocus.com/archive/1/461489[3] Blanset, David et al.  Dual operating system computer.      Oct, 1985.  http://www.freepatentsonline.com/4747040.html[4] Brown, Ralf.  Pentium Model-Specific Registers and What They Reveal.     Oct, 1995.  http://www.rcollins.org/articles/p5msr/PentiumMSRs.html[5] Butler, James and Sherri Sparks.  Windows Rootkits of 2005.    Nov, 2005.  http://www.securityfocus.com/infocus/1850[6] Cerrudo, Cesar.  Microsoft Windows Kernel GDI Local Privilege Escalation.     Oct, 2004.  http://projects.info-pull.com/mokb/MOKB-06-11-2006.html[7] CIAC.  E-34: Onehalf Virus (MS-DOS).     Sep, 1994.  http://www.ciac.org/ciac/bulletins/e-34.shtml[8] Conover, Matt.  Malware Profiling and Rootkit Detection on Windows.     2005.  http://xcon.xfocus.org/xcon2005/archives/2005/Xcon2005_Shok.pdf[9] Duflot, Lo  Security Issues Related to Pentium System Management Mode.    CanSecWest, 2006.  http://www.cansecwest.com/slides06/csw06-duflot.ppt[10] Ellch, John et al.  Exploiting 802.11 Wireless Driver Vulnerabilities on Windows.      Jan, 2007.  http://www.uninformed.org/?v=6&a=2&t=sumry[11] Firew0rker, the nobodies.  Kernel-mode backdoors for Windows NT.     Phrack 62.  Jan, 2005.  http://www.phrack.org/issues.html?issue=62&id=6#article[12] fuzenop.  SysEnterHook.      Feb, 2005.  http://www.rootkit.com/vault/fuzen_op/SysEnterHook.zip[13] Garfinkel, Tal.  Traps and Pitfalls: Practical Problems in System Call Interposition Based Security Tools.       http://www.stanford.edu/ talg/papers/traps/traps-ndss03.pdf[14] Gassoway, Paul.  Discovery of kernel rootkits with memory scan.      Oct, 2005.  http://www.freepatentsonline.com/20070078915.html[15] Gulbrandsen, John.  System Call Optimization with the SYSENTER Instruction.     Oct, 2004.  http://www.codeguru.com/Cpp/W-P/system/devicedriverdevelopment/article.php/c8223/[16] Heasman, John.  Implementing and Detecting an ACPI BIOS Rootkit.     BlackHat Federal, 2006.  https://www.blackhat.com/presentations/bh-federal-06/BH-Fed-06-Heasman.pdf[17] Heasman, John.  Implementing and Detecting a PCI Rootkit.     Nov, 2006.  http://www.ngssoftware.com/research/papers/Implementing_And_Detecting_A_PCI_Rootkit.pdf[18] Hoglund, Greg.  Kernel Object Hooking Rootkits (KOH Rootkits).      Jun, 2006.  http://www.rootkit.com/newsread.php?newsid=501[19] Hoglund, Greg.  A *REAL* NT Rootkit, patching the NT Kernel.      Phrack 55.  Sep, 1999. http://phrack.org/issues.html?issue=55&id=5[20] Hoglund, Greg and James Butler.  Rootkits: Subverting the Windows Kernel. 2006.  Addison-Wesley.[21] Hunt, Galen and Doug Brubacher.  Detours: Binary Interception of Win32 Functions. Proceedings of the  3rd USENIX Windows NT Symposium, pp. 135-143. Seattle, WA, July 1999. USENIX.[22] Intel.  2.1.2 The Intel 286 Processor (1982).      Intel 64 and IA-32 Architectures Software Developer's Manual. Denver, Colorado: Intel, 34.  http://www.intel.com/products/processor/manuals/index.htm.[23] Intel.  IA-32 Intel Architecture Software Developer's Manual Volume 3: System Programming Guide.     Sep, 2005.[24] Jack, Barnaby.  Remote Windows Kernel Exploitation: Step into the Ring 0. Aug, 2005.  http://www.blackhat.com/presentations/bh-usa-05/BH_US_05-Jack_White_Paper.pdf[25] Kasslin, Kimmo.  Kernel Malware: The Attack from Within.      2006.  http://www.f-secure.com/weblog/archives/kasslin_AVAR2006_KernelMalware_paper.pdf[26] Kdm.  NTIllusion: A portable Win32 userland rootkit [incomplete].     Phrack 62.  Jan, 2005.    http://www.phrack.org/issues.html?issue=62&id=12&mode=txt[27] M. B. Jones.  Interposition agents: Transparently interposing user code at the system interface.       In Symposium on Operating System Principles, pages 80-93, 1993.  http://www.scs.stanford.edu/nyu/04fa/sched/readings/interposition-agents.pdf[28] Mythrandir.  Protected mode programming and O/S development.      Phrack 52.  Jan, 1998.  http://www.phrack.org/issues.html?issue=52&id=17#article[29] PaX team.  PAGEEXEC.      Mar, 2003.  http://pax.grsecurity.net/docs/pageexec.txt[30] Plaguez.  Weakening the Linux Kernel.      Phrack 52.  Jan, 1998.    http://www.phrack.org/issues.html?issue=52&id=18#article[31] Prasad Dabak, Milind Borate, and Sandeep Phadke.  Hooking Software Interrupts.      Oct, 1999.  http://www.windowsitlibrary.com/Content/356/09/1.html[32] Rutkowska, Joanna.  System Virginity Verifier.       http://invisiblethings.org/tools/svv/svv-2.3-src.zip[33] Rutkowska, Joanna.  Rookit Hunting vs. Compromise Detection.      BlackHat Europe, 2006. http://invisiblethings.org/papers/rutkowska_bheurope2006.ppt[34] Rutkowska, Joanna.  Introducing Stealth Malware Taxonomy.      Nov, 2006. http://invisiblethings.org/papers/malware-taxonomy.pdf[35] Silvio.  Shared Library Call Redirection Via ELF PLT Infection.     Phrack 56.  Jan, 2000.  http://www.phrack.org/issues.html?issue=56&id=7#article[36] skape and Skywing.  Bypassing PatchGuard on Windows x64.     Uninformed Journal.  Jan, 2006. http://www.uninformed.org/?v=3&a=3&t=sumry[37] Skywing.  Subverting PatchGuard version 2.      Uninformed Journal.  Jan, 2007. http://www.uninformed.org/?v=6&a=1&t=sumry[38] Skywing.  Anti-Virus Software Gone Wrong.     Uninformed Journal.  Jun, 2006.  http://www.uninformed.org/?v=4&a=4&t=sumry[39] Skywing.  Programming against the x64 exception handling support.       Feb, 2007.  http://www.nynaeve.net/?p=113[40] Soeder, Derek.  Windows Expand-down Data Segment Local Privilege Escalation.     Apr, 2004.  http://research.eeye.com/html/advisories/published/AD20040413D.html[41] Sparks, Sherri and James Butler.  Raising the Bar for Windows Rootkit Detection.      Phrack 63.  Jan, 2005.  http://www.phrack.org/issues.html?issue=63&id=8[42] Trusted Computing Group.  Trusted Computing Group: Home.       https://www.trustedcomputinggroup.org/home[43] Trusted Computing Group.  TPM Specification.       https://www.trustedcomputinggroup.org/specs/TPM/[44] Welinder, Morten.  modifyldt security holes.       Mar, 1996.  http://lkml.org/lkml/1996/3/6/13[45] Wikipedia.  Call gate.       http://en.wikipedia.org/wiki/Call_gate
原创粉丝点击