自己写调试器软断点 [Linux]

来源：互联网发布：音频剪切合并软件编辑：程序博客网时间：2024/06/16 08:56

之前写过博文《自己写调试器软断点》，但其是基于windows 32位的环境下的，现在自己的笔记本加了内存，顺便让电脑升级到64位的ubuntu，所以就把原来的代码移植到x84_64 Linux下。

在linux中，我们需要用到ptrace 这个系统调用来实现对进程的控制。同时为了获取系统库中的函数地址，我们需要dlsym等在libdl中的相关函数帮助。

ptrace 的原型如下：

       long ptrace(enum __ptrace_request request, pid_t pid,                   void *addr, void *data);

我们可以看到 ptrace 的地址和数据参数都是 void *，这个是非常灵活的设计，这样ptrace可以根据request的不同来接收、返回不同类型的参数。在Linux中我们能看到大量这样的函数原型。

我们需要用到下面的 ptrace request，由于用 Python，我们不能直接包含 <sys/ptrace.h>，所以我们就在系统中找到相应的头文件，然后将这个转换到我们的 Python 文件中。64位的 Linux 是在 /usr/include/x86_64-linux-gnu/sys/ptrace.h 中。

在<sys/ptrace.h>文件中，其 ptrace request 参数是枚举变量，而在 python 中我们无需转换成 python ctypes 库对应的类型即可使用。

''' Indicate that the process making this request should be traced.     All signals received by this process can be intercepted by its     parent and its parent can use the other `ptrace' requests.  '''PTRACE_TRACEME = 0''' Return the word in the process's text space at address ADDR.  '''PTRACE_PEEKTEXT = 1''' Return the word in the process's data space at address ADDR.  '''PTRACE_PEEKDATA = 2''' Return the word in the process's user area at offset ADDR.  '''PTRACE_PEEKUSER = 3''' Write the word DATA into the process's text space at address ADDR.  '''PTRACE_POKETEXT = 4''' Write the word DATA into the process's data space at address ADDR.  '''PTRACE_POKEDATA = 5''' Write the word DATA into the process's user area at offset ADDR.  '''PTRACE_POKEUSER = 6''' Continue the process.  '''PTRACE_CONT = 7''' Get all general purpose registers used by a processes.   This is not supported on all machines.  '''PTRACE_GETREGS = 12''' Set all general purpose registers used by a processes.   This is not supported on all machines.  '''PTRACE_SETREGS = 13''' Attach to a process that is already running. '''PTRACE_ATTACH = 16''' Detach from a process attached to with PTRACE_ATTACH.  '''PTRACE_DETACH = 17

现在我们需要实现 attach 到一个活动的 process，在 Linux 中请确保你有比相应进程高的权限，我一般直接sudo

    def attachPID(self, pid):        """Attach to process"""        ec = self.c_ptrace(PTRACE_ATTACH, pid, None, None)        if(ec != 0):            self.c_perror("Attach to process error!\n")            return ec        self.trace_pid = pid        self.is_attached = True

当一个 process 被 ptrace attach时，系统会向指定进程发出 SIGSTOP 信号，指定进程便会中断，debugger需要等待这个process

    def waitForPID(self):        """Wait for a PID singal"""        if self.is_attached:            self.c_waitpid(self.trace_pid, byref(c_int(self.status)), 0)

当然在 debugger 做完事情后，我们需要让调试进程继续运行

    def continuePID(self):        """continue ptrace for one PID"""        self.c_ptrace(PTRACE_CONT, self.trace_pid, None, None)

debugger 需要实现的基本功能就是实现断点功能，现在这篇文章介绍的是软断点，也就是通过改变intel的指令实现的，也就是说我们需改变特定进程的代码段，以及其线程控制块，这样我们就可以让指定进程在指定的指令下stop，debugger获取指定进程的相关TCB，堆栈信息等等，实现调试的目的。

我们先看如何通过 ptrace 系统调用来更改进程的代码段。

    def readMemData(self, addr):        """Read data from address of process memory"""        data = c_ulong()        if(self.is_attached):            data = self.c_ptrace(PTRACE_PEEKTEXT, self.trace_pid, c_void_p(addr), None)            if -1 == data:                self.c_perror("readMemData error!")            return data    def writeMemData(self, addr, data):        """Write data to address of process memory"""        if(self.is_attached):            status = self.c_ptrace(PTRACE_POKETEXT, self.trace_pid, c_void_p(addr), c_void_p(data))            if status != 0:                self.c_perror("writeMemData error!")

可以看到，我们是通过 ptrace PTRACE_PEEKTEXT 和 PTRACE_POKETEXT 来实现对代码段的读写的。传进去的地址必须是进程的虚拟地址。这个怎么获取呢？

当然可以用 objdump 获取相关进程对应的二进制文件的偏移地址，然后通过计算获得其虚拟地址。

如果我们想要在指定进程调用指定库的函数地址上设置断点，我们还可以通过计算获得其在指定进程的虚拟地址。本文所做的就是这样的事情。Linux 同样的库代码会被映射到不同的进程空间，他们的虚拟地址是不一样的，但他们的相对偏移地址是相同的。有点晕，没关系，我们看代码就知道了。

    def _getModuleBase(self, pid, module_name):        """Get module base virtual address"""        if(pid < 0):            maps_name = "/proc/self/maps"        else:            maps_name = "/proc/" + str(pid) + "/maps"        fp = open(maps_name, "r")        addr = 0;        data = fp.readlines()        for line in data:            if(line.find(module_name) > 0):                f = line.find("-")                addr = int(line[0:f], 16)                if addr == 0x8000:                    addr = 0                break        fp.close()        return addr    def _getRemoteAddr(self, pid, module_name, local_addr):        """Get PID module function virtual address"""        local_base = self._getModuleBase(-1, module_name)        remote_base = self._getModuleBase(pid, module_name)        #print "Local 0x%X VS Rmote 0x%X" % (local_base, remote_base)        return int(local_addr - local_base + remote_base)    def moduleResolve(self, mname, fname):        """Find function virtual address in module name in one PID"""        module_handle = None        if("libc" == mname):            module_handle = self.c_handle        else:            module_handle = self.c_dlopen(mname)            if module_handle == 0:                print "dlopen module %s error %s" % (mname, self.c_dlerror())                return        func_addr = self.c_dlsym(c_void_p(module_handle), c_char_p(fname))        return self._getRemoteAddr(self.trace_pid, mname, func_addr)

Debugger 通过函数名来找到 debugger 自身进程空间内的库函数虚拟地址， dlsym 以及相关函数原型如下

       #include <dlfcn.h>       void *dlopen(const char *filename, int flag);       char *dlerror(void);       void *dlsym(void *handle, const char *symbol);       int dlclose(void *handle);

我们可以看到，debugger 会根据不同的进程，在内核文件系统 /proc 中找对应进程的maps，而maps文件存着不同进程加载库文件map到进程空间的虚拟地址的库的基址。下面的例子我们可以很清楚地知道这个进程的堆地址区间，库的地址区间。而上述例子中 debugger 就是根据库的模块名找到对应的虚拟地址基址，根据同一个库中函数的相对地址不变来推算到指定进程的库函数虚拟地址。

neilhhw@Hou-ThinkPad:~$ cat /proc/3993/maps 00400000-006ba000 r-xp 00000000 08:02 1179902                            /usr/bin/python2.7008b9000-008ba000 r--p 002b9000 08:02 1179902                            /usr/bin/python2.7008ba000-0092f000 rw-p 002ba000 08:02 1179902                            /usr/bin/python2.70092f000-00941000 rw-p 00000000 00:00 0 01fa4000-02ae5000 rw-p 00000000 00:00 0                                  [heap]7f72c4000000-7f72c4021000 rw-p 00000000 00:00 0 7f72c4021000-7f72c8000000 ---p 00000000 00:00 0 7f72c95eb000-7f72c9624000 r-xp 00000000 08:02 134910                     /lib/x86_64-linux-gnu/libreadline.so.6.27f72c9624000-7f72c9824000 ---p 00039000 08:02 134910                     /lib/x86_64-linux-gnu/libreadline.so.6.27f72c9824000-7f72c9826000 r--p 00039000 08:02 134910                     /lib/x86_64-linux-gnu/libreadline.so.6.2

好了，我们现在已经获得了需要设置断点的地址，那么我们开始设置断点吧！还记得 Intel 的软件断点是怎么设置吗？

    def bpSet(self, addr):        """Set breakpoint at the refer virtual address"""        data = self.readMemData(addr)        self.bp_data = data        data = (data & 0xFFFFFFFFFFFFFF00) | 0xCC        self.writeMemData(addr, data)        data = self.readMemData(addr)    def bpRemove(self, addr):        """Remove breakpoint in the PID"""        self.writeMemData(addr, self.bp_data)        self.bp_data = self.readMemData(addr)        self._bpContinue()    def _bpContinue(self):        regs = user_regs_struct()        self.readRegs(regs)        regs.rip -= 1        self.writeRegs(regs)

是的，我们只是将指定进程的代码段指令第一个字节修改成0xCC, 这样，只要程序执行到这条指令，内核就会想指定进程发送 SIGTRAP 消息让进程停住等待调试。这样 debugger就能获取进程的的很多调试信息。

当然，当调试结束，我们需要恢复环境，那么就需要remove断点，让进程继续执行。

下面便是一个测试用例。

# main function here for test purposedef main():    """This is main function for debugger"""    if len(sys.argv) > 1:        trace_pid = int(sys.argv[1])    else:        print("%s parameter needs one process ID" % sys.argv[0])        sys.exit(-1)    my_debugger = MyDebugger()    my_debugger.attachPID(trace_pid)    my_debugger.waitForPID()    addr = my_debugger.moduleResolve("libc", "printf")    print "printf address: 0x%X" % addr    data = my_debugger.readMemData(addr)    print "printf in libc data 0x%X" % data    my_debugger.bpSet(addr)    my_debugger.continuePID()    my_debugger.waitForPID()    regs = user_regs_struct()    my_debugger.readRegs(regs)    print "Orignal RAX: 0x%X RAX: 0x%X RBX: 0x%X RCX: 0x%X RDX: 0x%X RBP: 0x%X" % (regs.orig_rax, regs.rax, regs.rbx, regs.rcx, regs.rdx, regs.rbp)    data = my_debugger.readMemData(regs.rip - 1)    print "Data in RIP: 0x%X" % data    my_debugger.bpRemove(addr)    my_debugger.continuePID()if __name__ == '__main__':    main()

最后将类的构造函数贴上

    def __init__(self):        """init of MyDebugger"""        libc = CDLL("libc.so.6")        libdl = CDLL("libdl.so.2")        self.c_ptrace = libc.ptrace        self.c_waitpid = libc.waitpid        self.c_perror = libc.perror        self.trace_pid = 0        self.is_attached = False        self.c_ptrace.restype = c_ulong        self.status = 0        self.c_dlopen = libdl.dlopen        self.c_dlerror = libdl.dlerror        self.c_dlsym = libdl.dlsym        #Change void pointer to dword        self.c_dlsym.restype = c_ulong        self.c_handle = libc._handle        self.dl_handle = libdl._handle        self.bp_data = c_ulong()

自己写调试器 软断点 [Linux]

自己写调试器软断点 [Linux]