tombstone 分析

来源:互联网 发布:手机淘宝导航怎么设置 编辑:程序博客网 时间:2024/05/22 20:10

 Coredump 是分析Android native exception和kernel exception的利器,coredump是核心转储,可以理解为当进程发生异常无法挽救时,OS机制把这块出问题的内存取出来打包成核心转储供给离线分析用。有了coredump 不但可以定位具体出异常的代码所在文件行数,还可以离线调试,一步步还原问题现场,抓出导致异常真凶.但是很多时候由于系统挂得太突然等某些原因来不及打包coredump,导致无法获取到核心转储,只留下一堆 tombstone 的残余信息,要使用有限的调试信息分析问题原因并解决之,这个时候GNU tools工具家族的addr2line工具就可以发挥作用了,addr2line工具可以根据内存地址加上符号库文件即可“翻译”出代码出错的具体位置(这里工具定位到的代码位置很多情况下只是供参考,不一定是真正的错误原因,特别是内存被踩的情况)。

tombstone的本意是“墓碑”,这里形象的用于描述进程挂了之后留下供调试的线索,

如下是某进程崩溃后留下的 tombstone 中的的 backtrace:

Revision: '0'ABI: 'arm64'pid: 24377, tid: 24377, name: gx_fpd  >>> /system/bin/gx_fpd <<<signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------    x0   0000000000000000  x1   0000000000005f39  x2   0000000000000006  x3   0000000000000000    x4   0000000000000000  x5   0000000000000001  x6   0000000000000000  x7   0000000000000000    x8   0000000000000083  x9   0000007fb4eec110  x10  0000000000000002  x11  0000000000000003    x12  0000000000000000  x13  0000000000000043  x14  0000007fcc97a768  x15  0000000000000000    x16  0000007fb4b866a8  x17  0000007fb4b48b6c  x18  0000000000000002  x19  0000007fb4f670a8    x20  0000007fb4f66fe8  x21  000000000000000b  x22  0000000000000006  x23  0000005582219f90    x24  0000007fcc97ac90  x25  0000007fb4e04d18  x26  0000000000000000  x27  0000000000000000    x28  0000000000000000  x29  0000007fcc97ab60  x30  0000007fb4b46308    sp   0000007fcc97ab60  pc   0000007fb4b48b74  pstate 0000000020000000    v0   2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e2e  v1   006370692e67756265642e6f6e6e6974    v2   636f69203a4457540000000000000031  v3   80000000000000000000000000000000    v4   00000000000000008020080280200800  v5   00000000400000000000040000000000    v6   00000000000000000000000000000000  v7   80200802802008028020080280200802    v8   00000000000000000000000000000000  v9   00000000000000000000000000000000    v10  00000000000000000000000000000000  v11  00000000000000000000000000000000    v12  00000000000000000000000000000000  v13  00000000000000000000000000000000    v14  00000000000000000000000000000000  v15  00000000000000000000000000000000    v16  40100401401004014010040140100401  v17  00000000a00aa0080000aaa880400400    v18  00000000000000008020080280200800  v19  0833083a082f08240828083c082e0832    v20  0c950c920c9a0c950c960c950c970c9a  v21  000000000000000000000055822a6c18    v22  083a083e083408380834083b084f084b  v23  0c950c960c960c930c970c8d0c930c9a    v24  000000000000000000000055822a6c08  v25  085908470837083f083e083f08410843    v26  0c950c930c920c940c950c960c920c97  v27  000000000000000000000055822a6bf8    v28  0862084c084e083b084608350826082e  v29  0c920c960c930c950c920c970c900c98    v30  000000000000000000000055822a6be8  v31  0838083c0850085a08410851082f0846    fpsr 00000000  fpcr 00000000backtrace:    #00 pc 000000000006ab74  /system/lib64/libc.so (tgkill+8)    #01 pc 0000000000068304  /system/lib64/libc.so (pthread_kill+68)    #02 pc 00000000000212f8  /system/lib64/libc.so (raise+28)    #03 pc 000000000001ba98  /system/lib64/libc.so (abort+60)    #04 pc 000000000002e104  /system/lib64/libbinder.so (android::IPCThreadState::joinThreadPool(bool)+216)    #05 pc 0000000000004c5c  /system/bin/gx_fpd (main+236)    #06 pc 0000000000019794  /system/lib64/libc.so (__libc_init+100)    #07 pc 0000000000004d78  /system/bin/gx_fpd

从发现异常的信号 signal 6 (SIGABRT) 看第一印象就是发生了NULL内存范围,被MMU拦截了,ARM异常处理报出 data abort异常所致。 这里很重要一点是要知道具体backtrace代表的源代码是什么,也就是从backtrace翻译成具体的源代码。addr2line工具则提供了此功能。

用法如下:(一定要用带sysmbol目录下的库) 

addr2line -e symbols/system/lib64/xxx.so -f -C <addr

./aarch64-linux-android-addr2line -e symbols/system/lib64/libc.so 000000000006ab74bionic/libc/arch-arm64/syscalls/tgkill.S:9./aarch64-linux-android-addr2line -e symbols/system/lib64/libc.so 0000000000068304bionic/libc/bionic/pthread_kill.cpp:45 (discriminator 1)./aarch64-linux-android-addr2line -e symbols/system/lib64/libc.so 00000000000212f8 bionic/libc/bionic/raise.cpp:34 (discriminator 1)./aarch64-linux-android-addr2line -e symbols/system/lib64/libc.so 000000000001ba98 bionic/libc/bionic/abort.cpp:47./aarch64-linux-android-addr2line -e symbols/system/lib64/libbinder.so 000000000002e104frameworks/native/libs/binder/IPCThreadState.cpp:608

转换后如下 ==》

backtrace:    #00 pc 000000000006ab74  /system/lib64/libc.so (tgkill+8)    tgkill.S:9    #01 pc 0000000000068304  /system/lib64/libc.so (pthread_kill+68)  pthread_kill.cpp:45    #02 pc 00000000000212f8  /system/lib64/libc.so (raise+28)  raise.cpp:34    #03 pc 000000000001ba98  /system/lib64/libc.so (abort+60)  abort.cpp:47    #04 pc 000000000002e104  /system/lib64/libbinder.so  IPCThreadState.cpp:608 (android::IPCThreadState::joinThreadPool(bool)+216)    #05 pc 0000000000004c5c  /system/bin/gx_fpd (main+236)    #06 pc 0000000000019794  /system/lib64/libc.so (__libc_init+100)    #07 pc 0000000000004d78  /system/bin/gx_fpd

这里注意下,因为gx_fpd 是第三方库,不带symbol,所以无法解析出具体代码位置。

然后我们可以看下发生异常的代码,IPCThreadState.cpp:608

void IPCThreadState::joinThreadPool(bool isMain){    LOG_THREADPOOL("**** THREAD %p (PID %d) IS JOINING THE THREAD POOL\n", (void*)pthread_self(), getpid());    mOut.writeInt32(isMain ? BC_ENTER_LOOPER : BC_REGISTER_LOOPER);        // This thread may have been spawned by a thread that was in the background    // scheduling group, so first we will make sure it is in the foreground    // one to avoid performing an initial transaction in the background.    set_sched_policy(mMyThreadId, SP_FOREGROUND);            status_t result;    do {        processPendingDerefs();        // now get the next command to be processed, waiting if necessary        result = getAndExecuteCommand();        if (result < NO_ERROR && result != TIMED_OUT && result != -ECONNREFUSED && result != -EBADF) {            ALOGE("getAndExecuteCommand(fd=%d) returned unexpected error %d, aborting",                  mProcess->mDriverFD, result);            abort();   <======= LINE 608        }

上面代码可以出,这个abort不是发生NULL指针所致,而是为了拦截程序发生超出预期的行为而人为的加了abort 动作, 这里就需要分析这个result为什么会异常导致跑到这个陷阱中了,而这块属于binder通信的核心代码,所以需要对binder的原理深入理解以及其代码非常的熟悉才能从容的进一步调试分析.

原创粉丝点击