generate linux crash core file

来源:互联网 发布:js判断等于0 编辑:程序博客网 时间:2024/05/11 17:10

当程序崩溃时,linux提供了产生core文件的机制。相当于windows系统中的minidump转储的崩溃文件.

前段时间,程序运行很长时间会崩溃,当时大家是满世界的打文件日志。
当时就知道有core文件这一说,但是没有做实验。

日志文件不好的地方:
* 可能记录的日志较多,也不一定能命中,如果没命中,还要再将日志加细致,对于运行时间很久才能出现的BUG, 用打文件日志定位问题用掉了太多时间。如果测试时间不能承受(e.g. 运行1个月才出现一次,那一年才有12次机会将日志再加细致一些,工程如果大了,被搞死了),那还真麻烦。
* 日志一般是记录我们关心的流程点,调BUG的日志加入后,工程不好看。等BUG修正后,日志是否还保留,还有待商量。如果去了调试日志,下次再出问题,还要加上调试日志。
* 就算采用的日志类比较高效,因为进行了IO操作(写文件),对程序运行效率有很大影响。特别是那些在直接或间接在循环中打印的文件日志。

今天在看“The Art of Debugging with GDB DDD and Eclipse”, 这书讲到了linux程序崩溃时,如何产生core文件. 做了实验进行验证,非常好使. 以后不用打印文件日志来找BUG了:)

产生core文件的步骤

工程本身不用改。
做一个.sh, 里面启动编译好的目标程序,在启动之前,先执行 ulimit -c unlimited

#!/bin/sh# @file run_gen_core.shchmod 777 ./test_gdb_no_source_debugulimit -c unlimited./test_gdb_no_source_debug

实现工程下载点

Makefile文件从博客上贴回新建的Makefile中,就不能正常编译了。
先保存一个实验工程副本到CSDN.
case_generate_linux_crash_core_file.zip

模拟发生段错误的demo

将实验的流程点也记录进去了,看着有点乱。

// @file main.cpp// @brief test gdb debug the elf that no source// if debian OS not install gdb, please run below command to install gdb// apt-get install gdb// run below command to rebuild project// make rebuild// run below command to only stay output file// make only_stay_output// ================================================================================// gdb// ================================================================================// now can use gdb to debug ./test_gdb_no_source_debug without source code// gdb ./test_gdb_no_source_debug// gdb documentation below// http://www.gnu.org/software/gdb/documentation/// 如果命令只记得一部分, 用tab key 补齐命令// view main funtion addr, because main was export on this Makefile// (gdb) info line main// Line 14 of "main.cpp" starts at address 0x4006f6 <main(int, char**)> and ends at 0x400705 <main(int, char**)+15>.// view main's dasm code// (gdb) disassemble 0x4006f6/**(gdb) disassemble 0x4006f6Dump of assembler code for function main(int, char**):   0x00000000004006f6 <+0>: push   %rbp   0x00000000004006f7 <+1>: mov    %rsp,%rbp   0x00000000004006fa <+4>: sub    $0x20,%rsp   0x00000000004006fe <+8>: mov    %edi,-0x14(%rbp)   0x0000000000400701 <+11>:    mov    %rsi,-0x20(%rbp)   0x0000000000400705 <+15>:    movq   $0x0,-0x8(%rbp)   0x000000000040070d <+23>:    movl   $0x0,-0xc(%rbp)   0x0000000000400714 <+30>:    mov    $0x0,%edi   0x0000000000400719 <+35>:    callq  0x4005f0 <time@plt>   0x000000000040071e <+40>:    mov    %eax,%edi   0x0000000000400720 <+42>:    callq  0x4005e0 <srand@plt>   0x0000000000400725 <+47>:    movl   $0xa,-0xc(%rbp)   0x000000000040072c <+54>:    mov    -0xc(%rbp),%eax   0x000000000040072f <+57>:    mov    %eax,%edi   0x0000000000400731 <+59>:    callq  0x400768 <Add1ToN(int)>   0x0000000000400736 <+64>:    mov    %rax,-0x8(%rbp)   0x000000000040073a <+68>:    mov    -0x8(%rbp),%rdx   0x000000000040073e <+72>:    mov    -0xc(%rbp),%eax   0x0000000000400741 <+75>:    mov    %eax,%esi   0x0000000000400743 <+77>:    mov    $0x400824,%edi   0x0000000000400748 <+82>:    mov    $0x0,%eax   0x000000000040074d <+87>:    callq  0x400590 <printf@plt>   0x0000000000400752 <+92>:    mov    $0x400837,%edi   0x0000000000400757 <+97>:    callq  0x4005b0 <puts@plt>   0x000000000040075c <+102>:   callq  0x4005d0 <getchar@plt>   0x0000000000400761 <+107>:   mov    $0x0,%eax   0x0000000000400766 <+112>:   leaveq    0x0000000000400767 <+113>:   retq   End of assembler dump.*/// set bp on main function's first code line// (gdb) break main// set bp on addr// (gdb) break *xxxxxxxx// e.g. (gdb) break *0x00000000004006F6// view bp was set// (gdb) info breakpoints// run until break on gdb// (gdb) run// when break on gdb, view main's parameter/**(gdb) info argsargc = 1argv = 0x7fffffffebc8*/// view current dasm code// (gdb) disassemble// step one dasm code line// (gdb) next// view register// (gdb) info registers eax// (gdb) info registers// (gdb) info all-registers// set register// (gdb) set $rax = 1// (gdb) set $rax += 1// (gdb) set $rax -= 1// view stack// (gdb) x/8x $rsp// execute C code line// (gdb) next// (gdb) step 1// quit gdb// (gdb)quit// 常见的就这么多了,如果还需要,就再去找// ================================================================================// DDD// ================================================================================// add one line append to /etc/apt/sources.list// deb http://ftp.hk.debian.org/debian sid main// do below// apt-get update// do below// apt-get install ddd// 更换介质:请把标有// “Debian GNU/Linux 7.5.0 _Wheezy_ - Official amd64 DVD Binary-1 20140426-13:37// 升级的还挺多的,下次玩ddd, 这次就用gdb#include <stdlib.h>#include <stdio.h>#include <time.h>long Add1ToN(int iN);#define MY_DEBUG#define NUM_N 10int main(int argc, char* argv[]){    long lSum = 0;    int iN = 0;    srand((int)time(NULL));#ifdef MY_DEBUG    iN = NUM_N;#else    iN = rand();#endif    lSum = Add1ToN(iN);    printf("Add1ToN(%d) = %ld\n", iN, lSum);    printf("END\n");    return 0;}long Add1ToN(int iN){    char* pTmp = NULL;    long lRc = 1;    if (iN > 1) {        do {            *pTmp = 0; // Segmentation fault            // lRc /= 0; // Floating point exception            lRc += iN;        } while (--iN > 1);    }    return lRc;}
# ==============================================================================# makefile#   lostspeed 2017-06-24# ==============================================================================## if need create core file when program was crash, do below# ulimit -c unlimited# when prog crash, create the core file.# use gdb view core file below # gdb object_file core_file# gdb ./test_gdb_no_source_debug ./core# if source file and object file layout same to project was build# can see crash point below# ==============================================================================# root@debian750:/home/lostspeed/test# gdb ./test_gdb_no_source_debug ./core# GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1# Copyright (C) 2014 Free Software Foundation, Inc.# License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html># This is free software: you are free to change and redistribute it.# There is NO WARRANTY, to the extent permitted by law.  Type "show copying"# and "show warranty" for details.# This GDB was configured as "x86_64-linux-gnu".# Type "show configuration" for configuration details.# For bug reporting instructions, please see:# <http://www.gnu.org/software/gdb/bugs/>.# Find the GDB manual and other documentation resources online at:# <http://www.gnu.org/software/gdb/documentation/>.# For help, type "help".# Type "apropos word" to search for commands related to "word"...# Reading symbols from ./test_gdb_no_source_debug...done.# warning: core file may not match specified executable file.# [New LWP 11452]# [Thread debugging using libthread_db enabled]# Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".# Core was generated by `./test_gdb_no_source_debug'.# Program terminated with signal SIGSEGV, Segmentation fault.# #0  0x0000000000400734 in Add1ToN (iN=10) at main.cpp:165# warning: Source file is more recent than executable.# 165               *pTmp = 0; // Segmentation fault# (gdb) # ==============================================================================# now, we know the crash point, then can patch source code to fixed the crash :)# if have ./test_gdb_no_source_debug ./core, but not have source code# can show Segmentation fault "at main.cpp:165", now we can see project to fix the crashCC = g++#   -W \#   -Wall \CFLAGS = --std=c++11 \    -W \    -Wall \    -gBIN = test_gdb_no_source_debugINC = -I.LIBS = -lstdc++ \  -pthreadLIBPATH = /usr/local/libSUBDIR = $(shell ls -d */)ROOTSRC = $(wildcard *.cpp)ROOTOBJ = $(ROOTSRC:.cpp=.o)SUBSRC = $(shell find $(SUBDIR) -name '*.cpp')SUBOBJ = $(SUBSRC:.cpp=.o)help:    clear    @echo make help    @echo command list:    @echo make clean    @echo make all    @echo make rebuild    @echo make only_stay_outputclean:    clear    @echo make clean    rm -f $(BIN) $(ROOTOBJ) $(SUBOBJ)    ls -lall:$(BIN)    @echo make all    ls -l$(BIN) : $(ROOTOBJ) $(SUBOBJ)    $(CC) $(CFLAGS) -o $@ $^ -L$(LIBPATH) $(LIBS).cpp.o:    $(CC) -c $(CFLAGS) $^ -o $@ $(INC)rebuild:    make clean    make all    chmod 777 ./run_gen_core.shonly_stay_output:    rm *.o    rm *.cpp    rm ./Makefile    ls -l

模拟崩溃和查看core文件定位崩溃点

*将工程上传到debian服务器,工程文件共3个

root@debian750:/home/lostspeed/test# ls -l总用量 16-rw-r--r-- 1 root root 4459 627 16:55 main.cpp-rw-r--r-- 1 root root 3124 627 23:04 Makefile-rw-r--r-- 1 root root  118 627 23:24 run_gen_core.sh

编译工程

make rebuild

清掉源文件,保留运行脚本和产生的elf目标文件

make only_stay_output

运行脚本,使目标程序运行,产生core文件

root@debian750:/home/lostspeed/test# ./run_gen_core.sh Segmentation fault (core dumped)root@debian750:/home/lostspeed/test# ls -l总用量 216-rw------- 1 root root 434176 627 23:37 core-rwxrwxrwx 1 root root    118 627 23:24 run_gen_core.sh-rwxrwxrwx 1 root root   7576 627 23:34 test_gdb_no_source_debug

用GDB打开目标文件和core文件,定位段错误的崩溃点

root@debian750:/home/lostspeed/test# gdb ./test_gdb_no_source_debug ./core GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1Copyright (C) 2014 Free Software Foundation, Inc.License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>This is free software: you are free to change and redistribute it.There is NO WARRANTY, to the extent permitted by law.  Type "show copying"and "show warranty" for details.This GDB was configured as "x86_64-linux-gnu".Type "show configuration" for configuration details.For bug reporting instructions, please see:<http://www.gnu.org/software/gdb/bugs/>.Find the GDB manual and other documentation resources online at:<http://www.gnu.org/software/gdb/documentation/>.For help, type "help".Type "apropos word" to search for commands related to "word"...Reading symbols from ./test_gdb_no_source_debug...(no debugging symbols found)...done.warning: core file may not match specified executable file.[New LWP 5517][Thread debugging using libthread_db enabled]Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".Core was generated by `./test_gdb_no_source_debug'.Program terminated with signal SIGSEGV, Segmentation fault.#0  0x0000000000400734 in Add1ToN(int) ()(gdb) 

分析段错误崩溃点

这里根据服务器环境的不同,可能定位的准确程度不一样。
但是都能定位到崩溃的函数。
在公司的虚拟机里,可以定位到导致崩溃的行。
但是能定位到引起崩溃的函数,要打印的调试日志的量就小多了(等调试完成后,要去掉这些少量的调试日志,也是很快的),如果程序是自己写的,到了问题函数,看看就能知道问题在哪里。

如果这个BUG真的要运行一个月才会出现,可以将定位的问题函数抽出来,做成testcase, 写成可以立即出现问题的场景。这样修复BUG也方便多了。

对于本demo, Add1ToN中有对空指针的赋值操作,这里引起了段错误.