Linux下coredump

来源：互联网发布：多得美工学院编辑：程序博客网时间：2024/04/30 05:38

1.coredump简单介绍

程序崩溃时保存的程序运行时的保存的内存信息的coredump文件，可以通过sysctl或者/proc中来设置core文件的文件名以及生成的路径等。一般的coredump文件为ELF格式，coredmp包含了程序运行时的内存，寄存器状态，堆栈指针，内存管理信息等。许多程序和操作系统出错时会自动生成一个core文件。coredump可以用在很多场合，使用Linux系统在跑一些压力测试或者系统负载一大的话，系统就hang住了或者干脆system panic。这时唯一能帮助你分析和解决问题的就是coredump了。通常进程或者内核收到

2.core文件相关命令

ulimit命令改变shell的资源限制，显示shell的资源限制，coredump项值为coredump文件大小单位blocks（4kbytes），程序崩溃时的行为不可按平常时的行为来估计，比如缓冲区溢出等错误可能导致堆栈被破坏，因此经常会出现某个变量的值被修改成乱七八糟的，然后程序用这个大小去申请内存就可能导致程序比平常时多占用很多内存。因此无论程序正常运行时占用的内存多么少，要保证生成Core文件还是将大小限制设为unlimited为好一般使用ulimit -c unlimited不限制coredump文件大小，生成文件太小gdb时候会报错

ulimit -acore file size          (blocks, -c) 0data seg size           (kbytes, -d) unlimitedscheduling priority             (-e) 0file size               (blocks, -f) unlimitedpending signals                 (-i) 129159max locked memory       (kbytes, -l) 64max memory size         (kbytes, -m) unlimitedopen files                      (-n) 1024pipe size            (512 bytes, -p) 8POSIX message queues     (bytes, -q) 819200real-time priority              (-r) 0stack size              (kbytes, -s) 8192cpu time               (seconds, -t) unlimitedmax user processes              (-u) 129159virtual memory          (kbytes, -v) unlimitedfile locks                      (-x) unlimited

实际程序中可以通过cat /proc/pid/limits查看进程的资源限制，Max core file size中soft limit为coredump文件支持最大值，0则表示不会生成coredump文件，这里以pid=1为例子

#cat /proc/1/limits Limit                     Soft Limit           Hard Limit           Units     Max cpu time              unlimited            unlimited            seconds   Max file size             unlimited            unlimited            bytes     Max data size             unlimited            unlimited            bytes     Max stack size            8388608              unlimited            bytes     Max core file size        0                    unlimited            bytes     Max resident set          unlimited            unlimited            bytes     Max processes             129159               129159               processes Max open files            1024                 4096                 files     Max locked memory         65536                65536                bytes     Max address space         unlimited            unlimited            bytes     Max file locks            unlimited            unlimited            locks     Max pending signals       129159               129159               signals   Max msgqueue size         819200               819200               bytes     Max nice priority         0                    0                    Max realtime priority     0                    0                    Max realtime timeout      unlimited            unlimited            us

若程序调用了seteuid()/setegid()/setsid()改变了进程的有效用户或组，则在默认情况下系统不会为这些进程生成coredump。很多服务程序都会调用seteuid()或者daemon( )。为了能够让这些进程生成core dump，需要进程中使用函数getrlimit，setrlimit来改变大小，linux系统下man setrlimit查看函数具体说明RLIMIT_CORE指定修改参数为coredump，其他参数具体含义man命令进行查阅函数使用之后进行说明。
如果开启之后还可以设置coredump文件的格式化名字以及生成的路径，/proc/sys/kernel/core_uses_pid支持文件名中带有pid， /proc/sys/kernel/core_pattern可以设置格式化的core文件保存位置或文件名，例如如下示例说明echo "/corefile/core-%e-%p-%t" > core_pattern将会控制所产生的core文件会存放到/corefile目录下，产生的文件名为core-命令名-pid-时间戳
参数列表:
%p - insert pid into filename 添加pid
%u - insert current uid into filename 添加当前uid
%g - insert current gid into filename 添加当前gid
%s - insert signal that caused the coredump into the filename 添加导致产生core的信号
%t - insert UNIX time that the coredump occurred into filename 添加core文件生成时的unix时间
%h - insert hostname where the coredump happened into filename 添加主机名
%e - insert coredumping executable name into filename 添加命令名
Coredump函数在kernel/fs/exec.c中函数为do_coredump( )，如果coredump生成失败可以在do_coredump函数中增加打印，do_coredump的源代码如下所示。

void do_coredump(long signr, int exit_code, struct pt_regs *regs){struct core_state core_state;char corename[CORENAME_MAX_SIZE + 1];struct mm_struct *mm = current->mm;struct linux_binfmt * binfmt;const struct cred *old_cred;struct cred *cred;int retval = 0;int flag = 0;int ispipe;static atomic_t core_dump_count = ATOMIC_INIT(0);struct coredump_params cprm = {.signr = signr,.regs = regs,.limit = rlimit(RLIMIT_CORE),/* * We must use the same mm->flags while dumping core to avoid * inconsistency of bit flags, since this flag is not protected * by any locks. */.mm_flags = mm->flags,};audit_core_dumps(signr);binfmt = mm->binfmt;//binfmt->core_dump根据内核宏初始化赋值core_dump函数，未开宏时为NULLif (!binfmt || !binfmt->core_dump)goto fail;if (!__get_dumpable(cprm.mm_flags))goto fail;cred = prepare_creds();if (!cred)goto fail;/* *We cannot trust fsuid as being the "true" uid of the *process nor do we know its entire history. We only know it *was tainted so we dump it as root in mode 2. */if (__get_dumpable(cprm.mm_flags) == 2) {/* Setuid core dump mode */flag = O_EXCL;/* Stop rewrite attacks */cred->fsuid = 0;/* Dump root private */}retval = coredump_wait(exit_code, &core_state);if (retval < 0)goto fail_creds;old_cred = override_creds(cred);/* * Clear any false indication of pending signals that might * be seen by the filesystem code called to write the core file. */clear_thread_flag(TIF_SIGPENDING);//根据/proc/sys/kernel/core_pattern中值定义core文件名ispipe = format_corename(corename, signr); if (ispipe) {int dump_count;char **helper_argv;if (cprm.limit == 1) {/* * Normally core limits are irrelevant to pipes, since * we're not writing to the file system, but we use * cprm.limit of 1 here as a speacial value. Any * non-1 limit gets set to RLIM_INFINITY below, but * a limit of 0 skips the dump.  This is a consistent * way to catch recursive crashes.  We can still crash * if the core_pattern binary sets RLIM_CORE =  !1 * but it runs as root, and can do lots of stupid things * Note that we use task_tgid_vnr here to grab the pid * of the process group leader.  That way we get the * right pid if a thread in a multi-threaded * core_pattern process dies. */printk(KERN_WARNING"Process %d(%s) has RLIMIT_CORE set to 1\n",task_tgid_vnr(current), current->comm);printk(KERN_WARNING "Aborting core\n");goto fail_unlock;}cprm.limit = RLIM_INFINITY;dump_count = atomic_inc_return(&core_dump_count);if (core_pipe_limit && (core_pipe_limit < dump_count)) {printk(KERN_WARNING "Pid %d(%s) over core_pipe_limit\n",       task_tgid_vnr(current), current->comm);printk(KERN_WARNING "Skipping core dump\n");goto fail_dropcount;}helper_argv = argv_split(GFP_KERNEL, corename+1, NULL);if (!helper_argv) {printk(KERN_WARNING "%s failed to allocate memory\n",       __func__);goto fail_dropcount;}retval = call_usermodehelper_fns(helper_argv[0], helper_argv,NULL, UMH_WAIT_EXEC, umh_pipe_setup,NULL, &cprm);argv_free(helper_argv);if (retval) { printk(KERN_INFO "Core dump to %s pipe failed\n",       corename);goto close_fail; }} else {struct inode *inode;//根据进程的soft limit大小，soft limit大于coredump初始设置最小值=PAGE_SZIEif (cprm.limit < binfmt->min_coredump)goto fail_unlock;cprm.file = filp_open(corename, O_CREAT | 2 | O_NOFOLLOW | O_LARGEFILE | flag, 0600);if (IS_ERR(cprm.file))goto fail_unlock;inode = cprm.file->f_path.dentry->d_inode;if (inode->i_nlink > 1)goto close_fail;if (d_unhashed(cprm.file->f_path.dentry))goto close_fail;/* * AK: actually i see no reason to not allow this for named * pipes etc, but keep the previous behaviour for now. */if (!S_ISREG(inode->i_mode))goto close_fail;/* * Dont allow local users get cute and trick others to coredump * into their pre-created files. */if (inode->i_uid != current_fsuid())goto close_fail;if (!cprm.file->f_op || !cprm.file->f_op->write)goto close_fail;if (do_truncate(cprm.file->f_path.dentry, 0, 0, cprm.file))goto close_fail;}//执行core_dump函数输出寄存器等信息到core文件中retval = binfmt->core_dump(&cprm);if (retval)current->signal->group_exit_code |= 0x80;if (ispipe && core_pipe_limit)wait_for_dump_helpers(cprm.file);close_fail:if (cprm.file)filp_close(cprm.file, NULL);fail_dropcount:if (ispipe)atomic_dec(&core_dump_count);fail_unlock:coredump_finish(mm);revert_creds(old_cred);fail_creds:put_cred(cred);fail:return;}

开启系统的coredump步骤为：

开启内核宏支持coredump函数（对应的进程重新编译并编译中-g要加上才能正常显示gdb信息）
命令设置coredump文件大小，ulimit -c unlimited表示coredump没有限制或者ulimit -c 1024支持文件大小1024k，如果进程脱离终端利用getrlimit，setrlimit如下所示
```
struct rlimit rlp；getrlimit(RLIMIT_CORE, &rlp);rlp.rlim_cur = 4 * 1024 * 1024;setrlimit(RLIMIT_CORE, &rlp);
```
设置文件生成路径以及文件名。可以/proc/sys/kernel/core_pattern和/proc/sys/kernel/core_uses_pid来设置或者sysctl -w kernel.core_uses_pid =0 sysctl -w kernel.core_pattern = /var/core.%e.%p设置

3. 调试

如上述步骤成功则会生成对应的core文件，如果是大型服务器中core文件可以直接gdb进行调试，这里只说明在嵌入式中如何利用gdb达到调试的目的。对应目录生成的core文件从系统中拷贝出来类似tftp命令 ftp或者利用u盘拷贝。嵌入式中不能拷贝出来coredump文件那之前设置都是白费的。

成功获取core文件，并将拷贝的core文件放入对应的process的程序工程目录下同一目录下且工程目录有process生成bin文件，cd到process的目录
XXX-XXX-XX-gdb bin core
进入gdb模式，调试中可能有一些库要用到，所以还要设置gdb中调用库的库文件的绝对路径，一般linux嵌入式一般是生成的文件系统作为调用路径
initially, you will see a lot of error messages. They can be ignored. Now on the gdb prompt, type:
(gdb) set solib-absolute-prefix SRCPATH/targets/PROFILE/fs.install
(gdb) bt
bt之后可以看见打印的堆栈信息。

造成程序coredump的原因很多，这里根据以往的经验总结一下：

1 内存访问越界

由于使用错误的下标，导致数组访问越界
搜索字符串时，依靠字符串结束符来判断字符串是否结束，但是字符串没有正常的使用结束符
使用strcpy, strcat, sprintf, strcmp, strcasecmp等字符串操作函数，将目标字符串读/写爆。应该使用strncpy, strlcpy, strncat, strlcat, snprintf, strncmp, strncasecmp等函数防止读写越界。

2.多线程程序使用了线程不安全的函数。
3.多线程读写的数据未加锁保护。对于会被多个线程同时访问的全局数据，应该注意加锁保护，否则很容易造成core dump
4.非法指针

使用空指针
随意使用指针转换。一个指向一段内存的指针，除非确定这段内存原先就分配为某种结构或类型，或者这种结构或类型的数组，否则不要将它转换为这种结构或类型的指针，而应该将这段内存拷贝到一个这种结构或类型中，再访问这个结构或类型。这是因为如果这段内存的开始地址不是按照这种结构或类型对齐的，那么访问它时就很容易因为bus error而core dump.

5.堆栈溢出
不要使用大的局部变量（因为局部变量都分配在栈上），这样容易造成堆栈溢出，破坏系统的栈和堆结构，导致出现莫名其妙的错误。

阅读全文

0 0