Debug 内核 Oops

来源:互联网 发布:whatsapp数据迁移 编辑:程序博客网 时间:2024/06/06 19:15

内核的Oops有点像用户态的 段错误(segfaults). 通常,CPU寄存器和调用栈信息会被dump出来。利用这些信息,能够查出来发生问题的代码。

下面用一个例子来说明。

1. 首先,写一个简单的内核模块代码:

#include <linux/kernel.h>#include <linux/module.h>#include <linux/init.h> static void create_oops() {        *(int *)0 = 0;} static int __init my_oops_init(void) {        printk("oops from the module\n");        create_oops();       return (0);}static void __exit my_oops_exit(void) {        printk("Goodbye world\n");} module_init(my_oops_init);module_exit(my_oops_exit);
显然,这个模块在被载入的时候,将会出错。

把这段代码保存为 oops.c, 放到 oops 目录下。

然后,编译:

export ARCH=armexport CROSS_COMPILE=arm-linux-gnueabi-make -C /home/charles/code/linux-3.2 M=`pwd` modules

或者写一个Makefile如下:

obj-m := oops.oARCH = armCROSS_COMPILE = arm-linux-gnueabi- EXTRA_CFLAGS = -g -O0all:make ARCH=$(ARCH) CROSS_COMPILE=$(CROSS_COMPILE) -C $(HOME)/code/linux-3.10.28 M=$(PWD) modulesclean:make ARCH=$(ARCH) CROSS_COMPILE=$(CROSS_COMPILE) -C $(HOME)/code/linux-3.10.28 M=$(PWD) clean



会生成一系列的文件:

:~/code/oops$ lsMakefile       Module.symvers  oops.ko     oops.mod.omodules.order  oops.c          oops.mod.c  oops.o
:~/code/oops$ cat Makefile obj-m := oops.o

然后,把 oops.ko 拷贝到目标机(实质是qemu虚拟机)的 /lib/modules/3.2.0/下面:

~ # ls /lib/modules/3.2.0/oops.ko
然后,加载 oops:

~ # modprobe oopsDisabling lock debugging due to kernel taintoops: module license 'unspecified' taints kernel.oops from the moduleUnable to handle kernel NULL pointer dereference at virtual address 00000000pgd = 8738c000[00000000] *pgd=673c6831, *pte=00000000, *ppte=00000000Internal error: Oops: 817 [#1] SMPModules linked in: oops(P+)CPU: 0    Tainted: P           O  (3.2.0 #1)PC is at my_oops_init+0x10/0x1c [oops]LR is at my_oops_init+0xc/0x1c [oops]pc : [<7f002010>]    lr : [<7f00200c>]    psr: 60000013sp : 873c5eb0  ip : 88820000  fp : 7f002000r10: 873c4000  r9 : 8046d100  r8 : 0000001cr7 : 00000001  r6 : 873f7a80  r5 : 7f000074  r4 : 7f000074r3 : 804554ac  r2 : 804554ac  r1 : 60000093  r0 : 00000000Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment userControl: 10c53c7d  Table: 6738c06a  DAC: 00000015Process modprobe (pid: 474, stack limit = 0x873c42f0)Stack: (0x873c5eb0 to 0x873c6000)5ea0:                                     00000000 80008678 80572140 8009cc3c5ec0: 00000000 00000000 870834a0 88890000 00000001 80058ec0 7f0000bc 7f0000745ee0: 7f000074 873f7a80 00000001 0000001c 7f0000bc 00000024 000000b2 800585885f00: 7f000080 000b70ca 0001d9c1 00000000 800567d4 000a2974 7f0001b0 873c40005f20: 00000068 00000000 00000000 00000000 00000000 00000000 00000000 000000005f40: 88890000 00005d02 888942a0 8889410e 88895c50 870834e0 000001c4 000002145f60: 00000000 00000000 00000025 00000026 0000000f 00000000 0000000d 000000005f80: 00000004 000b70ca 000c30e8 00000000 00000080 8000e2a8 873c4000 000000005fa0: 0001d9c1 8000e100 000b70ca 000c30e8 000c30e8 00005d02 000a2974 000000005fc0: 000b70ca 000c30e8 00000000 00000080 000b70d8 7ec5ff80 000b70ca 0001d9c15fe0: 2acc76a0 7ec5f990 0001d359 2acc76b0 800d0010 000c30e8 00000000 00000000[<7f002010>] (my_oops_init+0x10/0x1c [oops]) from [<80008678>] (do_one_initcall+0xfc/0x164)[<80008678>] (do_one_initcall+0xfc/0x164) from [<80058588>] (sys_init_module+0xd10/0x1a60)[<80058588>] (sys_init_module+0xd10/0x1a60) from [<8000e100>] (ret_fast_syscall+0x0/0x30)Code: e92d4008 e59f000c eb4c4240 e3a00000 (e5800000) ---[ end trace a9cf7df06d0f6920 ]---Segmentation fault

其中能看到 pc, lr(link register)和 sp 寄存器的值和调用堆栈。

my_oops_init+0x10/0x1c
表示 符号+偏移/长度

2. 下面开始 debug.

首先,在  host 机器上,把模块加载到 gdb里面:

$ arm-linux-gnueabi-gdb oops.ko GNU gdb (crosstool-NG linaro-1.13.1-2012.04-20120426 - Linaro GCC 2012.04) 7.4-2012.04Copyright (C) 2012 Free Software Foundation, Inc.License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>This is free software: you are free to change and redistribute it.There is NO WARRANTY, to the extent permitted by law.  Type "show copying"and "show warranty" for details.This GDB was configured as "--host=i686-build_pc-linux-gnu --target=arm-linux-gnueabi".For bug reporting instructions, please see:<https://bugs.launchpad.net/gcc-linaro>...Reading symbols from /home/charles/code/oops/oops.ko...done.

然后,把符号文件加入进来:

(gdb) add-symbol-file oops.ko 0x7f002000add symbol table from file "oops.ko" at.text_addr = 0x7f002000(y or n) y  Reading symbols from /home/charles/code/oops/oops.ko...done.
0x7f002000为oops.ko代码段的地址,可以用如下的方式得到:

~ # cat /sys/module/oops/sections/.init.text 0x7f002000
根据 pc的值可以知道发生问题的函数,对它进行反汇编:

(gdb) disassemble  my_oops_initDump of assembler code for function my_oops_init:   0x00000000 <+0>:push{r3, lr}   0x00000004 <+4>:ldrr0, [pc, #12]; 0x18 <my_oops_init+24>   0x00000008 <+8>:bl0x8 <my_oops_init+8>   0x0000000c <+12>:movr0, #0   0x00000010 <+16>:strr0, [r0]   0x00000014 <+20>:pop{r3, pc}   0x00000018 <+24>:andeqr0, r0, r0End of assembler dump.
根据上面的便宜值0x10,可以知道出错时正在执行的代码的位置为:

0x00000000 + 0x10 = 0x00000010, 即是 str r0,[r0]

(gdb) l *0x000000100x10 is in my_oops_init (/home/charles/code/oops/oops.c:6).1#include <linux/kernel.h>2#include <linux/module.h>3#include <linux/init.h>4 5static void create_oops() {6        *(int *)0 = 0;7}8 9static int __init my_oops_init(void) {10        printk("oops from the module\n");

即在第6行。

这种方法其实是把问题搞复杂了,其实,不需要知道oops 模块在内核中的地址.

直接根据

my_oops_init+0x10/0x1c
就可以定位到出错的代码在函数  oops_init里的行数。

$ arm-linux-gnueabi-gdb oops.ko GNU gdb (crosstool-NG linaro-1.13.1-2012.04-20120426 - Linaro GCC 2012.04) 7.4-2012.04Copyright (C) 2012 Free Software Foundation, Inc.License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>This is free software: you are free to change and redistribute it.There is NO WARRANTY, to the extent permitted by law.  Type "show copying"and "show warranty" for details.This GDB was configured as "--host=i686-build_pc-linux-gnu --target=arm-linux-gnueabi".For bug reporting instructions, please see:<https://bugs.launchpad.net/gcc-linaro>...Reading symbols from /home/charles/code/oops/oops.ko...done.(gdb) disassemble  my_oops_my_oops_exit  my_oops_init  (gdb) disassemble  my_oops_init Dump of assembler code for function my_oops_init:   0x00000000 <+0>:push{r3, lr}   0x00000004 <+4>:ldrr0, [pc, #12]; 0x18 <my_oops_init+24>   0x00000008 <+8>:bl0x8 <my_oops_init+8>   0x0000000c <+12>:movr0, #0   0x00000010 <+16>:strr0, [r0]   0x00000014 <+20>:pop{r3, pc}   0x00000018 <+24>:andeqr0, r0, r0End of assembler dump.(gdb) print /x  0x00000000+0x10$1 = 0x10(gdb) list *0x100x10 is in my_oops_init (/home/charles/code/oops/oops.c:6).1#include <linux/kernel.h>2#include <linux/module.h>3#include <linux/init.h>4 5static void create_oops() {6        *(int *)0 = 0;7}8 9static int __init my_oops_init(void) {10        printk("oops from the module\n");(gdb) 


参考:

1. http://www.linuxforu.com/2011/01/understanding-a-kernel-oops/