gdb 多线程调试

来源：互联网发布：js rsa加密 php解密编辑：程序博客网时间：2024/06/12 18:34

先介绍一下GDB多线程调试的基本命令。

info threads 显示当前可调试的所有线程，每个线程会有一个GDB为其分配的ID，后面操作线程的时候会用到这个ID。前面有*的是当前调试的线程。

thread ID 切换当前调试的线程为指定ID的线程。

break thread_test.c:123 thread all 在所有线程中相应的行上设置断点

thread apply ID1 ID2 command 让一个或者多个线程执行GDB命令command。

thread apply all command 让所有被调试线程执行GDB命令command。

set scheduler-locking off|on|step 估计是实际使用过多线程调试的人都可以发现，在使用step或者continue命令调试当前被调试线程的时候，其他线程也是同时执行的，怎么只让被调试程序执行呢？通过这个命令就可以实现这个需求。off 不锁定任何线程，也就是所有线程都执行，这是默认值。 on 只有当前被调试程序会执行。 step 在单步的时候，除了next过一个函数的情况(熟悉情况的人可能知道，这其实是一个设置断点然后continue的行为)以外，只有当前线程会执行。

gdb对于多线程程序的调试有如下的支持：

线程产生通知：在产生新的线程时, gdb会给出提示信息

(gdb) r
Starting program: /root/thread
[New Thread 1073951360 (LWP 12900)]
[New Thread 1082342592 (LWP 12907)]---以下三个为新产生的线程
[New Thread 1090731072 (LWP 12908)]
[New Thread 1099119552 (LWP 12909)]

查看线程：使用info threads可以查看运行的线程。

注意，行首的蓝色文字为gdb分配的线程号，对线程进行切换时，使用该该号码，而不是上文标出的绿色数字。

另外，行首的红色星号标识了当前活动的线程

切换线程：使用 thread THREADNUMBER 进行切换，THREADNUMBER为上文提到的线程号。下例显示将活动线程从 1 切换至 4。

(gdb) info threads
   4 Thread 1099119552 (LWP 12940)   0xffffe002 in ?? ()
   3 Thread 1090731072 (LWP 12939)   0xffffe002 in ?? ()
   2 Thread 1082342592 (LWP 12938)   0xffffe002 in ?? ()
* 1 Thread 1073951360 (LWP 12931)   main (argc=1, argv=0xbfffda04) at thread.c:21
(gdb) thread 4
[Switching to thread 4 (Thread 1099119552 (LWP 12940))]#0   0xffffe002 in ?? ()
(gdb) info threads
* 4 Thread 1099119552 (LWP 12940)   0xffffe002 in ?? ()
   3 Thread 1090731072 (LWP 12939)   0xffffe002 in ?? ()
   2 Thread 1082342592 (LWP 12938)   0xffffe002 in ?? ()
   1 Thread 1073951360 (LWP 12931)   main (argc=1, argv=0xbfffda04) at thread.c:21
(gdb)

后面就是直接在你的线程函数里面设置断点,然后continue到那个断点,一般情况下多线程的时候,由于是同时运行的,最好设置 set scheduler-locking on

这样的话,只调试当前线程

GDB是*nix下常用的调试工具，可以提供及其复杂的调试功能，功能十分强大。这里展示一下GDB调试多线程的常规方法。
常用命令：
info threads ：显示当前可以调试的所有线程。
thread IDx : IDx请用上述命令中的线程ID替换，该命令用于切换被调试的线程，请注意GDB只能调试一个执行序列，也就一个传统意义上的进程
break file.c:20 thread all:在file.c中的第20行，为所有经过这里的线程设置断点
set scheduler-locking off|on|step:线程之间是并行执行的，step之类的命令会对所有线程生效。该命令就是提供了一种只对单一线程生效的解决方式。选项off表示不锁定任何进程，也就是默认情况。on表示命令只对当前线程生效。step表示在单步的时候，除了next过一个函数的情况以外，只对当前线程执行。

使用举例：

应用1，下面代码会产生coredump，我们调试之。

[cpp] view plaincopyprint?

#include <stdio.h>
#include <pthread.h>
int a(void){
sleep(2);
return 0;
}
int b(){
a();
return 0;
}
int c(){
b();
return 0;
}
void *myThread1(void)
{
int i=9;
while(i>0)
{
printf("Our 1st pthread,created by chn89.\n");
sleep(2);
i--;
c();
}
pthread_exit(0);
}
void *myThread2(void)
{
int i=5;
while(i>0)
{
printf("Our 2st pthread,created by chn89.\n");
sleep(2);
i--;
}
pthread_exit(0);
}
int main()
{
int ret=0;
pthread_t thread_id1,thread_id2;
ret = pthread_create(&thread_id1, NULL, (void*)myThread1, NULL); //这里笔误,应为thread_id1 就是调试这里的错误
if (ret)
{
printf("Create pthread error!\n");
return 1;
}
ret = pthread_create(&thread_id2, NULL, (void*)myThread2, NULL);
if (ret)
{
printf("Create pthread error!\n");
return 1;
}
pthread_join(thread_id1, NULL);
pthread_join(thread_id2, NULL);
return 0;
}

#include <stdio.h>#include <pthread.h>int a(void){    sleep(2);    return 0;}int b(){    a();    return 0;}int c(){    b();    return 0;}void *myThread1(void){    int i=9;    while(i>0)    {        printf("Our 1st pthread,created by chn89.\n");        sleep(2);        i--;        c();    }    pthread_exit(0);}void *myThread2(void){    int i=5;    while(i>0)    {        printf("Our 2st pthread,created by chn89.\n");        sleep(2);        i--;    }    pthread_exit(0);}int main(){    int  ret=0;    pthread_t thread_id1,thread_id2;      ret = pthread_create(&thread_id1, NULL, (void*)myThread1, NULL); //这里笔误,应为thread_id1 就是调试这里的错误    if (ret)    {        printf("Create pthread error!\n");        return 1;    }      ret = pthread_create(&thread_id2, NULL, (void*)myThread2, NULL);    if (ret)    {        printf("Create pthread error!\n");        return 1;    }      pthread_join(thread_id1, NULL);    pthread_join(thread_id2, NULL);      return 0;}

编译之 gcc pthread_gdb.c -g -lpthread
执行会提示segment错误，并提示产生coredump。但是却没有产生。需要执行ulimit -c unlimited。再执行一次，才真正的产生了coredump文件。
执行gdb a.out corefile 并执行bt查看执行backtrace，显示第49行执行错误。也就是初步怀疑线程1运行有问题。
执行
gdb a.out
(gdb) b 49
(gdb) thread 2 //主线程是1
(gdb) c
之后发现线程2工作正常，同理对线程2，也正常。按理pthread_join()是库函数不应该有问题，仔细检查发现，XX的笔误。这里提供了单独调试线程的方法，各位可以一试。

应用2，代码是修正笔误的上述代码，这里假设上述代码已经处于运行状态，但是跑飞了，需要确定各线程执行位置。

[cpp] view plaincopyprint?

gdb a.out pid -- pid是运行的进程号
(gdb) thread 2
(gdb) bt
#0 0x00855416 in __kernel_vsyscall ()
#1 0x00bf1086 in nanosleep () from /lib/libc.so.6
#2 0x00bf0ea4 in sleep () from /lib/libc.so.6
#3 0x08048516 in a () at pthread_gdb.c:5
#4 0x08048528 in b () at pthread_gdb.c:10
#5 0x0804853a in c () at pthread_gdb.c:15
#6 0x08048571 in myThread1 () at pthread_gdb.c:26
#7 0x00cede99 in start_thread () from /lib/libpthread.so.0
#8 0x00c2cd2e in clone () from /lib/libc.so.6
可以看出线程1的backtrace，正在执行函数a中的sleep呢。

gdb a.outpid -- pid是运行的进程号(gdb) thread 2(gdb) bt#0  0x00855416 in __kernel_vsyscall ()#1  0x00bf1086 in nanosleep () from /lib/libc.so.6#2  0x00bf0ea4 in sleep () from /lib/libc.so.6#3  0x08048516 in a () at pthread_gdb.c:5#4  0x08048528 in b () at pthread_gdb.c:10#5  0x0804853a in c () at pthread_gdb.c:15#6  0x08048571 in myThread1 () at pthread_gdb.c:26#7  0x00cede99 in start_thread () from /lib/libpthread.so.0#8  0x00c2cd2e in clone () from /lib/libc.so.6可以看出线程1的backtrace，正在执行函数a中的sleep呢。

GDB多线程调试的基本命令。

info threads显示当前可调试的所有线程，每个线程会有一个GDB为其分配的ID，后面操作线程的时候会用到这个。前面有*的是当前调试的线程。thread ID切换当前调试的线程为指定ID的线程。break thread_test.c:123 thread all
在所有线程中相应的行上设置断点thread apply ID1 ID2 command让一个或者多个线程执行GDB命令command。thread apply all command
让所有被调试线程执行GDB命令command。set scheduler-locking off|on|step估计是实际使用过多线程调试的人都可以发现，在使用step或者continue命令调试当前被调试线程的时候，其他线程也是同时执行的，怎么只让被调试程序执行呢？通过这个命令就可以实现这个需求。off 不锁定任何线程，也就是所有线程都执行，这是默认值。 on 只有当前被调试程序会执行。 step 在单步的时候，除了next过一个函数的情况(熟悉情况的人可能知道，这其实是一个设置断点然后continue的行为)以外，只有当前线程会执行。

gdb对于多线程程序的调试有如下的支持：

线程产生通知：在产生新的线程时, gdb会给出提示信息

查看线程：使用info threads可以查看运行的线程。

注意，行首的数字为gdb分配的线程号，对线程进行切换时，使用该该号码。

另外，行首的星号标识了当前活动的线程

切换线程：使用 thread THREADNUMBER 进行切换，THREADNUMBER 为上文提到的线程号。下例显示将活动线程从 1 切换至 4。

后面就是直接在你的线程函数里面设置断点,然后continue到那个断点,一般情况下多线程的时候,由于是同时运行的,最好设置 set scheduler-locking on

这样的话,只调试当前线程

--------------------------------------------------------------------------------------------------------------

二.实战片

调试程序:

[cpp] view plaincopy

#include<stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <unistd.h>
void * fun1 (void *arg1)
{
printf ("[pthread1] -- start\n");
sleep (2);
printf ("[pthread1] -- end\n");
pthread_exit ((void *) NULL);
}
void * fun2 (void *arg1)
{
printf ("[pthread2] -- start\n");
sleep (2);
printf ("[pthread2] -- end\n");
pthread_exit ((void *) NULL);
}
int main(void)
{
pthread_t pid1, pid2;
void *tmp;
printf ("[main] -- start\n");
if (pthread_create (&pid1, NULL, fun1, NULL)) {
perror ("create pthread1 error\n");
exit (1);
}
if (pthread_create (&pid2, NULL, fun2, NULL)) {
perror ("create pthread2 error\n");
exit (1);
}
if (pthread_join (pid1, &tmp)) {
perror ("join pthread1 error\n");
exit (1);
}
if (pthread_join (pid2, &tmp)) {
perror ("join pthread2 error\n");
exit (1);
}
sleep (2);
printf ("[main] -- end\n");
return 0;
}

本程序有3个线程， main 线程先执行，然后创建2个子线程。创建后main 线程等子线程结束，最后再退出。

多线程程序的顺序是未知的，但我们用gdb 调试，可以指定每个线程的前后顺序。以这个例子为例： main 线程创建完 pthread2 和 pthread3 后，不知道

pthread2 先执行，还是 pthread3 先执行，也不知道是pthread2 先结束还是pthread3 先结束。但我们这次调试，等main 线程创建完pthread2 后，先指定

pthread2 先执行，先调试它，等pthread2结束了，再调试pthread3，最后返回调试main线程

[plain] view plaincopy

(gdb) b 28 # 先在main 线程设置断点，设置到创建 pthread2 线程处
Breakpoint 1 at 0x4007dc: file d.c, line 28.
30 if (pthread_create (&pid1, NULL, fun1, NULL)) {

[plain] view plaincopy

(gdb) set scheduler-locking on # 在创建pthread2 后，线程的执行顺序就不定了，所以我们先设置 scheduler ,指定<单一线程调试>模式

[plain] view plaincopy

(gdb) n # 在这步，pthread2线程已经创建，main线程停在创建pthread3线程处。由于我们是单线程调试，所以pthread2虽然创建了，但没执行
[New Thread 0x7ffff7fe5700 (LWP 25412)]
34 if (pthread_create (&pid2, NULL, fun2, NULL)) {

[plain] view plaincopy

(gdb) info thread # 察看线程信息，现在main 创建了pthread2 线程，所以有两个线程，我们目前在调试1线程(就是main线程, 星号代表当前线程)
Id Target Id Frame
2 Thread 0x7ffff7fe5700 (LWP 25412) "a.out" 0x0000003bd880dd9c in __lll_lock_wait_private () from /lib64/libpthread.so.0
* 1 Thread 0x7ffff7fe6740 (LWP 25409) "a.out" main () at d.c:34

[plain] view plaincopy

(gdb) thread 2 # 现在我们让main线程就停在这，去调试 pthread2线程
[Switching to thread 2 (Thread 0x7ffff7fe5700 (LWP 25412))]

[plain] view plaincopy

(gdb) b 8 # pthread2 线程是执行一个函数，所以在函数处加入断点
Breakpoint 2 at 0x400778: file d.c, line 8.

[plain] view plaincopy

(gdb) c # pthread2 调试到我们的断点处
Continuing.

一直n ,把pthread2 调试完毕

[plain] view plaincopy

(gdb) n <span style="color:#3333FF;"><strong> </strong></span># pthread2 执行完毕
0x0000003bd8807cd7 in start_thread () from /lib64/libpthread.so.0
Missing separate debuginfos, use: debuginfo-install libgcc-4.7.0-4.fc17.x86_6

[plain] view plaincopy

(gdb) thread 1 #这时我们跳回 main 线程，去创建 pthread3
[Switching to thread 1 (Thread 0x7ffff7fe6740 (LWP 25409))]

[plain] view plaincopy

(gdb) n
[New Thread 0x7ffff77e4700 (LWP 25424)]
39 if (pthread_join (pid1, &tmp)) {
(gdb) thread 3
[Switching to thread 3 (Thread 0x7ffff77e4700 (LWP 25424))]

同上，把pthraed3调试完毕，再跳回 main 线程

[plain] view plaincopy

(gdb) thread 1
[Switching to thread 1 (Thread 0x7ffff7fe6740 (LWP 25409))]
#0 main () at d.c:39
39 if (pthread_join (pid1, &tmp)) {

这时main线程停止在 pthread_join 处，如果继续调试，main线程会阻塞（不确定为什么)，这时我们设置 set scheduler-locking off 关闭单线程调试，然后继续调试

main 线程就不会阻塞。

0 0