[转]GDBINT gdb internal Notes …

来源:互联网 发布:阿里云静态资源 编辑:程序博客网 时间:2024/06/03 16:44
GDBINT gdb internalNotes

GDB结构简介(overallstructure)
1)GDB组成:
userinterface
symbolhandling (the symbol side)
object file readers, debugging info interpreters, symbol tablemanagement, source language expression 
parsing,type and value printing. 
target systemhandling (thetarget side).
execution control, stack frame analysis, and physical targetmanipulation
target side 和symbol side的区分并不是非常清晰,只是对于理解GDB大有好处.

2)GDB Configurations 
Host/host surport/host dependent:host是运行GDB的系统,为了让GDB在host上运行所需要的信息(#include/宏定义/...)
叫做host surpot.
Target/Target surport/target dependent:就是目标机/目标进程的堆栈结构,指令集,寄存器...
Native/native surport/native dependent:host和target是一样的,这是候需要的支持叫做native dependent.比如unix下需要的
子进程支持/通过ptrace、procfs对进程调试的支持/如何在这种情况下获得target的寄存器内存信息等.

3)目录结构和文件命名原则
*read.c :读取obj/symbol table/
*-thread.c :处理debug thread的文件
inf*.c : 处理 inferior program的代码 (被调试程序的幽默叫法)
*-tdep.c : target dependentcode
*-nat.c : native surport code


Algorithms
GDB采用的算法不是很复杂,关键在于很容易迷失在具体的细节/特殊情况(和OS面临的处境差不多).

Frame
GDB 为了支持DWARF标准的Call Frame而重新定义了GDB自己的Frame结构.GDB的Frame用于跟踪calling和called 函数.就是一个调用
栈的back trace过程.GDB 的Frame不仅仅是个callframe,每级frame都包含当前cpu状态的snapshot(或可以取到)(fix me).

sentinel frame:当前指令的frame,调用栈的顶端,level是-1,类型是SENTINEL_FRAME,而当前函数的frame(fixme)其level是0.
unwind操作:取自DWARF标准,frame_register_unwind,就是返回上一级frame.

Prologue Analysis
CFI : Dwarf call frame information,现在的GCC都生成这种call frame信息了.
Prologure 分析用于找出frame的size和olderframe的基地址.虽然有CFI的帮助会简单些,但是不是什么时候都有CFI的.并且
Prologue分析这个技术早于CFI. 这种back trace容许GDB修改一些参数或者一个某些变量的值. 应为有calleesaved register存在,所
以,进程的frame pointer会发生改变.并且某些变量可能无规则的散布于yongest的frame中.(requestcomments)
Prologure技术的基本原理是分析具体的汇编码,藉此找出这些frame size和这些保存在stackframe中的寄存器值.prologue-value.h
prologue-value.c提供了一个prologue分析的框架,从函数的入口指令开始,分析到的当前PC,然后:
1)检查sp值是否可知:知道了,意味着得到了frame size.
2)检查我们吧previous的frame指针存在哪里.
具体的细节请参考GDBINT 和相关代码(request comments).

Breakpoint HandlingHardwareBreakpoint: 需要CPU的支持.执行到指定PC就break out(一个中断或者其他什么机制).
Software Breakpoint: GDB把指定地址的指令换成一条特殊指令(比如x86可以是int3,可以是div0),等异常发生后GDB获取控制权,等到
user 发出继续的命令后,再把那条指令换回去.

软件break point的宏定义:BREAKPOINT
breakpoint的处理大多在 `breakpoint.c',`infrun.c'.
接口函数简介: 

 

target_remove_breakpointtarget_insert_breakpoint(bp_tgt)

target_remove_hw_breakpoint (bp_tgt)target_insert_hw_breakpoint(bp_tgt)Longjmp Support
GDB支持在在程序做longjmp的时候break在longjmp的目标地址. (参考 "maint infobreakpoint").必须实现
gdbarch_get_longjmp_target.同时jmp_buf是系统特定的,应该在tm-target.h中定义jmp_buf.参考tm-sun4os4.hsparc-tdep.c

Watchpoints就是数据访问时的breakpoint.GDB总是试图使用hw支持的watchpoints.但是并不是所有系统都有watch point支持,或者hw资源不够,或
者要监视的内存太大...
software的watchpoint是非常慢的:采用单步执行,每次检查目标地址.对于writewatch,gdb就是比较watch地址的值.对于read watch
point,需要目标系统提供target_stopped_data_address:返回被调试程序停止时,所访问的地址.
下面是支持硬件Watchpoints的一些资源:
TARGET_HAS_HARDWARE_WATCHPOINTS
If defined, the target supports hardware watchpoints.
TARGET_CAN_USE_HARDWARE_WATCHPOINT (type, count, other)
Return the number of hardware watchpoints of type type that arepossible to be set. 
The value is positive if count watchpoints of this type can be set,zero if setting watchpoints of thistype 
is not supported, and negative if count is more than the maximumnumber of watchpoints of type type thatcan 
be set. other is non-zero if other types of watchpoints arecurrently enabled (there are architectures which
cannot set watchpoints of different types at the sametime).

TARGET_REGION_OK_FOR_HW_WATCHPOINT (addr, len) Return non-zero ifhardware watchpoints can be used to watch a region whose address isaddr and whose length 
in bytes is len.

target_insert_watchpoint (addr, len,type) 
target_remove_watchpoint (addr, len, type) Insert or remove ahardware watchpoint starting at addr, for len bytes. type is thewatchpoint type, one of the 
possible values of the enumerated data type target_hw_bp_type,defined by `breakpoint.h' as follows: enumtarget_hw_bp_type
{
hw_write = 0,
hw_read = 1,
hw_access = 2,
hw_execute = 3
};

 

These two macros should return 0for success, non-zero for failure.

target_stopped_data_address (addr_p) If the inferior has somewatchpoint that triggered, place the address associated with thewatchpoint at the location pointed to by addr_p and returnnon-zero. Otherwise, return zero. Note that this primitive is usedby GDB only ontargets that support data-read or data-access typewatchpoints, so targets that have support only for data-writewatchpoints need not implement these primitives.

HAVE_STEPPABLE_WATCHPOINT If defined to a non-zero value, it is notnecessary to disable a watchpoint to step over it.

int gdbarch_have_nonsteppable_watchpoint (gdbarch) If it returns anon-zero value, GDB should disable a watchpoint to step theinferior over it.

HAVE_CONTINUABLE_WATCHPOINT If defined to a non-zero value, it ispossible to continue the inferior after a watchpoint has been hit.

CANNOT_STEP_HW_WATCHPOINTS If this is defined to a non-zero value,GDB will remove all watchpoints before stepping the inferior.

STOPPED_BY_WATCHPOINT (wait_status) Return non-zero if stopped by awatchpoint. wait_status is of the type struct target_waitstatus,defined by `target.h'. Normally, this macro is defined to invokethe function pointed to by the to_stopped_by_watchpoint member ofthe structure(of the type target_ops, defined on `target.h') thatdescribes the target-specific operations; to_stopped_by_watchpointignores the wait_status argument.

GDB does not require the non-zerovalue returned by STOPPED_BY_WATCHPOINT to be 100% correct, so if atarget cannot  determine for sure whether theinferior stopped due to a watchpoint, it could return non-zero"just in case".

x86 Watchpoints :请参考GDBINT英文原版.Checkpoints
Checkpoints是一个程序运行状态的一个副本. 以后可以从这里重新开始执行. 实现方式有fork一个子进程,保持core文件等.总之要保存
程序状态的一切:寄存器/内存/....
Observing changes in GDBinternals (眼拙,未能明白讲的是啥)


UserInterface

.....算了吧这个没有太大必要看了.同时 libgdb 也不看了:没啥详细说明,是个GDB的标准库,用于构建图形化的user界面等.

 

Symbol Handling

这是个关键模块. Symbol包括函数,变量和类型.

SymbolReading
symfile.c 含有打开synbolfile的代码.(参考GDB命令symbol-file命令,一般就在要调试的程序中).GDB也使用BFD来读取符号表:参考
find_sym_fns.
Symbol-reading modules 通过add_symtab_fns向GDB注册自己,其参数是structsym_fns:symbol format的名称, prefix的长度,四个函数指针. 
每个symbol reading模块提供下面四个接口函数:(细节参考GDBINT或者代码)(现在还不是很清楚,requestcomments).

 


xyz_symfile_init(struct sym_fns*sf)

当需要读取符号表的时候,symbol_file_add 会调用此函数,参数是新分配的一个fym_fns,其bfdfield 是新符号表对应的BFD.

xyz_new_init()

放弃当前的symbols时,symbol_file_add 调用此函数.

 

xyz_symfile_read(struct sym_fns*sf, CORE_ADDR addr, int mainline)

symbol_file_add调用此函数获取具体的符号表:psymtabs or symtabs. sf 是调用初始化函数时的那个sym_fns.

xyz_psymtab_to_symtab (structpartial_symtab *pst)

PartialSymbol TablesGDB 有三种符号表:

 

  • Full symbol tables (symtabs) :包含关于符号和地址的主信息.

  • Partial symbol tables (psymtabs):包含足够的信息去读取full symbol table.

  • Minimal symbol tables (msymtabs):非调试用symbols.

psymtab 的作用是快速传递一个程序的符号表信息:external symbols,types, static symbolsand types, enum values declared at filescope.psymtab还包含一些地址范围.
psymtab的使用方式如下:
1)通过一个指令地址,可以找到psymtable的一个地址范围,从而可以读取完整的符号表.比如find_pc_function,find_pc_line, and other find_pc_...
2)通过名字来使用psymtab: lookup_symbol, 通过名字找到对应的完整符号.
psymtab不含有符号的类型信息. 细节请参考GDBINT.

Types
FundamentalTypes (e.g., FT_VOIDFT_BOOLEAN).

GDB使用的内部类型.

TypeCodes (e.g., TYPE_CODE_PTRTYPE_CODE_ARRAY).

属于基本类型或者派生类型. 典型情况下几个基本类型 FT_*映射到一种TYPE_CODE_* , 通过其bit长度,是否是signed的等熟悉进行区分.

BuiltinTypes (e.g., builtin_type_voidbuiltin_type_char).

历史原因造成的,对应于基本类型.(GDB的维护人员其实打算把这些internal type给搞掉的: builtin_type_int(gdbtypes.c)基本上和 a TYPE_CODE_INT(c-lang.c)是一样的.(对应于FT_INTEGER).区别在于builtin_type不和任何objfile有关联,而`c-lang.c' 则搞了很多 TYPE_CODE_INT,每个都和特定的objfile相关.



Object File Formats
a.out          : unix的原始的obj文件类型. 符号表几乎没有,对应文件是dbxread.c.

COFF format : System V Release 3 (SVR3) Unix,符号表有缺陷(比如include的头文件不能解析),对应文件coffread.c

ECOFF         COFF 扩展版本,Mips and Alpha workstations,mipsread.c

XCOFF         IBM RS/6000 running AIX

PE             : Windows 95 and NT use, 基本上是COFF.

ELF           : System V Release 4 (SVR4) Unix. ELF 类似COFF但是解决了COFF的许多不足,elfread.c

SOM          : HP(not to be confused with IBM's SOM, which is a cross-languageABI),`somread.c'.

 

Debugging File Formats

独立于obj文件的调试信息.

Stabs

stabs原本是a.out中的信息,但是COFF,ELF和其他obj文件也含有有这个信息.dbxread.c:基本的stabs处理和封装,stabsread.c:干活的地方.

COFF :coff文件也含有私有的debugging信息,不太常用,扩展性不好.

Mips debug (Third Eye): ECOFF含有的特殊调试信息, mdebugread.c

 

DWARF 2 : DWARF1的下一版,但和第一版不兼容,dwarf2read.c

SOM: 和COFF类似.


Adding a New Symbol Reader toGDB

如果使用现有的obj文件,就简单的多.否则,你需要先将新的 obj文件支持加到BFD. GDB通过一组swaping 函数(request comment),使用具体的BFD接口,对于特殊的target(如COFF),可能还需要一层封装,因为不同的platform可能不一样,这些接口应该在bfd/libxyz.h中进行描述.

 

Memory Management for SymbolFiles

 

 

一个symbolfile的符号信息,存储在objfile_obstack里(request comment), unload一个objfile的时候内存自动释放. 所以也不要在一个obj文件中引用另一个obj的符号. 和用户相关的一些数(requestcomment)据和type也是存在于这个obstack里的,但是objfileunload的时候会copy到global的内存里,所以不会丢失.


 


 

Language Support

这个东西我们不想涉及,幸好,GDBINT里说的也很少,仅仅罗列了一下步骤...
1. Create the expression parser:lang-exp.y ,一般是通过YACC parser产生所需要的parser
2. Add anyevaluation routines 
3.
Add any evaluationroutines, if necessary
4.Update some existing code
5.Add a place of call
6.Use macros to trim code
7.Edit `Makefile.in'
(这里仅仅罗列下步骤,具体请参考GDBINT, 不大关心这个.... 同时也不懂)



Host Definition
Add new Host
现在应该用autoconf来做这件事情(reques comment).老的host使用下面的配置文件.

 

gdb/config/arch/xyz.mh' 包含host和native的配置.host configuration现在由Autoconf处理,HOST信息包含一些定义:XM_FILE=xm-xyz.h,还可能有CC, SYSV_DEFINE,XM_CFLAGS, XM_ADD_FILES, XM_CLIBS, XM_CDEPS, 请参考"Makefile.in".

 

 

gdb/config/arch/xm-xyz.h'这个文件以前包含在xyz机器上运行gdb需要的一些定义和信息,现在通过Autoconf来实现.新的host和native配置不需要这个文件了.Host Conditionals

完成GDB的配置后,需要很多的宏需要定义.这里列出了一些:

 

GDBINIT_FILENAME  :GDB初始化文件名,一般是.gdbinit

NO_STD_REGS : This macro is deprecated.

SIGWINCH_HANDLER  If your host defines SIGWINCH, you can define thisto be the name of a function to be called if SIGWINCH is received.

SIGWINCH_HANDLER_BODY Define this to expand into code that will definethe function named by the expansion of SIGWINCH_HANDLER.

ALIGN_STACK_ON_STARTUP Define this ifyour system is of a sort that will crash in tgetent if the stackhappens not to be longword-aligned when main is called. This is arare situation, but is known to occur on several different types ofsystems.

CRLF_SOURCE_FILES Define this ifhost files use \r\n rather than \n as a line terminator. This willcause source file listings to omit \r characters when printing andit will allow \r\n line endings of files which are "sourced" bygdb. It must be possible to open files in binary mode usingO_BINARY or, for fopen, "rb".

DEFAULT_PROMPT The default valueof the prompt string (normally "(gdb) ").

DEV_TTY The nameof the generic TTY device, defaults to "/dev/tty".

FOPEN_RB Define this if binary files are opened thesame way as text files.

HAVE_MMAP In somecases, use the system call mmap for reading symbol tables. For somemachines this allows for sharing and quick updates.

HAVE_TERMIO Define this if the host system hastermio.h.

INT_MAX INT_MIN LONG_MAX UINT_MAXULONG_MAX Values for host-side constants.

ISATTY Substitute for isatty, if not available.

LONGEST This is the longest integer type availableon the host. If not defined, it will default to long long or long,depending on CC_HAS_LONG_LONG.

CC_HAS_LONG_LONG Define this if thehost C compiler supports long long. This is set by the configurescript.

PRINTF_HAS_LONG_LONG Define this if the host can handle printing of longlong integers via the printf format conversion specifier ll. Thisis set by the configure script.

HAVE_LONG_DOUBLE Define this if the host C compiler supports longdouble. This is set by the configure script.

PRINTF_HAS_LONG_DOUBLE Define this if the host can handle printing of longdouble float-point numbers via the printf format conversionspecifier Lg. This is set by the configure script.

SCANF_HAS_LONG_DOUBLE Define this if the host can handle the parsing oflong double float-point numbers via the scanf format conversionspecifier Lg. This is set by the configure script.

LSEEK_NOT_LINEAR Define this if lseek (n) does not necessarily moveto byte number n in the file. This is only used when reading sourcefiles. It is normally faster to define CRLF_SOURCE_FILES whenpossible.

L_SET This macro is used as the argument tolseek (or, most commonly, bfd_seek). FIXME, should be replaced bySEEK_SET instead, which is the POSIX equivalent.

NORETURN If defined, this should be one or moretokens, such as volatile, that can be used in both the declarationand definition of functions to indicate that they never return. Thedefault is already set correctly if compiling with GCC. This willalmost never need to be defined.

ATTR_NORETURN If defined, this should be one or more tokens, suchas __attribute__ ((noreturn)), that can be used in the declarationsof functions to indicate that they never return. The default isalready set correctly if compiling with GCC. This will almost neverneed to be defined.

SEEK_CUR SEEK_SET Define these to appropriate value for thesystem lseek, if not already defined.

STOP_SIGNAL This isthe signal for stopping GDB. Defaults to SIGTSTP. (Only redefinedfor the Convex.)

 

USG Means thatSystem V (prior to SVR4) include files are in use. (FIXME: Thissymbol is abused in `infrun.c', `regex.c', and `utils.c' for otherthings, at the moment.)

 

lint Define this tohelp placate lint in some situations.

 

volatile Define thisto override the defaults of __volatile__ or .


转自:http://hi.baidu.com/systemsoftware/blog/item/b227fb2ce6d0e6e28b139977.html
原创粉丝点击