mach-o格式分析

来源:互联网 发布:最好网络合理理财 编辑:程序博客网 时间:2024/05/16 07:01

0x00 摘要

人生无根蒂,飘如陌上尘。 分散逐风转,此已非常身。

— 陶渊明 《杂诗》

mach-o格式是OS X系统上的可执行文件格式,类似于windows的PE与linux的ELF,如果不彻底搞清楚mach-o的格式与相关知识,去做其他研究,无异于建造空中阁楼。

每个Mach-O文件斗包含一个Mach-O头,然后是载入命令(Load Commands),最后是数据块(Data)。

接下来就对整个Mach-O的格式做出详细的分析。

0x01 Mach-O格式简单介绍

Mach-O文件的格式如下图所示:

又如下几个部分组成:

  • Header:保存了Mach-O的一些基本信息,包括了平台、文件类型、LoadCommands的个数等等。
  • LoadCommands:这一段紧跟Header,加载Mach-O文件时会使用这里的数据来确定内存的分布。
  • Data:每一个segment的具体数据都保存在这里,这里包含了具体的代码、数据等等。

0x02 Headers

2.1 数据结构

Headers的定义可以在开源的内核代码中找到。

123456789101112131415161718192021222324252627282930313233343536
/* * The 32-bit mach header appears at the very beginning of the object file for * 32-bit architectures. */struct mach_header {uint32_tmagic;/* mach magic number identifier */cpu_type_tcputype;/* cpu specifier */cpu_subtype_tcpusubtype;/* machine specifier */uint32_tfiletype;/* type of file */uint32_tncmds;/* number of load commands */uint32_tsizeofcmds;/* the size of all the load commands */uint32_tflags;/* flags */};/* Constant for the magic field of the mach_header (32-bit architectures) */#defineMH_MAGIC0xfeedface/* the mach magic number */#define MH_CIGAM0xcefaedfe/* NXSwapInt(MH_MAGIC) *//* * The 64-bit mach header appears at the very beginning of object files for * 64-bit architectures. */struct mach_header_64 {uint32_tmagic;/* mach magic number identifier */cpu_type_tcputype;/* cpu specifier */cpu_subtype_tcpusubtype;/* machine specifier */uint32_tfiletype;/* type of file */uint32_tncmds;/* number of load commands */uint32_tsizeofcmds;/* the size of all the load commands */uint32_tflags;/* flags */uint32_treserved;/* reserved */};/* Constant for the magic field of the mach_header_64 (64-bit architectures) */#define MH_MAGIC_64 0xfeedfacf /* the 64-bit mach magic number */#define MH_CIGAM_64 0xcffaedfe /* NXSwapInt(MH_MAGIC_64) */

根据mach_headermach_header_64的定义,很明显可以看出,Headers的主要作用就是帮助系统迅速的定位Mach-O文件的运行环境,文件类型。

2.2 实例

使用工具分析一个mach-o文件来具体的看一下Mach-O Headers。

通过otool可以得到Mach header的具体的情况,但是可读性略微有一点差。

12345
➜  bin otool -h gitgit:Mach header      magic cputype cpusubtype  caps    filetype ncmds sizeofcmds      flags 0xfeedfacf 16777223          3  0x80           2    17       1432 0x00200085

还有一个工具是MachOview可以看的更清楚一点。

  • MagicNumber的值为0xFEEDFACF所以该文件是一个64位平台上的文件
  • CPU Type和CPU SubType也很容易理解,运行在X86_64的CPU平台上
  • File Type标示了该文件是一个可执行文件,后面具体分析
  • Flags标示了这个MachO文件的四个特性,后面具体分析

2.3 具体参数

2.3.1 FileType

因为Mach-O文件不仅仅用来实现可执行文件,同时还用来实现了其他内容

  • 内核扩展
  • 库文件
  • CoreDump

他的源码定义如下:

12345678910111213
#defineMH_OBJECT0x1/* relocatable object file */#defineMH_EXECUTE0x2/* demand paged executable file */#defineMH_FVMLIB0x3/* fixed VM shared library file */#defineMH_CORE0x4/* core file */#defineMH_PRELOAD0x5/* preloaded executable file */#defineMH_DYLIB0x6/* dynamically bound shared library */#defineMH_DYLINKER0x7/* dynamic link editor */#defineMH_BUNDLE0x8/* dynamically bound bundle file */#defineMH_DYLIB_STUB0x9/* shared library stub for static *//*  linking only, no section contents */#defineMH_DSYM0xa/* companion file with only debug *//*  sections */#defineMH_KEXT_BUNDLE0xb/* x86_64 kexts */

解释一下一些常用到的文件类型。

File Type用处例子MH_OBJECT编译过程中产生的*.obj文件gcc -c xxx.c 生成xxx.o文件MH_EXECUTABLE可执行二进制文件/usr/bin/gitMH_CORECoreDump崩溃时的Dump文件MH_DYLIB动态库/usr/lib/里面的那些库文件MH_DYLINKER连接器linker/usr/lib/dyld文件MH_KEXT_BUNDLE内核扩展文件自己开发的简单内核模块

2.3.2 flags

Mach-O headers还包含了一些很重要的dyld的加载参数。代码中的定义如下:

12345678910111213141516171819202122
#defineMH_INCRLINK0x2/* the object file is the output of an   incremental link against a base file   and can't be link edited again */#define MH_DYLDLINK0x4/* the object file is input for the   dynamic linker and can't be staticly   link edited again */#define MH_BINDATLOAD0x8/* the object file's undefined   references are bound by the dynamic   linker when loaded. */#define MH_PREBOUND0x10/* the file has its dynamic undefined   references prebound. */#define MH_SPLIT_SEGS0x20/* the file has its read-only and   read-write segments split */#define MH_LAZY_INIT0x40/* the shared library init routine is   to be run lazily via catching memory   faults to its writeable segments   (obsolete) */#define MH_TWOLEVEL0x80/* the image is using two-level name   space bindings */...//太长,有兴趣可以自己看源码// EXTERNAL_HEADERS/mach-o/x86_64/loader.h

同样简单的介绍几个比较重要的。

Flag Type含义MH_NOUNDEFS目标没有未定义的符号,不存在链接依赖MH_DYLDLINK该目标文件是dyld的输入文件,无法被再次的静态链接MH_PIE允许随机的地址空间MH_ALLOW_STACK_EXECUTION栈内存可执行代码,一般是默认关闭的。MH_NO_HEAP_EXECUTION堆内存无法执行代码

2.4 Headers小结

0x03 Load Commands

这是load_command的数据结构

1234
struct load_command {uint32_t cmd;/* type of load command */uint32_t cmdsize;/* total size of command in bytes */};

Load Commands 直接就跟在Header后面,所有command占用内存的总和在Mach-O Header里面已经给出了。在加载过Header之后就是通过解析LoadCommand来加载接下来的数据了。我简单的看了一下内核中是如何解析macho数据的,抛开内核的实现细节,逻辑其实也十分简单。

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172
staticload_return_tparse_machfile(struct vnode *vp,       vm_map_tmap,thread_tthread,struct mach_header*header,off_tfile_offset,off_tmacho_size,intdepth,int64_taslr_offset,int64_tdyld_aslr_offset,load_result_t*result){[...] //此处省略大量初始化与检测/* * Loop through each of the load_commands indicated by the * Mach-O header; if an absurd value is provided, we just * run off the end of the reserved section by incrementing * the offset too far, so we are implicitly fail-safe. */offset = mach_header_sz;ncmds = header->ncmds;while (ncmds--) {/* *Get a pointer to the command. */lcp = (struct load_command *)(addr + offset);//lcp设为当前要解析的cmd的地址oldoffset = offset;//oldoffset是从macho文件内存开始的地方偏移到当前command的偏移量offset += lcp->cmdsize;//重新计算offset,再加上当前command的长度,offset的值为文件内存起始地址到下一个command的偏移量/* * Perform prevalidation of the struct load_command * before we attempt to use its contents.  Invalid * values are ones which result in an overflow, or * which can not possibly be valid commands, or which * straddle or exist past the reserved section at the * start of the image. */if (oldoffset > offset ||    lcp->cmdsize < sizeof(struct load_command) ||    offset > header->sizeofcmds + mach_header_sz) {ret = LOAD_BADMACHO;break;}//做了一个检测,与如何加载进入内存无关/* * Act on struct load_command's for which kernel * intervention is required. */switch(lcp->cmd) {case LC_SEGMENT:[...]ret = load_segment(lcp,                   header->filetype,                   control,                   file_offset,                   macho_size,                   vp,                   map,                   slide,                   result);break;case LC_SEGMENT_64:[...]ret = load_segment(lcp,                   header->filetype,                   control,                   file_offset,                   macho_size,                   vp,                   map,                   slide,                   result);break;case LC_UNIXTHREAD:if (pass != 1)break;ret = load_unixthread( (struct thread_command *) lcp, thread, slide, result);break;case LC_MAIN:if (pass != 1)break;if (depth != 1)break;ret = load_main( (struct entry_point_command *) lcp, thread, slide, result);break;case LC_LOAD_DYLINKER:if (pass != 3)break;if ((depth == 1) && (dlp == 0)) {dlp = (struct dylinker_command *)lcp;dlarchbits = (header->cputype & CPU_ARCH_MASK);} else {ret = LOAD_FAILURE;}break;case LC_UUID:if (pass == 1 && depth == 1) {ret = load_uuid((struct uuid_command *) lcp,(char *)addr + mach_header_sz + header->sizeofcmds,result);}break;case LC_CODE_SIGNATURE:[...]ret = load_code_signature((struct linkedit_data_command *) lcp,vp,file_offset,macho_size,header->cputype,result);[...]break;#if CONFIG_CODE_DECRYPTIONcase LC_ENCRYPTION_INFO:case LC_ENCRYPTION_INFO_64:if (pass != 3)break;ret = set_code_unprotect((struct encryption_info_command *) lcp,addr, map, slide, vp, file_offset,header->cputype, header->cpusubtype);if (ret != LOAD_SUCCESS) {printf("proc %d: set_code_unprotect() error %d "       "for file \"%s\"\n",       p->p_pid, ret, vp->v_name);/*  * Don't let the app run if it's  * encrypted but we failed to set up the * decrypter. If the keys are missing it will * return LOAD_DECRYPTFAIL. */ if (ret == LOAD_DECRYPTFAIL) {/* failed to load due to missing FP keys */proc_lock(p);p->p_lflag |= P_LTERM_DECRYPTFAIL;proc_unlock(p); } psignal(p, SIGKILL);}break;#endifdefault:/* Other commands are ignored by the kernel */ret = LOAD_SUCCESS;break;}if (ret != LOAD_SUCCESS)break;}if (ret != LOAD_SUCCESS)break;}[...] //此处略去加载之后的处理代码}

3.1cmdsize字段

这里主要看while循环刚刚进入的时候几行代码,来理解是如何通过load_command的cmd字段来解析Macho文件的数据。

12345678
...lcp = (struct load_command *)(addr + offset);//lcp设为当前要解析的cmd的地址oldoffset = offset;//oldoffset是从macho文件内存开始的地方偏移到当前command的偏移量offset += lcp->cmdsize;//重新计算offset,再加上当前command的长度,offset的值为文件内存起始地址到下一个command的偏移量...

3.2 cmd字段

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107
switch(lcp->cmd) {case LC_SEGMENT:[...]ret = load_segment(lcp,                   header->filetype,                   control,                   file_offset,                   macho_size,                   vp,                   map,                   slide,                   result);break;case LC_SEGMENT_64:[...]ret = load_segment(lcp,                   header->filetype,                   control,                   file_offset,                   macho_size,                   vp,                   map,                   slide,                   result);break;case LC_UNIXTHREAD:if (pass != 1)break;ret = load_unixthread( (struct thread_command *) lcp, thread, slide, result);break;case LC_MAIN:if (pass != 1)break;if (depth != 1)break;ret = load_main( (struct entry_point_command *) lcp, thread, slide, result);break;case LC_LOAD_DYLINKER:if (pass != 3)break;if ((depth == 1) && (dlp == 0)) {dlp = (struct dylinker_command *)lcp;dlarchbits = (header->cputype & CPU_ARCH_MASK);} else {ret = LOAD_FAILURE;}break;case LC_UUID:if (pass == 1 && depth == 1) {ret = load_uuid((struct uuid_command *) lcp,(char *)addr + mach_header_sz + header->sizeofcmds,result);}break;case LC_CODE_SIGNATURE:[...]ret = load_code_signature((struct linkedit_data_command *) lcp,vp,file_offset,macho_size,header->cputype,result);[...]break;#if CONFIG_CODE_DECRYPTIONcase LC_ENCRYPTION_INFO:case LC_ENCRYPTION_INFO_64:if (pass != 3)break;ret = set_code_unprotect((struct encryption_info_command *) lcp,addr, map, slide, vp, file_offset,header->cputype, header->cpusubtype);if (ret != LOAD_SUCCESS) {printf("proc %d: set_code_unprotect() error %d "       "for file \"%s\"\n",       p->p_pid, ret, vp->v_name);/*  * Don't let the app run if it's  * encrypted but we failed to set up the * decrypter. If the keys are missing it will * return LOAD_DECRYPTFAIL. */ if (ret == LOAD_DECRYPTFAIL) {/* failed to load due to missing FP keys */proc_lock(p);p->p_lflag |= P_LTERM_DECRYPTFAIL;proc_unlock(p); } psignal(p, SIGKILL);}break;#endifdefault:/* Other commands are ignored by the kernel */ret = LOAD_SUCCESS;break;}

从这一段代码可以看出,根据cmd字段的类型不同,使用了不同的函数来加载。简单的列出一张表看一看在内核代码中不同的command类型都有哪些作用。

Command类型处理函数用途LC_SEGMENT;LC_SEGMENT_64load_segment将segment中的数据加载并映射到进程的内存空间去LC_LOAD_DYLINKERload_dylinker调用/usr/lib/dyld程序LC_UUIDload_uuid加载128-bit的唯一IDLC_THREADload_thread开启一个MACH线程,但是不分配栈空间。LC_UNIXTHREADload_unixthread开启一个UNIX线程LC_CODE_SIGNATUREload_code_signature进行数字签名LC_ENCRYPTION_INFOset_code_unprotect加密二进制文件

0x04 Segment&Section

加载数据时,主要加载的就是LC_SEGMET活着LC_SEGMENT_64。其他的Segment的用途在上一节已经简单的介绍了,这里不做深究。

LCSEGMENT以及LC_SEGMENT_64的数据结构是这样的。

1234567891011121314151617181920212223242526272829
struct segment_command { /* for 32-bit architectures */uint32_tcmd;/* LC_SEGMENT */uint32_tcmdsize;/* includes sizeof section structs */charsegname[16];/* segment name */uint32_tvmaddr;/* memory address of this segment */uint32_tvmsize;/* memory size of this segment */uint32_tfileoff;/* file offset of this segment */uint32_tfilesize;/* amount to map from the file */vm_prot_tmaxprot;/* maximum VM protection */vm_prot_tinitprot;/* initial VM protection */uint32_tnsects;/* number of sections in segment */uint32_tflags;/* flags */};struct segment_command_64 { /* for 64-bit architectures */uint32_tcmd;/* LC_SEGMENT_64 */uint32_tcmdsize;/* includes sizeof section_64 structs */charsegname[16];/* segment name */uint64_tvmaddr;/* memory address of this segment */uint64_tvmsize;/* memory size of this segment */uint64_tfileoff;/* file offset of this segment */uint64_tfilesize;/* amount to map from the file */vm_prot_tmaxprot;/* maximum VM protection */vm_prot_tinitprot;/* initial VM protection */uint32_tnsects;/* number of sections in segment */uint32_tflags;/* flags */};

可以看出,这里大部分的数据是用来帮助内核将Segment映射到虚拟内存的。主要要关注的是nsects

字段,标示了Segment中有多少secetion。section是具体有用的数据存放的地方。

Section的数据结构如下:

12345678910111213141516171819202122232425262728
struct section { /* for 32-bit architectures */charsectname[16];/* name of this section */charsegname[16];/* segment this section goes in */uint32_taddr;/* memory address of this section */uint32_tsize;/* size in bytes of this section */uint32_toffset;/* file offset of this section */uint32_talign;/* section alignment (power of 2) */uint32_treloff;/* file offset of relocation entries */uint32_tnreloc;/* number of relocation entries */uint32_tflags;/* flags (section type and attributes)*/uint32_treserved1;/* reserved (for offset or index) */uint32_treserved2;/* reserved (for count or sizeof) */};struct section_64 { /* for 64-bit architectures */charsectname[16];/* name of this section */charsegname[16];/* segment this section goes in */uint64_taddr;/* memory address of this section */uint64_tsize;/* size in bytes of this section */uint32_toffset;/* file offset of this section */uint32_talign;/* section alignment (power of 2) */uint32_treloff;/* file offset of relocation entries */uint32_tnreloc;/* number of relocation entries */uint32_tflags;/* flags (section type and attributes)*/uint32_treserved1;/* reserved (for offset or index) */uint32_treserved2;/* reserved (for count or sizeof) */uint32_treserved3;/* reserved */};

除了同样有帮助内存映射的变量外,在了解Mach-O格式的时候,只需要知道不同的Section有着不同的作用就可以了。

Section作用__text代码__cstring硬编码的字符串__constconst 关键词修饰过的变量__DATA.__bssbss段

因为section类型已经是最小的分类了,还有更多复杂section段就不一一例举了,遇到没见过的section类型可以自行查找Apple文档。

0x05 小结

通过对Mach-O格式的仔细分析,可以更好的理解Mach-O文件的加载过程,为研究dyld或者其他OS X系统下的模块打好基础。

参考

1.mach-o文件加载的全过程(1)

http://dongaxis.github.io/2015/01/01/mac-o%E6%96%87%E4%BB%B6%E5%8A%A0%E8%BD%BD%E7%9A%84%E5%85%A8%E8%BF%87%E7%A8%8B-1/

2.Mach-O 可执行文件

http://objccn.io/issue-6-3/

3.iPhone Mach-O文件格式与代码签名

http://zhiwei.li/text/2012/02/15/iphone-mach-o%E6%96%87%E4%BB%B6%E6%A0%BC%E5%BC%8F%E4%B8%8E%E4%BB%A3%E7%A0%81%E7%AD%BE%E5%90%8D/

4.Dynamic Linking of Imported Functions in Mach-O

http://www.codeproject.com/Articles/187181/Dynamic-Linking-of-Imported-Functions-in-Mach-O

5.otool详解Mach-o文件头部

http://www.mc2lab.com/?p=68


原文地址: http://turingh.github.io/2016/03/07/mach-o%E6%96%87%E4%BB%B6%E6%A0%BC%E5%BC%8F%E5%88%86%E6%9E%90/

0 0
原创粉丝点击