mach-o格式分析
来源:互联网 发布:最好网络合理理财 编辑:程序博客网 时间:2024/05/16 07:01
0x00 摘要
人生无根蒂,飘如陌上尘。 分散逐风转,此已非常身。
— 陶渊明 《杂诗》
mach-o
格式是OS X系统上的可执行文件格式,类似于windows的PE
与linux的ELF
,如果不彻底搞清楚mach-o
的格式与相关知识,去做其他研究,无异于建造空中阁楼。
每个Mach-O文件斗包含一个Mach-O头,然后是载入命令(Load Commands),最后是数据块(Data)。
接下来就对整个Mach-O的格式做出详细的分析。
0x01 Mach-O格式简单介绍
Mach-O文件的格式如下图所示:
又如下几个部分组成:
- Header:保存了Mach-O的一些基本信息,包括了平台、文件类型、LoadCommands的个数等等。
- LoadCommands:这一段紧跟Header,加载Mach-O文件时会使用这里的数据来确定内存的分布。
- Data:每一个segment的具体数据都保存在这里,这里包含了具体的代码、数据等等。
0x02 Headers
2.1 数据结构
Headers的定义可以在开源的内核代码中找到。
123456789101112131415161718192021222324252627282930313233343536
/* * The 32-bit mach header appears at the very beginning of the object file for * 32-bit architectures. */struct mach_header {uint32_tmagic;/* mach magic number identifier */cpu_type_tcputype;/* cpu specifier */cpu_subtype_tcpusubtype;/* machine specifier */uint32_tfiletype;/* type of file */uint32_tncmds;/* number of load commands */uint32_tsizeofcmds;/* the size of all the load commands */uint32_tflags;/* flags */};/* Constant for the magic field of the mach_header (32-bit architectures) */#defineMH_MAGIC0xfeedface/* the mach magic number */#define MH_CIGAM0xcefaedfe/* NXSwapInt(MH_MAGIC) *//* * The 64-bit mach header appears at the very beginning of object files for * 64-bit architectures. */struct mach_header_64 {uint32_tmagic;/* mach magic number identifier */cpu_type_tcputype;/* cpu specifier */cpu_subtype_tcpusubtype;/* machine specifier */uint32_tfiletype;/* type of file */uint32_tncmds;/* number of load commands */uint32_tsizeofcmds;/* the size of all the load commands */uint32_tflags;/* flags */uint32_treserved;/* reserved */};/* Constant for the magic field of the mach_header_64 (64-bit architectures) */#define MH_MAGIC_64 0xfeedfacf /* the 64-bit mach magic number */#define MH_CIGAM_64 0xcffaedfe /* NXSwapInt(MH_MAGIC_64) */
根据mach_header
与mach_header_64
的定义,很明显可以看出,Headers的主要作用就是帮助系统迅速的定位Mach-O文件的运行环境,文件类型。
2.2 实例
使用工具分析一个mach-o文件来具体的看一下Mach-O Headers。
通过otool可以得到Mach header的具体的情况,但是可读性略微有一点差。
12345
➜ bin otool -h gitgit:Mach header magic cputype cpusubtype caps filetype ncmds sizeofcmds flags 0xfeedfacf 16777223 3 0x80 2 17 1432 0x00200085
还有一个工具是MachOview可以看的更清楚一点。
- MagicNumber的值为0xFEEDFACF所以该文件是一个64位平台上的文件
- CPU Type和CPU SubType也很容易理解,运行在X86_64的CPU平台上
- File Type标示了该文件是一个可执行文件,后面具体分析
- Flags标示了这个MachO文件的四个特性,后面具体分析
2.3 具体参数
2.3.1 FileType
因为Mach-O文件不仅仅用来实现可执行文件,同时还用来实现了其他内容
- 内核扩展
- 库文件
- CoreDump
- …
他的源码定义如下:
12345678910111213
#defineMH_OBJECT0x1/* relocatable object file */#defineMH_EXECUTE0x2/* demand paged executable file */#defineMH_FVMLIB0x3/* fixed VM shared library file */#defineMH_CORE0x4/* core file */#defineMH_PRELOAD0x5/* preloaded executable file */#defineMH_DYLIB0x6/* dynamically bound shared library */#defineMH_DYLINKER0x7/* dynamic link editor */#defineMH_BUNDLE0x8/* dynamically bound bundle file */#defineMH_DYLIB_STUB0x9/* shared library stub for static *//* linking only, no section contents */#defineMH_DSYM0xa/* companion file with only debug *//* sections */#defineMH_KEXT_BUNDLE0xb/* x86_64 kexts */
解释一下一些常用到的文件类型。
2.3.2 flags
Mach-O headers还包含了一些很重要的dyld的加载参数。代码中的定义如下:
12345678910111213141516171819202122
#defineMH_INCRLINK0x2/* the object file is the output of an incremental link against a base file and can't be link edited again */#define MH_DYLDLINK0x4/* the object file is input for the dynamic linker and can't be staticly link edited again */#define MH_BINDATLOAD0x8/* the object file's undefined references are bound by the dynamic linker when loaded. */#define MH_PREBOUND0x10/* the file has its dynamic undefined references prebound. */#define MH_SPLIT_SEGS0x20/* the file has its read-only and read-write segments split */#define MH_LAZY_INIT0x40/* the shared library init routine is to be run lazily via catching memory faults to its writeable segments (obsolete) */#define MH_TWOLEVEL0x80/* the image is using two-level name space bindings */...//太长,有兴趣可以自己看源码// EXTERNAL_HEADERS/mach-o/x86_64/loader.h
同样简单的介绍几个比较重要的。
2.4 Headers小结
0x03 Load Commands
这是load_command的数据结构
1234
struct load_command {uint32_t cmd;/* type of load command */uint32_t cmdsize;/* total size of command in bytes */};
Load Commands 直接就跟在Header后面,所有command占用内存的总和在Mach-O Header里面已经给出了。在加载过Header之后就是通过解析LoadCommand来加载接下来的数据了。我简单的看了一下内核中是如何解析macho数据的,抛开内核的实现细节,逻辑其实也十分简单。
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172
staticload_return_tparse_machfile(struct vnode *vp, vm_map_tmap,thread_tthread,struct mach_header*header,off_tfile_offset,off_tmacho_size,intdepth,int64_taslr_offset,int64_tdyld_aslr_offset,load_result_t*result){[...] //此处省略大量初始化与检测/* * Loop through each of the load_commands indicated by the * Mach-O header; if an absurd value is provided, we just * run off the end of the reserved section by incrementing * the offset too far, so we are implicitly fail-safe. */offset = mach_header_sz;ncmds = header->ncmds;while (ncmds--) {/* *Get a pointer to the command. */lcp = (struct load_command *)(addr + offset);//lcp设为当前要解析的cmd的地址oldoffset = offset;//oldoffset是从macho文件内存开始的地方偏移到当前command的偏移量offset += lcp->cmdsize;//重新计算offset,再加上当前command的长度,offset的值为文件内存起始地址到下一个command的偏移量/* * Perform prevalidation of the struct load_command * before we attempt to use its contents. Invalid * values are ones which result in an overflow, or * which can not possibly be valid commands, or which * straddle or exist past the reserved section at the * start of the image. */if (oldoffset > offset || lcp->cmdsize < sizeof(struct load_command) || offset > header->sizeofcmds + mach_header_sz) {ret = LOAD_BADMACHO;break;}//做了一个检测,与如何加载进入内存无关/* * Act on struct load_command's for which kernel * intervention is required. */switch(lcp->cmd) {case LC_SEGMENT:[...]ret = load_segment(lcp, header->filetype, control, file_offset, macho_size, vp, map, slide, result);break;case LC_SEGMENT_64:[...]ret = load_segment(lcp, header->filetype, control, file_offset, macho_size, vp, map, slide, result);break;case LC_UNIXTHREAD:if (pass != 1)break;ret = load_unixthread( (struct thread_command *) lcp, thread, slide, result);break;case LC_MAIN:if (pass != 1)break;if (depth != 1)break;ret = load_main( (struct entry_point_command *) lcp, thread, slide, result);break;case LC_LOAD_DYLINKER:if (pass != 3)break;if ((depth == 1) && (dlp == 0)) {dlp = (struct dylinker_command *)lcp;dlarchbits = (header->cputype & CPU_ARCH_MASK);} else {ret = LOAD_FAILURE;}break;case LC_UUID:if (pass == 1 && depth == 1) {ret = load_uuid((struct uuid_command *) lcp,(char *)addr + mach_header_sz + header->sizeofcmds,result);}break;case LC_CODE_SIGNATURE:[...]ret = load_code_signature((struct linkedit_data_command *) lcp,vp,file_offset,macho_size,header->cputype,result);[...]break;#if CONFIG_CODE_DECRYPTIONcase LC_ENCRYPTION_INFO:case LC_ENCRYPTION_INFO_64:if (pass != 3)break;ret = set_code_unprotect((struct encryption_info_command *) lcp,addr, map, slide, vp, file_offset,header->cputype, header->cpusubtype);if (ret != LOAD_SUCCESS) {printf("proc %d: set_code_unprotect() error %d " "for file \"%s\"\n", p->p_pid, ret, vp->v_name);/* * Don't let the app run if it's * encrypted but we failed to set up the * decrypter. If the keys are missing it will * return LOAD_DECRYPTFAIL. */ if (ret == LOAD_DECRYPTFAIL) {/* failed to load due to missing FP keys */proc_lock(p);p->p_lflag |= P_LTERM_DECRYPTFAIL;proc_unlock(p); } psignal(p, SIGKILL);}break;#endifdefault:/* Other commands are ignored by the kernel */ret = LOAD_SUCCESS;break;}if (ret != LOAD_SUCCESS)break;}if (ret != LOAD_SUCCESS)break;}[...] //此处略去加载之后的处理代码}
3.1cmdsize字段
这里主要看while循环刚刚进入的时候几行代码,来理解是如何通过load_command的cmd字段来解析Macho文件的数据。
12345678
...lcp = (struct load_command *)(addr + offset);//lcp设为当前要解析的cmd的地址oldoffset = offset;//oldoffset是从macho文件内存开始的地方偏移到当前command的偏移量offset += lcp->cmdsize;//重新计算offset,再加上当前command的长度,offset的值为文件内存起始地址到下一个command的偏移量...
3.2 cmd字段
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107
switch(lcp->cmd) {case LC_SEGMENT:[...]ret = load_segment(lcp, header->filetype, control, file_offset, macho_size, vp, map, slide, result);break;case LC_SEGMENT_64:[...]ret = load_segment(lcp, header->filetype, control, file_offset, macho_size, vp, map, slide, result);break;case LC_UNIXTHREAD:if (pass != 1)break;ret = load_unixthread( (struct thread_command *) lcp, thread, slide, result);break;case LC_MAIN:if (pass != 1)break;if (depth != 1)break;ret = load_main( (struct entry_point_command *) lcp, thread, slide, result);break;case LC_LOAD_DYLINKER:if (pass != 3)break;if ((depth == 1) && (dlp == 0)) {dlp = (struct dylinker_command *)lcp;dlarchbits = (header->cputype & CPU_ARCH_MASK);} else {ret = LOAD_FAILURE;}break;case LC_UUID:if (pass == 1 && depth == 1) {ret = load_uuid((struct uuid_command *) lcp,(char *)addr + mach_header_sz + header->sizeofcmds,result);}break;case LC_CODE_SIGNATURE:[...]ret = load_code_signature((struct linkedit_data_command *) lcp,vp,file_offset,macho_size,header->cputype,result);[...]break;#if CONFIG_CODE_DECRYPTIONcase LC_ENCRYPTION_INFO:case LC_ENCRYPTION_INFO_64:if (pass != 3)break;ret = set_code_unprotect((struct encryption_info_command *) lcp,addr, map, slide, vp, file_offset,header->cputype, header->cpusubtype);if (ret != LOAD_SUCCESS) {printf("proc %d: set_code_unprotect() error %d " "for file \"%s\"\n", p->p_pid, ret, vp->v_name);/* * Don't let the app run if it's * encrypted but we failed to set up the * decrypter. If the keys are missing it will * return LOAD_DECRYPTFAIL. */ if (ret == LOAD_DECRYPTFAIL) {/* failed to load due to missing FP keys */proc_lock(p);p->p_lflag |= P_LTERM_DECRYPTFAIL;proc_unlock(p); } psignal(p, SIGKILL);}break;#endifdefault:/* Other commands are ignored by the kernel */ret = LOAD_SUCCESS;break;}
从这一段代码可以看出,根据cmd字段的类型不同,使用了不同的函数来加载。简单的列出一张表看一看在内核代码中不同的command类型都有哪些作用。
0x04 Segment&Section
加载数据时,主要加载的就是LC_SEGMET活着LC_SEGMENT_64。其他的Segment的用途在上一节已经简单的介绍了,这里不做深究。
LCSEGMENT以及LC_SEGMENT_64的数据结构是这样的。
1234567891011121314151617181920212223242526272829
struct segment_command { /* for 32-bit architectures */uint32_tcmd;/* LC_SEGMENT */uint32_tcmdsize;/* includes sizeof section structs */charsegname[16];/* segment name */uint32_tvmaddr;/* memory address of this segment */uint32_tvmsize;/* memory size of this segment */uint32_tfileoff;/* file offset of this segment */uint32_tfilesize;/* amount to map from the file */vm_prot_tmaxprot;/* maximum VM protection */vm_prot_tinitprot;/* initial VM protection */uint32_tnsects;/* number of sections in segment */uint32_tflags;/* flags */};struct segment_command_64 { /* for 64-bit architectures */uint32_tcmd;/* LC_SEGMENT_64 */uint32_tcmdsize;/* includes sizeof section_64 structs */charsegname[16];/* segment name */uint64_tvmaddr;/* memory address of this segment */uint64_tvmsize;/* memory size of this segment */uint64_tfileoff;/* file offset of this segment */uint64_tfilesize;/* amount to map from the file */vm_prot_tmaxprot;/* maximum VM protection */vm_prot_tinitprot;/* initial VM protection */uint32_tnsects;/* number of sections in segment */uint32_tflags;/* flags */};
可以看出,这里大部分的数据是用来帮助内核将Segment映射到虚拟内存的。主要要关注的是nsects
字段,标示了Segment中有多少secetion。section是具体有用的数据存放的地方。
Section的数据结构如下:
12345678910111213141516171819202122232425262728
struct section { /* for 32-bit architectures */charsectname[16];/* name of this section */charsegname[16];/* segment this section goes in */uint32_taddr;/* memory address of this section */uint32_tsize;/* size in bytes of this section */uint32_toffset;/* file offset of this section */uint32_talign;/* section alignment (power of 2) */uint32_treloff;/* file offset of relocation entries */uint32_tnreloc;/* number of relocation entries */uint32_tflags;/* flags (section type and attributes)*/uint32_treserved1;/* reserved (for offset or index) */uint32_treserved2;/* reserved (for count or sizeof) */};struct section_64 { /* for 64-bit architectures */charsectname[16];/* name of this section */charsegname[16];/* segment this section goes in */uint64_taddr;/* memory address of this section */uint64_tsize;/* size in bytes of this section */uint32_toffset;/* file offset of this section */uint32_talign;/* section alignment (power of 2) */uint32_treloff;/* file offset of relocation entries */uint32_tnreloc;/* number of relocation entries */uint32_tflags;/* flags (section type and attributes)*/uint32_treserved1;/* reserved (for offset or index) */uint32_treserved2;/* reserved (for count or sizeof) */uint32_treserved3;/* reserved */};
除了同样有帮助内存映射的变量外,在了解Mach-O格式的时候,只需要知道不同的Section有着不同的作用就可以了。
因为section类型已经是最小的分类了,还有更多复杂section段就不一一例举了,遇到没见过的section类型可以自行查找Apple文档。
0x05 小结
通过对Mach-O格式的仔细分析,可以更好的理解Mach-O文件的加载过程,为研究dyld或者其他OS X系统下的模块打好基础。
参考
1.mach-o文件加载的全过程(1)
http://dongaxis.github.io/2015/01/01/mac-o%E6%96%87%E4%BB%B6%E5%8A%A0%E8%BD%BD%E7%9A%84%E5%85%A8%E8%BF%87%E7%A8%8B-1/
2.Mach-O 可执行文件
http://objccn.io/issue-6-3/
3.iPhone Mach-O文件格式与代码签名
http://zhiwei.li/text/2012/02/15/iphone-mach-o%E6%96%87%E4%BB%B6%E6%A0%BC%E5%BC%8F%E4%B8%8E%E4%BB%A3%E7%A0%81%E7%AD%BE%E5%90%8D/
4.Dynamic Linking of Imported Functions in Mach-O
http://www.codeproject.com/Articles/187181/Dynamic-Linking-of-Imported-Functions-in-Mach-O
5.otool详解Mach-o文件头部
http://www.mc2lab.com/?p=68
原文地址: http://turingh.github.io/2016/03/07/mach-o%E6%96%87%E4%BB%B6%E6%A0%BC%E5%BC%8F%E5%88%86%E6%9E%90/
- mach-o格式分析
- mach-o格式分析
- mach-o格式分析
- Mach-O可执行文件格式
- 趣探 Mach-O:文件格式分析
- 趣探 Mach-O:文件格式分析
- 趣探 Mach-O:文件格式分析
- OSX内核加载mach-o流程分析
- Mach-o
- Mach-O
- 了解iOS上的可执行文件和Mach-O格式
- MAC系统中可执行文件格式(Mach-O)的学习 (一)
- 了解iOS上的可执行文件和Mach-O格式
- 了解iOS上的可执行文件和Mach-O格式
- mach-o的执行
- Mach-O 可执行文件
- Mach-O 可执行文件
- Mach-O可执行文件
- java面试题5
- 利用SecureCRT上传、下载文件
- 矩形面积交
- 如何在win7下远程控制ubuntu
- Spring中配置数据源的4种形式
- mach-o格式分析
- 合唱队形
- MySQL数据库总结(11)索引
- 内存池
- 装修项目之前端总结一
- HDOJ-2544 最短路(Dijkstra)
- Linux 的启动流程
- POJ,ZOJ题目分类(多篇整合版,分类很细致,全面)
- 深度学习(三十二)半监督阶梯网络学习笔记