Android Art Hook 技术方案

来源：互联网发布：ui设计网站知乎编辑：程序博客网时间：2024/05/23 01:14

转载自http://blog.csdn.net/L173864930/article/details/45035521Android Art Hook 技术方案

by 低端码农 at 2015.4.13
www.im-boy.net

0x1 开始

Anddroid上的ART从5.0之后变成默认的选择，可见ART的重要性，目前关于Dalvik Hook方面研究的文章很多，但我在网上却找不到关于ART Hook相关的文章，甚至连鼎鼎大名的XPosed和Cydia Substrate到目前为止也不支持ART的Hook。当然我相信，技术方案他们肯定是的，估计卡在机型适配上的了。

既然网上找不到相关的资料，于是我决定自己花些时间去研究一下，终于黃天不负有心人，我找到了一个切实可行的方法，即本文所介绍的方法。

应该说明的是本文所介绍的方法肯定不是最好的，但大家看完本文之后，如果能启发大家找到更好的ART Hook方法，那我抛砖引玉的目的就达到了。废话不多说，我们开始吧。

运行环境: 4.4.2 ART模式的模拟器
开发环境: Mac OS X 10.10.3

0x2 ART类方法加载及执行

在ART中类方法的执行要比在Dalvik中要复杂得多，Dalvik如果除去JIT部分，可以理解为是一个解析执行的虚拟机，而ART则同时包含本地指令执行和解析执行两种模式，同时所生成的oat文件也包含两种类型，分别是portable和quick。portable和quick的主要区别是对于方法的加载机制不相同，quick大量使用了Lazy Load机制，因此应用的启动速度更快，但加载流程更复杂。其中quick是作为默认选项，因此本文所涉及的技术分析都是基于quick类型的。

由于ART存在本地指令执行和解析执行两种模式，因此类方法之间并不是能直接跳转的，而是通过一些预先定义的bridge函数进行状态和上下文的切换，这里引用一下老罗博客中的示意图：

当执行某个方法时，如果当前是本地指令执行模式，则会执行ArtMethod::GetEntryPointFromCompiledCode()指向的函数，否则则执行ArtMethod::GetEntryPointFromInterpreter()指向的函数。因此每个方法，都有两个入口点，分别保存在ArtMethod::entry_point_from_compiled_code_和ArtMethod::entry_point_from_interpreter_。了解这一点非常重要，后面我们主要就是在这两个入口做文章。

在讲述原理之前，需要先把以下两个流程了解清楚，这里的内容要展开是非常庞大的，我针对Hook的关键点，简明扼要的描述一下，但还是强烈建议大家去老罗的博客里细读一下其中关于ART的几篇文章。

ArtMethod加载流程

这个过程发生在oat被装载进内存并进行类方法链接的时候，类方法链接的代码在art/runtime/class_linker.cc中的LinkCode，如下所示：

static void LinkCode(SirtRef<mirror::ArtMethod>& method, const OatFile::OatClass* oat_class, uint32_t method_index)    SHARED_LOCKS_REQUIRED(Locks::mutator_lock_) {  // Method shouldn't have already been linked.  DCHECK(method->GetEntryPointFromCompiledCode() == NULL);  // Every kind of method should at least get an invoke stub from the oat_method.  // non-abstract methods also get their code pointers.  const OatFile::OatMethod oat_method = oat_class->GetOatMethod(method_index);  // 这里默认会把method::entry_point_from_compiled_code_设置oatmethod的code  oat_method.LinkMethod(method.get());  // Install entry point from interpreter.  Runtime* runtime = Runtime::Current();  bool enter_interpreter = NeedsInterpreter(method.get(), method->GetEntryPointFromCompiledCode()); //判断方法是否需要解析执行  // 设置解析执行的入口点  if (enter_interpreter) {    method->SetEntryPointFromInterpreter(interpreter::artInterpreterToInterpreterBridge);  } else {    method->SetEntryPointFromInterpreter(artInterpreterToCompiledCodeBridge);  }  // 下面是设置本地指令执行的入口点  if (method->IsAbstract()) {    method->SetEntryPointFromCompiledCode(GetCompiledCodeToInterpreterBridge());    return;  }  // 这里比较难理解，如果是静态方法，但不是clinit，但需要把entry_point_from_compiled_code_设置为GetResolutionTrampoline的返回值  if (method->IsStatic() && !method->IsConstructor()) {    // For static methods excluding the class initializer, install the trampoline.    // It will be replaced by the proper entry point by ClassLinker::FixupStaticTrampolines    // after initializing class (see ClassLinker::InitializeClass method).    method->SetEntryPointFromCompiledCode(GetResolutionTrampoline(runtime->GetClassLinker()));  } else if (enter_interpreter) {    // Set entry point from compiled code if there's no code or in interpreter only mode.    method->SetEntryPointFromCompiledCode(GetCompiledCodeToInterpreterBridge());  }  if (method->IsNative()) {    // Unregistering restores the dlsym lookup stub.    method->UnregisterNative(Thread::Current());  }  // Allow instrumentation its chance to hijack code.  runtime->GetInstrumentation()->UpdateMethodsCode(method.get(),method->GetEntryPointFromCompiledCode());}1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48

通过上面的代码我们可以得到，一个ArtMethod的入口主要有以下几种：

Interpreter2Interpreter对应artInterpreterToInterpreterBridge(art/runtime/interpreter/interpreter.cc);
Interpreter2CompledCode对应artInterpreterToCompiledCodeBridge(/art/runtime/entrypoints/interpreter/interpreter_entrypoints.cc);
CompliedCode2Interpreter对应art_quick_to_interpreter_bridge(art/runtime/arch/arm/quick_entrypoints_arm.S);
CompliedCode2ResolutionTrampoline对应art_quick_resolution_trampoline(art/runtime/arch/arm/quick_entrypoints_arm.S);
CompliedCode2CompliedCode这个入口是直接指向oat中的指令，详细可见OatMethod::LinkMethod;

其中调用约定主要有两种，分别是：

typedef void (EntryPointFromInterpreter)(Thread* self, MethodHelper& mh, const DexFile::CodeItem* code_item, ShadowFrame* shadow_frame, JValue* result), 这种对应上述1，3两种入口；

剩下的2，4，5三种入口对应的是CompledCode的入口，代码中并没有直接给出，但我们通过分析ArtMethod::Invoke的方法调用，就可以知道其调用约定了。Invoke过程中会调用art_quick_invoke_stub(/art/runtime/arch/arm/quick_entrypoints_arm.S)，代码如下所示：

 /* * Quick invocation stub. * On entry: *   r0 = method pointer *   r1 = argument array or NULL for no argument methods *   r2 = size of argument array in bytes *   r3 = (managed) thread pointer *   [sp] = JValue* result *   [sp + 4] = result type char */ENTRY art_quick_invoke_stubpush   {r0, r4, r5, r9, r11, lr}       @ spill regs.save  {r0, r4, r5, r9, r11, lr}.pad #24.cfi_adjust_cfa_offset 24.cfi_rel_offset r0, 0.cfi_rel_offset r4, 4.cfi_rel_offset r5, 8.cfi_rel_offset r9, 12.cfi_rel_offset r11, 16.cfi_rel_offset lr, 20mov    r11, sp                         @ save the stack pointer.cfi_def_cfa_register r11mov    r9, r3                          @ move managed thread pointer into r9mov    r4, #SUSPEND_CHECK_INTERVAL     @ reset r4 to suspend check intervaladd    r5, r2, #16                     @ create space for method pointer in frameand    r5, #0xFFFFFFF0                 @ align frame size to 16 bytessub    sp, r5                          @ reserve stack space for argument arrayadd    r0, sp, #4                      @ pass stack pointer + method ptr as dest for memcpybl     memcpy                          @ memcpy (dest, src, bytes)ldr    r0, [r11]                       @ restore method*ldr    r1, [sp, #4]                    @ copy arg value for r1ldr    r2, [sp, #8]                    @ copy arg value for r2ldr    r3, [sp, #12]                   @ copy arg value for r3mov    ip, #0                          @ set ip to 0str    ip, [sp]                        @ store NULL for method* at bottom of frameldr    ip, [r0, #METHOD_CODE_OFFSET]   @ get pointer to the codeblx    ip                              @ call the methodmov    sp, r11                         @ restore the stack pointerldr    ip, [sp, #24]                   @ load the result pointerstrd   r0, [ip]                        @ store r0/r1 into result pointerpop    {r0, r4, r5, r9, r11, lr}       @ restore spill regs.cfi_adjust_cfa_offset -24bx     lrEND art_quick_invoke_stub1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45

“ldr ip, [r0, #METHOD_CODE_OFFSET]”其实就是把ArtMethod::entry_point_from_compiled_code_赋值给ip，然后通过blx直接调用。通过这段小小的汇编代码，我们得出如下堆栈的布局：

   -(low)   | caller(Method *)   | <- sp    | arg1               | <- r1   | arg2               | <- r2   | arg3               | <- r3   | ...                |    | argN               |   | callee(Method *)   | <- r0   +(high)1
2
3
4
5
6
7
8
9
1
2
3
4
5
6
7
8
9

这种调用约定并不是平时我们所见的调用约定，主要体现在参数当超过4时，并不是从sp开始保存，而是从sp + 20这个位置开始存储，所以这就是为什么在代码里entry_point_from_compiled_code_的类型是void *的原因了，因为无法用代码表示。

理解好这个调用约定对我们方案的实现至关重要。

ArtMethod执行流程

上面详细讲述了类方法加载和链接的过程，但在实际执行的过程中，其实还不是直接调用ArtMethod的entry_point(解析执行和本地指令执行的入口)，为了加快执行速度，ART为oat文件中的每个dex创建了一个DexCache（art/runtime/mirror/dex_cache.h）结构，这个结构会按dex的结构生成一系列的数组，这里我们只分析它里面的methods字段。 DexCache初始化的方法是Init，实现如下：

void DexCache::Init(const DexFile* dex_file,                    String* location,                    ObjectArray<String>* strings,                    ObjectArray<Class>* resolved_types,                    ObjectArray<ArtMethod>* resolved_methods,                    ObjectArray<ArtField>* resolved_fields,                    ObjectArray<StaticStorageBase>* initialized_static_storage) {  //...  //...  Runtime* runtime = Runtime::Current();  if (runtime->HasResolutionMethod()) {    // Initialize the resolve methods array to contain trampolines for resolution.    ArtMethod* trampoline = runtime->GetResolutionMethod();    size_t length = resolved_methods->GetLength();    for (size_t i = 0; i < length; i++) {      resolved_methods->SetWithoutChecks(i, trampoline);    }  }}1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

根据dex方法的个数，产生相应长度resolved_methods数组，然后每一个都用Runtime::GetResolutionMethod()返回的结果进行填充，这个方法是由Runtime::CreateResolutionMethod产生的，代码如下：

mirror::ArtMethod* Runtime::CreateResolutionMethod() {  mirror::Class* method_class = mirror::ArtMethod::GetJavaLangReflectArtMethod();  Thread* self = Thread::Current();  SirtRef<mirror::ArtMethod>      method(self, down_cast<mirror::ArtMethod*>(method_class->AllocObject(self)));  method->SetDeclaringClass(method_class);  // TODO: use a special method for resolution method saves  method->SetDexMethodIndex(DexFile::kDexNoIndex);  // When compiling, the code pointer will get set later when the image is loaded.  Runtime* r = Runtime::Current();  ClassLinker* cl = r->GetClassLinker();  method->SetEntryPointFromCompiledCode(r->IsCompiler() ? NULL : GetResolutionTrampoline(cl));  return method.get();}1
2
3
4
5
6
7
8
9
10
11
12
13
14
1
2
3
4
5
6
7
8
9
10
11
12
13
14

从method->SetDexMethodIndex(DexFile::kDexNoIndex)这句得知，所有的ResolutionMethod的methodIndexDexFile::kDexNoIndex。而ResolutionMethod的entrypoint就是我们上面入口分析中的第4种情况，GetResolutionTrampoline最终返回的入口为art_quick_resolution_trampoline(art/runtime/arch/arm/quick_entrypoints_arm.S)。我们看一下其实现代码：

    .extern artQuickResolutionTrampolineENTRY art_quick_resolution_trampoline    SETUP_REF_AND_ARGS_CALLEE_SAVE_FRAME    mov     r2, r9                 @ pass Thread::Current    mov     r3, sp                 @ pass SP    blx     artQuickResolutionTrampoline  @ (Method* called, receiver, Thread*, SP)    cbz     r0, 1f                 @ is code pointer null? goto exception    mov     r12, r0    ldr  r0, [sp, #0]              @ load resolved method in r0    ldr  r1, [sp, #8]              @ restore non-callee save r1    ldrd r2, [sp, #12]             @ restore non-callee saves r2-r3    ldr  lr, [sp, #44]             @ restore lr    add  sp, #48                   @ rewind sp    .cfi_adjust_cfa_offset -48    bx      r12                    @ tail-call into actual code1:    RESTORE_REF_AND_ARGS_CALLEE_SAVE_FRAME    DELIVER_PENDING_EXCEPTIONEND art_quick_resolution_trampoline1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

调整好寄存器后，直接跳转至artQuickResolutionTrampoline(art/runtime/entrypoints/quick/quick_trampoline_entrypoints.cc)，接下来我们分析这个方法的实现（大家不要晕了。。。，我会把无关紧要的代码去掉）：

// Lazily resolve a method for quick. Called by stub code.extern "C" const void* artQuickResolutionTrampoline(mirror::ArtMethod* called,                                                    mirror::Object* receiver,                                                    Thread* thread, mirror::ArtMethod** sp)    SHARED_LOCKS_REQUIRED(Locks::mutator_lock_) {  FinishCalleeSaveFrameSetup(thread, sp, Runtime::kRefsAndArgs);  // Start new JNI local reference state  JNIEnvExt* env = thread->GetJniEnv();  ScopedObjectAccessUnchecked soa(env);  ScopedJniEnvLocalRefState env_state(env);  const char* old_cause = thread->StartAssertNoThreadSuspension("Quick method resolution set up");  // Compute details about the called method (avoid GCs)  ClassLinker* linker = Runtime::Current()->GetClassLinker();  mirror::ArtMethod* caller = QuickArgumentVisitor::GetCallingMethod(sp);  InvokeType invoke_type;  const DexFile* dex_file;  uint32_t dex_method_idx;  if (called->IsRuntimeMethod()) {    //...    //...  } else {    invoke_type = kStatic;    dex_file = &MethodHelper(called).GetDexFile();    dex_method_idx = called->GetDexMethodIndex();  }  //...  // Resolve method filling in dex cache.  if (called->IsRuntimeMethod()) {    called = linker->ResolveMethod(dex_method_idx, caller, invoke_type);  }  const void* code = NULL;  if (LIKELY(!thread->IsExceptionPending())) {    //...    linker->EnsureInitialized(called_class, true, true);    //...  }  // ...  return code;}1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46

inline bool ArtMethod::IsRuntimeMethod() const {  return GetDexMethodIndex() == DexFile::kDexNoIndex;}1
2
3
1
2
3

called->IsRuntimeMethod()用于判断当前方法是否为ResolutionMethod。如果是，那么就走ClassLinker::ResolveMethod流程去获取真正的方法，见代码：

mirror::ArtMethod* ClassLinker::ResolveMethod(const DexFile& dex_file,                                                   uint32_t method_idx,                                                   mirror::DexCache* dex_cache,                                                   mirror::ClassLoader* class_loader,                                                   const mirror::ArtMethod* referrer,                                                   InvokeType type) {  DCHECK(dex_cache != NULL);  // Check for hit in the dex cache.  mirror::ArtMethod* resolved = dex_cache->GetResolvedMethod(method_idx);  if (resolved != NULL) {    return resolved;  }  // Fail, get the declaring class.  const DexFile::MethodId& method_id = dex_file.GetMethodId(method_idx);  mirror::Class* klass = ResolveType(dex_file, method_id.class_idx_, dex_cache, class_loader);  if (klass == NULL) {    DCHECK(Thread::Current()->IsExceptionPending());    return NULL;  }  // Scan using method_idx, this saves string compares but will only hit for matching dex  // caches/files.  switch (type) {    case kDirect:  // Fall-through.    case kStatic:      resolved = klass->FindDirectMethod(dex_cache, method_idx);      break;    case kInterface:      resolved = klass->FindInterfaceMethod(dex_cache, method_idx);      DCHECK(resolved == NULL || resolved->GetDeclaringClass()->IsInterface());      break;    case kSuper:  // Fall-through.    case kVirtual:      resolved = klass->FindVirtualMethod(dex_cache, method_idx);      break;    default:      LOG(FATAL) << "Unreachable - invocation type: " << type;  }  if (resolved == NULL) {    // Search by name, which works across dex files.    const char* name = dex_file.StringDataByIdx(method_id.name_idx_);    std::string signature(dex_file.CreateMethodSignature(method_id.proto_idx_, NULL));    switch (type) {      case kDirect:  // Fall-through.      case kStatic:        resolved = klass->FindDirectMethod(name, signature);        break;      case kInterface:        resolved = klass->FindInterfaceMethod(name, signature);        DCHECK(resolved == NULL || resolved->GetDeclaringClass()->IsInterface());        break;      case kSuper:  // Fall-through.      case kVirtual:        resolved = klass->FindVirtualMethod(name, signature);        break;    }  }  if (resolved != NULL) {    // Be a good citizen and update the dex cache to speed subsequent calls.    dex_cache->SetResolvedMethod(method_idx, resolved);    return resolved;  } else {    // ...    }}1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70

其实这里发生了“连锁反应”，ClassLinker::ResolveType走的流程，跟ResolveMethod是非常类似的，有兴趣的朋友可以跟一下。
找到解析后的klass，再经过一轮疯狂的搜索，把找到的resolved通过DexCache::SetResolvedMethod覆盖掉之前的“替身”。当再下次再通过ResolveMethod解析方法时，就可以直接把该方法返回，不需要再解析了。

我们回过头来再重新“复现”一下这个过程，当我们首次调用某个类方法，其过程如下所示：

调用ResolutionMethod的entrypoint，进入art_quick_resolution_trampoline；
art_quick_resolution_trampoline跳转到artQuickResolutionTrampoline；
artQuickResolutionTrampoline调用ClassLinker::ResolveMethod解析类方法；
ClassLinker::ResolveMethod调用ClassLinkder::ResolveType解析类，再从解析好的类寻找真正的方法；
调用DexCache::SetResolvedMethod，用真正的方法覆盖掉“替身”方法；
调用真正方法的entrypoint代码；

也许你会问，为什么要把过程搞得这么绕？一切都是为了延迟加载，提高启动速度，这个过程跟ELF Linker的PLT/GOT符号重定向的过程是何其相似啊，所以技术都是想通的，一通百明。

0x3 Hook ArtMethod

通过上述ArtMethod加载和执行两个流程的分析，对于如何Hook ArtMethod，我想到了两个方案，分别

修改DexCach里的methods，把里面的entrypoint修改为自己的，做一个中转处理；
直接修改加载后的ArtMethod的entrypoint，同样做一个中转处理；

上面两个方法都是可行的，但由于我希望整个项目可以在NDK环境(而不是在源码下)下编译，因为就采用了方案2，因为通过JNI的接口就可以直接获取解析之后的ArtMethod，可以减少很多文件依赖。

回到前面的调用约定，每个ArtMethod都有两个约定，按道理我们应该准备两个中转函数的，但这里我们不考虑强制解析模式执行，所以只要处理好entry_point_from_compiled_code的中转即可。

首先，我们找到对应的方法，先保存其entrypoint，然后再把我们的中转函数art_quick_dispatcher覆盖，代码如下所示：

extern int __attribute__ ((visibility ("hidden"))) art_java_method_hook(JNIEnv* env, HookInfo *info) {    const char* classDesc = info->classDesc;    const char* methodName = info->methodName;    const char* methodSig = info->methodSig;    const bool isStaticMethod = info->isStaticMethod;    // TODO we can find class by special classloader what do just like dvm    jclass claxx = env->FindClass(classDesc);    if(claxx == NULL){        LOGE("[-] %s class not found", classDesc);        return -1;    }    jmethodID methid = isStaticMethod ?            env->GetStaticMethodID(claxx, methodName, methodSig) :            env->GetMethodID(claxx, methodName, methodSig);    if(methid == NULL){        LOGE("[-] %s->%s method not found", classDesc, methodName);        return -1;    }    ArtMethod *artmeth = reinterpret_cast<ArtMethod *>(methid);    if(art_quick_dispatcher != artmeth->GetEntryPointFromCompiledCode()){        uint64_t (*entrypoint)(ArtMethod* method, Object *thiz, u4 *arg1, u4 *arg2);        entrypoint = (uint64_t (*)(ArtMethod*, Object *, u4 *, u4 *))artmeth->GetEntryPointFromCompiledCode();        info->entrypoint = (const void *)entrypoint;        info->nativecode = artmeth->GetNativeMethod();        artmeth->SetEntryPointFromCompiledCode((const void *)art_quick_dispatcher);        // save info to nativecode :)        artmeth->SetNativeMethod((const void *)info);        LOGI("[+] %s->%s was hooked\n", classDesc, methodName);    }else{        LOGW("[*] %s->%s method had been hooked", classDesc, methodName);    }    return 0;}1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43

我们关键的信息通过ArtMethod::SetNativeMethod保存起来了。

考虑到ART特殊的调用约定，art_quick_dispatcher只能用汇编实现了，把寄存器适当的调整一下，再跳转到另一个函数artQuickToDispatcher，这样就可以很方便用c/c++访问参数了。

先看一下art_quick_dispatcher函数的实现如下：

/* * Art Quick Dispatcher. * On entry: *   r0 = method pointer *   r1 = arg1 *   r2 = arg2 *   r3 = arg3 *   [sp] = method pointer *   [sp + 4] = addr of thiz *   [sp + 8] = addr of arg1 *   [sp + 12] = addr of arg2 *   [sp + 16] = addr of arg3 * and so on */    .extern artQuickToDispatcherENTRY art_quick_dispatcher    push    {r4, r5, lr}           @ sp - 12    mov     r0, r0                 @ pass r0 to method    str     r1, [sp, #(12 + 4)]    str     r2, [sp, #(12 + 8)]    str     r3, [sp, #(12 + 12)]    mov     r1, r9                 @ pass r1 to thread    add     r2, sp, #(12 + 4)      @ pass r2 to args array    add     r3, sp, #12            @ pass r3 to old SP    blx     artQuickToDispatcher   @ (Method* method, Thread*, u4 **, u4 **)    pop     {r4, r5, pc}           @ return on success, r0 and r1 hold the resultEND art_quick_dispatcher1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

我把r2指向参数数组，这样就我们就可以非常方便的访问所有参数了。另外，我用r3保存了旧的sp地址，这样是为后面调用原来的entrypoint做准备的。我们先看看artQuickToDispatcher的实现：

extern "C" uint64_t artQuickToDispatcher(ArtMethod* method, Thread *self, u4 **args, u4 **old_sp){    HookInfo *info = (HookInfo *)method->GetNativeMethod();    LOGI("[+] entry ArtHandler %s->%s", info->classDesc, info->methodName);    // If it not is static method, then args[0] was pointing to this    if(!info->isStaticMethod){        Object *thiz = reinterpret_cast<Object *>(args[0]);        if(thiz != NULL){            char *bytes = get_chars_from_utf16(thiz->GetClass()->GetName());            LOGI("[+] thiz class is %s", bytes);            delete bytes;        }    }    const void *entrypoint = info->entrypoint;    method->SetNativeMethod(info->nativecode); //restore nativecode for JNI method    uint64_t res = art_quick_call_entrypoint(method, self, args, old_sp, entrypoint);    JValue* result = (JValue* )&res;    if(result != NULL){        Object *obj = result->l;        char *raw_class_name = get_chars_from_utf16(obj->GetClass()->GetName());        if(strcmp(raw_class_name, "java.lang.String") == 0){            char *raw_string_value = get_chars_from_utf16((String *)obj);            LOGI("result-class %s, result-value \"%s\"", raw_class_name, raw_string_value);            free(raw_string_value);        }else{            LOGI("result-class %s", raw_class_name);        }        free(raw_class_name);    }    // entrypoid may be replaced by trampoline, only once.//  if(method->IsStatic() && !method->IsConstructor()){    entrypoint = method->GetEntryPointFromCompiledCode();    if(entrypoint != (const void *)art_quick_dispatcher){        LOGW("[*] entrypoint was replaced. %s->%s", info->classDesc, info->methodName);        method->SetEntryPointFromCompiledCode((const void *)art_quick_dispatcher);        info->entrypoint = entrypoint;        info->nativecode = method->GetNativeMethod();    }    method->SetNativeMethod((const void *)info);//  }    return res;}1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53

这里参数解析就不详细说了，接下来是最棘手的问题——如何重新调回原来的entrypoint。

这里的关键点是要还原之前的堆栈布局，art_quick_call_entrypoint就是负责完成这个工作的，其实现如下所示：

/* * * Art Quick Call Entrypoint * On entry: *  r0 = method pointer *  r1 = thread pointer *  r2 = args arrays pointer *  r3 = old_sp *  [sp] = entrypoint */ENTRY art_quick_call_entrypoint    push    {r4, r5, lr}           @ sp - 12    sub     sp, #(40 + 20)         @ sp - 40 - 20    str     r0, [sp, #(40 + 0)]    @ var_40_0 = method_pointer    str     r1, [sp, #(40 + 4)]    @ var_40_4 = thread_pointer    str     r2, [sp, #(40 + 8)]    @ var_40_8 = args_array    str     r3, [sp, #(40 + 12)]   @ var_40_12 = old_sp    mov     r0, sp    mov     r1, r3    ldr     r2, =40    blx     memcpy                 @ memcpy(dest, src, size_of_byte)    ldr     r0, [sp, #(40 + 0)]    @ restore method to r0    ldr     r1, [sp, #(40 + 4)]    mov     r9, r1                 @ restore thread to r9    ldr     r5, [sp, #(40 + 8)]    @ pass r5 to args_array    ldr     r1, [r5]               @ restore arg1    ldr     r2, [r5, #4]           @ restore arg2    ldr     r3, [r5, #8]           @ restore arg3    ldr     r5, [sp, #(40 + 20 + 12)] @ pass ip to entrypoint    blx     r5    add     sp, #(40 + 20)    pop     {r4, r5, pc}           @ return on success, r0 and r1 hold the resultEND art_quick_call_entrypoint1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

这里我偷懒了，直接申请了10个参数的空间，再使用之前传进入来的old_sp进行恢复，使用memcpy直接复制40字节。之后就是还原r0, r1, r2, r3, r9的值了。调用entrypoint完后，结果保存在r0和r1，再返回给artQuickToDispatcher。

至此，整个ART Hook就分析完毕了。

0x4 4.4与5.X上实现的区别

我的整个方案都是在4.4上测试的，主要是因为我只有4.4的源码，而且硬盘空间不足，实在装不下5.x的源码了。但整个思路，是完全可以套用用5.X上。另外，5.X的实现代码比4.4上复杂了很多，否能像我这样在NDK下编译完成就不知道了。

正常的4.4模拟器是以dalvik启动的，要到设置里改为art，这里会要求进行重启，但一般无效，我们手动关闭再重新打开就OK了，但需要等上一段时间才可以。

0x5 结束

虽然这篇文章只是介绍了Art Hook的技术方案，但其中的技术原理，对于如何在ART上进行代码加固、动态代码还原等等也是很有启发性。

老样子，整个项目的代码，我已经提交到https://github.com/boyliang/AllHookInOne，大家遇到什么问题，欢迎提问，有问题记得反馈。

对了，请用https://github.com/boyliang/ndk-patch给你的NDK打一下patch。

阅读全文

0 0