framework watchdog源码分析

来源:互联网 发布:重庆指尖网络 编辑:程序博客网 时间:2024/05/19 17:24

1.framework watchdog简介

Android 平台实现了一个软件的WatchDog来监护SystemServer。SystemServer无疑是Android平台中最重要的进程了,里面运行了整个平台中绝大多数的服务。在这个进程中运行着近50个线程,任何一个线程死掉都可能导致整个系统死掉。SystemServer退出反而问题不大,因为 init进程会重新启动它,但是它死锁就麻烦了,因为整个系统就没法动了。
       在 SystemServer里运行的服务中,最重要的几个服务应该数ActivityManager、WindowManager和 PowerManager。软件的WatchDog主要就是确保这几个服务发生死锁之后,退出SystemServer进程,让init进程重启它,让系统回到可用状态


2.首先介绍下watchdog的原理,所有平台的watchdog其实都原理很简单,死循环去看护一个定时器,定时器需要定时向监护的thread发信号(喂狗),如果监护对象超时没有返回,那就没法进行下轮循环,watchdog咬死系统,framework重启

3.画了一张极丑的图,虽然丑,但是详细~下面所有的code都是围绕这个图展开的,要认真揣摩这张图~


1>首先,watchdog是由system server初始化并启动,分三小步:

1.1.第一小步startOtherServices

private void startOtherServices() {......   traceBeginAndSlog("InitWatchdog");   final Watchdog watchdog = Watchdog.getInstance();   watchdog.init(context, mActivityManagerService);   Trace.traceEnd(Trace.TRACE_TAG_SYSTEM_SERVER);......}
相应的,我们可以在开机log中看到这句
01-26 16:42:25.984  1596  1596 I SystemServer: InitWatchdog

1.2进入watchdog中的getInstance函数

public static Watchdog getInstance() {if (sWatchdog == null) {   sWatchdog = new Watchdog();}return sWatchdog;}

1.3. 来看watchdog的构造函数Watchdog()

简单说明,就是把一些重要的thread加入监测对象,参照上图右上角部分,default timeout时间是60s

private Watchdog() {       super("watchdog");       // Initialize handler checkers for each common thread we want to check.  Note       // that we are not currently checking the background thread, since it can       // potentially hold longer running operations with no guarantees about the timeliness       // of operations there.       // The shared foreground thread is the main checker.  It is where we       // will also dispatch monitor checks and do other work.       mMonitorChecker = new HandlerChecker(FgThread.getHandler(),               "foreground thread", DEFAULT_TIMEOUT);       mHandlerCheckers.add(mMonitorChecker);       // Add checker for main thread.  We only do a quick check since there       // can be UI running on the thread.       mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()),               "main thread", DEFAULT_TIMEOUT));       // Add checker for shared UI thread.       mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(),               "ui thread", DEFAULT_TIMEOUT));       // And also check IO thread.       mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(),               "i/o thread", DEFAULT_TIMEOUT));       // And the display thread.       mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(),               "display thread", DEFAULT_TIMEOUT));       // Initialize monitor for Binder threads.       addMonitor(new BinderThreadMonitor());   }

现在需要来了解一个对象mMonitorCheckers,我们所有需要被监测的thread都保存在这个对象里,是个极其重要的List

mHandlerCheckers和mMonitorChecker的关系如下(参照上图右上角部分):

①mHandlerCheckers是一个list,存储的是5个HandlerChecker类型的对象,分别对应fg,main,ui,io,display5个thread
②mMonitorChecker一个是HandlerChecker类型对象,他和mHandlerCheckers的基地址是相同的,也就是说,
也就是说,mMonitorChecker和mHandlerCheckers的fg thread共用一个对象空间

   final ArrayList<HandlerChecker> mHandlerCheckers = new ArrayList<>();   final HandlerChecker mMonitorChecker;Default 60sstatic final boolean DB = false;static final long DEFAULT_TIMEOUT = DB ? 10*1000 : 60*1000;


1.4是不是有点晕?没关系我们来看一下HandlerChecker这个class就会豁然开朗,先来看前面定义变量的部分

   /**    * Used for checking status of handle threads and scheduling monitor callbacks.    */   public final class HandlerChecker implements Runnable {       private final Handler mHandler;       private final String mName;       private final long mWaitMax;//特别注意mMonitor List,用来管理一些monitor对象       private final ArrayList<Monitor> mMonitors = new ArrayList<Monitor>();               private boolean mCompleted;       private Monitor mCurrentMonitor;       private long mStartTime;构造函数,mWaitMax就是thread传进来的timeout时间,上文提到过,default是60s       HandlerChecker(Handler handler, String name, long waitMaxMillis) {           mHandler = handler;           mName = name;           mWaitMax = waitMaxMillis;           mCompleted = true;       }在HandlerChecker内部定义了一个monitor类型的list即mMonitor,所以需要监测的monitor都add到这个list去       public void addMonitor(Monitor monitor) {           mMonitors.add(monitor);       }

1.5这里还要注意一下watchdog提供的接口函数addMonitor

在初始化5个要check的thread之后,调用addMonitor函数将binder加入monitor
①addMonitor是watchdog提供给我们的接口函数,调用mMonitorChecker的addMonitor函数,并传入monitor
所以,如果想要我们的thread被监控,就需要实
现自己的monitor函数并调用addMonitor函数将自己添加到mMonitorChecker中

   public void addMonitor(Monitor monitor) {       synchronized (this) {           if (isAlive()) {               throw new RuntimeException("Monitors can't be added once the Watchdog is running");           }           mMonitorChecker.addMonitor(monitor);       }   }
②HandlerChecker类中的addMonitor成员函数
把传入的monitor参数添加到mMonitors的list,HandlerChecker只为我们提供了一个接口
       public void addMonitor(Monitor monitor) {           mMonitors.add(monitor);       }
实现了monitor接口的thread有:
ActivityManagerService
InputManagerService 举个栗子
MountService
NativeDaemonConnector
NetworkManagementService
PowerManagerService
WindowManagerService

③举个InputManagerService的栗子

实现monitor接口,内容就是简单锁一下自己,看是否发生死锁或者block

// Called by the heartbeat to ensure locks are not held indefinitely (for deadlock detection).   @Override   public void monitor() {       synchronized (mInputFilterLock) { }       nativeMonitor(mPtr);   }......
来解释一下synchronized关键字:可以用于方法中的某个区块中,表示只对这个区块的资源实行互斥访问。
用法是:private final Object mLock = new Object(); ........... synchronized(syncObject){/*区块*/},
它的作用域是当前对象,syncObject可以是类实例或类
如果线程死锁或者阻塞,必然无法正常获取当前锁,monitor无法正常返回

在其start函数中调用watchdog的addmonitor接口函数将自己加入check List
   public void start() {       Slog.i(TAG, "Starting input manager");       nativeStart(mPtr);       // Add ourself to the Watchdog monitors.       Watchdog.getInstance().addMonitor(this);
除addMonitor外,watchdog还提供给我们另一个接口函数addThread
顾名思义,addMonitor是把对象加入mMonitorChecker也就是mHanderCheckers中的fg成员中,自然addThread就是把对象加入mHanderCheckers List中

2>终于可以进入第二小步,内容最简单,watchdog.init 

watchdog.init(context, mActivityManagerService);注册broadcast接收系统内部reboot请求,重启系统   public void init(Context context, ActivityManagerService activity) {       mResolver = context.getContentResolver();       mActivity = activity;       context.registerReceiver(new RebootRequestReceiver(),               new IntentFilter(Intent.ACTION_REBOOT),               android.Manifest.permission.REBOOT, null);       mUEventObserver.startObserving(LOG_STATE_MATCH);   }

3>第三小步,Watchdog.getInstance().start()

由于watchdog继承thread,所以start即调用其run函数,run函数是watchdog的功能核心,前面的两小步都是铺垫

我们先来看watchdog的检测机制

public void run() {       boolean waitedHalf = false;       while (true) {           final ArrayList<HandlerChecker> blockedCheckers;           final String subject;           final boolean allowRestart;           int debuggerWasConnected = 0;           synchronized (this) {CHECK_INTERVAL = DEFAULT_TIMEOUT / 2;即30s               long timeout = CHECK_INTERVAL;               // Make sure we (re)spin the checkers that have become idle within               // this wait-and-check interval取出mHandlerCheckers的每个成员,执行其scheduleCheckLocked函数每个被watchdog监测的成员都需要定时喂狗,这就是喂狗的动作               for (int i=0; i<mHandlerCheckers.size(); i++) {                   HandlerChecker hc = mHandlerCheckers.get(i);                   hc.scheduleCheckLocked();               }
3.1.1.喂狗scheduleCheckLocked,参见上图watchdog检测机制的喂狗部分

       public void scheduleCheckLocked() {mMonitors.size为0即不处理mMonitors的mHandlerCheckers List对象,即除去fg thread的其他mHandlerCheckers List成员当其处于polling轮询mode时,代表没有阻塞,设置mCompleted为true并返回           if (mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling()) {               // If the target looper has recently been polling, then               // there is no reason to enqueue our checker on it since that               // is as good as it not being deadlocked.  This avoid having               // to do a context switch to check the thread.  Note that we               // only do this if mCheckReboot is false and we have no               // monitors, since those would need to be executed at this point.               mCompleted = true;               return;           }要清楚一个概念,由于mMonitors和mCompeleted都是HanderChecker中的成员,所以mMonitors中的所有对象都是共用一个mCompeleted变量如果上一个monitor还在处理中没有返回,那mCompeleted就还是false,这种情况直接返回           if (!mCompleted) {               // we already have a check in flight, so no need               return;           }真正的喂狗动作:           mCompleted = false;           mCurrentMonitor = null;           mStartTime = SystemClock.uptimeMillis();//记录喂狗开始时间           mHandler.postAtFrontOfQueue(this);//把自己丢给mhander       }

3.1.2.postAtFrontOfQueue

postAtFrontOfQueue(this)==>run( )
该方法输入参数为Runnable对象,根据消息机制, 最终会回调HandlerChecker中的run方法,该方法会循环遍历所有的Monitor接口,具体的服务实现该接口的monitor()方法

       public void run() {很明显这边mMonitor的内容都是关于fg thread的           final int size = mMonitors.size();           for (int i = 0 ; i < size ; i++) {               synchronized (Watchdog.this) {                   mCurrentMonitor = mMonitors.get(i);               }调用每一个被监测的thread(fg checker中)的monitor接口函数               mCurrentMonitor.monitor();           }如果monitor函数可以正常执行并返回,设mCompleted为true,代表喂狗完毕           synchronized (Watchdog.this) {               mCompleted = true;               mCurrentMonitor = null;           }       }

每隔30秒会检查System_Server中重要的几把锁(包括WindowManagerService、ActivityManagerService、PowerManagerService、NetworkManagementService、MountService、InputManagerService等)、同时还会检测最重要的7个线程消息队列是否空闲(WindowManagerService、PowerManagerService、PackageManagerService、ActivityManagerService、UiThread、IOThread、MainThread),最终根据mCompleted和mStartTime值来判断是否阻塞超时60S,如果发生超时,那么将打印trace日志和kernel trace日志,最后将SystemServer干掉重启


3.1.3.evaluateCheckerCompletionLocked找到最饿的狗

这个函数很简单,遍历mHandlerCheckers成员中寻找wait state值最大的,先来了解一下所有状态值的定义:    static final int COMPLETED = 0;    static final int WAITING = 1;    static final int WAITED_HALF = 2;    static final int OVERDUE = 3;   private int evaluateCheckerCompletionLocked() {       int state = COMPLETED;       for (int i=0; i<mHandlerCheckers.size(); i++) {           HandlerChecker hc = mHandlerCheckers.get(i);           state = Math.max(state, hc.getCompletionStateLocked());       }       return state;   }   public int getCompletionStateLocked() {如果mHandlerCheckers成员已经顺利返回并且置mCompleted true,代表没有死锁也没有block,可以返回COMPLETED了           if (mCompleted) {               return COMPLETED;           } else {mWaitMax是timeout时间即60s,如果mComPleted为false并且等待时间小于30s则return WAITING相安无事,如果等待时间超过30s则return WAITTED_HALF               long latency = SystemClock.uptimeMillis() - mStartTime;               if (latency < mWaitMax/2) {                   return WAITING;               } else if (latency < mWaitMax) {                   return WAITED_HALF;               }           }           return OVERDUE;//否则等待时间超过60s,return OVERDUE     }


3.1.4.Watchdog.run- result,分析一下上一步的结果

如果wait state中最大值都是0,那说明所有被监控的线程都没有问题,给waitedHalf设false,然后可以结束这轮循环了这边需要注意下waitedHalf这个变量,他是watchdog run函数中开始while死循环之前定义的,用来记录这轮状态               if (waitState == COMPLETED) {                   // The monitors have returned; reset                   waitedHalf = false;                   continue;如果wait state为WAITING即等待时间小于30s,就先结束这轮循环并recheck,注意这次没有清waitedHalf变量了,所以waitedHalf中存着上次的状态               } else if (waitState == WAITING) {                   // still waiting but within their configured intervals; back off and recheck                   continue;如果等待时间超过30s,并且waitedHalf为false即首次等待时间超过30s,新建一个pids List,并打印堆栈信息,之后把waitedHalf设为ture               } else if (waitState == WAITED_HALF) {                   if (!waitedHalf) {                       // We've waited half the deadlock-detection interval.  Pull a stack                       // trace and wait another half.                       ArrayList<Integer> pids = new ArrayList<Integer>();                       pids.add(Process.myPid());                       ActivityManagerService.dumpStackTraces(true, pids, null, null,                               NATIVE_STACKS_OF_INTEREST);                       waitedHalf = true;                   }                   continue;               }evaluateCheckerCompletionLocked函数返回OVERDUE,代表已经超时               // something is overdue!               blockedCheckers = getBlockedCheckersLocked();//找到所有超时的成员加入blockedCheckers List               subject = describeCheckersLocked(blockedCheckers);//将阻塞线程写到字符串中方便打印到event日志               allowRestart = mAllowRestart;//设allowRestart变量为true           }

3.1.5.已超时getBlockedCheckersLocked

   private ArrayList<HandlerChecker> getBlockedCheckersLocked() {       ArrayList<HandlerChecker> checkers = new ArrayList<HandlerChecker>();       for (int i=0; i<mHandlerCheckers.size(); i++) {           HandlerChecker hc = mHandlerCheckers.get(i);用isOverdueLocked找到所有超时的成员加入checkers List           if (hc.isOverdueLocked()) {               checkers.add(hc);           }       }       return checkers;   }isOverdueLocked(),很简单,根据mCompleted和msSartTime依mWaitMax为标准判断是否超时                        public boolean isOverdueLocked() {           return (!mCompleted) && (SystemClock.uptimeMillis() > mStartTime + mWaitMax);       }
3.1.6.describeCheckersLocked

   private String describeCheckersLocked(ArrayList<HandlerChecker> checkers) {       StringBuilder builder = new StringBuilder(128);       for (int i=0; i<checkers.size(); i++) {           if (builder.length() > 0) {               builder.append(", ");           }           builder.append(checkers.get(i).describeBlockedStateLocked());       }       return builder.toString();   }       public String describeBlockedStateLocked() {注意这里用mCurrentMonitor来判断是monitor还是hander出的问题,因为mCurrentMonitor是HanderChecker类中变量,mCurrentMonitor是在进行Mmonitors check时才会去设的,并且如果monitor可以成功return后会置null所以如果mCurrentMonitor为null代表Mmonitors可以正常返回没有异常,所以问题就一定是出在hander了           if (mCurrentMonitor == null) {               return "Blocked in handler on " + mName + " (" + getThread().getName() + ")";否则mCurrentMonitor不为null,代表mMonitors出问题           } else {               return "Blocked in monitor " + mCurrentMonitor.getClass().getName()                       + " on " + mName + " (" + getThread().getName() + ")";           }       }

3.2Watchdog-run()处理机制

已超时啦

/ If we got here, that means that the system is most likely hung.First collect stack traces from all threads of the system process.// Then kill this process so that the system will restart.           EventLog.writeEvent(EventLogTags.WATCHDOG, subject);           ArrayList<Integer> pids = new ArrayList<Integer>();           pids.add(Process.myPid());           if (mPhonePid > 0) pids.add(mPhonePid);           // Pass !waitedHalf so that just in case we somehow wind up here without having           // dumped the halfway stacks, we properly re-initialize the trace file.打印system server和native进程的栈信息             final File stack = ActivityManagerService.dumpStackTraces(!waitedHalf, pids, null, null, NATIVE_STACKS_OF_INTEREST);           // Give some extra time to make sure the stack traces get written.           // The system's been hanging for a minute, another second or two won't hurt much.           SystemClock.sleep(2000);NATIVE_STACKS_OF_INTEREST string数组,指定我们要在trace中打印出来的native process   public static final String[] NATIVE_STACKS_OF_INTEREST = new String[] {       "/system/bin/audioserver",       "/system/bin/cameraserver",       "/system/bin/drmserver",       "/system/bin/mediadrmserver","/system/bin/gx_fpd",       "/system/bin/fingerprintd",       "/system/bin/mediaserver",       "/system/bin/sdcard",       "/system/bin/surfaceflinger",       "media.codec",     // system/bin/mediacodec       "media.extractor", // system/bin/mediaextractor       "com.android.bluetooth",  // Bluetooth service   };

3.2.1.dumpKernelStackTraces打印kernel stack信息

/ Set this to true to have the watchdog record kernel thread stacks when it fires=> static final boolean RECORD_KERNEL_THREADS = true;           // Pull our own kernel thread stacks as well if we're configured for that           if (RECORD_KERNEL_THREADS) {               dumpKernelStackTraces();           }   private File dumpKernelStackTraces() {这个prop的值是/data/anr/traces.txt       String tracesPath = SystemProperties.get("dalvik.vm.stack-trace-file", null);       if (tracesPath == null || tracesPath.length() == 0) {           return null;       }       native_dumpKernelStacks(tracesPath);       return new File(tracesPath);   }   private native void native_dumpKernelStacks(String tracesPath);}调用jni->android_server_watchdog.cppnamespace android {static const JNINativeMethod g_methods[] = {   { "native_dumpKernelStacks", "(Ljava/lang/String;)V", (void*)dumpKernelStacks },};int register_android_server_Watchdog(JNIEnv* env) {   return RegisterMethodsOrDie(env, "com/android/server/Watchdog", g_methods, NELEM(g_methods));}}

3.2.2.dumpKernelStacks(android_server_watchdog.cpp)

static void dumpKernelStacks(JNIEnv* env, jobject clazz, jstring pathStr) {   char buf[128];   DIR* taskdir;   ALOGI("dumpKernelStacks"); jni->android_server_watchdog.cpp   if (!pathStr) {       jniThrowException(env, "java/lang/IllegalArgumentException", "Null path");       return; }   const char *path = env->GetStringUTFChars(pathStr, NULL);打开/data/anr/trace.txt文件   int outFd = open(path, O_WRONLY | O_APPEND | O_CREAT,       S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH);   if (outFd < 0) {       ALOGE("Unable to open stack dump file: %d (%s)", errno, strerror(errno));       goto done; }把这句话写入trace文件   snprintf(buf, sizeof(buf), "\n----- begin pid %d kernel stacks -----\n", getpid());   write(outFd, buf, strlen(buf));寻找当前进程中的所有thread,即读取/proc/pid/task目录   // look up the list of all threads in this process   snprintf(buf, sizeof(buf), "/proc/%d/task", getpid());   taskdir = opendir(buf);   if (taskdir != NULL) {       struct dirent * ent;打印所有thread的stack信息       while ((ent = readdir(taskdir)) != NULL) {           int tid = atoi(ent->d_name);           if (tid > 0 && tid <= 65535) {               // dump each stack trace               dumpOneStack(tid, outFd);           }       }       closedir(taskdir);   }   snprintf(buf, sizeof(buf), "----- end pid %d kernel stacks -----\n", getpid());   write(outFd, buf, strlen(buf));   close(outFd);done:   env->ReleaseStringUTFChars(pathStr, path);}

3.2.3.给生成的trace文件加时间戳,Add timestamp for traces

给目前生成的trace文件更名加上时间戳,防止文件后续被覆盖,注意这里是先生成trace.txt再更名的           String tracesPath = SystemProperties.get("dalvik.vm.stack-trace-file", null);           String traceFileNameAmendment = "_SystemServer_WDT" + mTraceDateFormat.format(new Date());           if (tracesPath != null && tracesPath.length() != 0) {               File traceRenameFile = new File(tracesPath);               String newTracesPath;               int lpos = tracesPath.lastIndexOf (".");               if (-1 != lpos)                   newTracesPath = tracesPath.substring (0, lpos) + traceFileNameAmendment + tracesPath.substring (lpos);               else                   newTracesPath = tracesPath + traceFileNameAmendment;Slog.d(TAG, "Watchdog File:2 " + traceRenameFile + " rename to " + newTracesPath);               traceRenameFile.renameTo(new File(newTracesPath));               tracesPath = newTracesPath;           }           final File newFd = new File(tracesPath);           // Try to add the error to the dropbox, but assuming that the ActivityManager           // itself may be deadlocked.  (which has happened, causing this statement to           // deadlock and the watchdog as a whole to be ineffective)           Thread dropboxThread = new Thread("watchdogWriteToDropbox") {                   public void run() {                       mActivity.addErrorToDropBox(                               "watchdog", null, "system_server", null, null,                               subject, null, newFd, null);                   }               };           dropboxThread.start();           try {               dropboxThread.join(2000);  // wait up to 2 seconds for it to return.           } catch (InterruptedException ignored) {}

3.2.4.根据属性值判断,触发watchdog后是否要进ramdump, persist.sys.crashOnWatchdog

通过判断persist.sys.crashOnWatchdog prop的值来判定,触发watchdog的时候是否要进ramdump,通过/proc/sysrq-trigger结点实现           // At times, when user space watchdog traces don't give an indication on           // which component held a lock, because of which other threads are blocked,           // (thereby causing Watchdog), crash the device to analyze RAM dumps           boolean crashOnWatchdog = SystemProperties                                       .getBoolean("persist.sys.crashOnWatchdog", false);           if (crashOnWatchdog) {               // Trigger the kernel to dump all blocked threads, and backtraces               // on all CPUs to the kernel log               Slog.e(TAG, "Triggering SysRq for system_server watchdog");               doSysRq('w');               doSysRq('l');               // wait until the above blocked threads be dumped into kernel log               SystemClock.sleep(3000);               // now try to crash the target               doSysRq('c');      }


3.2.5.,最后,是monkey对watchdog的拦截部分

判断mController的值                      IActivityController controller;           synchronized (this) {               controller = mController;           }           if (controller != null) {               Slog.i(TAG, "Reporting stuck state to activity controller");               try {                   Binder.setDumpDisabled("Service dumps disabled due to hung system process.");                   // 1 = keep waiting, -1 = kill system                   int res = controller.systemNotResponding(subject);                   if (res >= 0) {                       Slog.i(TAG, "Activity controller requested to coninue to wait");                       waitedHalf = false;                       continue;                   }               } catch (RemoteException e) {               }           }mController在setActivityController函数中被赋值,内容为函数参数:IActivityController类型的controller   public void setActivityController(IActivityController controller) {       synchronized (this) {           mController = controller;       }   }
3.2.5.1.具体来看monkey的拦截实现

由system server中的setActivityController函数来实现对外接口,并打包watchdog中的setActivityController函数   public void setActivityController(IActivityController controller, boolean imAMonkey) {       enforceCallingPermission(android.Manifest.permission.SET_ACTIVITY_WATCHER,               "setActivityController()");       synchronized (this) {           mController = controller;           mControllerIsAMonkey = imAMonkey;           Watchdog.getInstance().setActivityController(controller);       }   }<Monkey.java> (cmds\monkey\src\com\android\commands\monkey)monkey中调用setActivityController接口,传入自身IActivityController类型参数       try {           mAm.setActivityController(new ActivityController(), true);           mNetworkMonitor.register(mAm);       } catch (RemoteException e) {           System.err.println("** Failed talking with activity manager!");           return false;       }private class ActivityController extends IActivityController.Stub {       public boolean activityStarting(Intent intent, String pkg) {           boolean allow = MonkeyUtils.getPackageFilter().checkEnteringPackage(pkg)                   || (DEBUG_ALLOW_ANY_STARTS != 0);           if (mVerbose > 0) {
3.2.5.2.Monkey-systemNotResponding,拦截后调用的自然是monkey的systemNotResponding函数

       public int systemNotResponding(String message) {           StrictMode.ThreadPolicy savedPolicy = StrictMode.allowThreadDiskWrites();           System.err.println("// WATCHDOG: " + message);           StrictMode.setThreadPolicy(savedPolicy);           synchronized (Monkey.this) {               if (!mIgnoreCrashes) {                   mAbort = true;               }               if (mRequestBugreport) {                   mRequestWatchdogBugreport = true;               }               mWatchdogWaiting = true;           }           synchronized (Monkey.this) {               while (mWatchdogWaiting) {                   try {                       Monkey.this.wait();                   } catch (InterruptedException e) {                   }               }           }           return (mKillProcessAfterError) ? -1 : 1;       }根据mKillProcessAfterError值决定函数返回结果,此值默认false,但是当monkey中定义了--kill-process-after-error参数时才会设true所以,上述systemNotResponding函数返回1,自然watchdog会继续wait,继续continue进行下次循环,而不会kill掉system server重启framework               } else if (opt.equals("--kill-process-after-error")) {                   mKillProcessAfterError = true;watchdog和monkey之间通过binder通信,当binder通信异常会释放当前transaction,所以watchdong就会开始kill掉system server进行重启framework了


3.2.6.最后的最后,如果没有monkey拦截,就是framework的重启了。Kill system_server& reboot framework

kill掉system server,system重启           // Only kill the process if the debugger is not attached.           if (Debug.isDebuggerConnected()) {               debuggerWasConnected = 2;           }           if (debuggerWasConnected >= 2) {               Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process");           } else if (debuggerWasConnected > 0) {               Slog.w(TAG, "Debugger was connected: Watchdog is *not* killing the system process");           } else if (!allowRestart) {               Slog.w(TAG, "Restart not allowed: Watchdog is *not* killing the system process");           } else {               Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);               for (int i=0; i<blockedCheckers.size(); i++) {                   Slog.w(TAG, blockedCheckers.get(i).getName() + " stack trace:");                   StackTraceElement[] stackTrace                           = blockedCheckers.get(i).getThread().getStackTrace();                   for (StackTraceElement element: stackTrace) {                       Slog.w(TAG, "    at " + element);                   }               }               Slog.w(TAG, "*** GOODBYE!");               Process.killProcess(Process.myPid());               System.exit(10);           }           waitedHalf = false;