Chrome的Crash Report服务(三)

来源:互联网 发布:空手道网络数据集 编辑:程序博客网 时间:2024/05/12 10:43

Chrome如何捕获程序的异常?

   一个C++程序, 当发生异常时,比如内存访问违例时,CPU硬件会发现此问题,并产生一个异常(你可以把它理解为中断),然后CPU会把代码流程切换到异常处理服务例程。操作系统异常处理服务例程会查看当前进程是否处于调试状态,如果是,则通知调试器发生了异常,如果不是则操作系统会查看当前线程是否安装了的异常帧链,如果安装了SEH(try.... catch....),则调用SEH,并根据返回结果决定是否全局展开或者局部展开。如果异常链中所有的SEH都没有处理此异常,而且此进程还处于调试状态,则操作系统会再次通知调试器发生异常(二次异常)。如果还没人处理,则调用操作系统的默认异常处理代码UnhandledExceptionHandler,不过操作系统允许你Hook这个函数,就是通过 SetUnhandledExceptionFilter函数来设置。大部分异常通过此种方法都能捕获。

  不过在Visual C++ 2005之后MicrosoftCRTC运行时库)的一些与安全相关的代码做了些改动,典型的,例如增加了对缓冲溢出的检查。新CRT版本在出现错误时强制把异常抛给默认的调试器(如果没有配置的话,默认是Dr.Watson),而不再通知应用程序设置的异常捕获函数,这种行为主要在以下两种情况出现。
  (1)  
遇到_invalid_parameter错误,而应用程序又没有主动调用_set_invalid_parameter_handler设置错误捕获函数。

 

  (2) 虚函数调用错误,而应用程序又没有主动调用_set_purecall_handler设置捕获函数。
在Chrome中对这两种情况也做了特殊处理。专门设置了两个回调函数进行捕获处理。

Chrome的Crash Report主要流程

在Chrome中,支持两种不同模式的Dump。
进程外Dump:由独立的Crash Handle Process处理Dump的生成过程,主进程产生异常时,通过IPC方式通知Crash Handle Process。由Crash Handle Process中的crash_generation_server负责写Dump文件。大致流程如下:

进程外捕获方式
上图中,crash_generation_client和crash_generation_server之间是进程间通讯(IPC)。crash_report_sender负责将dump信息发送到google的crash report server(https://clients2.google.com/cr/report)。
进程内Dump:与进程外方式类似,只不过在Browser进程中增加了一个crash_handle_thread线程,由此线程负责写dump.基本流程如下:
进程内捕获方式

crash_genration_client的实现

几个关键信号量变量


  HANDLE server_alive_;
表示crash_handle_process是否活动的变量

HANDLE crash_event_;
表示crash_generation_client是否有exception事件发生的信号量。在crash_generation_client和crash_generation_server建立IPC通道后,crash_generation_server将等待这个信号量。

HANDLE crash_generated_;
表示crash_generation_server是否已写完dump文件的信号量。由crash_generation_server在写完dum文件后,设置该信号量。

几个关键变量

CustomClientInfo custom_info_;
描述当前发生Exception的进程的一些信息,在这里可能是Browser进程,也可能是Render进程。

EXCEPTION_POINTERS* exception_pointers_;
异常发生时,所有异常信息保存该指针指向的内存中。

MDRawAssertionInfo assert_info_;
Assert异常信息指针。

在crash_generation_client初始化时,将向crash_generation_server注册,建立ICP通道,且把上面几个地址发送给crash_generation_server,当后续crash_generation_client发生异常时,crash_generation_server将从这几个地址中读取信息,生成dump文件。(当然这是进程外模式,进程内模式由browser进程内的独立线程完成这些工作。)

一个关键函数

下面函数是
  1. bool CrashGenerationClient::SignalCrashEventAndWait() {
  2.   assert(crash_event_);
  3.   assert(crash_generated_);
  4.   assert(server_alive_);
  5.  
  6.   // Reset the dump generated event before signaling the crash
  7.   // event so that the server can set the dump generated event
  8.   // once it is done generating the event.
  9.   if (!ResetEvent(crash_generated_)) {
  10.     return false;
  11.   }
  12.  
  13.   if (!SetEvent(crash_event_)) {
  14.     return false;
  15.   }
  16.  
  17.   HANDLE wait_handles[kWaitEventCount] = {crash_generated_, server_alive_};
  18.  
  19.   DWORD result = WaitForMultipleObjects(kWaitEventCount,
  20.                                         wait_handles,
  21.                                         FALSE,
  22.                                         kWaitForServerTimeoutMs);
  23.  
  24.   // Crash dump was successfully generated only if the server
  25.   // signaled the crash generated event.
  26.   return result == WAIT_OBJECT_0;
  27. }

这个函数是crash_generation_client产生exception时,如何和服务器交互的。基本上在上面介绍变量时已经介绍到了。
crash_generation_client是如何捕获异常的
在本文开始部分已经描述了原理。我们可以看一下实现。

  1. void ExceptionHandler::Initialize(const wstring& dump_path,
  2.                                   FilterCallback filter,
  3.                                   MinidumpCallback callback,
  4.                                   void* callback_context,
  5.                                   int handler_types,
  6.                                   MINIDUMP_TYPE dump_type,
  7.                                   const wchar_t* pipe_name,
  8.                                   const CustomClientInfo* custom_info) {
  9.   LONG instance_count = InterlockedIncrement(&instance_count_);
  10.   filter_ = filter;
  11.   callback_ = callback;
  12.   callback_context_ = callback_context;
  13.   dump_path_c_ = NULL;
  14.   next_minidump_id_c_ = NULL;
  15.   next_minidump_path_c_ = NULL;
  16.   dbghelp_module_ = NULL;
  17.   minidump_write_dump_ = NULL;
  18.   dump_type_ = dump_type;
  19.   rpcrt4_module_ = NULL;
  20.   uuid_create_ = NULL;
  21.   handler_types_ = handler_types;
  22.   previous_filter_ = NULL;
  23. #if _MSC_VER >= 1400  // MSVC 2005/8
  24.   previous_iph_ = NULL;
  25. #endif  // _MSC_VER >= 1400
  26.   previous_pch_ = NULL;
  27.   handler_thread_ = NULL;
  28.   is_shutdown_ = false;
  29.   handler_start_semaphore_ = NULL;
  30.   handler_finish_semaphore_ = NULL;
  31.   requesting_thread_id_ = 0;
  32.   exception_info_ = NULL;
  33.   assertion_ = NULL;
  34.   handler_return_value_ = false;
  35.   handle_debug_exceptions_ = false;
  36.  
  37.   // Attempt to use out-of-process if user has specified pipe name.
  38.   if (pipe_name != NULL) {
  39.     scoped_ptr<CrashGenerationClient> client(
  40.         new CrashGenerationClient(pipe_name,
  41.                                   dump_type_,
  42.                                   custom_info));
  43.  
  44.     // If successful in registering with the monitoring process,
  45.     // there is no need to setup in-process crash generation.
  46.     if (client->Register()) {
  47.       crash_generation_client_.reset(client.release());
  48.     }
  49.   }
  50.  
  51.   if (!IsOutOfProcess()) {
  52.     // Either client did not ask for out-of-process crash generation
  53.     // or registration with the server process failed. In either case,
  54.     // setup to do in-process crash generation.
  55.  
  56.     // Set synchronization primitives and the handler thread.  Each
  57.     // ExceptionHandler object gets its own handler thread because that's the
  58.     // only way to reliably guarantee sufficient stack space in an exception,
  59.     // and it allows an easy way to get a snapshot of the requesting thread's
  60.     // context outside of an exception.
  61.     InitializeCriticalSection(&handler_critical_section_);
  62.     handler_start_semaphore_ = CreateSemaphore(NULL, 0, 1, NULL);
  63.     assert(handler_start_semaphore_ != NULL);
  64.  
  65.     handler_finish_semaphore_ = CreateSemaphore(NULL, 0, 1, NULL);
  66.     assert(handler_finish_semaphore_ != NULL);
  67.  
  68.     // Don't attempt to create the thread if we could not create the semaphores.
  69.     if (handler_finish_semaphore_ != NULL && handler_start_semaphore_ != NULL) {
  70.       DWORD thread_id;
  71.       handler_thread_ = CreateThread(NULL,         // lpThreadAttributes
  72.                                      kExceptionHandlerThreadInitialStackSize,
  73.                                      ExceptionHandlerThreadMain,
  74.                                      this,         // lpParameter
  75.                                      0,            // dwCreationFlags
  76.                                      &thread_id);
  77.       assert(handler_thread_ != NULL);
  78.     }
  79.  
  80.     dbghelp_module_ = LoadLibrary(L"dbghelp.dll");
  81.     if (dbghelp_module_) {
  82.       minidump_write_dump_ = reinterpret_cast<MiniDumpWriteDump_type>(
  83.           GetProcAddress(dbghelp_module_, "MiniDumpWriteDump"));
  84.     }
  85.  
  86.     // Load this library dynamically to not affect existing projects.  Most
  87.     // projects don't link against this directly, it's usually dynamically
  88.     // loaded by dependent code.
  89.     rpcrt4_module_ = LoadLibrary(L"rpcrt4.dll");
  90.     if (rpcrt4_module_) {
  91.       uuid_create_ = reinterpret_cast<UuidCreate_type>(
  92.           GetProcAddress(rpcrt4_module_, "UuidCreate"));
  93.     }
  94.  
  95.     // set_dump_path calls UpdateNextID.  This sets up all of the path and id
  96.     // strings, and their equivalent c_str pointers.
  97.     set_dump_path(dump_path);
  98.   }
  99.  
  100.   // There is a race condition here. If the first instance has not yet
  101.   // initialized the critical section, the second (and later) instances may
  102.   // try to use uninitialized critical section object. The feature of multiple
  103.   // instances in one module is not used much, so leave it as is for now.
  104.   // One way to solve this in the current design (that is, keeping the static
  105.   // handler stack) is to use spin locks with volatile bools to synchronize
  106.   // the handler stack. This works only if the compiler guarantees to generate
  107.   // cache coherent code for volatile.
  108.   // TODO(munjal): Fix this in a better way by changing the design if possible.
  109.  
  110.   // Lazy initialization of the handler_stack_critical_section_
  111.   if (instance_count == 1) {
  112.     InitializeCriticalSection(&handler_stack_critical_section_);
  113.   }
  114.  
  115.   if (handler_types != HANDLER_NONE) {
  116.     EnterCriticalSection(&handler_stack_critical_section_);
  117.  
  118.     // The first time an ExceptionHandler that installs a handler is
  119.     // created, set up the handler stack.
  120.     if (!handler_stack_) {
  121.       handler_stack_ = new vector<ExceptionHandler*>();
  122.     }
  123.     handler_stack_->push_back(this);
  124.  
  125.     if (handler_types & HANDLER_EXCEPTION)
  126.       previous_filter_ = SetUnhandledExceptionFilter(HandleException);
  127.  
  128. #if _MSC_VER >= 1400  // MSVC 2005/8
  129.     if (handler_types & HANDLER_INVALID_PARAMETER)
  130.       previous_iph_ = _set_invalid_parameter_handler(HandleInvalidParameter);
  131. #endif  // _MSC_VER >= 1400
  132.  
  133.     if (handler_types & HANDLER_PURECALL)
  134.       previous_pch_ = _set_purecall_handler(HandlePureVirtualCall);
  135.  
  136.     LeaveCriticalSection(&handler_stack_critical_section_);
  137.   }
  138. }


在该函数的Line126中,调用了SetUnhandledExceptionFilter函数,设置了我们要处理的回调函数。
另外针对invalid paramter和purecall两种在VC2005中不支持的特性,做了特殊处理。

crash_generation_server的实现

crash_generation_server基本上就是一个IPC Server。负责监听各个crash_generation_client的请求。
crash_generation_server的关键函数也就是一个简单的状态机函数:

void CrashGenerationServer::HandleConnectionRequest() {

  // If we are shutting doen then get into ERROR state, reset the event so more

  // workers don't run and return immediately.

  if (shutting_down_) {

    server_state_ = IPC_SERVER_STATE_ERROR;

    ResetEvent(overlapped_.hEvent);

    return;

  }

 

  switch (server_state_) {

    case IPC_SERVER_STATE_ERROR:

      HandleErrorState();

      break;

 

    case IPC_SERVER_STATE_INITIAL:

      HandleInitialState();

      break;

 

    case IPC_SERVER_STATE_CONNECTING:

      HandleConnectingState();

      break;

 

    case IPC_SERVER_STATE_CONNECTED:

      HandleConnectedState();

      break;

 

    case IPC_SERVER_STATE_READING:

      HandleReadingState();

      break;

 

    case IPC_SERVER_STATE_READ_DONE:

      HandleReadDoneState();

      break;

 

    case IPC_SERVER_STATE_WRITING:

      HandleWritingState();

      break;

 

    case IPC_SERVER_STATE_WRITE_DONE:

      HandleWriteDoneState();

      break;

 

    case IPC_SERVER_STATE_READING_ACK:

      HandleReadingAckState();

      break;

 

    case IPC_SERVER_STATE_DISCONNECTING:

      HandleDisconnectingState();

      break;

 

    default:

      assert(false);

      // This indicates that we added one more state without

      // adding handling code.

      server_state_ = IPC_SERVER_STATE_ERROR;

      break;

  }

}

这个函数负责维护IPC的各种连接状态。并进行不同处理,相当直观,无须赘述!

crash_report_sender的实现

这个实现非常简单,模拟了一个表单的提交,将minidump信息封装成一个MIME类型,通过HTTP方式提交到服务器上。估计google的crash report server(https://clients2.google.com/cr/report)也就是一个简单的网页处理脚本,完全可以认为是通过一个表单提交上来的信息。

Browser如何使用crash report服务

首先,crash_handle process是一个独立运行的程序,负责监听chrome进程的请求。

 

crash handle process

 

 

其次,在Browser初始化时,生成crash_generation_client实例,

 

在chrome的主函数入口中包含了

 

  // Initialize the crash reporter.
  InitCrashReporterWithDllPath(dll_full_path);

 

这一行代码,在这个函数中生成了一个全局变量

 

  g_breakpad = new google_breakpad::ExceptionHandler(temp_dir, NULL, callback,
                   NULL, google_breakpad::ExceptionHandler::HANDLER_ALL,
                   dump_type, pipe_name.c_str(), info->custom_info);

 

其中ExceptionHandler类包含了CrashGenerationClient实例。

 

由于Crash Report服务应该是越早启动越好,因此我们也可以看到chrome初始化该变量的位置也是相当的靠前。

小节

Google的crash_report服务几个关键点:
1.Minidump的定制化处理机制。
2.进程外dump写机制。
3.chrome是如何捕获Exception的。