A muti-threads touble shooting experience

来源:互联网 发布:油画在线网络课程 编辑:程序博客网 时间:2024/06/07 02:19

The new SDK cost months effort of several engineers crashed in alpha stage reported by the product team. It seams there are two threads still running but the DLL was unloaded, the whole team devoted to this defeat immediately, we can never tolerate crash. we found that there are two asynchronous threads which wasn't used , if we disable them, the crash can't be reproduced. So we have found the story inside the defeat. But why?

So we write a unit testing program to test the task class. The code looks reasonable and the logic is so clear. The operation serial is: task.run(int i); task.execute(boost::function0); task.stop(); task.wait(); the class task has a threadqueue member.  

  • run() will creat i asynchronous  threads and these threads will get functors from the queue, if the queue is empty, the threads will stop working and wait, if the functor is empty, the thread will exit.  A new thread is created by boost::thread_group.create_thread, and the thread function is do_run(), a thread counter will increase 1 first of all and decrease 1 when thread exist.
  • execute() puts a functor to the queue.
  • stop() puts a empty functor the queue.
  • wait(), if the thread counter > 0 then use boost::thread_group.join_all() to wait until all threads exit properly.

OK that's all, the logic is simple and clear but the program crashs. and the the system some times promote "Thread deadlock" and dead in WaitForSingleObject() and some times promote "thread can't be joinable", and some times it hungs when get functor from the threadque, i even doubt about the stability of the threadque, so i write another unit testing program to test it, but it works, so there are no way but debug , debug and debug.

  Finally  I found threads are active while the class destroyed. How could this happen? The truth is that the thread will not run immediately after created, so the thread counter is still 0 , when the other threads call wait(), join_all() does not work at all. After that the class was destroyed.

The proper way is to use boost::thread_group.size() to get the current thread numbers. Finally i think we will find the truth more quicly if prompt  from the system is accurate and useful  and the "deadlock", "thread can't be joinable" misdirect us all.

原创粉丝点击