一次system hang的调试经历

来源:互联网 发布:知乎 量化分析师 招聘 编辑:程序博客网 时间:2024/06/07 00:02

调试了3周,终于把问题找到了,今晚不加班:-).

问题描述

在高通的一个项目中,我们在modem侧制造crash,这个crash会通过管道传递到linux这端。Linux这侧收到这个消息后,就会将重启任务放到一个work中,通过schedule_work将这个work加入system_wq. 等待worker线程执行重启任务。但是经过反复的重启测试,发现会有偶现的系统hang无法重启的情况。

问题定位

对于system hang的问题,高通提供了一套工具,名叫linux-ramdump-parser-v2,可以将ramdump出来,然后用这个工具去解析。

Log分析

下面是workqueue的解析情况,当前平台只有一个CPU,对于bind的线程池,每个 CPU有两个,一个是普通优先级的线程池,一个是高优先级的线程池。高优先级线程池中的线程如kwoker/0:0H。下面pool 0是普通线程池,pool 1是高优先级线程池。

CPU 0
pool 0
BUSY Workqueue worker: kworker/0:8 current_work: (None)
BUSY Workqueue worker: kworker/0:9 current_work: (None)
BUSY Workqueue worker: kworker/0:10 current_work: (None)
BUSY Workqueue worker: kworker/0:11 current_work: (None)
BUSY Workqueue worker: kworker/0:12 current_work: (None)
BUSY Workqueue worker: kworker/0:13 current_work: (None)
BUSY Workqueue worker: kworker/0:2 current_work: (None)
BUSY Workqueue worker: kworker/0:14 current_work: (None)
BUSY Workqueue worker: kworker/0:5 current_work: (None)
BUSY Workqueue worker: kworker/0:4 current_work: (None)
BUSY Workqueue worker: kworker/0:3 current_work: (None)
BUSY Workqueue worker: kworker/0:0 current_work: (None)
BUSY Workqueue worker: kworker/0:6 current_work: (None)
BUSY Workqueue worker: kworker/0:7 current_work: (None)
IDLE Workqueue worker: kworker/0:1 current_work: (None)
IDLE Workqueue worker: kworker/0:15 current_work: (None)
Pending entry: device_restart_work_hdlr
Pending entry: ubiblock_do_work
Pending entry: ubiblock_do_work
Pending entry: ubiblock_do_work
Pending entry: ubiblock_do_work
Pending entry: ubiblock_do_work
Pending entry: pm_runtime_work
Pending entry: neigh_periodic_work
Pending entry: do_cache_clean
Pending entry: neigh_periodic_work
Pending entry: push_to_pool
Pending entry: push_to_pool
Pending entry: addrconf_verify_work
Pending entry: check_lifetime
Pending entry: release_one_tty
Pending entry: console_callback
pool 1
IDLE Workqueue worker: kworker/0:0H current_work: (None)

从上面的日志可以看出普通线程池有大量的BUSY线程,同时也有两个IDLE线程。理论上,只要有IDLE线程,我们的重启任务device_restart_work_hdlr就会被IDLE线程执行。从CMWQ(Concurrency Managed Workqueue)的设计上来说,线程池中一个时间点只允许1个线程运行。于是我们可以猜测,线程池中确实有一个线程正在执行,并且这个线程获取了某种锁,导致其他线程无法获取到锁而睡眠。我们需要这些kworker的backtrace,看看他们在干什么。这里只举两个例子,kworker/0:10和kworker/0:14。

=====================================================  Process: kworker/0:10, cpu: 0 pid: 292 start: 0xddea3600 ===================================================== Task name: kworker/0:10 pid: 292 cpu: 0 state: 0x2 exit_state: 0x0 stack base: 0xdd9dc000 Stack: [<c0828bd4>] __schedule+0x2c8 [<c038a4bc>] msm_nand_read_oob+0xec [<c038b290>] msm_nand_read+0x22c [<c03838b8>] part_read+0x44 [<c03818c0>] mtd_read+0x70 [<c03966b0>] ubi_io_read+0x1b0 [<c03944a0>] ubi_eba_read_leb+0x254 [<c0394674>] ubi_eba_read_leb_sg+0xcc [<c03937c0>] ubi_leb_read_sg+0x68 [<c039e25c>] ubiblock_do_work+0x8c [<c002fc00>] process_one_work+0x1e4 [<c00302f4>] worker_thread+0x2f0 [<c00339f8>] kthread+0xc4 [<c000dc40>] ret_from_fork+0x14======================================================= Process: kworker/0:14, cpu: 0 pid: 296 start: 0xdd076880 ===================================================== Task name: kworker/0:14 pid: 296 cpu: 0 state: 0x0 exit_state: 0x0 stack base: 0xde3d4000 Stack: [<c0828bd4>] __schedule+0x2c8 [<c0828ee4>] preempt_schedule+0x3c [<c055ab00>] sps_get_iovec+0x20c [<c0388d40>] msm_nand_sps_get_iovec+0x1c [<c038a898>] msm_nand_read_oob+0x4c8 [<c038b290>] msm_nand_read+0x22c [<c03838b8>] part_read+0x44 [<c03818c0>] mtd_read+0x70 [<c03966b0>] ubi_io_read+0x1b0 [<c03944a0>] ubi_eba_read_leb+0x254 [<c0394674>] ubi_eba_read_leb_sg+0xcc [<c03937c0>] ubi_leb_read_sg+0x68 [<c039e25c>] ubiblock_do_work+0x8c [<c002fc00>] process_one_work+0x1e4 [<c00302f4>] worker_thread+0x2f0 [<c00339f8>] kthread+0xc4 [<c000dc40>] ret_from_fork+0x14

从下面代码不难看出,msm_nand_read_oob中有个mutex_lock,kworker/0:14获取这里这把锁,继续往下走,而其他pool 0中的kworker则获取不到这个锁就schedule了。

        for (n = 0; n < (cmd_list->count - 1); n++) {            iovec->addr =  msm_virt_to_dma(chip,                        &cmd_list->cw_desc[n].ce[0]);            iovec->size = sizeof(struct sps_command_element) *                        cmd_list->cw_desc[n].num_ce;            iovec->flags = cmd_list->cw_desc[n].flags;            iovec++;        }        mutex_lock(&info->lock);        err = msm_nand_get_device(chip->dev);        if (err)            goto unlock_mutex;        /* Submit data descriptors */        for (n = rw_params.start_sector; n < cwperpage; n++) {            err = msm_nand_submit_rw_data_desc(ops,                        &rw_params, info, n, 0);            if (err) {                pr_err("Failed to submit data descs %d\n", err);                panic("error in nand driver\n");                goto put_dev;            }        }

重新回到kworker/0:14的backtrace,从这里看,这个kworker也最终调用__schedule,而且貌似再也没有调度回来了。但真的是这样吗?CFS给了我们答案,这个kworker线程一直在CFS pending队列中,也就是说他一定会有机会运行。

======================= RUNQUEUE STATE ============================                                                                                                                                                                    CPU0 2 process is running                                                                                                                                                                                                              |--curr: fwupdateDaemon(567)                                                                                                                                                                                                          |--idle: swapper(0)                                                                                                                                                                                                                   |--stop: None(0)                                                                                                                                                                                                                      CFS 2 process is pending                                                                                                                                                                                                               |--curr: 0 process is grouping                                                                                                                                                                                                        |   |--curr: None(0)                                                                                                                                                                                                                  |   |--next: None(0)                                                                                                                                                                                                                  |   |--last: None(0)                                                                                                                                                                                                                  |   |--skip: None(0)                                                                                                                                                                                                                  |--next: None(0)                                                                                                                                                                                                                      |--last: None(0)                                                                                                                                                                                                                         8  |--skip: None(0)                                                                                                                                                                                                                         9  |--pend: kworker/0:14(296)              

只能看看kworker/0:14调用的其他函数。sps_get_iovec这个函数在spin_unlock的时候开启抢占,继而调用__schedule,这个函数是没有问题的。最后发现msm_nand_sps_get_iovec貌似有些问题,中间有个while循环。我们猜测,在条件判断中,可能出现一直是true,然后无法返回的情况。最后在while循环中加上pr_err,再次测试。果然,出现问题时,这个while没有退出。因为sps设备是modem和Linux共用的,当modem crash的时候,sps可能无法正常运行,造成Linux这端无法读取到数据。因为这个线程一直运行,普通优先级线程池中无法唤醒其他的IDLE线程处理我们的重启任务。

static int msm_nand_sps_get_iovec(struct sps_pipe *pipe, uint32_t indx,                unsigned int cnt, struct sps_iovec *iovec){    int ret = 0;    do {        do {            ret = sps_get_iovec((pipe), (iovec));        } while (((iovec)->addr == 0x0) && ((iovec)->size == 0x0));        if (ret)            return ret;    } while (--(cnt));    return ret;}

解决方法

  1. 因为重启任务本来就是非常紧急的,可以将这个work加入pool 1,也就是高优先级线程池。即使pool 0所有线程挂起,pool 1仍然可以唤起一个IDLE工作线程处理重启任务。
  2. 可以在上面的循环加上计数,在重试一定次数后退出。

工作队列相关文档

http://kernel.meizu.com/linux-workqueue.html
http://www.wowotech.net/irq_subsystem/cmwq-intro.html