lockup watchdogs & rcu stall detector

在嵌入式开发中,出现freeze的问题很让人头疼,就是那种不发生kernel panic,什么log不输出,串口终端死掉,无法输入输出。这时候,肯定是哪里出现了死锁或者死循环,长时间占用CPU资源,导致其他进程无法运行。
比如说,在底层的USB Host控制器驱动中,设置完某个控制寄存器,然后用while来检查状态寄存器的某一位是否满足条件。如果该控制器本身出现了问题,那么,将会一直卡在while中无法退出,此时就会出现freeze。
还有,我们有时候会看到watchdog/0进程的CPU占用率非常高,此时也有可能是哪个循环语句占用太长的时间导致的。
以下的内容参照:http://blog.csdn.net/luckyapple1028/article/details/51932414
这篇文章帮助很大,在此表示感谢。


lockup watchdogs detector简介

关于lockup watchdogs detector,可以参照:Documentation/lockup-watchdogs.txt文章。Linux Kernel中有个看门狗,用于检测死锁问题,包括softlockuphardlockup
softlockup的介绍如下:

A softlockup is defined as a bug that causes the kernel to loop in kernel mode for more than 20 seconds (see Implementation below fordetails), without giving other tasks a chance to run.
The current stack trace is displayed upon detection and, by default, the system will stay locked up. Alternatively, the kernel can be configured to panic;
a sysctl, kernel.softlockup_panic, a kernel parameter,softlockup_panic (see Documentation/kernel-parameters.txt for details), and a compile option, BOOTPARAM_HARDLOCKUP_PANIC, are provided for this.

hardlockup介绍如下:

A hardlockup is defined as a bug that causes the CPU to loop in kernel mode for more than 10 seconds (see Implementation below fordetails), without letting other interrupts have a chance to run.
Similarly to the softlockup case, the current stack trace is displayedupon detection and the system will stay locked up unless the default behavior is changed, which can be done through a compile time knob,
BOOTPARAM_HARDLOCKUP_PANIC, and a kernel parameter, nmi_watchdog (see Documentation/kernel-parameters.txt for details).

softlockup是进程上下文出现的死锁,hardlockup是中断上下文出现的死锁。

R状态死锁指的是某一任务一直处于TASK_RUNNING态且一直占用着CPU,从而导致其他进程得不到调度而饿死的情况。一般情况下,R状态死锁较可能是由于程序出现死循环导致的,可以出现在内核态的进程上下文中(内核配置为非抢占式,soft lockup),也可以出现在中断上下文中的中断处理程序中(hard lockup)。
异常的程序一直运行,CPU无法调度到其他的任务运行,对于单CPU的设备,则直接的表现就是“死机”。这种死锁现象较难定位,内核也同样提供了一种检测手段来检测这种死锁并向用户发出告警–LOCKUP_DETECTOR,它可支持监测进程上下文和中断上下文中的R状态死锁(SOFTLOCKUP_DETECTORHARDLOCKUP_DETECTOR),由于HARDLOCKUP_DETECTOR需要nmi中断的支持且目前的arm32环境并不支持,本文仅分析其中SOFTLOCKUP_DETECTOR中的原理及实现方式,并给出一个示例。


lockup watchdogs detectors实现机制

该机制是基于hrtimerperf子系统,介绍如下:

A periodic hrtimer runs to generate interrupts and kick the watchdog task. An NMI perf event is generated every watchdog_thresh(compile-time initialized to 10 and configurable through sysctl of the same name) seconds to check for hardlockups. If any CPU in the system does not receive any hrtimer interrupt during that time the hardlockup detector (the handler for the NMI perf event) will generate a kernel warning or call panic, depending on theconfiguration.

The watchdog task is a high priority kernel thread that updates a timestamp every time it is scheduled. If that timestamp is not updatedfor 2*watchdog_thresh seconds (the softlockup threshold) the softlockup detector (coded inside the hrtimer callback function)will dump useful debug information to the system log, after which it will call panic if it was instructed to do so or resume execution of other kernel code.

lockup detector机制在内核代码的kernel/watchdog.c中实现,本文以Linux 4.1.15版本源码为例进行分析。首先了解其背后的设计原理:

利用进程上下文、中断、nmi中断的不同优先级实现死锁监测。它们3者的优先级关系为“进程上下文 < 中断 < nmi中断”,其中进程上下文优先级最低,可通过中断来进行监测进程的运行状态,nmi中断的优先级最高,它是一种不可屏蔽的中断,在中断上下文中发生死锁时,nmi中断处理也可正常进入,因此可用来监测中断中的死锁。

关于这部分的流程分析,我也不是很懂。我只想知道,多久喂一次狗?多久会超时?超时之后会做怎样的动作?
更具体的流程,请参照:
http://blog.csdn.net/luckyapple1028/article/details/51932414


lockup watchdogs detectors常见参数

主要参数在<Kernel_Dir>/Kernel/watchdog.c文件中设定。

watchdog_thresh

设置看门狗超时时间,hard lockup的时间默认是10s,soft lockup的时间是20s,该值可通过/proc/sys/kernel/watchdog_thresh去配置。

sample_period

该值定义喂狗时间,默认是watchdog_thresh的五分一。

1
2
3
4
5
6
7
8
9
10
11
static void set_sample_period(void)
{
/*
* convert watchdog_thresh from seconds to ns
* the divide by 5 is to give hrtimer several chances (two
* or three with the current relation between the soft
* and hard thresholds) to increment before the
* hardlockup detector generates a warning
*/
sample_period = get_softlockup_thresh() * ((u64)NSEC_PER_SEC / 5);
}

watchdog_timer_fn()

看门狗超时执行的函数,在这个函数里,会打印堆栈信息,用来判断是谁lockup了。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# insmod watchdog_test.ko                               
[ 2977.578690] in watchdog_test_init, line = 16 starttttt
[ 3004.000899] BUG: soft lockup - CPU#0 stuck for 22s! [insmod:391]
[ 3004.006866] Modules linked in: watchdog_test(+) wlan(O) ehci_hcd usb_f_ss_lb usb_f_ncm u_ether libcompo)
[ 3004.025606]
[ 3004.027092] CPU: 0 PID: 391 Comm: insmod Tainted: P O 3.14.19 #1
[ 3004.033930] task: cd0ecc00 ti: ce1d8000 task.ti: ce1d8000
[ 3004.039304] PC is at watchdog_test_init+0x1c/0x20 [watchdog_test]
[ 3004.045364] LR is at watchdog_test_init+0x1c/0x20 [watchdog_test]
[ 3004.051428] pc : [<bf44b01c>] lr : [<bf44b01c>] psr: 600f0013
[ 3004.051428] sp : ce1d9e20 ip : 00000000 fp : bf44b0dc
[ 3004.062845] r10: bf44b000 r9 : 00000000 r8 : ce1d8028
[ 3004.068040] r7 : 00000001 r6 : bf44b094 r5 : bf44b0a0 r4 : ce1d9f58
[ 3004.074529] r3 : 00000752 r2 : 1374d000 r1 : d3d6e340 r0 : 0000002a
[ 3004.081020] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
[ 3004.088110] Control: 10c53c7d Table: b34e404a DAC: 00000015
[ 3004.093823] CPU: 0 PID: 391 Comm: insmod Tainted: P O 3.14.19 #1
[ 3004.100689] [<c0015f68>] (unwind_backtrace) from [<c0012288>] (show_stack+0x10/0x14)
[ 3004.108406] [<c0012288>] (show_stack) from [<c047dc20>] (dump_stack+0x80/0x90)
[ 3004.115600] [<c047dc20>] (dump_stack) from [<c0082e08>] (watchdog_timer_fn+0x144/0x17c)
[ 3004.123573] [<c0082e08>] (watchdog_timer_fn) from [<c003c3dc>] (__run_hrtimer.isra.32+0x54/0xe4)
[ 3004.132321] [<c003c3dc>] (__run_hrtimer.isra.32) from [<c003cca0>] (hrtimer_interrupt+0x114/0x2c8)
[ 3004.141236] [<c003cca0>] (hrtimer_interrupt) from [<c0014b90>] (twd_handler+0x34/0x44)
[ 3004.149119] [<c0014b90>] (twd_handler) from [<c005f650>] (handle_percpu_devid_irq+0x6c/0x84)
[ 3004.157514] [<c005f650>] (handle_percpu_devid_irq) from [<c005bb80>] (generic_handle_irq+0x2c/0x3c)
[ 3004.166517] [<c005bb80>] (generic_handle_irq) from [<c000fbf4>] (handle_IRQ+0x74/0xe8)
[ 3004.174396] [<c000fbf4>] (handle_IRQ) from [<c0008554>] (gic_handle_irq+0x2c/0x5c)
[ 3004.181930] [<c0008554>] (gic_handle_irq) from [<c0012dc0>] (__irq_svc+0x40/0x50)
[ 3004.189371] Exception stack(0xce1d9dd8 to 0xce1d9e20)
[ 3004.194392] 9dc0: 0000002a d3d6e340
[ 3004.202526] 9de0: 1374d000 00000752 ce1d9f58 bf44b0a0 bf44b094 00000001 ce1d8028 00000000
[ 3004.210664] 9e00: bf44b000 bf44b0dc 00000000 ce1d9e20 bf44b01c bf44b01c 600f0013 ffffffff
[ 3004.218801] [<c0012dc0>] (__irq_svc) from [<bf44b01c>] (watchdog_test_init+0x1c/0x20 [watchdog_test])
[ 3004.227984] [<bf44b01c>] (watchdog_test_init [watchdog_test]) from [<c0008908>] (do_one_initcall+0xd4/0x17c)
[ 3004.237766] [<c0008908>] (do_one_initcall) from [<c00765ec>] (load_module+0x1bec/0x2140)
[ 3004.245815] [<c00765ec>] (load_module) from [<c0076be8>] (SyS_init_module+0xa8/0x110)
[ 3004.253608] [<c0076be8>] (SyS_init_module) from [<c000eca0>] (ret_fast_syscall+0x0/0x30)

注意,在做上述测试softlock watchdog的时候,要确保rcu stall detector机制是关闭的或者设置rcu stall detector的超时更长,否则会使rcu stall detector先检测出来死锁。
设置时间要大于softlock watchdog_thresh的值,方法如下:

1
# echo 30 > /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout

或者将rcu_schedstall warning输出关闭,方法如下:

1
# echo 1 > /sys/module/rcupdate/parameters/rcu_cpu_stall_suppress

这里的堆栈信息,可以使用反汇编技术,确定是在哪一条语句出现问题的。
我这里使用的测试程序如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#include <linux/module.h>
#include <linux/init.h>
#include <linux/fs.h>
#include <linux/device.h>
#include <asm/uaccess.h>
#include <linux/delay.h>

static int watchdog_test_init(void)
{
int count = 3;
printk("in %s, line = %d starttttt\n", __func__, __LINE__);
while(1);
//mdelay(3000);

printk("in %s, line = %d endddddddddd\n", __func__, __LINE__);
return 0;
}

static void watchdog_test_exit(void)
{

}

module_init(watchdog_test_init);
module_exit(watchdog_test_exit);

MODULE_AUTHOR("XXX Project");
MODULE_DESCRIPTION("watchdog test driver");
MODULE_LICENSE("GPL");

如果有打开softlockup_panic参数,那么还会调用panic("softlockup: hung tasks");以此输出panic信息供调试。

softlockup_panic

该参数用来选择当看门狗超时的时候,是否触发panic。有以下两种方式来设置该值。非0,表示输出panic信息。该值默认为0,

1
2
unsigned int __read_mostly softlockup_panic =
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE;

第1种方法通过在defconfig中设置CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC=y即可,因为BOOTPARAM_SOFTLOCKUP_PANIC_VALUE的值依赖于CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE.

1
2
3
4
5
6
config BOOTPARAM_SOFTLOCKUP_PANIC_VALUE
int
depends on LOCKUP_DETECTOR
range 0 1
default 0 if !BOOTPARAM_SOFTLOCKUP_PANIC
default 1 if BOOTPARAM_SOFTLOCKUP_PANIC

第2种方法是在启动命令中加入softlockup_panic=的参数进行选择。

1
2
3
4
5
6
7
static int __init softlockup_panic_setup(char *str)
{
softlockup_panic = simple_strtoul(str, NULL, 0);

return 1;
}
__setup("softlockup_panic=", softlockup_panic_setup);

watchdog_user_enabled

是否打开lockup watchdogs功能,默认为1,可通过/proc/sys/kernel/watchdog去设置,对应的接口函数为:proc_dowatchdog()


watchdog/0 进程CPU占用率高调试方法

有时候会发现watchdog/0(这里的0表示CPU0,下面以CPU0为例)或者其他CPU核上的watchdog进程的CPU占用率很高,有可能是内核中某个驱动长时间占用CPU,但是没有被锁住触发到lockup watchdog detector,所以就表现为watchdog/0的CPU占用率非常高。
比如说在上面的watchdog_test.c的驱动程序中,将while(1)语句改成mdelay(6000),延时6s,加载该驱动后,打印出来的watchdog/0CPU占用率非常高。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
xxx_project:/data # top -m 5 &

User 2%, System 7%, IOW 0%, IRQ 4%
User 8 + Nice 0 + Sys 21 + Idle 253 + IOW 0 + IRQ 12 + SIRQ 2 = 296

PID PR NI CPU% S #THR VSS RSS PCY UID Name
57 20 0 2% S 18 108060K 2584K fg 0 /bin/klogd
66 20 0 1% S 12 75152K 2400K fg 0 /system/bin/testmanager
77 20 0 1% S 9 49408K 2120K 0 /system/bin/ilog
440 20 0 1% R 2 12388K 1176K 0 top
67 20 0 1% S 27 306280K 4596K fg 0 /system/bin/updatemanager
xxx_project:/data # insmod watchdog_test.ko
[ 145.431829] in watchdog_test_init, line = 12 starttttt
[ 151.504804] in watchdog_test_init, line = 16 endddddddddd
xxx_project:/data #


User 0%, System 86%, IOW 0%, IRQ 1%
User 4 + Nice 0 + Sys 612 + Idle 81 + IOW 0 + IRQ 12 + SIRQ 1 = 710

PID PR NI CPU% S #THR VSS RSS PCY UID Name
10 -100 0 84% S 1 0K 0K 0 watchdog/0
440 20 0 0% R 2 12676K 1368K 0 top
57 20 0 0% S 18 108060K 2584K fg 0 /bin/klogd
7 20 0 0% S 1 0K 0K 0 rcu_sched
60 20 0 0% S 13 70220K 3712K fg 0 /system/bin/settingmanager



User 6%, System 14%, IOW 0%, IRQ 6%
User 12 + Nice 0 + Sys 25 + Idle 123 + IOW 0 + IRQ 11 + SIRQ 1 = 172

PID PR NI CPU% S #THR VSS RSS PCY UID Name
57 20 0 4% S 18 108060K 2584K fg 0 /bin/klogd
440 20 0 2% R 2 12676K 1368K 0 top
66 20 0 2% S 12 75152K 2400K fg 0 /system/bin/testmanager
67 20 0 2% S 27 306280K 4596K fg 0 /system/bin/updatemanager
77 20 0 1% S 9 49408K 2120K 0 /system/bin/ilog

上面只能看到watchdog的CPU占用率高的情况,没有看到更多的信息,不好确定驱动中的哪里有长时间占用CPU的情况。
如果要调试这种情况,我能想到将看门狗的触发时间watchdog_thresh设置短一点,刚好让其触发超时。比如说我这里将watchdog_thresh设置为2,也就是softlock的看门狗时间为4s,小于测试代码中的6s,看其是否能输出调试信息。
测试下来,这种方法是可行的。当然,仅限于死锁时间大于4s的情况,除非自己修改驱动代码。


rcu stall detector简介

在上面验证lockup watchdogs detector机制,加载watchdog_test.ko驱动之前,要先将rcu_cpu_stall_timeout的时间设置比2*watchdog_thresh长,否则加载watchdog_test.ko之后,出现的log是rcu cpu stall相关的log,log如下:

1
2
3
[  208.383733] in watchdog_test_init, line = 16 starttttt
[ 229.389303] INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 0, t=5252 jiffies, g=51235, c=51234, q=13)
[ 229.399881] INFO: Stall ended before state dump start

关于RCU stall detector的详细内容,可以参照:<Kernel_Dir>/Documentation/RCU/stallwarn.txt文件,里面分别讲了什么情况下会有RCU CPU Stall Warnings,如何修改RCU CPU Stall Detector的参数,以及如何去解析该warnings。

下内容转自下面几篇文章,关于RCU的内容不是很熟悉,以后要专门学习一下。
https://www.kernel.org/doc/Documentation/RCU/stallwarn.txt
http://blog.csdn.net/wdsfup/article/details/76087670
https://www.ibm.com/developerworks/cn/linux/l-rcu/
https://lwn.net/Articles/301910/

什么情况下触发rcu cpu stall warning

RCU是基于其原理命名的,Read-Copy Update

  • [Read]指的是对于被RCU保护的共享数据,reader可以直接访问,不需要获得任何锁;
  • [Copy Update]指的是writer修改数据前首先拷贝一个副本,然后在副本上进行修改,修改完毕后向reclaimer(垃圾回收器)注册一个回调函数(callback),在适当的时机完成真正的修改操作:把原数据的指针重新指向新的被修改的数据,这里所说的适当的时机就是当既有的reader全都退出临界区的时候,而等待恰当时机的过程被称为grace period
  • 在RCU机制中,writer不需要和reader竞争任何锁,只在有多个writer的情况下它们之间需要某种锁进行同步作,如果写操作频繁的话RCU的性能会严重下降,所以RCU只适用于读多写少的情况。RCU CPU Stall Detector它有助于检测导致 grace period 过度延迟的因素,因为grace period的长短是RCU性能的重要因素。

rcu stall detector相关参数

在 sys 下,提供了两个接口来配置RCU的参数。

  • /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout 来配置timerout,默认值为21。
  • /sys/module/rcupdate/parameters/rcu_cpu_stall_suppress 来配置是否输出warning,默认值为0,表示输出。

具体的参数详见:<Kernel_Dir>/lib/Kconfig.debug文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
config RCU_CPU_STALL_TIMEOUT
int "RCU CPU stall timeout in seconds"
depends on RCU_STALL_COMMON
range 3 300
default 60
help
If a given RCU grace period extends more than the specified
number of seconds, a CPU stall warning is printed. If the
RCU grace period persists, additional CPU stall warnings are
printed at more widely spaced intervals.

config RCU_CPU_STALL_VERBOSE
bool "Print additional per-task information for RCU_CPU_STALL_DETECTOR"
depends on TREE_PREEMPT_RCU
default y
help
This option causes RCU to printk detailed per-task information
for any tasks that are stalling the current RCU grace period.

Say N if you are unsure.

Say Y if you want to enable such checks.

config RCU_CPU_STALL_INFO
bool "Print additional diagnostics on RCU CPU stall"
depends on (TREE_RCU || TREE_PREEMPT_RCU) && DEBUG_KERNEL
default n
help
For each stalled CPU that is aware of the current RCU grace
period, print out additional per-CPU diagnostic information
regarding scheduling-clock ticks, idle state, and,
for RCU_FAST_NO_HZ kernels, idle-entry state.

Say N if you are unsure.

Say Y if you want to enable such diagnostics.

解析rcu cpu stall warning

常见的一个warning如下:

1
2
[  229.389303] INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 0, t=5252 jiffies, g=51235, c=51234, q=13)
[ 229.399881] INFO: Stall ended before state dump start

这里的打印在<Kernel_Dir>/kernel/rcu/tree.c中的check_cpu_stall() -> print_other_cpu_stall()函数中打印:

The detected by line indicates which CPU detected the stall (in thiscase, CPU 0), how many jiffies have elapsed since the start of thegrace period (in this case 5252), the number of the last grace periodto start and to complete (51235 and 51234, respectively), and an estimateof the total number of RCU callbacks queued across all CPUs (13 inthis case).


jiffies与HZ和tick的关系

系统运行时间(以秒为单位):system_time = (jiffies)/HZ

  • HZ:Linux核心每隔固定周期会发出timer interrupt (IRQ 0)HZ是用来定义每一秒有几次timer interrupts;使用CONFIG_HZ定义;
  • tick:是HZ的倒数,意即timer interrupt每发生一次中断的时间。如HZ为250时,tick为4毫秒(millisecond)。
  • jiffies:为Linux核心变数(32位元变数,unsigned long),它被用来纪录系统自开机以来,已经过多少的tick。每发生一次timer interrupt,Jiffies变数会被加一。

以上内容参照:https://my.oschina.net/armsky/blog/36983

Title:lockup watchdogs & rcu stall detector

Author:Victor Huang

Time:2019-03-17 / 16:03

Link:http://wowothink.com/ca07e9c0/

License: Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)