关于kasan的基本原理都分析完了,本文介绍一个常见的kasan定位的错误,相当于实战一下。
此问题的日志如下
[ 9.523897] ================================================================== [ 9.523913] BUG: KASAN: global-out-of-bounds in of_match_node+0x120/0x13c [ 9.523922] Read of size 1 at addr ffffffd00bd92908 by task swapper/0/1 [ 9.523938] CPU: 6 PID: 1 Comm: swapper/0 Not tainted 5.10.198 #91 [ 9.523946] Hardware name: Firefly ROC-RK3588S-PC V13 MIPI(Linux) (DT) [ 9.523953] Call trace: [ 9.523964] dump_backtrace+0x0/0x3bc [ 9.523973] show_stack+0x1c/0x24 [ 9.523983] dump_stack_lvl+0x130/0x168 [ 9.523994] print_address_description.constprop.0+0x74/0x2b8 [ 9.524003] kasan_report+0x1e8/0x200 [ 9.524012] __asan_report_load1_noabort+0x30/0x54 [ 9.524020] of_match_node+0x120/0x13c [ 9.524029] of_match_device+0x44/0x80 [ 9.524038] platform_match+0xa0/0x23c [ 9.524047] __driver_attach+0x68/0x25c [ 9.524056] bus_for_each_dev+0x10c/0x1a0 [ 9.524065] driver_attach+0x40/0x60 [ 9.524073] bus_add_driver+0x2d4/0x540 [ 9.524081] driver_register+0x1a0/0x3d0 [ 9.524089] __platform_driver_register+0xd0/0x110 [ 9.524099] ds1820_init+0x20/0x28 [ 9.524107] do_one_initcall+0xb0/0x4e0 [ 9.524118] kernel_init_freeable+0x47c/0x4e4 [ 9.524126] kernel_init+0x18/0x13c [ 9.524134] ret_from_fork+0x10/0x18 [ 9.524146] The buggy address belongs to the variable: [ 9.524156] of_ds1820_match+0xc8/0x260 [ 9.524167] Memory state around the buggy address: [ 9.524176] ffffffd00bd92800: 00 00 00 07 f9 f9 f9 f9 00 00 00 00 00 00 00 00 [ 9.524184] ffffffd00bd92880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 9.524191] >ffffffd00bd92900: 00 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 03 f9 f9 f9 [ 9.524197] ^ [ 9.524204] ffffffd00bd92980: f9 f9 f9 f9 00 00 00 00 04 f9 f9 f9 f9 f9 f9 f9 [ 9.524211] ffffffd00bd92a00: 00 00 00 00 00 01 f9 f9 f9 f9 f9 f9 00 00 05 f9 [ 9.524217] ==================================================================
可以看到,kasan检测到了一个全局变量oob的问题,此问题出现在地址ffffffd00bd92908,这个地址下毒值是0xf9,查阅《KASAN(1)-简单实践》的下毒值的对于宏可以知道这是 KASAN_GLOBAL_REDZONE ,并且此变量为of_ds1820_match变量,报错是在of_match_node+0x120/0x13c
处发生的load/store访问从而触发了错误。
我们挑取关键信息,如下
首先我们看看 of_ds1820_match 变量的当前定义,如下
static const struct of_device_id of_ds1820_match[] = { { .compatible = "firefly,ds1820" }, };
似乎没看出什么问题?
然后我们查看of_match_node函数报错的位置,如下
(gdb): disassemble of_match_node Dump of assembler code for function of_match_node: 0xffffffd00ab4b21c <+0>: stp x29, x30, [sp, #-96]! 0xffffffd00ab4b220 <+4>: mov x29, sp 0xffffffd00ab4b224 <+8>: stp x23, x24, [sp, #48] 0xffffffd00ab4b228 <+12>: adrp x24, 0xffffffd00e6a1000 0xffffffd00ab4b22c <+16>: add x24, x24, #0xb00 0xffffffd00ab4b230 <+20>: stp x19, x20, [sp, #16] 0xffffffd00ab4b234 <+24>: mov x19, x0 0xffffffd00ab4b238 <+28>: mov x0, x24 0xffffffd00ab4b23c <+32>: stp x21, x22, [sp, #32] 0xffffffd00ab4b240 <+36>: mov x22, x1 0xffffffd00ab4b244 <+40>: stp x25, x26, [sp, #64] 0xffffffd00ab4b248 <+44>: stp x27, x28, [sp, #80] 0xffffffd00ab4b24c <+48>: bl 0xffffffd00b7efa30 <_raw_spin_lock_irqsave> 0xffffffd00ab4b250 <+52>: mov x25, x0 0xffffffd00ab4b254 <+56>: cbz x19, 0xffffffd00ab4b324 <of_match_node+264> 0xffffffd00ab4b258 <+60>: mov x27, #0xffd000000000 // #281268818280448 0xffffffd00ab4b25c <+64>: add x21, x19, #0x40 0xffffffd00ab4b260 <+68>: add x20, x19, #0x20 0xffffffd00ab4b264 <+72>: mov w26, #0x0 // #0 0xffffffd00ab4b268 <+76>: mov x28, #0x0 // #0 0xffffffd00ab4b26c <+80>: movk x27, #0xdfff, lsl #48 0xffffffd00ab4b270 <+84>: mov w23, #0x0 // #0 0xffffffd00ab4b274 <+88>: b 0xffffffd00ab4b2a4 <of_match_node+136> 0xffffffd00ab4b278 <+92>: mov x3, x19 0xffffffd00ab4b27c <+96>: mov x2, x20 0xffffffd00ab4b280 <+100>: mov x1, x21 0xffffffd00ab4b284 <+104>: mov x0, x22 0xffffffd00ab4b288 <+108>: bl 0xffffffd00ab4aa50 <__of_device_is_compatible> 0xffffffd00ab4b28c <+112>: cmp w0, w26 0xffffffd00ab4b290 <+116>: csel x28, x28, x19, le 0xffffffd00ab4b294 <+120>: add x20, x20, #0xc8 0xffffffd00ab4b298 <+124>: add x21, x21, #0xc8 0xffffffd00ab4b29c <+128>: csel w26, w26, w0, le 0xffffffd00ab4b2a0 <+132>: add x19, x19, #0xc8 0xffffffd00ab4b2a4 <+136>: lsr x0, x19, #3 0xffffffd00ab4b2a8 <+140>: ldrsb w2, [x0, x27] 0xffffffd00ab4b2ac <+144>: cmp w2, #0x0 0xffffffd00ab4b2b0 <+148>: ccmp w23, w2, #0x1, ne // ne = any 0xffffffd00ab4b2b4 <+152>: b.ge 0xffffffd00ab4b32c <of_match_node+272> // b.tcont 0xffffffd00ab4b2b8 <+156>: ldrb w0, [x19] 0xffffffd00ab4b2bc <+160>: lsr x1, x20, #3 0xffffffd00ab4b2c0 <+164>: cbnz w0, 0xffffffd00ab4b278 <of_match_node+92> 0xffffffd00ab4b2c4 <+168>: ldrsb w1, [x1, x27] 0xffffffd00ab4b2c8 <+172>: cmp w1, #0x0 0xffffffd00ab4b2cc <+176>: ccmp w0, w1, #0x1, ne // ne = any 0xffffffd00ab4b2d0 <+180>: b.ge 0xffffffd00ab4b338 <of_match_node+284> // b.tcont 0xffffffd00ab4b2d4 <+184>: ldrb w0, [x19, #32] 0xffffffd00ab4b2d8 <+188>: lsr x1, x21, #3 0xffffffd00ab4b2dc <+192>: cbnz w0, 0xffffffd00ab4b278 <of_match_node+92> 0xffffffd00ab4b2e0 <+196>: ldrsb w1, [x1, x27] 0xffffffd00ab4b2e4 <+200>: cmp w1, #0x0 0xffffffd00ab4b2e8 <+204>: ccmp w0, w1, #0x1, ne // ne = any 0xffffffd00ab4b2ec <+208>: b.ge 0xffffffd00ab4b344 <of_match_node+296> // b.tcont 0xffffffd00ab4b2f0 <+212>: ldrb w0, [x19, #64] 0xffffffd00ab4b2f4 <+216>: cbnz w0, 0xffffffd00ab4b278 <of_match_node+92> 0xffffffd00ab4b2f8 <+220>: mov x1, x25 0xffffffd00ab4b2fc <+224>: mov x0, x24 0xffffffd00ab4b300 <+228>: bl 0xffffffd00b7efa90 <_raw_spin_unlock_irqrestore> 0xffffffd00ab4b304 <+232>: mov x0, x28 0xffffffd00ab4b308 <+236>: ldp x19, x20, [sp, #16] 0xffffffd00ab4b30c <+240>: ldp x21, x22, [sp, #32] 0xffffffd00ab4b310 <+244>: ldp x23, x24, [sp, #48] 0xffffffd00ab4b314 <+248>: ldp x25, x26, [sp, #64] 0xffffffd00ab4b318 <+252>: ldp x27, x28, [sp, #80] 0xffffffd00ab4b31c <+256>: ldp x29, x30, [sp], #96 0xffffffd00ab4b320 <+260>: ret 0xffffffd00ab4b324 <+264>: mov x28, #0x0 // #0 0xffffffd00ab4b328 <+268>: b 0xffffffd00ab4b2f8 <of_match_node+220> 0xffffffd00ab4b32c <+272>: mov x0, x19 0xffffffd00ab4b330 <+276>: bl 0xffffffd0086579c0 <__asan_report_load1_noabort> 0xffffffd00ab4b334 <+280>: b 0xffffffd00ab4b2b8 <of_match_node+156> 0xffffffd00ab4b338 <+284>: mov x0, x20 0xffffffd00ab4b33c <+288>: bl 0xffffffd0086579c0 <__asan_report_load1_noabort> 0xffffffd00ab4b340 <+292>: b 0xffffffd00ab4b2d4 <of_match_node+184> 0xffffffd00ab4b344 <+296>: mov x0, x21 0xffffffd00ab4b348 <+300>: bl 0xffffffd0086579c0 <__asan_report_load1_noabort> 0xffffffd00ab4b34c <+304>: b 0xffffffd00ab4b2f0 <of_match_node+212>
可以看到,0x120也就是288,这里对于了一个bl跳转,那么谁让它跳转呢,如下汇编
0xffffffd00ab4b2c8 <+172>: cmp w1, #0x0 0xffffffd00ab4b2cc <+176>: ccmp w0, w1, #0x1, ne // ne = any 0xffffffd00ab4b2d0 <+180>: b.ge 0xffffffd00ab4b338 <of_match_node+284> // b.tcont
这里w1和0比较,如果不等于0,则继续将w0和w1比较,并设置状态为1。
这里解释需要理解上下文,我们可以对照源码,如下
(gdb): disassemble /m of_match_node 1118 for (; matches->name[0] || matches->type[0] || matches->compatible[0]; matches++) { 0xffffffd00ab4b294 <+120>: add x20, x20, #0xc8 0xffffffd00ab4b298 <+124>: add x21, x21, #0xc8 0xffffffd00ab4b29c <+128>: csel w26, w26, w0, le 0xffffffd00ab4b2a0 <+132>: add x19, x19, #0xc8 0xffffffd00ab4b2a4 <+136>: lsr x0, x19, #3 0xffffffd00ab4b2a8 <+140>: ldrsb w2, [x0, x27] 0xffffffd00ab4b2ac <+144>: cmp w2, #0x0 0xffffffd00ab4b2b0 <+148>: ccmp w23, w2, #0x1, ne // ne = any 0xffffffd00ab4b2b4 <+152>: b.ge 0xffffffd00ab4b32c <of_match_node+272> // b.tcont 0xffffffd00ab4b2b8 <+156>: ldrb w0, [x19] 0xffffffd00ab4b2bc <+160>: lsr x1, x20, #3 0xffffffd00ab4b2c0 <+164>: cbnz w0, 0xffffffd00ab4b278 <of_match_node+92> 0xffffffd00ab4b2c4 <+168>: ldrsb w1, [x1, x27] 0xffffffd00ab4b2c8 <+172>: cmp w1, #0x0 0xffffffd00ab4b2cc <+176>: ccmp w0, w1, #0x1, ne // ne = any 0xffffffd00ab4b2d0 <+180>: b.ge 0xffffffd00ab4b338 <of_match_node+284> // b.tcont 0xffffffd00ab4b2d4 <+184>: ldrb w0, [x19, #32] 0xffffffd00ab4b2d8 <+188>: lsr x1, x21, #3 0xffffffd00ab4b2dc <+192>: cbnz w0, 0xffffffd00ab4b278 <of_match_node+92> 0xffffffd00ab4b2e0 <+196>: ldrsb w1, [x1, x27] 0xffffffd00ab4b2e4 <+200>: cmp w1, #0x0 0xffffffd00ab4b2e8 <+204>: ccmp w0, w1, #0x1, ne // ne = any 0xffffffd00ab4b2ec <+208>: b.ge 0xffffffd00ab4b344 <of_match_node+296> // b.tcont 0xffffffd00ab4b2f0 <+212>: ldrb w0, [x19, #64] 0xffffffd00ab4b2f4 <+216>: cbnz w0, 0xffffffd00ab4b278 <of_match_node+92>
可以看到,就是这句c源码
for (; matches->name[0] || matches->type[0] || matches->compatible[0]; matches++) {
到这里,我们结合这两个关键信息得出的信息点,再次对比分析一下
static const struct of_device_id of_ds1820_match[] = { { .compatible = "firefly,ds1820" }, };
for (; matches->name[0] || matches->type[0] || matches->compatible[0]; matches++) {
可以看到,这个for循环会匹配of_ds1820_match里面的值,默认从0项开始,如果name/type/compatible中但凡一个值不为0,就进入循环,然后matchs自加。
按照语义,默认matches->compatible[0]
的值是"firefly,ds1820",进入循环,下一次matches自加后,会逐个寻找matches->name[1] || matches->type[1] || matches->compatible[1];
, 很明显of_ds1820_match没有第一项,所以全局变量越界。
所以,关于of_device_id的编写,我们需要一个哨兵项,这样能够在这个for循环上,遇到哨兵的时候主动退出,而不是越界访问。
所以修改代码如下
static const struct of_device_id of_ds1820_match[] = { { .compatible = "firefly,ds1820" }, {}, };
本文以一个实际的例子介绍了kasan定位全局变量oob的问题,这个问题看起来很简单,也很不起眼。
对于编写驱动时的of_device_id的数组,是否需要哨兵的小知识点实在是微不足道了。所以才会不断的遗留在内核中一直不被发现。
关于kasan本身,我阅读了内核大部分实现代码,可以发现其是一个不错的代码增强工具,有助于提高大家编写内核的代码质量。