内核里面的内存错误通常比较难处理,一般情况的内存错误有如下几点:
通常情况下,内核检测内存泄漏的方式有三种,分别如下:
接下来基于这三种方式来谈谈上述五种内存错误情况
我们知道,内核关于小块内存分配是通过slab/slub分配器处理,我们可以在slub中利用slub_debug来检测如下错误:
首先我们需要打开slub的配置项如下:
其次,我们需要在开机bootargs中添加slub_debug字符,如下
Parameters may be given to ``slub_debug``. If none is specified then full debugging is enabled. Format: slub_debug=<Debug-Options> Enable options for all slabs slub_debug=<Debug-Options>,<slab name1>,<slab name2>,... Enable options only for select slabs (no spaces after a comma) Multiple blocks of options for all slabs or selected slabs can be given, with blocks of options delimited by ';'. The last of "all slabs" blocks is applied to all slabs except those that match one of the "select slabs" block. Options of the first "select slabs" blocks that matches the slab's name are applied. Possible debug options are:: F Sanity checks on (enables SLAB_DEBUG_CONSISTENCY_CHECKS Sorry SLAB legacy issues) Z Red zoning P Poisoning (object and padding) U User tracking (free and alloc) T Trace (please only use on single slabs) A Enable failslab filter mark for the cache O Switch debugging off for caches that would have caused higher minimum slab orders - Switch all debugging off (useful if the kernel is configured with CONFIG_SLUB_DEBUG_ON) F.e. in order to boot just with sanity checks and red zoning one would specify:: slub_debug=FZ Trying to find an issue in the dentry cache? Try:: slub_debug=,dentry to only enable debugging on the dentry cache. You may use an asterisk at the end of the slab name, in order to cover all slabs with the same prefix. For example, here's how you can poison the dentry cache as well as all kmalloc slabs:: slub_debug=P,kmalloc-*,dentry Red zoning and tracking may realign the slab. We can just apply sanity checks to the dentry cache with:: slub_debug=F,dentry Debugging options may require the minimum possible slab order to increase as a result of storing the metadata (for example, caches with PAGE_SIZE object sizes). This has a higher liklihood of resulting in slab allocation errors in low memory situations or if there's high fragmentation of memory. To switch off debugging for such caches by default, use:: slub_debug=O You can apply different options to different list of slab names, using blocks of options. This will enable red zoning for dentry and user tracking for kmalloc. All other slabs will not get any debugging enabled:: slub_debug=Z,dentry;U,kmalloc-* You can also enable options (e.g. sanity checks and poisoning) for all caches except some that are deemed too performance critical and don't need to be debugged by specifying global debug options followed by a list of slab names with "-" as options:: slub_debug=FZ;-,zs_handle,zspage The state of each debug option for a slab can be found in the respective files under:: /sys/kernel/slab/<slab name>/ If the file contains 1, the option is enabled, 0 means disabled. The debug options from the ``slub_debug`` parameter translate to the following files:: F sanity_checks Z red_zone P poison U store_user T trace A failslab Careful with tracing: It may spew out lots of information and never stop if used on the wrong slab.
然后我们需要编译slabinfo程序,如下
# cd tools/vm/ # scp slabinfo xxx@xxx:destination/
这样,如果对于越界访问,则会提示 Redzone overwritten
如下
BUG kmalloc-32 (Tainted: G O ): Redzone overwritten
对于重复释放,则会提示 Object already free
如下
BUG kmalloc-128 (Tainted: G B O ): Object already free
对于访问已经释放的内存,则会提示 Poison overwritten
如下
BUG kmalloc-128 (Tainted: G B O ): Poison overwritten
kmemleak的作用是开启一个单独的扫描内存的内核线程,然后打印发现的新的未引用的对象数量,正因为只是打印未引用的对象,所有kmemleak存在误报的情况,得到的信息仅供参考
对于kmemleak,需要打开配置如下
CONFIG_HAVE_DEBUG_KMEMLEAK=y CONFIG_DEBUG_KMEMLEAK=y CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE=4096
然后在bootargs添加参数如下:
kmemleak=on
进入系统后,我们需要主动在问题触发前开启扫描,如下
echo scan > /sys/kernel/debug/kmemleak
等待问题出现之后,通过节点查看问题,如下
cat /sys/kernel/debug/kmemleak
存在问题则出现如下打印
unreferenced object 0xede22dc0 (size 128):
kasan是一个动态检查内存错误的工具,它可以检查如下内存问题
对于内核打开kasan可以通过如下
CONFIG_HAVE_ARCH_KASAN=y CONFIG_KASAN=y CONFIG_KASAN_OUTLINE=y CONFIG_KASAN_INLINE=y
对于kasan来说,内核提供了测试程序,位置如下:
mm/kasan/kasan_test.c
我们可以利用检测如下错误
如果产生,则出现如下日志
BUG: KASAN: slab-out-of-bounds in kmalloc_oob_right+0xa4/0xe0 [kasan] at addr ffff800066539c7b
如果产生,则出现如下日志
BUG: KASAN: use-after-free in kmalloc_uaf+0xac/0xe0 [kasan] at addr ffff800066539e08
如果产生,则出现如下日志
BUG: KASAN: stack-out-of-bounds in kasan_stack_oob+0xa8/0xf0 [kasan] at addr ffff800066acb95a
如果产生,则出现如下日志
BUG: KASAN: global-out-of-bounds in kasan_global_oob+0x9c/0xe8 [kasan] at addr ffff7ffffc001c8d
kasan总体效率比slub_debug高效,如果可以的话,能用kasan检测的错误就可以不用slub_debug。