2025-02-13

Linux内核常见内存错误

内核里面的内存错误通常比较难处理，一般情况的内存错误有如下几点：

越界访问
访问已释放的内存
重复释放
内存泄漏
栈溢出

通常情况下，内核检测内存泄漏的方式有三种，分别如下：

slub_debug
kmemleak
kasan

接下来基于这三种方式来谈谈上述五种内存错误情况

slub_debug

我们知道，内核关于小块内存分配是通过slab/slub分配器处理，我们可以在slub中利用slub_debug来检测如下错误：

访问已经释放的内存
越界访问
释放已经释放过的内存

首先我们需要打开slub的配置项如下：

CONFIG_SLUB=y
CONFIG_SLUB_DEBUG=y
CONFIG_SLUB_DEBUG_ON=y
CONFIG_SLUB_STATS=y

其次，我们需要在开机bootargs中添加slub_debug字符，如下


Parameters may be given to ``slub_debug``. If none is specified then full
debugging is enabled. Format:

slub_debug=<Debug-Options>
        Enable options for all slabs

slub_debug=<Debug-Options>,<slab name1>,<slab name2>,...
        Enable options only for select slabs (no spaces
        after a comma)

Multiple blocks of options for all slabs or selected slabs can be given, with
blocks of options delimited by ';'. The last of "all slabs" blocks is applied
to all slabs except those that match one of the "select slabs" block. Options
of the first "select slabs" blocks that matches the slab's name are applied.

Possible debug options are::

        F               Sanity checks on (enables SLAB_DEBUG_CONSISTENCY_CHECKS
                        Sorry SLAB legacy issues)
        Z               Red zoning
        P               Poisoning (object and padding)
        U               User tracking (free and alloc)
        T               Trace (please only use on single slabs)
        A               Enable failslab filter mark for the cache
        O               Switch debugging off for caches that would have
                        caused higher minimum slab orders
        -               Switch all debugging off (useful if the kernel is
                        configured with CONFIG_SLUB_DEBUG_ON)

F.e. in order to boot just with sanity checks and red zoning one would specify::

        slub_debug=FZ

Trying to find an issue in the dentry cache? Try::

        slub_debug=,dentry

to only enable debugging on the dentry cache.  You may use an asterisk at the
end of the slab name, in order to cover all slabs with the same prefix.  For
example, here's how you can poison the dentry cache as well as all kmalloc
slabs::

        slub_debug=P,kmalloc-*,dentry

Red zoning and tracking may realign the slab.  We can just apply sanity checks
to the dentry cache with::

        slub_debug=F,dentry
Debugging options may require the minimum possible slab order to increase as
a result of storing the metadata (for example, caches with PAGE_SIZE object
sizes).  This has a higher liklihood of resulting in slab allocation errors
in low memory situations or if there's high fragmentation of memory.  To
switch off debugging for such caches by default, use::

        slub_debug=O

You can apply different options to different list of slab names, using blocks
of options. This will enable red zoning for dentry and user tracking for
kmalloc. All other slabs will not get any debugging enabled::

        slub_debug=Z,dentry;U,kmalloc-*

You can also enable options (e.g. sanity checks and poisoning) for all caches
except some that are deemed too performance critical and don't need to be
debugged by specifying global debug options followed by a list of slab names
with "-" as options::

        slub_debug=FZ;-,zs_handle,zspage

The state of each debug option for a slab can be found in the respective files
under::

        /sys/kernel/slab/<slab name>/

If the file contains 1, the option is enabled, 0 means disabled. The debug
options from the ``slub_debug`` parameter translate to the following files::

        F       sanity_checks
        Z       red_zone
        P       poison
        U       store_user
        T       trace
        A       failslab

Careful with tracing: It may spew out lots of information and never stop if
used on the wrong slab.

然后我们需要编译slabinfo程序，如下


# cd tools/vm/
# scp slabinfo xxx@xxx:destination/

这样，如果对于越界访问，则会提示 Redzone overwritten 如下


 BUG kmalloc-32 (Tainted: G           O     ): Redzone overwritten

对于重复释放，则会提示 Object already free 如下


 BUG kmalloc-128 (Tainted: G B O ): Object already free

对于访问已经释放的内存，则会提示 Poison overwritten 如下


 BUG kmalloc-128 (Tainted: G B O ): Poison overwritten

kmemleak

kmemleak的作用是开启一个单独的扫描内存的内核线程，然后打印发现的新的未引用的对象数量，正因为只是打印未引用的对象，所有kmemleak存在误报的情况，得到的信息仅供参考
对于kmemleak，需要打开配置如下


CONFIG_HAVE_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK=y
CONFIG_DEBUG_KMEMLEAK_DEFAULT_OFF=y
CONFIG_DEBUG_KMEMLEAK_EARLY_LOG_SIZE=4096

然后在bootargs添加参数如下:


kmemleak=on

进入系统后，我们需要主动在问题触发前开启扫描，如下


echo scan > /sys/kernel/debug/kmemleak

等待问题出现之后，通过节点查看问题，如下


cat /sys/kernel/debug/kmemleak

存在问题则出现如下打印


unreferenced object 0xede22dc0 (size 128):

kasan

kasan是一个动态检查内存错误的工具，它可以检查如下内存问题

越界访问
使用已释放内存
重复释放

对于内核打开kasan可以通过如下


CONFIG_HAVE_ARCH_KASAN=y
CONFIG_KASAN=y
CONFIG_KASAN_OUTLINE=y
CONFIG_KASAN_INLINE=y

对于kasan来说，内核提供了测试程序，位置如下：


mm/kasan/kasan_test.c

我们可以利用检测如下错误

堆栈越界访问

如果产生，则出现如下日志


BUG: KASAN: slab-out-of-bounds in kmalloc_oob_right+0xa4/0xe0 [kasan] at addr ffff800066539c7b

使用已释放内存

如果产生，则出现如下日志


BUG: KASAN: use-after-free in kmalloc_uaf+0xac/0xe0 [kasan] at addr ffff800066539e08

栈越界访问

如果产生，则出现如下日志


BUG: KASAN: stack-out-of-bounds in kasan_stack_oob+0xa8/0xf0 [kasan] at addr ffff800066acb95a

全局变量越界访问

如果产生，则出现如下日志


BUG: KASAN: global-out-of-bounds in kasan_global_oob+0x9c/0xe8 [kasan] at addr ffff7ffffc001c8d

总结

kasan总体效率比slub_debug高效，如果可以的话，能用kasan检测的错误就可以不用slub_debug。

2025-02-11

cache的结构介绍

根据上面的图可以知道，我们需要留意如下信息：

offset
line
index
way
set
tag

offset

对于cache而言，offset代表了cache的便宜，假设offset占用了4位，则我们知道cache line大小是16 Byte

line

和offset对应，offset作为PA的低四位，则cache line共计大小是16Byte

index

索引是指的有多少个cache line，index作为索引组合起来可以计算为一个way，假设index占用8位，则一个way占用256个cache line，则16*256=4096 Byte大小作为一个way。

way

我们计算了offset和index的乘积也就是4096Byte，way这里指的是路，有多少个路就是代表整个cache总大小多少个4096 Byte，假设cache的总大小是16KB，那么我们16/4=4，这里就是四路cache
结合来看，那么一个总cache大小是16KB的情况下，假设way是4路，则每一路是4KB，如果cache line是16Byte，则我们知道index就是256个

set

根据上面的计算，我们再把每个way上index相同的cache line称之为一个set，也就是一组。那么按照上面的例子，同一个index的组一共有8个，因为我们有8个way

tag

对于PA物理内存上，将除掉offset和index的位剩余的为作为tag标记，用于判断cache line存放的数据是否和处理器想要的一致。

2025-02-10

atexit

之前同事问了一个问题，在调试程序的时候，想要在程序退出的时候进行一些清理工作，但是不太清楚这个程序内应该怎样添加好。当时没有回答出来这个问题，最近在翻阅的时候找到了这个标准库函数atexit，本文介绍简单介绍这个函数功能。留作记忆

示例代码


#include <stdlib.h>
#include <stdio.h>

void atexit_func()
{
    printf("atexit_func is called\n");
}

void main()
{
    atexit(atexit_func);
    return ;
}

此代码通过atexit注册了一个退出回调函数atexit_func，此函数打印了一句话。

运行代码

为了能够证明这个atexit早就存在并可以使用，这里以c89来进行编译构建如下：


gcc test.c -std=c89 -o test

然后运行如下


# ./test
atexit_func is called

总结

至此可以发现，此函数可以用作程序的退出回收工作，我们在调试操作系统上一些社区特别大型的c程序的时候，不是很方便在社区代码中添加回收逻辑的时候，这个atexit可以做到这点。

2025-01-22

根据防破解之-完整性校验我们拿到了固定的摘要，我们需要针对这个摘要进行加密，否则其他人可以修改#这个摘要信息，将已经破解的文件的摘要放在指定节上。

一、什么是非对称加密

了解非对称加密的时候，我们需要先知道什么是对称加密，对称加密指的加密和解密时使用的密钥都是同一个，是“对称”的。图个网络图片例子如下：

这里我们虽然发现对称加密能够做到加密策略，但是如果对称密钥泄露，那加密也就被破解，也就是如何把密钥安全地传递给对方，这里非对称加密就出现了。

对于非对称加密，它有两个密钥，一个叫公钥，一个叫私钥。两个密钥是不同的，“不对称”，公钥可以公开给任何人使用，而私钥必须严格保密。

公钥和私钥有单向性，虽然都可以用来加密解密，但公钥加密后只能用私钥解密，反过来，私钥加密后也只能用公钥解密。

这样，非对称加密可以解决密钥被泄露的问题，也就是我们的所有的内容通过私钥加密，而我发布的公钥仅仅用来解密我私钥加密的内容。因为我私钥拿在手上，不会释放，所以没有办法破解密文

二、什么是RSA

RSA是比较著名的非对称加密，它的安全性基于“整数分解”的数学难题，使用两个超大素数的乘积作为生成密钥的材料，想要从公钥推算出私钥是非常困难的，有兴趣了解rsa加密的可以查看此文章：RSA

我们可以实践如下：

2.1 生成公/私钥

为了更安全，这里选择2048长度


openssl genrsa -out rsa_private_key.pem 2048
openssl rsa -in rsa_private_key.pem -pubout -out rsa_public_key.pem

2.2 使用公钥加密

我们使用文件plaintext.bin作为待加密文件，ciphertext.bin是密文文件


openssl rsautl -encrypt -pubin -inkey rsa_public_key.pem -in plaintext.bin -out ciphertext.bin

2.3 使用私钥解密


openssl rsautl -decrypt -inkey rsa_private_key.pem -in ciphertext.bin -out out_plaintext.bin

此时我们可以发现"plaintext.bin"和"out_plaintext.bin"是完全相等的。

至此，我们可以在sha256的基础上，通过非对称加密将摘要信息进行加密，这样对方无法破解我们的摘要信息。从而保证了摘要信息被篡改的风险。

2025-01-22

之前我们讲到了elf文件，通过解析elf文件，我们知道了对于关键文件需要保护哪些内容，这里主要针对是关键性文件的完整性校验的了解。

一、完整性计算

根据防破解之-elf文件格式我们知道了数据来源，为了实现关键文件防篡改，我们需要对这些内容进行完整性计算，针对此，我们应该满足下面三点：

数据正向计算容易，逆向计算几乎不可能
数据计算结果长度固定
数据计算不易碰撞

根据上面的要求，结合当前已知的数据结构，我们可以选择hash，并且是单向hash。而常用的单向hash有哪些呢，如下：

对于此，我们可以如下假设，先定义一个数据源来自于.text


objcopy --dump-section .text=text.bin libhelloworld.so

这里我们提取了text.bin，我们先使用md5进行提取摘要


# md5sum text.bin
2d662e596919c294d7e3f274d75549b6 text.bin

使用sha256进行提取摘要


# openssl dgst -sha256 text.bin
SHA256(text.bin)= 6aaf37f9ef03aa1f06dab8083784a4b50acca524f9cf7476acc52bd23dc118f2

使用sm3进行提取摘要通过openssl


# openssl dgst -sm3 text.bin
SM3(text.bin)= d0e43ea849949decc4cf8deb557f4cb46f0f560563dc2f0a63f6cbfa5e27de18

至此，基于三种算法的完整性计算方案已经演示

二、md5的碰撞

我们可以知道，md5默认是256bit的摘要提取，但是根据当前的技术状态，md5是能够存在碰撞的，虽然是2^128的概率，如下是碰撞例子：

数据1：


STR1=d131dd02c5e6eec4693d9a0698aff95c2fcab58712467eab4004583eb8fb7f8955ad340609f4b30283e488832571415a085125e8f7cdc99fd91dbdf280373c5bd8823e3156348f5bae6dacd436c919c6dd53e2b487da03fd02396306d248cda0e99f33420f577ee8ce54b67080a80d1ec69821bcb6a8839396f9652b6ff72a70

数据2


STR2=d131dd02c5e6eec4693d9a0698aff95c2fcab50712467eab4004583eb8fb7f8955ad340609f4b30283e4888325f1415a085125e8f7cdc99fd91dbd7280373c5bd8823e3156348f5bae6dacd436c919c6dd53e23487da03fd02396306d248cda0e99f33420f577ee8ce54b67080280d1ec69821bcb6a8839396f965ab6ff72a70

此时我们做一下文本转换


echo $STR1 | xxd -r -p > str1
echo $STR2 | xxd -r -p > str2

此时我们对比一下即可


# md5sum str1 str2
79054025255fb1a26e4bc422aef54eb4 str1
79054025255fb1a26e4bc422aef54eb4 str2

可以看到，上面STR1和STR2进行碰撞了

我们通过hexdump进行转换16进制如下：


hexdump -C str1 > 1
hexdump -C str2 > 2

此时我们对比即可两个str的不同


参考：https://www.mscs.dal.ca/~selinger/md5collision/

故，根据此信息，md5存在碰撞问题，我们可以选择sha或sm3，对于sha，我们通常情况下选择更通用的sha256。

三、最终选择

根据上面我们可以知道，如果我们不在意md5的碰撞问题，那么我们可以选择md5，如果比较在意碰撞，那么我们可以选择更通用的sha256算法

当然sha还提供了其他的信息摘要算法，如下：


# openssl help
Message Digest commands (see the `dgst' command for more details)
blake2b512 blake2s256 gost md4
md5 rmd160 sha1 sha224
sha256 sha3-224 sha3-256 sha3-384
sha3-512 sha384 sha512 sha512-224
sha512-256 shake128 shake256 sm3

这些摘要算法我就不一一演示，对于当前方案，我们只需要知道选择了sha256。后续如果有需求使用sm3，会在重构的时候使用sm3。

对于sha256的原理，我也不了解，需要时间沉淀，这里提供文档，点击即可阅读，有兴趣的可以了解一下

关于演示，这里贴出一个网页，也可以了解一下：

https://sha256algorithm.com/

阅读全文