之前将了crash中查看结构体,这个是非常常用调试内核状态的方式,文章是《使用crash查看内核结构体》,但是大部分情况下,我们需要看的是堆栈信息。为了演示,这里根据crash中的bt信息,手动推导函数堆栈,用作熟悉crash工具的回溯栈区的方式
这里直接获取了ping的bt,如下
crash> bt 49105 PID: 49105 TASK: ffffff805e2ee580 CPU: 0 COMMAND: "ping" #0 [ffffffc0149b3830] __switch_to at ffffffc008017540 #1 [ffffffc0149b3850] __schedule at ffffffc0095162d0 #2 [ffffffc0149b38f0] schedule at ffffffc009516900 #3 [ffffffc0149b3910] schedule_timeout at ffffffc00951ac64 #4 [ffffffc0149b39a0] __skb_wait_for_more_packets at ffffffc0091a0248 #5 [ffffffc0149b3a20] __skb_recv_datagram at ffffffc0091a0e00 #6 [ffffffc0149b3a90] skb_recv_datagram at ffffffc0091a0ea0 #7 [ffffffc0149b3ac0] ping_recvmsg at ffffffc00928a7ec #8 [ffffffc0149b3b20] inet_recvmsg at ffffffc009279340 #9 [ffffffc0149b3b70] sock_recvmsg at ffffffc009188084 #10 [ffffffc0149b3ba0] ____sys_recvmsg at ffffffc009188d3c #11 [ffffffc0149b3c90] ___sys_recvmsg at ffffffc00918c410 #12 [ffffffc0149b3d80] __sys_recvmsg at ffffffc00918c7e4 #13 [ffffffc0149b3e20] __arm64_sys_recvmsg at ffffffc00918c864 #14 [ffffffc0149b3e30] el0_svc_common at ffffffc008025508 #15 [ffffffc0149b3e70] do_el0_svc at ffffffc008025690 #16 [ffffffc0149b3e80] el0_svc at ffffffc009513510 #17 [ffffffc0149b3ea0] el0_sync_handler at ffffffc009513d54 #18 [ffffffc0149b3fe0] el0_sync at ffffffc008011e14 PC: 0000007f86ec2994 LR: 000000558e2a9f6c SP: 0000007fcdff7a20 X29: 0000007fcdff7a20 X28: 00000000000000c0 X27: 0000007fcdff7ae0 X26: 0000007fcdff7bb8 X25: 000000558e2c1000 X24: 0000007fcdff7b38 X23: 0000000000000000 X22: 000000558e2c2078 X21: 0000007f870ca710 X20: 0000007fcdff7b00 X19: 0000000000000003 X18: 0000000000000001 X17: 0000007f86ec2960 X16: 000000558e2c1b48 X15: 000000007fffffde X14: 0000000000000001 X13: 0000000000000037 X12: 000000007fffffff X11: 00000012b9b749a3 X10: 0014d207f0963169 X9: 0000000000000018 X8: 00000000000000d4 X7: 00000000001d1c32 X6: 0000000029aaaaf1 X5: 0000000000000080 X4: 0000000000000001 X3: 0000007f870c9f10 X2: 0000000000000000 X1: 0000007fcdff7b00 X0: 0000000000000003 ORIG_X0: 0000000000000003 SYSCALLNO: d4 PSTATE: 60001000
我们知道crash更多是用于内核的问题排查,在使用ping的过程中,内核并没有死锁和堆栈,所以当前寄存器是用户空间的寄存器的值,这些不是很方便crash排查,但这不妨碍我们回溯内核栈,下面基于此bt的信息来回溯栈
我们知道aarch64的FP和LR寄存器他们的作用如下
同样的,我们还知道,aarch64中指令大小是32位,那么对应4字节。所以我们可以演示上述两点
根据上面的堆栈,我们知道最后是在ffffffc008017540也就是函数__switch_to中切出,那么我们知道__switch_to的x29寄存器是ffffffc0149b3830,于是我们读出其值如下
crash> rd ffffffc0149b3830 ffffffc0149b3830: ffffffc0149b3850
可以看到ffffffc0149b3850是__schedule的x29寄存器。可以发现x29具备list的特性,所以为了直接回溯,借用list指令可以一次性读出所有的x29寄存器,如下
crash> list ffffffc0149b3830 ffffffc0149b3830 ffffffc0149b3850 ffffffc0149b38f0 ffffffc0149b3910 ffffffc0149b39a0 ffffffc0149b3a20 ffffffc0149b3a90 ffffffc0149b3ac0 ffffffc0149b3b20 ffffffc0149b3b70 ffffffc0149b3ba0 ffffffc0149b3c90 ffffffc0149b3d80 ffffffc0149b3e20 ffffffc0149b3e30 ffffffc0149b3e70 ffffffc0149b3e80 ffffffc0149b3ea0 ffffffc0149b3fe0
我们批量得到了x29寄存器的值,那么我们可以知道x30寄存器是x29+8。同样因为是地址+8,我们借助list的特性,可以得到如下
crash> list -s list_head.prev ffffffc0149b3830 ffffffc0149b3830 prev = 0xffffffc0095162d4 <__schedule+692> ffffffc0149b3850 prev = 0xffffffc009516904 <schedule+68> ffffffc0149b38f0 prev = 0xffffffc00951ac68 <schedule_timeout+376> ffffffc0149b3910 prev = 0xffffffc0091a024c <__skb_wait_for_more_packets+276> ffffffc0149b39a0 prev = 0xffffffc0091a0e04 <__skb_recv_datagram+124> ffffffc0149b3a20 prev = 0xffffffc0091a0ea4 <skb_recv_datagram+60> ffffffc0149b3a90 prev = 0xffffffc00928a7f0 <ping_recvmsg+112> ffffffc0149b3ac0 prev = 0xffffffc009279344 <inet_recvmsg+76> ffffffc0149b3b20 prev = 0xffffffc009188088 <sock_recvmsg+72> ffffffc0149b3b70 prev = 0xffffffc009188d40 <____sys_recvmsg+128> ffffffc0149b3ba0 prev = 0xffffffc00918c414 <___sys_recvmsg+124> ffffffc0149b3c90 prev = 0xffffffc00918c7e8 <__sys_recvmsg+96> ffffffc0149b3d80 prev = 0xffffffc00918c868 <__arm64_sys_recvmsg+32> ffffffc0149b3e20 prev = 0xffffffc00802550c <el0_svc_common+108> ffffffc0149b3e30 prev = 0xffffffc008025694 <do_el0_svc+28> ffffffc0149b3e70 prev = 0xffffffc009513514 <el0_svc+28> ffffffc0149b3e80 prev = 0xffffffc009513d58 <el0_sync_handler+168> ffffffc0149b3ea0 prev = 0xffffffc008011e18 <el0_sync+344> ffffffc0149b3fe0 prev = 0x0
可以看到,这里的x30寄存器保存这返回地址的下一个指令,那么我们计算返回地址就是上述地址-4即可。
至此,我们通过一个例子,将crash的bt的回栈进行了解析。
在内核中调试会经常使用crash工具,此工具可以调试死锁,假死等问题,之前《RK平台上使用crash进行live debug》上已经分析了crash工具的安装和基本使用,本文作为加强理解篇,以读取系统的task_struct的tasks字段,从而获取当前进程的所有进程的task_struct。
我们通过ps可以查看到当前系统的进程信息,以前10个为例
PID PPID CPU TASK ST %MEM VSZ RSS COMM > 0 0 0 ffffffc00a6d23c0 RU 0.0 0 0 [swapper/0] > 0 0 1 ffffff81f0856580 RU 0.0 0 0 [swapper/1] > 0 0 2 ffffff81f0898000 RU 0.0 0 0 [swapper/2] 0 0 3 ffffff81f0898e80 RU 0.0 0 0 [swapper/3] 0 0 4 ffffff81f0899d00 RU 0.0 0 0 [swapper/4] > 0 0 5 ffffff81f089ab80 RU 0.0 0 0 [swapper/5] > 0 0 6 ffffff81f089ba00 RU 0.0 0 0 [swapper/6] > 0 0 7 ffffff81f089c880 RU 0.0 0 0 [swapper/7] 1 0 2 ffffff81f0808000 IN 0.1 245056 6372 systemd 2 0 6 ffffff81f0808e80 IN 0.0 0 0 [kthreadd] 3 2 0 ffffff81f0809d00 ID 0.0 0 0 [rcu_gp] 4 2 0 ffffff81f080ab80 ID 0.0 0 0 [rcu_par_gp] 8 2 0 ffffff81f080e580 ID 0.0 0 0 [mm_percpu_wq] 9 2 0 ffffff81f0850000 IN 0.0 0 0 [rcu_tasks_rude_]
通过上面信息可以发现,cpu0,1,2,5,6,7都是idle状态,只有cpu3和4是运行的状态。
从TASK一列,我们能够拿到struct task_struct的结构体地址,接下来我们基于此来进行实践crash工具
struct task_struct init_task start_kernel sched_init init_idle sprintf(idle->comm, "%s/%d", INIT_TASK_COMM, cpu); #define INIT_TASK_COMM "swapper"
对于此结构体我们关注tasks链表,所以我们需要得到pid和comm和tasks list,以swapper/0为例
crash> struct task_struct.pid,comm ffffffc00a6d23c0 pid = 0 comm = "swapper/0\000\000\000\000\000\000" tasks = { next = 0xffffff81f0808438, prev = 0xffffff80b2a82fb8 }
根据上面代码,我们知道每个cpu都有一个idle进程,所以包含cpu 1-7的信息如下
crash> struct task_struct.pid,comm,tasks ffffff81f0856580 pid = 0 comm = "swapper/1\000\000\000\000\000\000" tasks = { next = 0xffffff81f08092b8, prev = 0xffffffc00a6d27f8 <init_task+1080> } crash> struct task_struct.pid,comm,tasks ffffff81f0898000 pid = 0 comm = "swapper/2\000\000\000\000\000\000" tasks = { next = 0xffffff81f08092b8, prev = 0xffffffc00a6d27f8 <init_task+1080> } crash> struct task_struct.pid,comm,tasks ffffff81f0898e80 pid = 0 comm = "swapper/3\000\000\000\000\000\000" tasks = { next = 0xffffff81f08092b8, prev = 0xffffffc00a6d27f8 <init_task+1080> } crash> struct task_struct.pid,comm,tasks ffffff81f0899d00 pid = 0 comm = "swapper/4\000\000\000\000\000\000" tasks = { next = 0xffffff81f08092b8, prev = 0xffffffc00a6d27f8 <init_task+1080> } crash> struct task_struct.pid,comm,tasks ffffff81f089ab80 pid = 0 comm = "swapper/5\000\000\000\000\000\000" tasks = { next = 0xffffff81f08092b8, prev = 0xffffffc00a6d27f8 <init_task+1080> } crash> struct task_struct.pid,comm,tasks ffffff81f089ba00 pid = 0 comm = "swapper/6\000\000\000\000\000\000" tasks = { next = 0xffffff81f08092b8, prev = 0xffffffc00a6d27f8 <init_task+1080> } crash> struct task_struct.pid,comm,tasks ffffff81f089c880 pid = 0 comm = "swapper/7\000\000\000\000\000\000" tasks = { next = 0xffffff81f08092b8, prev = 0xffffffc00a6d27f8 <init_task+1080> }
我们知道所有的task通过tasks串起来,所以我们可以先定位tasks位于task_struct的举例,如下
crash> struct task_struct.tasks -o -x struct task_struct { [0x438] struct list_head tasks; }
这里可以知道位于task_struct的0x438个字节。我们打印tasks的链表,这里先以swapper/0为例如下
crash> struct task_struct.pid,comm,tasks ffffffc00a6d23c0 pid = 0 comm = "swapper/0\000\000\000\000\000\000" tasks = { next = 0xffffff81f0808438, prev = 0xffffff81f09e8438 }
此时我们知道其next指针是0xffffff81f0808438,它是其他进程task_struct.tasks的指针,所以我们可以通过计算偏移量来获得 task_struct。如下
>>> hex(0xffffff81f0808438-0x438) '0xffffff81f0808000'
此时我们获得了next的 task_struct指针,所以我们打印如下
crash> struct task_struct.pid,comm,tasks 0xffffff81f0808000 pid = 1 comm = "systemd\000\000\000\000\000\000\000\000" tasks = { next = 0xffffff81f08092b8, prev = 0xffffffc00a6d27f8 <init_task+1080> }
同样的,对于非swapper/0上的idle进程,我们可以获取其next的进程信息,它们默认是kthreadd如下
crash> struct task_struct.pid,comm,tasks ffffff81f0856580 pid = 0 comm = "swapper/1\000\000\000\000\000\000" tasks = { next = 0xffffff81f08092b8, prev = 0xffffffc00a6d27f8 <init_task+1080> }
计算地址
>>> hex(0xffffff81f08092b8-0x438) '0xffffff81f0808e80'
打印task_struct
crash> struct task_struct.pid,comm,tasks 0xffffff81f0808e80 pid = 2 comm = "kthreadd\000\000\000\000\000\000\000" tasks = { next = 0xffffff81f080a138, prev = 0xffffff81f0808438 }
根据上面我们可以简单通过tasks来获取next的task_struct,接下来我们使用list命令。对于swapper/0,我们list获取链表所有成员如下
crash> list -h 0xffffff81f0808438 ffffff81f0808438 ffffff81f08092b8 ffffff81f080a138
这样可以直接计算出所有的task_struct,通过如下
for addr in addresses: print("struct task_struct.pid,comm,tasks", hex(int(addr, 16) - 0x438))
这里粘贴前五个打印如下
crash> struct task_struct.pid,comm,tasks 0xffffff81f0808000 pid = 1 comm = "systemd\000\000\000\000\000\000\000\000" tasks = { next = 0xffffff81f08092b8, prev = 0xffffffc00a6d27f8 <init_task+1080> } crash> struct task_struct.pid,comm,tasks 0xffffff81f0808e80 pid = 2 comm = "kthreadd\000\000\000\000\000\000\000" tasks = { next = 0xffffff81f080a138, prev = 0xffffff81f0808438 } crash> struct task_struct.pid,comm,tasks 0xffffff81f0809d00 pid = 3 comm = "rcu_gp\000d\000\000\000\000\000\000\000" tasks = { next = 0xffffff81f080afb8, prev = 0xffffff81f08092b8 } crash> struct task_struct.pid,comm,tasks 0xffffff81f080ab80 pid = 4 comm = "rcu_par_gp\000\000\000\000\000" tasks = { next = 0xffffff81f080e9b8, prev = 0xffffff81f080a138 } crash> struct task_struct.pid,comm,tasks 0xffffff81f080e580 pid = 8 comm = "mm_percpu_wq\000\000\000" tasks = { next = 0xffffff81f0850438, prev = 0xffffff81f080afb8 }
对于swapper/1-7,这里以3个作为示例,如下
crash> struct task_struct.pid,comm,tasks 0xffffff81f0809d00 pid = 3 comm = "rcu_gp\000d\000\000\000\000\000\000\000" tasks = { next = 0xffffff81f080afb8, prev = 0xffffff81f08092b8 } crash> struct task_struct.pid,comm,tasks 0xffffff81f080ab80 pid = 4 comm = "rcu_par_gp\000\000\000\000\000" tasks = { next = 0xffffff81f080e9b8, prev = 0xffffff81f080a138 } crash> struct task_struct.pid,comm,tasks 0xffffff81f080e580 pid = 8 comm = "mm_percpu_wq\000\000\000" tasks = { next = 0xffffff81f0850438, prev = 0xffffff81f080afb8 }
可以看到,这里信息和最上面的ps得到的进程信息完全一致。
至此,我们根据crash做了一个简单的实验,通过task_struct的tasks遍历查找所有的pid和comm,它可以方便的实时查看内核的结构体数据,从而学习内核和定位内核问题。
之前讲清楚了中断向量表,rtems中只实现了sp0和spx的irq中断。那么本文基于timer中断的入口handle来讲解中断是如何触发的
操作系统的ticker是基于硬件timer实现的,为了了解中断触发,timer是非常合适的一个例子,系统启动后,记录的每一个tick值都是一次timer的定时到期。
在arm64中,timer默认是读取cntfrq_el0寄存器,如下
其代码实现如下
void arm_generic_timer_get_config( uint32_t *frequency, uint32_t *irq ) { uint64_t val; __asm__ volatile ( "mrs %[val], cntfrq_el0" : [val] "=&r" (val) ); *frequency = val; #ifdef ARM_GENERIC_TIMER_USE_VIRTUAL *irq = BSP_TIMER_VIRT_PPI; #elif defined(AARCH64_GENERIC_TIMER_USE_PHYSICAL_SECURE) *irq = BSP_TIMER_PHYS_S_PPI; #else *irq = BSP_TIMER_PHYS_NS_PPI; #endif }
我们知道默认情况下使用的是no-secure el0,所以中断号设置是30,如下
#define BSP_TIMER_PHYS_NS_PPI 30
对于timer的使用,主要如下几个步骤
至此timer会倒计时发生中断,然后就是中断触发了,对于中断触发,其流程如下
关于中断等下会讲,现在先简单把timer说清楚
关于设置cntp_ctl,代码如下
void arm_gt_clock_set_control(uint32_t ctl) { __asm__ volatile ( #ifdef AARCH64_GENERIC_TIMER_USE_VIRTUAL "msr cntv_ctl_el0, %[ctl]" #elif defined(AARCH64_GENERIC_TIMER_USE_PHYSICAL_SECURE) "msr cntps_ctl_el1, %[ctl]" #else "msr cntp_ctl_el0, %[ctl]" #endif : : [ctl] "r" (ctl) ); }
关于设置timer的计数值,代码如下
void arm_gt_clock_set_compare_value(uint64_t cval) { __asm__ volatile ( #ifdef AARCH64_GENERIC_TIMER_USE_VIRTUAL "msr cntv_cval_el0, %[cval]" #elif defined(AARCH64_GENERIC_TIMER_USE_PHYSICAL_SECURE) "msr cntps_cval_el1, %[cval]" #else "msr cntp_cval_el0, %[cval]" #endif : : [cval] "r" (cval) ); }
至此,我们先简单把timer说清楚了。
对于一个中断,我们需要填充其ISR才能正常工作,所以为了让中断触发能够跳到自己设置的ISR,我们需要注册中断,RTEMS中注册中断的方式如下
static void arm_gt_clock_handler_install(rtems_interrupt_handler handler) { rtems_status_code sc; rtems_interrupt_entry_initialize( &arm_gt_interrupt_entry, handler, &arm_gt_clock_instance, "Clock" ); sc = rtems_interrupt_entry_install( arm_gt_clock_instance.irq, RTEMS_INTERRUPT_UNIQUE, &arm_gt_interrupt_entry ); if (sc != RTEMS_SUCCESSFUL) { bsp_fatal(BSP_ARM_FATAL_GENERIC_TIMER_CLOCK_IRQ_INSTALL); } }
这里的install动作我们关注如下调用
rtems_interrupt_entry_install bsp_interrupt_entry_install bsp_interrupt_entry_install_first
此函数的代码如下
static rtems_status_code bsp_interrupt_entry_install_first( rtems_vector_number vector, rtems_option options, rtems_interrupt_entry *entry ) { rtems_vector_number index; index = vector; bsp_interrupt_entry_store_release( bsp_interrupt_get_dispatch_table_slot( index ), entry ); bsp_interrupt_set_handler_unique( index, RTEMS_INTERRUPT_IS_UNIQUE( options ) ); bsp_interrupt_vector_enable( vector ); return RTEMS_SUCCESSFUL; }
可以发现其做了如下几个事情
至此,通过上述操作,中断会注册到向量表上,如下
&_Record_Interrupt_dispatch_table[ 30 ]; bsp_interrupt_dispatch_table[ 30 ]
上面两个地址是相等的
bsp_interrupt_dispatch_table[ i ] = &_Record_Interrupt_entry_table[ i ];
此时当中断发生时,可以通过此表找到对应的ISR。
对于timer中断,其通过curr_el_spx_irq触发,之前提过中断向量表了,这里我们只关心行为,那么首先要做的是JUMP_HANDLER
.macro JUMP_HANDLER /* Mask to use in BIC, lower 7 bits */ mov x0, #0x7f /* LR contains PC, mask off to the base of the current vector */ bic x0, lr, x0 /* Load address from the last word in the vector */ ldr x0, [x0, #0x78] /* * Branch and link to the address in x0. There is no reason to save the current * LR since it has already been saved and the current contents are junk. */ blr x0 /* Pop x0,lr from stack */ ldp x0, lr, [sp], #0x10 /* Return from exception */ eret nop nop nop nop nop nop nop nop nop nop nop nop nop nop nop nop nop nop nop nop nop nop .endm
进入中断处理程序的代码是
ldr x0, [x0, #0x78]
根据RTEMS的中断管理之中断向量表
的分析,其入口函数是_AArch64_Exception_interrupt_no_nest
。
对于_AArch64_Exception_interrupt_no_nest
的代码,其实现如下
_AArch64_Exception_interrupt_no_nest: /* Execution template: Save volatile registers on thread stack(some x, all q, ELR, etc.) Switch to interrupt stack Execute interrupt handler Switch to thread stack Call thread dispatch Restore volatile registers from thread stack Return to embedded exception vector code */ /* Push interrupt context */ push_interrupt_context /* * Switch to interrupt stack, interrupt dispatch may enable interrupts causing * nesting */ msr spsel, #0 /* Jump into the handler */ bl .AArch64_Interrupt_Handler /* * Switch back to thread stack, interrupt dispatch should disable interrupts * before returning */ msr spsel, #1 /* * Check thread dispatch necessary, ISR dispatch disable and thread dispatch * disable level. */ cmp x0, #0 bne .Lno_need_thread_dispatch bl _AArch64_Exception_thread_dispatch .Lno_need_thread_dispatch: /* * SP should be where it was pre-handler (pointing at the exception frame) * or something has leaked stack space */ /* Pop interrupt context */ pop_interrupt_context /* Return to vector for final cleanup */ ret
可以看的保存中断上下文代码是push_interrupt_context
,那么其实现如下
.macro push_interrupt_context /* * Push x1-x21 on to the stack, need 19-21 because they're modified without * obeying PCS */ stp lr, x1, [sp, #-0x10]! stp x2, x3, [sp, #-0x10]! stp x4, x5, [sp, #-0x10]! stp x6, x7, [sp, #-0x10]! stp x8, x9, [sp, #-0x10]! stp x10, x11, [sp, #-0x10]! stp x12, x13, [sp, #-0x10]! stp x14, x15, [sp, #-0x10]! stp x16, x17, [sp, #-0x10]! stp x18, x19, [sp, #-0x10]! stp x20, x21, [sp, #-0x10]! /* * Push q0-q31 on to the stack, need everything because parts of every register * are volatile/corruptible */ stp q0, q1, [sp, #-0x20]! stp q2, q3, [sp, #-0x20]! stp q4, q5, [sp, #-0x20]! stp q6, q7, [sp, #-0x20]! stp q8, q9, [sp, #-0x20]! stp q10, q11, [sp, #-0x20]! stp q12, q13, [sp, #-0x20]! stp q14, q15, [sp, #-0x20]! stp q16, q17, [sp, #-0x20]! stp q18, q19, [sp, #-0x20]! stp q20, q21, [sp, #-0x20]! stp q22, q23, [sp, #-0x20]! stp q24, q25, [sp, #-0x20]! stp q26, q27, [sp, #-0x20]! stp q28, q29, [sp, #-0x20]! stp q30, q31, [sp, #-0x20]! /* Get exception LR for PC and spsr */ mrs x0, ELR_EL1 mrs x1, SPSR_EL1 /* Push pc and spsr */ stp x0, x1, [sp, #-0x10]! /* Get fpcr and fpsr */ mrs x0, FPSR mrs x1, FPCR /* Push fpcr and fpsr */ stp x0, x1, [sp, #-0x10]! .endm
可以看的,其操作如下
将spsel设置为0,这样栈跳到sp_el0上,然后执行AArch64_Interrupt_Handler
msr spsel, #0 bl .AArch64_Interrupt_Handler
我们先看AArch64_Interrupt_Handler做了哪些事情
.AArch64_Interrupt_Handler: /* Get per-CPU control of current processor */ GET_SELF_CPU_CONTROL SELF_CPU_CONTROL_GET_REG /* Increment interrupt nest and thread dispatch disable level */ ldr w2, [SELF_CPU_CONTROL, #PER_CPU_ISR_NEST_LEVEL] ldr w3, [SELF_CPU_CONTROL, #PER_CPU_THREAD_DISPATCH_DISABLE_LEVEL] add w2, w2, #1 add w3, w3, #1 str w2, [SELF_CPU_CONTROL, #PER_CPU_ISR_NEST_LEVEL] str w3, [SELF_CPU_CONTROL, #PER_CPU_THREAD_DISPATCH_DISABLE_LEVEL] /* Save LR */ mov x21, LR /* Call BSP dependent interrupt dispatcher */ bl bsp_interrupt_dispatch /* Restore LR */ mov LR, x21 /* Load some per-CPU variables */ ldr w0, [SELF_CPU_CONTROL, #PER_CPU_THREAD_DISPATCH_DISABLE_LEVEL] ldrb w1, [SELF_CPU_CONTROL, #PER_CPU_DISPATCH_NEEDED] ldr w2, [SELF_CPU_CONTROL, #PER_CPU_ISR_DISPATCH_DISABLE] ldr w3, [SELF_CPU_CONTROL, #PER_CPU_ISR_NEST_LEVEL] /* Decrement levels and determine thread dispatch state */ eor w1, w1, w0 sub w0, w0, #1 orr w1, w1, w0 orr w1, w1, w2 sub w3, w3, #1 /* Store thread dispatch disable and ISR nest levels */ str w0, [SELF_CPU_CONTROL, #PER_CPU_THREAD_DISPATCH_DISABLE_LEVEL] str w3, [SELF_CPU_CONTROL, #PER_CPU_ISR_NEST_LEVEL] /* Return should_skip_thread_dispatch in x0 */ mov x0, x1 /* Return from handler */ ret
主要做了如下事情,代码有注释,也很好理解
percpu结构体的四个变量如下
/** * This contains the current interrupt nesting level on this * CPU. */ uint32_t isr_nest_level; /** * @brief Indicates if an ISR thread dispatch is disabled. * * This flag is context switched with each thread. It indicates that this * thread has an interrupt stack frame on its stack. By using this flag, we * can avoid nesting more interrupt dispatching attempts on a previously * interrupted thread's stack. */ uint32_t isr_dispatch_disable; /** * @brief The thread dispatch critical section nesting counter which is used * to prevent context switches at inopportune moments. */ volatile uint32_t thread_dispatch_disable_level; /** * @brief This is set to true when this processor needs to run the thread * dispatcher. * * It is volatile since interrupts may alter this flag. * * This member is not protected by a lock and must be accessed only by this * processor. Code (e.g. scheduler and post-switch action requests) running * on another processors must use an inter-processor interrupt to set the * thread dispatch necessary indicator to true. * * @see _Thread_Get_heir_and_make_it_executing(). */ volatile bool dispatch_necessary;
根据上面的解析,此时中断会调整到bsp_interrupt_dispatch
中,然后寻找ISR程序运行,bsp_interrupt_dispatch
代码如下
void bsp_interrupt_dispatch(void) { while (true) { uint32_t icciar = READ_SR(ICC_IAR1); rtems_vector_number vector = GIC_CPUIF_ICCIAR_ACKINTID_GET(icciar); uint32_t status; if (!bsp_interrupt_is_valid_vector(vector)) { break; } status = arm_interrupt_enable_interrupts(); bsp_interrupt_handler_dispatch_unchecked(vector); arm_interrupt_restore_interrupts(status); WRITE_SR(ICC_EOIR1, icciar); } }
可以看的上述代码,它步骤如下
对于执行函数,如下
static inline void bsp_interrupt_dispatch_entries( const rtems_interrupt_entry *entry ) { do { ( *entry->handler )( entry->arg ); entry = bsp_interrupt_entry_load_acquire( &entry->next ); } while ( RTEMS_PREDICT_FALSE( entry != NULL ) ); }
值得注意的是,我们这里的handler就是之前注册的timer中断
Clock_driver_support_install_isr( Clock_isr );
也就是Clock_isr函数,这个函数也就是计算tick后重新激活timer。这里主要关心中断触发,接下来看中断返回
在AArch64_Interrupt_Handler
中,就介绍了中断返回的部分代码,这里简单复述一下
在AArch64_Interrupt_Handler
返回之后,还是回到中断向量表指定函数_AArch64_Exception_interrupt_no_nest
中了,这里继续做剩下的操作,如下
AArch64_Interrupt_Handler
返回之后,就直接恢复中断上下文,恢复的步骤就是将之前保存的寄存器恢复,函数是pop_interrupt_context
其实现如下
.macro pop_interrupt_context /* Pop fpcr and fpsr */ ldp x0, x1, [sp], #0x10 /* Restore fpcr and fpsr */ msr FPCR, x1 msr FPSR, x0 /* Pop pc and spsr */ ldp x0, x1, [sp], #0x10 /* Restore exception LR for PC and spsr */ msr SPSR_EL1, x1 msr ELR_EL1, x0 /* Pop q0-q31 */ ldp q30, q31, [sp], #0x20 ldp q28, q29, [sp], #0x20 ldp q26, q27, [sp], #0x20 ldp q24, q25, [sp], #0x20 ldp q22, q23, [sp], #0x20 ldp q20, q21, [sp], #0x20 ldp q18, q19, [sp], #0x20 ldp q16, q17, [sp], #0x20 ldp q14, q15, [sp], #0x20 ldp q12, q13, [sp], #0x20 ldp q10, q11, [sp], #0x20 ldp q8, q9, [sp], #0x20 ldp q6, q7, [sp], #0x20 ldp q4, q5, [sp], #0x20 ldp q2, q3, [sp], #0x20 ldp q0, q1, [sp], #0x20 /* Pop x1-x21 */ ldp x20, x21, [sp], #0x10 ldp x18, x19, [sp], #0x10 ldp x16, x17, [sp], #0x10 ldp x14, x15, [sp], #0x10 ldp x12, x13, [sp], #0x10 ldp x10, x11, [sp], #0x10 ldp x8, x9, [sp], #0x10 ldp x6, x7, [sp], #0x10 ldp x4, x5, [sp], #0x10 ldp x2, x3, [sp], #0x10 ldp lr, x1, [sp], #0x10 /* Must clear reservations here to ensure consistency with atomic operations */ clrex .endm
保存和恢复的步骤是差不多的,其步骤简单介绍如下
等到AArch64_Interrupt_Handler
返回之后,代码回到宏JUMP_HANDLER上,我们之前代码执行的blr,那么接下来做如下动作
blr x0 /* Pop x0,lr from stack */ ldp x0, lr, [sp], #0x10 /* Return from exception */ eret
这里从sp开始,去16个字节保存到x0和lr寄存器上,然后调用eret返回到中断前的状态,恢复ELR_EL1和SPSR_EL1寄存器。这里因为JUMP_HANDLER是宏,所以继续回溯到curr_el_spx_irq函数中来看ldp这条指令,如下
curr_el_spx_irq: stp x0, lr, [sp, #-0x10]! /* Push x0,lr on to the stack */ bl curr_el_spx_irq_get_pc /* Get current execution address */ curr_el_spx_irq_get_pc: /* The current PC is now in LR */ JUMP_HANDLER JUMP_TARGET_SPx .balign 0x80
可以看的,之前stp其实讲x0和lr的值保存在sp-0x10的位置,并修改了sp的值,所以eret之前做的就是恢复进入中断前的x0和lr寄存器。
至此,一次timer中断就完成返回到了中断开始之前的状态。
主动调用线程调度的函数是_AArch64_Exception_thread_dispatch
代码如下
_AArch64_Exception_thread_dispatch: /* Get per-CPU control of current processor */ GET_SELF_CPU_CONTROL SELF_CPU_CONTROL_GET_REG /* Thread dispatch */ mrs NON_VOLATILE_SCRATCH, DAIF .Ldo_thread_dispatch: /* Set ISR dispatch disable and thread dispatch disable level to one */ mov w0, #1 str w0, [SELF_CPU_CONTROL, #PER_CPU_ISR_DISPATCH_DISABLE] str w0, [SELF_CPU_CONTROL, #PER_CPU_THREAD_DISPATCH_DISABLE_LEVEL] /* Save LR */ mov x21, LR /* Call _Thread_Do_dispatch(), this function will enable interrupts */ mov x0, SELF_CPU_CONTROL mov x1, NON_VOLATILE_SCRATCH mov x2, #0x80 bic x1, x1, x2 bl _Thread_Do_dispatch /* Restore LR */ mov LR, x21 /* Disable interrupts */ msr DAIF, NON_VOLATILE_SCRATCH #ifdef RTEMS_SMP GET_SELF_CPU_CONTROL SELF_CPU_CONTROL_GET_REG #endif /* Check if we have to do the thread dispatch again */ ldrb w0, [SELF_CPU_CONTROL, #PER_CPU_DISPATCH_NEEDED] cmp w0, #0 bne .Ldo_thread_dispatch /* We are done with thread dispatching */ mov w0, #0 str w0, [SELF_CPU_CONTROL, #PER_CPU_ISR_DISPATCH_DISABLE] /* Return from thread dispatch */ ret
这里好像没什么好详细讲的了,其实就是主动调用_Thread_Do_dispatch
,主动让调度器开始下一个高优先级任务。
至此,本文详细的通过timer中断介绍了中断触发的详细过程。有助于了解RTEMS在aarch64上,以及gic-v3的中断管理流程
在《RTEMS初始化-bootcard调用流程》中就已经简单初始化中断向量表的过程,但其目的是为了梳理bootcard的调用流程,本文基于RTEMS的中断管理逻辑来分析RTEMS的中断管理功能,从而更清晰的了解RTEMS的中断管理
中断向量表的填充在aarch64-exception-default.S
中,里面会实现全局变量bsp_start_vector_table_begin的值,根据aarch64的定义中断有如下基本点
所以我们可以看到rtems填入中断的方式如下
curr_el_sp0_sync: .dword _AArch64_Exception_default .balign 0x80 curr_el_sp0_irq: JUMP_HANDLER JUMP_TARGET_SP0 .balign 0x80 curr_el_sp0_fiq: JUMP_HANDLER JUMP_TARGET_SP0 .balign 0x80 curr_el_sp0_serror: JUMP_HANDLER JUMP_TARGET_SP0 .balign 0x80 curr_el_spx_sync: .dword _AArch64_Exception_default .balign 0x80 curr_el_spx_irq: JUMP_HANDLER JUMP_TARGET_SPx .balign 0x80 curr_el_spx_fiq: JUMP_HANDLER JUMP_TARGET_SPx .balign 0x80 curr_el_spx_serror: JUMP_HANDLER JUMP_TARGET_SPx .balign 0x80 lower_el_aarch64_sync: JUMP_HANDLER JUMP_TARGET_SPx .balign 0x80 lower_el_aarch64_irq: JUMP_HANDLER JUMP_TARGET_SPx .balign 0x80 lower_el_aarch64_fiq: JUMP_HANDLER JUMP_TARGET_SPx .balign 0x80 lower_el_aarch64_serror: JUMP_HANDLER JUMP_TARGET_SPx .balign 0x80 lower_el_aarch32_sync: JUMP_HANDLER JUMP_TARGET_SPx .balign 0x80 lower_el_aarch32_irq: JUMP_HANDLER JUMP_TARGET_SPx .balign 0x80 lower_el_aarch32_fiq: JUMP_HANDLER JUMP_TARGET_SPx .balign 0x80 lower_el_aarch32_serror: JUMP_HANDLER JUMP_TARGET_SPx
可以看到,上述代码和arm64 spec描述完全一致
这部分内容在《RTEMS初始化-bootcard调用流程》提过,这里基于gic-v3再简单描述一下。
bsp_start bsp_interrupt_initialize bsp_interrupt_facility_initialize arm_interrupt_facility_set_exception_handler AArch64_set_exception_handler AArch64_get_vector_base_address char *vbar = VBAR_EL1 char *cvector_address = vbar + VECTOR_ENTRY_SIZE * exception + VECTOR_POINTER_OFFSET;
初始化共两个函数,我们逐步解析
gicv3_init_dist(ARM_GIC_DIST); gicv3_init_cpu_interface(_SMP_Get_current_processor());
对于init dist,如下
static void gicv3_init_dist(volatile gic_dist *dist) { uint32_t id_count = gicv3_get_id_count(dist); uint32_t id; dist->icddcr = GIC_DIST_ICDDCR_ARE_NS | GIC_DIST_ICDDCR_ARE_S | GIC_DIST_ICDDCR_ENABLE_GRP1S | GIC_DIST_ICDDCR_ENABLE_GRP1NS | GIC_DIST_ICDDCR_ENABLE_GRP0; for (id = 0; id < id_count; id += 32) { /* Disable all interrupts */ dist->icdicer[id / 32] = 0xffffffff; /* Set G1NS */ dist->icdigr[id / 32] = 0xffffffff; dist->icdigmr[id / 32] = 0; } for (id = 0; id < id_count; ++id) { gic_id_set_priority(dist, id, PRIORITY_DEFAULT); } for (id = 32; id < id_count; ++id) { gic_id_set_targets(dist, id, 0x01); } }
对于icddcr,对于GICD_CTLR寄存器,设置如下
dist->icddcr = GIC_DIST_ICDDCR_ARE_NS | GIC_DIST_ICDDCR_ARE_S | GIC_DIST_ICDDCR_ENABLE_GRP1S | GIC_DIST_ICDDCR_ENABLE_GRP1NS | GIC_DIST_ICDDCR_ENABLE_GRP0;
这里功能描述如下
这是distributor寄存器,其中开启了
也就是说这里开启了中断优先级路由和中断分发
对于dist->icdicer[id / 32] = 0xffffffff
这里对应寄存器GICD_ISENABLER,写1先禁用所有中断
对于dist->icdipr[id] = priority
这里对于寄存器GICD_IPRIORITYR,写入实际的优先级
对于dist->icdiptr[id] = targets;
这里对于寄存器GICD_ITARGETSR,写入中断处理目标寄存器(发给哪个CPU)
其他还有寄存器如下
volatile gic_redist *redist = gicv3_get_redist(cpu_index);
redistributor寄存器地址
在gdb中,我们可以查看中断向量表基地址为bsp_start_vector_table_begin,如下
0x6d000 <bsp_start_vector_table_begin>
对于中断的入口函数,其地址是entry + 0x78。 因为向量表的offset就是0x78,如下
#define VECTOR_POINTER_OFFSET 0x78
如果是sp0_irq,那么其地址是0x6dc98,0x6dc98是入口地址_AArch64_Exception_interrupt_nest
(gdb) x curr_el_sp0_irq + 0x78 0x6d0f8 <curr_el_sp0_irq_get_pc+112>: 0x000000000006dc98 (gdb) x 0x000000000006dc98 0x6dc98 <_AArch64_Exception_interrupt_nest>:
如果是spx_irq,那么其地址是0x6ddac,0x6ddac是入口地址_AArch64_Exception_interrupt_no_nest
(gdb) x curr_el_spx_irq + 0x78 0x6d2f8 <curr_el_spx_irq_get_pc+112>: 0x000000000006ddac (gdb) x 0x000000000006ddac 0x6ddac <_AArch64_Exception_interrupt_no_nest>:
对于未设置入口的sp0,其宏定义如下
.macro JUMP_TARGET_SP0 .dword .print_exception_dump_sp0 .endm
以curr_el_sp0_fiq为例也就是.print_exception_dump_sp0 其他类似
(gdb) x curr_el_sp0_fiq + 0x78 0x6d178 <curr_el_sp0_fiq_get_pc+112>: 0x000000000006d844 (gdb) x 0x000000000006d844 0x6d844 <.print_exception_dump_sp0>:
对于未设置入口的spx,其宏定义如下
.macro JUMP_TARGET_SPx .dword .print_exception_dump_spx .endm
以curr_el_spx_fiq为例也就是.print_exception_dump_spx,值得注意的是.print_exception_dump_spx的地址等于bsp_start_vector_table_end 其他类似
(gdb) x curr_el_spx_fiq + 0x78 0x6d378 <curr_el_spx_fiq_get_pc+112>: 0x000000000006d800 (gdb) x 0x000000000006d800 0x6d800 <bsp_start_vector_table_end>:
这里讲清楚了rtems中的中断向量表和gic-v3的中断初始化过程,接下来我们从中断触发的角度继续了解中断管理
关于内存泄漏这里就不说明了,我们知道在调试过程中,很多情况下并不算内存泄漏,例如分配的内存就是需要使用的,某些机制就需要申请内存,对象销毁才回收内存,那么它就不能算作内存泄漏,为了定位这种内存的使用情况,从而减少内存的使用情况,massif工具能够刨析内存的使用情况,然后结合代码解决这种问题。
安装很简单,我们需要具备valgrind和massif-visualizer两个工具,如下
apt install massif-visualizer valgrind
我们借助valgrind来加载massif工具,如下命令即可
valgrind --tool=massif ./ukui-tablet-desktop
当我们程序运行之后,可以将程序退出,那么本地会生成massif.out.pid的文件,根据这个文件就可以进行分析内存使用情况
分析通过massif-visualizer,命令举例如下
massif-visualizer massif.out.31446
此时我们可以看到界面如下
这里两个信息很重要
为了方便查看,我特地扩展了一下右侧的信息,这样就能够很方便的知道内存占用究竟是在哪里了。如下
这里拿峰值的72m为例,在72m的节点上,我们知道渲染使用了24M,QImage加载图片用了11M QImage构造用了2M。这都是实实在在的内存使用情况,展开就能看到函数调用。
massif能够查看内存占用,这些默认是基于上层代码函数的,如果是基于底层malloc, calloc, realloc, memalign, new等函数的,那么我们需要添加一个参数 --pages-as-heap=yes 即可。这样我们查看内存的使用情况关注点就来到了glibc上或自写的内存池上了。
为什么要这样查看呢,其实问题到这里的时候已经不是内存泄漏了,而是内存使用情况了,如果不去看底层内存管理机制,那么想降低内存的使用情况只能动复用内存或内存池或自行实现内存管理。所以才有--pages-as-heap=yes的出现。
那么演示如下:
首先我们收集日志如下
valgrind --tool=massif --pages-as-heap=yes ./ukui-tablet-desktop
退出程序之后,我们打开massif的日志界面 如下,这里我直接在峰值展开如下
这里我们就可以看到glibc内存管理的细节了,我们看到mmap arena tcache malloc free dl_map_segments 这类底层函数内存占用了。我们可以通过这些信息,为内存管理投其所好的进行内存优化,例如常用fastbin,大内存直接mmap等方式减少内存碎片。
了解glibc的简单原理可看我之前的文章。
并不是所有的内存问题都是泄漏,很多情况下其实是内存占用太大而已,实际还是需要使用的,那么这种情况下我们需要使用massif来查看内存布局,massif可以从两个角度调试,一个是应用角度,根据信息结合代码来进行初步调优,另一个是内存管理角度,根据信息结合glibc内存管理策略或自己写的内存管理程序策略来优化内存的分配,提高小块内存的利用率和减少大块内存的重复分配以及减少内存碎片。
这里介绍了massif,后续如果定位非内存泄漏问题会非常方便。
massif可以参考如下文章,原理性的东西我就不二转手了。