我们知道任何机器运行都是依赖内存的，通常情况下我们不应该怀疑内存的硬件问题，但在RAS领域上不怀疑是不应该的，对于内存而言，其实很容易出现各类的问题，例如内存大面积损坏，内存单bit翻转等。本文不讨论内存的大面积破坏的问题，因为这已经是不可修复的大缺陷了。这里讨论一种情况，那就是内存的单bit翻转导致的数据不正确时在aarch64系列芯片上的硬件和软件措施

一、硬件纠错方案

1.1 Parity

Parity也就是奇偶校验，非常早期的单片机设备总线通信例如spi等，会用到这个，这个相信大家有过介绍和理解，这里重复一下。

奇偶校验就是在一组数据上，新增一个校验位，这个校验位用于计算1的个数，如果1的个数是奇数，则是1，如果偶数，则是0。

假设我们在传输数据时，某个bit发生了翻转现象，那么我们的校验位就能识别出来。

1.2 ECC

ECC也叫Error-Correcting Code memory，我们知道Parity在简单的数据通讯中能够提示部分错误，但是不能主动回复错误，那么ECC就是一种能够恢复位翻转错误的一种硬件技术，当代内存颗粒基本上都具备ECC校验的基本功能。 ECC有多种纠错算法。这里简单列举一下：

SECDED(Single Error Correction Double Error Detection)
SSCDSD(Single symbol correction double symbol detection)
CRC(Cyclic Redundancy Check)
Chipkill

二、软件实施方案

当我们了解了对于内存领域常见的硬件纠错方案之后，我们也需要知道软件是如何处理和规范解决这种ECC错误的

软件的方案在arm架构上主要有两点：

ESB: Error Synchronization Barrier
SDEI: Software Delegated Exception Interface

2.1 ESB

在arm中，对于内存的这类错误有一个单独的概念叫做ESB，他能够记录内存的同步错误。

2.1.1 ESB的描述

arm规范中,ESB如下描述：

可以理解到，ESB是arm规范中作为错误同步屏障记录在特殊寄存器DISR(Deferred Interrupt Status Register)上并通过EL1层和EL2层上才能获取。

ESB的状态需要架构打开RAS扩展，否则作为空指令执行。

2.1.2 代码简析

对于ECC/Parity错误，在arm中默认是通过mm的fault来接受的，流程如下：

首先我们注意异常向量表如下：


 SYM_CODE_START(vectors)
 kernel_ventry 1, sync_invalid // Synchronous EL1t
 kernel_ventry 1, irq_invalid // IRQ EL1t
 kernel_ventry 1, fiq_invalid // FIQ EL1t
 kernel_ventry 1, error_invalid // Error EL1t
kernel_ventry 1, sync // Synchronous EL1h
 kernel_ventry 1, irq // IRQ EL1h
 kernel_ventry 1, fiq_invalid // FIQ EL1h
 kernel_ventry 1, error // Error EL1h
kernel_ventry 0, sync // Synchronous 64-bit EL0
 kernel_ventry 0, irq // IRQ 64-bit EL0
 kernel_ventry 0, fiq_invalid // FIQ 64-bit EL0
 kernel_ventry 0, error // Error 64-bit EL0
#ifdef CONFIG_COMPAT
 kernel_ventry 0, sync_compat, 32 // Synchronous 32-bit EL0
 kernel_ventry 0, irq_compat, 32 // IRQ 32-bit EL0
 kernel_ventry 0, fiq_invalid_compat, 32 // FIQ 32-bit EL0
 kernel_ventry 0, error_compat, 32 // Error 32-bit EL0
#else
 kernel_ventry 0, sync_invalid, 32 // Synchronous 32-bit EL0
 kernel_ventry 0, irq_invalid, 32 // IRQ 32-bit EL0
 kernel_ventry 0, fiq_invalid, 32 // FIQ 32-bit EL0
 kernel_ventry 0, error_invalid, 32 // Error 32-bit EL0
#endif
SYM_CODE_END(vectors)

我们这里以el0的sync异常为例，因为内存的同步异常通过sync来触发，如下：


kernel_ventry   0, sync                         // Synchronous 64-bit EL0

此时对于的函数如下：


SYM_CODE_START_LOCAL_NOALIGN(el0_sync)
 kernel_entry 0
 mov x0, sp
 bl el0_sync_handler
 b ret_to_user
SYM_CODE_END(el0_sync)

这里发现会跳转到函数el0_sync_handler，其实现如下：


asmlinkage void noinstr el0_sync_handler(struct pt_regs *regs)
{
        unsigned long esr = read_sysreg(esr_el1);
 
        switch (ESR_ELx_EC(esr)) {
        case ESR_ELx_EC_SVC64:
                el0_svc(regs);
                break;
        case ESR_ELx_EC_DABT_LOW:
                el0_da(regs, esr);
                break;
        case ESR_ELx_EC_IABT_LOW:
                el0_ia(regs, esr);
                break;
        case ESR_ELx_EC_FP_ASIMD:
                el0_fpsimd_acc(regs, esr);
                break;
        case ESR_ELx_EC_SVE:
                el0_sve_acc(regs, esr);
                break;
        case ESR_ELx_EC_FP_EXC64:
                el0_fpsimd_exc(regs, esr);
                break;
        case ESR_ELx_EC_SYS64:
        case ESR_ELx_EC_WFx:
                el0_sys(regs, esr);
                break;
        case ESR_ELx_EC_SP_ALIGN:
                el0_sp(regs, esr);
                break;
        case ESR_ELx_EC_PC_ALIGN:
                el0_pc(regs, esr);
                break;
        case ESR_ELx_EC_UNKNOWN:
                el0_undef(regs);
                break;
        case ESR_ELx_EC_BTI:
                el0_bti(regs);
                break;
        case ESR_ELx_EC_BREAKPT_LOW:
        case ESR_ELx_EC_SOFTSTP_LOW:
        case ESR_ELx_EC_WATCHPT_LOW:
        case ESR_ELx_EC_BRK64:
                el0_dbg(regs, esr);
                break;
        case ESR_ELx_EC_FPAC:
                el0_fpac(regs, esr);
                break;
        default:
                el0_inv(regs, esr);
        }
}

我们留意data abort error，所以关心如下：


case ESR_ELx_EC_DABT_LOW:
 el0_da(regs, esr);
 break;

其函数如下


static void noinstr el0_da(struct pt_regs *regs, unsigned long esr)
{
        unsigned long far = read_sysreg(far_el1);
 
        enter_from_user_mode();
        local_daif_restore(DAIF_PROCCTX);
        do_mem_abort(far, esr, regs);
}

我们看看跳转函数do_mem_abort的实现


void do_mem_abort(unsigned long far, unsigned int esr, struct pt_regs *regs)
{
        const struct fault_info *inf = esr_to_fault_info(esr);
        unsigned long addr = untagged_addr(far);
 
        if (!inf->fn(far, esr, regs))
                return;
 
        if (!user_mode(regs)) {
                pr_alert("Unhandled fault at 0x%016lx\n", addr);
                trace_android_rvh_do_mem_abort(regs, esr, addr, inf->name);
                mem_abort_decode(esr);
                show_pte(addr);
        }
 
        /*
         * At this point we have an unrecognized fault type whose tag bits may
         * have been defined as UNKNOWN. Therefore we only expose the untagged
         * address to the signal handler.
         */
        arm64_notify_die(inf->name, regs, inf->sig, inf->code, addr, esr);
}

这里留意函数esr_to_fault_info，如下：


static inline const struct fault_info *esr_to_fault_info(unsigned int esr)
{
        return fault_info + (esr & ESR_ELx_FSC);
}

所以我们应该关注这个核心的数组fault_info，如下：


static const struct fault_info fault_info[] = {
        { do_bad,               SIGKILL, SI_KERNEL,     "ttbr address size fault"       },
        { do_bad,               SIGKILL, SI_KERNEL,     "level 1 address size fault"    },
        { do_bad,               SIGKILL, SI_KERNEL,     "level 2 address size fault"    },
        { do_bad,               SIGKILL, SI_KERNEL,     "level 3 address size fault"    },
        { do_translation_fault, SIGSEGV, SEGV_MAPERR,   "level 0 translation fault"     },
        { do_translation_fault, SIGSEGV, SEGV_MAPERR,   "level 1 translation fault"     },
        { do_translation_fault, SIGSEGV, SEGV_MAPERR,   "level 2 translation fault"     },
        { do_translation_fault, SIGSEGV, SEGV_MAPERR,   "level 3 translation fault"     },
        { do_bad,               SIGKILL, SI_KERNEL,     "unknown 8"                     },
        { do_page_fault,        SIGSEGV, SEGV_ACCERR,   "level 1 access flag fault"     },
        { do_page_fault,        SIGSEGV, SEGV_ACCERR,   "level 2 access flag fault"     },
        { do_page_fault,        SIGSEGV, SEGV_ACCERR,   "level 3 access flag fault"     },
        { do_bad,               SIGKILL, SI_KERNEL,     "unknown 12"                    },
        { do_page_fault,        SIGSEGV, SEGV_ACCERR,   "level 1 permission fault"      },
        { do_page_fault,        SIGSEGV, SEGV_ACCERR,   "level 2 permission fault"      },
        { do_page_fault,        SIGSEGV, SEGV_ACCERR,   "level 3 permission fault"      },
        { do_sea,               SIGBUS,  BUS_OBJERR,    "synchronous external abort"    },
        { do_tag_check_fault,   SIGSEGV, SEGV_MTESERR,  "synchronous tag check fault"   },
        { do_bad,               SIGKILL, SI_KERNEL,     "unknown 18"                    },
        { do_bad,               SIGKILL, SI_KERNEL,     "unknown 19"                    },
        { do_sea,               SIGKILL, SI_KERNEL,     "level 0 (translation table walk)"      },
        { do_sea,               SIGKILL, SI_KERNEL,     "level 1 (translation table walk)"      },
        { do_sea,               SIGKILL, SI_KERNEL,     "level 2 (translation table walk)"      },
        { do_sea,               SIGKILL, SI_KERNEL,     "level 3 (translation table walk)"      },
        { do_sea,               SIGBUS,  BUS_OBJERR,    "synchronous parity or ECC error" },    // Reserved when RAS is implemented
        { do_bad,               SIGKILL, SI_KERNEL,     "unknown 25"                    },
        { do_bad,               SIGKILL, SI_KERNEL,     "unknown 26"                    },
        { do_bad,               SIGKILL, SI_KERNEL,     "unknown 27"                    },
        { do_sea,               SIGKILL, SI_KERNEL,     "level 0 synchronous parity error (translation table walk)"     },      // Reserved when RAS is implemented
        { do_sea,               SIGKILL, SI_KERNEL,     "level 1 synchronous parity error (translation table walk)"     },      // Reserved when RAS is implemented
        { do_sea,               SIGKILL, SI_KERNEL,     "level 2 synchronous parity error (translation table walk)"     },      // Reserved when RAS is implemented
        { do_sea,               SIGKILL, SI_KERNEL,     "level 3 synchronous parity error (translation table walk)"     },      // Reserved when RAS is implemented
        { do_bad,               SIGKILL, SI_KERNEL,     "unknown 32"                    },
        { do_alignment_fault,   SIGBUS,  BUS_ADRALN,    "alignment fault"               },
        { do_bad,               SIGKILL, SI_KERNEL,     "unknown 34"                    },
        { do_bad,               SIGKILL, SI_KERNEL,     "unknown 35"                    },
        { do_bad,               SIGKILL, SI_KERNEL,     "unknown 36"                    },
        { do_bad,               SIGKILL, SI_KERNEL,     "unknown 37"                    },
        { do_bad,               SIGKILL, SI_KERNEL,     "unknown 38"                    },
        { do_bad,               SIGKILL, SI_KERNEL,     "unknown 39"                    },
        { do_bad,               SIGKILL, SI_KERNEL,     "unknown 40"                    },
        { do_bad,               SIGKILL, SI_KERNEL,     "unknown 41"                    },
        { do_bad,               SIGKILL, SI_KERNEL,     "unknown 42"                    },
        { do_bad,               SIGKILL, SI_KERNEL,     "unknown 43"                    },
        { do_bad,               SIGKILL, SI_KERNEL,     "unknown 44"                    },
        { do_bad,               SIGKILL, SI_KERNEL,     "unknown 45"                    },
        { do_bad,               SIGKILL, SI_KERNEL,     "unknown 46"                    },
        { do_bad,               SIGKILL, SI_KERNEL,     "unknown 47"                    },
        { do_bad,               SIGKILL, SI_KERNEL,     "TLB conflict abort"            },
        { do_bad,               SIGKILL, SI_KERNEL,     "Unsupported atomic hardware update fault"      },
        { do_bad,               SIGKILL, SI_KERNEL,     "unknown 50"                    },
        { do_bad,               SIGKILL, SI_KERNEL,     "unknown 51"                    },
        { do_bad,               SIGKILL, SI_KERNEL,     "implementation fault (lockdown abort)" },
        { do_bad,               SIGBUS,  BUS_OBJERR,    "implementation fault (unsupported exclusive)" },
        { do_bad,               SIGKILL, SI_KERNEL,     "unknown 54"                    },
        { do_bad,               SIGKILL, SI_KERNEL,     "unknown 55"                    },
        { do_bad,               SIGKILL, SI_KERNEL,     "unknown 56"                    },
        { do_bad,               SIGKILL, SI_KERNEL,     "unknown 57"                    },
        { do_bad,               SIGKILL, SI_KERNEL,     "unknown 58"                    },
        { do_bad,               SIGKILL, SI_KERNEL,     "unknown 59"                    },
        { do_bad,               SIGKILL, SI_KERNEL,     "unknown 60"                    },
        { do_bad,               SIGKILL, SI_KERNEL,     "section domain fault"          },
        { do_bad,               SIGKILL, SI_KERNEL,     "page domain fault"             },
        { do_bad,               SIGKILL, SI_KERNEL,     "unknown 63"                    },
};

这里我们留意ECC和Parity错误，如下：


{ do_sea,               SIGBUS,  BUS_OBJERR,    "synchronous parity or ECC error" },    // Reserved when RAS is implemented

到这里，我们知道了常见的ECC/Parity错误会触发到软件的do_sea，这里我们重点开始关心软件上接受错误了是如何的行为，所以留意arm64_notify_die函数


void arm64_notify_die(const char *str, struct pt_regs *regs,
                      int signo, int sicode, unsigned long far,
                      int err)
{
        if (user_mode(regs)) {
                WARN_ON(regs != current_pt_regs());
                current->thread.fault_address = 0;
                current->thread.fault_code = err;
 
                arm64_force_sig_fault(signo, sicode, far, str);
        } else {
                die(str, regs, err);
        }
}

这里可以看到区分了用户空间和内核空间

用户空间调用的是arm64_force_sig_fault，这里可以发现其发送了SIGBUS的错误


void arm64_force_sig_fault(int signo, int code, unsigned long far,
                           const char *str)
{
        arm64_show_signal(signo, str);
        if (signo == SIGKILL)
                force_sig(SIGKILL);
        else
                force_sig_fault(signo, code, (void __user *)far);
}

force_sig_fault已经到信号的实现核心函数上了，这里不做解析了。

而内核空间则调用了die，这里直接oops了，如果打开了panic，则panic了。


void die(const char *str, struct pt_regs *regs, int err) {
oops_exit();
if (in_interrupt())
 panic("%s: Fatal exception in interrupt", str);
 if (panic_on_oops)
 panic("%s: Fatal exception", str);
}

至此，我们可以发现，如果系统发生了ECC错误，那么会通过同步异常给到aarch64芯片，我们以el0为例，该错误会通过异常向量表给到do_sea函数，此函数会根据ecc的内存错误发生地方判断是否在用户空间，如果是用户空间，则通过bus error终结程序，如果是内核空间，则发送oops。

2.2 SDEI

SDEI是arm架构提出来的一套软件处理接口，我们从全称就可以了解Software Delegated Exception interface。它的逻辑是通过在非安全事件注册回调，

2.2.1 SDEI的描述

SDEI在spec中描述的实现在安全世界。其流程如下：

SDEI会定义一系列的交互方式，如下：

这里描述了SDEI handler的交互过程。

2.2.2 代码简析

我们关注trampoline如下：


SYM_CODE_START(__sdei_asm_entry_trampoline)
 mrs x4, ttbr1_el1
 tbz x4, #USER_ASID_BIT, 1f
tramp_map_kernel tmp=x4
 isb
 mov x4, xzr
/*
 * Use reg->interrupted_regs.addr_limit to remember whether to unmap
 * the kernel on exit.
 */
1: str x4, [x1, #(SDEI_EVENT_INTREGS + S_ORIG_ADDR_LIMIT)]
tramp_data_read_var x4, __sdei_asm_handler
 br x4
SYM_CODE_END(__sdei_asm_entry_trampoline)

其实现如下：


/*
 * Software Delegated Exception entry point.
 *
 * x0: Event number
 * x1: struct sdei_registered_event argument from registration time.
 * x2: interrupted PC
 * x3: interrupted PSTATE
 * x4: maybe clobbered by the trampoline
 *
 * Firmware has preserved x0->x17 for us, we must save/restore the rest to
 * follow SMC-CC. We save (or retrieve) all the registers as the handler may
 * want them.
 */
 
 
SYM_CODE_START(__sdei_asm_handler)
        stp     x2, x3, [x1, #SDEI_EVENT_INTREGS + S_PC]
        stp     x4, x5, [x1, #SDEI_EVENT_INTREGS + 16 * 2]
        stp     x6, x7, [x1, #SDEI_EVENT_INTREGS + 16 * 3]
        stp     x8, x9, [x1, #SDEI_EVENT_INTREGS + 16 * 4]
        stp     x10, x11, [x1, #SDEI_EVENT_INTREGS + 16 * 5]
        stp     x12, x13, [x1, #SDEI_EVENT_INTREGS + 16 * 6]
        stp     x14, x15, [x1, #SDEI_EVENT_INTREGS + 16 * 7]
        stp     x16, x17, [x1, #SDEI_EVENT_INTREGS + 16 * 8]
        stp     x18, x19, [x1, #SDEI_EVENT_INTREGS + 16 * 9]
        stp     x20, x21, [x1, #SDEI_EVENT_INTREGS + 16 * 10]
        stp     x22, x23, [x1, #SDEI_EVENT_INTREGS + 16 * 11]
        stp     x24, x25, [x1, #SDEI_EVENT_INTREGS + 16 * 12]
        stp     x26, x27, [x1, #SDEI_EVENT_INTREGS + 16 * 13]
        stp     x28, x29, [x1, #SDEI_EVENT_INTREGS + 16 * 14]
        mov     x4, sp
        stp     lr, x4, [x1, #SDEI_EVENT_INTREGS + S_LR]
 
        mov     x19, x1
 
        /* Store the registered-event for crash_smp_send_stop() */
        ldrb    w4, [x19, #SDEI_EVENT_PRIORITY]
        cbnz    w4, 1f
        adr_this_cpu dst=x5, sym=sdei_active_normal_event, tmp=x6
        b       2f
1:      adr_this_cpu dst=x5, sym=sdei_active_critical_event, tmp=x6
2:      str     x19, [x5]
#ifdef CONFIG_VMAP_STACK
        /*
         * entry.S may have been using sp as a scratch register, find whether
         * this is a normal or critical event and switch to the appropriate
         * stack for this CPU.
         */
        cbnz    w4, 1f
        ldr_this_cpu dst=x5, sym=sdei_stack_normal_ptr, tmp=x6
        b       2f
1:      ldr_this_cpu dst=x5, sym=sdei_stack_critical_ptr, tmp=x6
2:      mov     x6, #SDEI_STACK_SIZE
        add     x5, x5, x6
        mov     sp, x5
#endif
 
#ifdef CONFIG_SHADOW_CALL_STACK
        /* Use a separate shadow call stack for normal and critical events */
        cbnz    w4, 3f
        ldr_this_cpu dst=scs_sp, sym=sdei_shadow_call_stack_normal_ptr, tmp=x6
        b       4f
3:      ldr_this_cpu dst=scs_sp, sym=sdei_shadow_call_stack_critical_ptr, tmp=x6
4:
#endif
 
        /*
         * We may have interrupted userspace, or a guest, or exit-from or
         * return-to either of these. We can't trust sp_el0, restore it.
         */
        mrs     x28, sp_el0
        ldr_this_cpu    dst=x0, sym=__entry_task, tmp=x1
        msr     sp_el0, x0
 
        /* If we interrupted the kernel point to the previous stack/frame. */
        and     x0, x3, #0xc
        mrs     x1, CurrentEL
        cmp     x0, x1
        csel    x29, x29, xzr, eq       // fp, or zero
        csel    x4, x2, xzr, eq         // elr, or zero
 
        stp     x29, x4, [sp, #-16]!
        mov     x29, sp
 
        add     x0, x19, #SDEI_EVENT_INTREGS
        mov     x1, x19
        bl      __sdei_handler
 
        msr     sp_el0, x28
        /* restore regs >x17 that we clobbered */
        mov     x4, x19         // keep x4 for __sdei_asm_exit_trampoline
        ldp     x28, x29, [x4, #SDEI_EVENT_INTREGS + 16 * 14]
        ldp     x18, x19, [x4, #SDEI_EVENT_INTREGS + 16 * 9]
        ldp     lr, x1, [x4, #SDEI_EVENT_INTREGS + S_LR]
        mov     sp, x1
 
        mov     x1, x0                  // address to complete_and_resume
        /* x0 = (x0 <= 1) ? EVENT_COMPLETE:EVENT_COMPLETE_AND_RESUME */
        cmp     x0, #1
        mov_q   x2, SDEI_1_0_FN_SDEI_EVENT_COMPLETE
        mov_q   x3, SDEI_1_0_FN_SDEI_EVENT_COMPLETE_AND_RESUME
        csel    x0, x2, x3, ls
 
        ldr_l   x2, sdei_exit_mode
 
        /* Clear the registered-event seen by crash_smp_send_stop() */
        ldrb    w3, [x4, #SDEI_EVENT_PRIORITY]
        cbnz    w3, 1f
        adr_this_cpu dst=x5, sym=sdei_active_normal_event, tmp=x6
        b       2f
1:      adr_this_cpu dst=x5, sym=sdei_active_critical_event, tmp=x6
2:      str     xzr, [x5]
 
alternative_if_not ARM64_UNMAP_KERNEL_AT_EL0
        sdei_handler_exit exit_mode=x2
alternative_else_nop_endif
 
#ifdef CONFIG_UNMAP_KERNEL_AT_EL0
        tramp_alias     dst=x5, sym=__sdei_asm_exit_trampoline, tmp=x3
        br      x5
#endif
SYM_CODE_END(__sdei_asm_handler)
NOKPROBE(__sdei_asm_handler)

这里我们关注其跳转如下：


bl __sdei_handler

其实现如下：


asmlinkage noinstr unsigned long
__sdei_handler(struct pt_regs *regs, struct sdei_registered_event *arg)
{
        unsigned long ret;
 
        arm64_enter_nmi(regs);
 
        ret = _sdei_handler(regs, arg);
 
        arm64_exit_nmi(regs);
 
        return ret;
}

对于_sdei_handler，会按照SDEI协议的event handler去处理，其函数如下：


static __kprobes unsigned long _sdei_handler(struct pt_regs *regs,
                                             struct sdei_registered_event *arg)
{
        u32 mode;
        int i, err = 0;
        int clobbered_registers = 4;
        u64 elr = read_sysreg(elr_el1);
        u32 kernel_mode = read_sysreg(CurrentEL) | 1;   /* +SPSel */
        unsigned long vbar = read_sysreg(vbar_el1);
 
        if (arm64_kernel_unmapped_at_el0())
                clobbered_registers++;
 
        /* Retrieve the missing registers values */
        for (i = 0; i < clobbered_registers; i++) {
                /* from within the handler, this call always succeeds */
                sdei_api_event_context(i, &regs->regs[i]);
        }
 
        /*
         * We didn't take an exception to get here, set PAN. UAO will be cleared
         * by sdei_event_handler()s force_uaccess_begin() call.
         */
        __uaccess_enable_hw_pan();
 
        err = sdei_event_handler(regs, arg);
        if (err)
                return SDEI_EV_FAILED;
 
        if (elr != read_sysreg(elr_el1)) {
                /*
                 * We took a synchronous exception from the SDEI handler.
                 * This could deadlock, and if you interrupt KVM it will
                 * hyp-panic instead.
                 */
                pr_warn("unsafe: exception during handler\n");
        }
 
        mode = regs->pstate & (PSR_MODE32_BIT | PSR_MODE_MASK);
 
        /*
         * If we interrupted the kernel with interrupts masked, we always go
         * back to wherever we came from.
         */
        if (mode == kernel_mode && !interrupts_enabled(regs))
                return SDEI_EV_HANDLED;
 
        /*
         * Otherwise, we pretend this was an IRQ. This lets user space tasks
         * receive signals before we return to them, and KVM to invoke it's
         * world switch to do the same.
         *
         * See DDI0487B.a Table D1-7 'Vector offsets from vector table base
         * address'.
         */
        if (mode == kernel_mode)
                return vbar + 0x280;
        else if (mode & PSR_MODE32_BIT)
                return vbar + 0x680;
 
        return vbar + 0x480;
}

这里我们关注函数sdei_event_handler，此时函数是acpi/fdt实现的firmware驱动，如下


int sdei_event_handler(struct pt_regs *regs,
                       struct sdei_registered_event *arg)
{
        int err;
        mm_segment_t orig_addr_limit;
        u32 event_num = arg->event_num;
 
        /*
         * Save restore 'fs'.
         * The architecture's entry code save/restores 'fs' when taking an
         * exception from the kernel. This ensures addr_limit isn't inherited
         * if you interrupted something that allowed the uaccess routines to
         * access kernel memory.
         * Do the same here because this doesn't come via the same entry code.
        */
        orig_addr_limit = force_uaccess_begin();
 
        err = arg->callback(event_num, regs, arg->callback_arg);
        if (err)
                pr_err_ratelimited("event %u on CPU %u failed with error: %d\n",
                                   event_num, smp_processor_id(), err);
 
        force_uaccess_end(orig_addr_limit);
 
        return err;
}
NOKPROBE_SYMBOL(sdei_event_handler);

接下来的流程，就完全符合SDEI定义的交互流程了。

三、rasdaemon

回顾了这些代码，在学习sdei的时候，意外发现一个仓库rasdaemon，此仓库是目的是通过一个上层的程序，捕获常见的ras领域的错误，当然也包括我们的内存单bit翻转的错误。

对于rasdaemon，可以将内存的错误数量进行统计。提供给用户查看。

不幸的是，此工具不在aarch64上实现，我们但是我们在amd上可以看到如下实现：


parse_amd_smca_event--->decode_smca_error

我们随便以smca_mce_descs中的一种desc描述示例如下：


static const char * const smca_smu2_mce_desc[] = {
    "High SRAM ECC or parity error",
    "Low SRAM ECC or parity error",
    "Data Cache Bank A ECC or parity error",
    "Data Cache Bank B ECC or parity error",
    "Data Tag Cache Bank A ECC or parity error",
    "Data Tag Cache Bank B ECC or parity error",
    "Instruction Cache Bank A ECC or parity error",
    "Instruction Cache Bank B ECC or parity error",
    "Instruction Tag Cache Bank A ECC or parity error",
    "Instruction Tag Cache Bank B ECC or parity error",
    "System Hub Read Buffer ECC or parity error",
    "PHY RAS ECC Error",
    [12 ... 57] = "Reserved",
    "A correctable error from a GFX Sub-IP",
    "A fatal error from a GFX Sub-IP",
    "Reserved",
    "Reserved",
    "A poison error from a GFX Sub-IP",
    "Reserved",
};

可以发现，其desc能够捕获ECC和Parity error。

但是鉴于自己没有对应的机器实践，rasdaemon并没有尝试验证。

四、参考文档：

memoryerrors-asplos15.pdf

ARM_DDI_0587C_a_RAS.pdf

ARM_DEN0054C_Software_Delegated_Exception_Interface.pdf

目录