编辑
2024-01-16
记录知识
0
请注意,本文编写于 648 天前,最后修改于 179 天前,其中某些信息可能已经过时。

目录

[jailhouse]文档介绍
Publicly Announced
理解jailhouse第一部分
了解jailhouse的结构体
启动jailhouse
CPU初始化
理解jailhouse第二部分
处理中断
创建cell
cell的初始化和销毁
禁用jailhouse
源码doc文档

[jailhouse]文档介绍

因为CPU分区越来越流行,这里为了学习jailhouse,先从其文档了解开始,这里记录了解的过程

Publicly Announced

https://lwn.net/Articles/574273/ https://lwn.net/Articles/574274/

Jailhouse是partitioning hypervisor分区虚拟化,能够运行bare-metal applications or non-Linux OSes aside a standard Linux kernel裸金属或无linuxOS的标准Linux
we expect to see in non-Linux cells are applications with highly demanding real-time, safety or security requirements
其宗旨是在non-linux侧应用高实时,高安全,高可靠程序
如果发展可观,个人感觉这两年这个愿景应该不远了,但也足足十年了

理解jailhouse第一部分

https://lwn.net/Articles/578295/

Jailhouse is different. First of all, it is a partitioning hypervisor that is more concerned with isolation than virtualization
jailhouse更多关心隔离
Each cell runs one guest and has a set of assigned resources (CPUs, memory regions, PCI devices) that it fully controls
对CPU,内存,PCI设备的完全控制
the Linux cell doesn't assert full control over hardware resources as dom0 does
when a new cell is created, the Linux cell cedes control over some of its CPU, device, and memory resources to that new cell
不会和其他虚拟机一样控制硬件,而是交给新的cell去完全控制

了解jailhouse的结构体

struct Jailhouse_system # 运行中的系统描述符 1. hypervisor_memory # jailhouse内的内存 2. config_memory # 硬件配置信息 3. system # 对linux侧的初始化配置描述符 struct jailhouse_cell_desc 成员如下: 1. cell名字 2. cpu_set # cpu个数 3. mem_regions # 内存区域 4. irq_lines # IRQ号 5. I/O bitmap # io 位图 5. pci_devices # PCI设备 jailhouse没有对人友好的配置文件,是通过objdump生成的原始二进制,未来可能用xml来实现 struct per_cpu 分配的CPU(cpu_data)包括如下: cpu_id,acpi_id,栈大小,cell结构体引用,register,cpu模式 struct jailhouse_header 描述整个虚拟机管理程序,包含如下: 虚拟机二进制起始位置,内存大小,页偏移,cpu数量。

需要对三个结构体详细分析(Jailhouse_system,jailhouse_cell_desc,jailhouse_header),后面补上

启动jailhouse

Jailhouse operates in a physically continuous memory region 需要连续物理内存 using the "memmap=" kernel command-line parameter 使用cmdline配置这块连续内存 future versions may use the contiguous memory allocator (CMA) 未来可能直接使用cma the loader linearly maps this memory into the kernel's virtual address space. Its offset from the memory region's base address is stored in the page_offset field of the header 物理地址和虚拟地址映射关系在page_offset上(struct jailhouse_header) For these tasks, the jailhouse user-space tool issues ioctl() commands to /dev/jailhouse 通过用户工具下发ioctl来初始化虚拟机,/dev/jailhouse是main.c编译的ko文件 the jailhouse tool is used to issue a JAILHOUSE_ENABLE ioctl() which causes a call to jailhouse_enable() 先发送JAILHOUSE_ENABLE,对应内核调用jailhouse_enable It loads the hypervisor code into the reserved memory region via a request_firmware() call 加载虚拟程序二进制到预留内存上(request_firmware调用) Then jailhouse_enable() maps Jailhouse's reserved memory region into kernel space using ioremap() and marks its pages as executable 将申请的内存地址ioremap到内核空间,并设置可执行权限 The hypervisor and a system configuration (struct jailhouse_system) copied from user space are laid out in the reserved region 结构体jailhouse_system是虚拟机运行的描述符,也会保存在这块区域 Finally, jailhouse_enable() calls enter_hypervisor() on each CPU, passing it the header, and waits until all these calls return 最后enter_hypervisor会在每个cpu上调用 This code locates the per_cpu region for a given cpu_id, stores the Linux stack pointer and cpu_id in it, sets the Jailhouse stack, and calls the architecture-independent entry() function, passing it a pointer to cpu_data. When this function returns, the Linux stack pointer is restored. 函数enter_hypervisor通过cpu_id来定位cpu,保存sp指针和cpu_id(虚拟机程序入栈),设置jailhouse的栈和函数入口,入参是cpu_data指针。函数返回后栈会恢复(虚拟机程序出栈) The entry() function is what actually enables Jailhouse 入口函数就是虚拟机二进制,之前不是说这块内存设置执行标志了嘛,这里就是运行它

我整个理解如下:

Linux启动--->初始化子系统--->ioctl下发--->加载虚拟程序--->入口函数设置--->运行虚拟程序 1. cmdline参数添加memmap项,内核启动时映射这块预留内存 2. 创建linux cell,使能虚拟化,将linux以访客模式迁移到cell内 3. 用户空间ioctl发出JAILHOUSE_ENABLE 内核调用Jailhouse_enable 4. 通过request_firmware将虚拟程序加载到预留内存,并标记可执行将虚拟机信息结构体填充调用入口函数 5. 对每个CPU保存现场,为入口函数设置堆栈和相关信息,在入口函数返回时恢复现场 6. 对linux cell设置分页,配置中断IDT,设置IO虚拟化,而对no root cell运行虚拟程序即可

CPU初始化

1. CPU initialization is a lengthy process that begins in the cpu_init() function 2. if it is on the system CPU set, it is added to the Linux cell The rest of the procedure is architecture-specific and continues in arch_cpu_init() 3. Then Jailhouse swaps the IDT (interrupt handlers), the Global Descriptor Table (GDT) that contains segment descriptors, and CR3 (page directory pointer) register with its own values 4. Finally, arch_cpu_init() fills the cpu_data->apic_id field (see apic_cpu_init()) and configures Virtual Machine Extensions (VMX) for the CPU. Then it prepares the Virtual Machine Control Structure (VMCS) which is located in cpu_data, and enables VMX on the CPU. The VMCS region is configured in vmcs_setup() so that on every VM entry or exit: 5. When all CPUs are initialized, entry() calls arch_cpu_activate_vmm(). it sets the RAX register to zero, loads all the general-purpose registers left and issues a VMLAUNCH instruction to enter the guest 1. 首先使用cpu_init()初始化,注册Linux CPU。 2. 如果cpuset在系统上,则加入linux cell,否则继续运行arch_cpu_init(保存当前寄存器并在进入vm时恢复) 3. jailhouse将IDT,GDT和CR3这些值和自己对应的值切换 4. 最后arch_cpu_init()填充cpu_data->apic_id,并配置VMX,然后在CPU上启用VMX 5. 当cpu初始化完成后,entry() 调用 arch_cpu_activate_vmm(),返回值清零,加载所有通用寄存器并发出VMLAUNCH指令

这里关于虚拟机启动退出额外解释如下

主机: 1. jailhouse获取控制(CR0-15)和段寄存器值(CS/SS/DS/ES/FS/GS) 2. 将IA32_EFER MSR拉高,切换处理器到64位模式 3. SP指针设置到cpu_data->stack末尾 4. RIP(arm是PC)设置为vm_exit函数入口 5. RFLAGS(arm是CPSR)寄存器中清楚IF(禁用中断) 6. vm_exit调用vmx_handle_exit并在返回是使用VMRESUME指令恢复VM执行,这样每次VM退出都会禁用中断 7. 因为jailhouse没有系统调用,所以SYSENTER MSR被清除。 虚拟机: 1. 从cpu_data->linux_*获取控制(CR0-15)和段寄存器值(CS/SS/DS/ES/FS/GS) 2. RSP(栈顶寄存器)和RIP(PC)的值从arch_entry取得 3. 获取到了RIP也就等于运行了虚拟机程序 4. IA32_EFER MSR寄存器会设置为Linux上原来的值 5. 虚拟机程序运行后,会覆盖所有的CPU相关寄存器

对于CPU初始化,个人理解是根据cpuset来判断是否在linux cell,把不在linux cell上的cpu继续运行arch_cpu_init,jailhouse驱动把当前cpu寄存器,中断,全局描述符,页目录表的页面指针保存并修改,拉高IA32_EFER MSR,设置为64位模式,重新设置堆栈,设置RIP寄存器,禁用中断,然后启动虚拟化,也就是发送VMLAUNCH指令。而guest这边则是在启动VMX后,拿到控制寄存器和段寄存器,栈顶指针,下一条指令指针,将IA32_EFER MSR恢复为linux的值,然后运行虚拟机程序,因为虚拟机程序是一个操作系统,所以又会覆盖所有的CPU相关寄存器。而虚拟机退出之后,jailhouse驱动会把原来保存的寄存器恢复。

具体实现还得从代码了解,暂时有个框架性质的概念吧。

理解jailhouse第二部分

第一部分知道了jailhouse的组成,启动方式和初始化CPU,第二部分说处理中断,创建cell,禁用虚拟机

处理中断

概念介绍

APIC(Advanced Programmable Interrupt Controller):高级可编程中断控制器分为IOAPIC和LAPIC(Local APIC),通常LAPIC用作处理器本身,而IOAPIC顾名思义用于IO设备 MMIO(Memory mapping I/O):内存映射I/O,它是PCI规范的一部分,I/O设备被放置在内存空间而不是I/O空间。从处理器的角度看,内存映射I/O后系统设备访问起来和内存一样 MSR(Model-specific registers):CPU的一组64 位寄存器,可以分别通过 RDMSR 和WRMSR 两条指令进行读和写的操作 IDT(Interrupt Descriptor Table):中断描述符表,它将每个异常或中断向量分别与它们的处理过程联系起来 ICR(Interrupt control register):中断控制寄存器

jailhouse虚拟LAPIC的大概方法

Jailhouse virtualizes the LAPIC only; the I/O APIC is simply mapped into the Linux cell jailhouse只虚拟化LAPIC,IO APIC直接映射到Linux侧也就是root cell When Jailhouse's apic_init() function initializes the LAPIC, it checks to see if x2APIC mode is enabled and sets up its apic_ops access methods appropriately. Internally, Jailhouse refers to all APIC registers by their MSR addresses. For xAPIC, these values are transparently converted to the corresponding MMIO offsets 当apic_init()初始化LAPIC时会检查x2APIC是否使能来填充对应的apic_ops,对应x2APIC,通过MSR地址引用所有APIC寄存器,对于xAPIC,会转换成MMIO的偏移量

xAPIC和x2APIC的具体实现方法

For xAPIC mode, a special LAPIC access page (apic_access_page[PAGE_SIZE] defined in vmx.c) is mapped into the guest's physical address space at XAPIC_BASE (0xfee00000); this happens in vmx_cell_init(). Later, in vmcs_setup(), LAPIC virtualization is enabled; this way, every time a guest tries to access the virtual LAPIC MMIO region, a trap back to the hypervisor (a "VM exit") occurs. No data is really read from the virtual LAPIC MMIO page or written to it, so CPUs can share this page. 对于xAPIC,之前了解是MMIO方式访问,所以在函数vmx_cell_init()中apic_access_page会被映射到虚拟机的XAPIC_BASE (0xfee00000),然后在vmcs_setup()中开启LAPIC的虚拟化工作,这样虚拟机每次访问LAPIC的MMIO区域时,都会陷入vm exit的异常,所以对于虚拟机来说,没有数据能够真正的对虚拟LAPIC的MMIO页面进行读写。因为都是陷入到jailhouse中去做的 当然所以CPU之间可以共享这个MMIO页,不然jailhouse怎么访问LAPIC的MMIO呢。 For x2APIC, instead, normal MSR bitmaps are used. By default, Jailhouse traps access to all LAPIC registers; however, if apic_init() detects that host LAPIC is in x2APIC mode, the bitmap is changed so that only ICR (interrupt control register) access is trapped. This happens when the master CPU executes vmx_init(). 对于x2APIC,通常是使用MSR的位图,对于xAPIC可以知道,jailhouse捕获了所有的LAPIC寄存器,也就是MMIO区域,而如果在apic_init()中检测到主机的LAPIC处于x2APIC模式,则在函数vmx_init()中会修改MSR位图,仅捕获ICR的访问 There is a special case when a guest tries to access a virtual x2APIC on a system where x2APIC is not enabled. In this case, the MSR bitmap remains unmodified. Jailhouse intercepts accesses to all LAPIC registers and passes incoming requests to xAPIC using the apic_ops access methods, effectively emulating an x2APIC on top of xAPIC. Since LAPIC registers are referred to in apic.c by their MSR addresses regardless the mode, this emulation has very little overhead. 如果虚拟机尝试在未使能x2APIC的机器上使用x2APIC来访问寄存器,这时候不会修改MSR的位图,又因为jailhouse拦截了所有的LAPIC寄存器,并通过apic_ops来访问xAPIC,从而有效模拟了虚拟机使用x2APIC模式来访问xAPIC。因为寄存器是引用的MSR寄存器地址,所以开销非常小 意思就是apic_init中能够有效的检查到主机的LAPIC模式,如果是xAPIC就直接MMIO,如果是x2APIC就修改位图,只访问ICR

jailhouse捕获ICR的原因

The main reason behind Jailhouse's trapping of ICR (and few other registers) access is isolation: a cell shouldn't be able to send an IPI to a CPU that is not in its own CPU set, and the ICR is what defines an interrupt's destination. To achieve this isolation, apic_cpu_init() is called by the master CPU during initialization; it stores the mapping from the apic_id to the associated cpu_id in an array called, appropriately, apic_to_cpu_id. When a CPU is assigned a logical LAPIC ID, Jailhouse ensures that it is equal to cpu_id. This way, when an IPI is sent to a physical or logical destination, the hypervisor is able to map it to cpu_id and check if the CPU is in the cell's set. jailhouse捕获ICR的目的是为了隔离,因为cell不应该发送一个不在自己cpuset上的处理器间中断,并且ICR定义了中断的目的地址。主要调用流程如下: 1. 初始化时调用apic_cpu_init,对于cpu_id的apic_id信息存放在数组apic_to_cpu_id中 2. 当jailhouse分配lapic id时,也就确定了cpu_id 3. 这样当处理器间中断发送到目的地时,管理程序能够直接找到对应的cpu_id,并且确定当前cpu是否属于cell

jailhouse只接受NMI中断

In vmcs_setup(), Jailhouse does not enable traps to the hypervisor on external interrupts and sets the exception bitmaps to all zeroes. This means that the only interrupt that results in a VM exit is a non-maskable interrupt (NMI); 在函数vmcs_setup()中,jailhouse不会捕获给虚拟机的外部中断异常,并且设置异常的位图为全零,也就是只有不可屏蔽中断NMI才能够导致虚拟机退出 所以对于虚拟机来说,cell是完全控制自己的中断资源的

NMI的作用

NMIs can only come from the hypervisor itself, which uses them to control guest CPUs (arch_suspend_cpu() in apic.c is an example). When an NMI occurs in a guest, that guest exits VM mode and Jailhouse re-throws the NMI in host mode. The CPU dispatches it through the host IDT and jumps to apic_nmi_handler(). It schedules another VM exit using a virtual machines extensions (VMX) feature known as a "preemption timer." vmcs_setup() sets this timer to zero, so, if it is enabled, a VM exit occurs immediately after VM entry. The reason behind this indirection is serialization: this way, NMIs (which are asynchronous by nature) are always delivered after entry into the guest system and cannot interfere with the host-to-guest transition. NMI中断只来源于jailhouse自身,目的是用来管理虚拟机CPU 当虚拟机内发生NMI时,虚拟机就退出VM模式,jailhouse会在主机模式下重新抛出NMI,然后跳到中断处理函数apic_nmi_handler。 它会通过抢占计时器的特性功能的(VMX扩展功能)调度另一个虚拟机退出 vmcs_setup()会设置这个抢占计时器的timer为0 如果这个功能打开,则VM会在VM entry后退出,为了管理的串行化 中断是本身是异步的,这样设计下,NMI先到虚拟机上,然后再由主机重新抛出NMI异常,如果主机设置timer为0再立马退出虚拟机。所以不会干扰主机和虚拟机之间的传输工作。

创建cell

用户创建cell

This process starts in the Linux cell with the JAILHOUSE_CELL_CREATE ioctl() command, leading to a jailhouse_cell_create() function call in the kernel This function copies the cell configuration and guest image from user space (the jailhouse user-space tool reads these from files and stores them in memory). Then, the cell's physical memory region is mapped and the guest image is moved to the target (physical) address specified by the user. 创建cell先发送JAILHOUSE_CELL_CREATE的ioctl,导致内核jailhouse_cell_create被调用,jailhouse_cell_create从用户空间复制cell的配置和二进制,然后映射cell的物理内存,将二进制移动到这个物理内存地址上

jailhouse内部先退出linux cell

After that, jailhouse_cell_create() calls the standard Linux cpu_down() function to offline each CPU assigned to the new cell; Finally, the loader issues a hypercall (JAILHOUSE_HC_CELL_CREATE) using the VMCALL instruction and passes a pointer to a struct jailhouse_cell_desc that describes the new cell. This causes a VM exit from the Linux cell to the hypervisor;vmx_handle_exit() dispatches the call to the cell_create() function defined in hypervisor/control.c 函数jailhouse_cell_create会使用cpu_down为cell预留cpu。 最后loader程序通过VMCALL发出hypercall (JAILHOUSE_HC_CELL_CREATE),并传递指针给新的cell的jailhouse_cell_desc结构体,这会导致虚拟机从linux cell退出到jailhouse,vmx_handle_exit()函数会调用cell_create()来创建

映射地址用来准备运行虚拟机

cell_create() suspends all CPUs assigned to the cell except the one executing the function (if it is in the cell's CPU set) to prevent races. 函数cell_create()会挂起在当前cell的cpu,以防止多核竞争 This is done in cell_suspend(), which indirectly signals an NMI (as described above) to each CPU and waits for the cpu_stopped flag to be set on the target's cpu_data. 函数cell_suspend()会给每个CPU发送NMI信号并等待cpu_stopped标志 Then, the cell configuration is mapped from the Linux cell to a per-CPU region above FOREIGN_MAPPING_BASE in the host's virtual address space (the loader copies this structure into kernel space). 然后linux cell配置信息会映射到每个CPU的FOREIGN_MAPPING_BASE地址(linux cell配置信息是通过loader拷贝到内核空间的) the new cell's I/O resources have their bits set in the Linux cell's io_bitmap,so accessing them will result in VM exit (and panic) 新的cell的io资源对应在linux cell的io_bitmap中,和中断APIC映射一样,虚拟机对IO资源的访问会导致vm退出 Finally, the new cell is added to the list of cells (which is a singly linked list having linux_cell as its head) and each CPU in the cell is reset using arch_cpu_reset(). 最后,新的cell会添加到以linux_cell为头的链表上,并且相应的cpu通过arch_cpu_reset()来reset

虚拟机从0x000ffff0处开始运行

On the next VM entry, the CPU will start executing code located at 0x000ffff0 in real mode. 下一次vm entry时,cpu会在实模式下执行0x000ffff0的代码 The address 0x000ffff0 is different from the normal x86 reset vector (0xfffffff0), and there is a reason: Jailhouse is not designed to run unmodified guests and has no BIOS emulation, so it can simplify the boot process and skip much of the work required for a real reset vector to work 我们知道x86默认指令从0xffffffff0运行,但是jailhouse是从0x000ffff0运行,所以jailhouse不是设计为运行一个未修改的虚拟机,因此jailhouse可以简化启动程序并跳过启动所需的大量工作

cell的初始化和销毁

cell的结构体

This structure contains the page table directories for use with the VMX and VT-d virtualization extensions, the io_bitmap for VMX, cpu_set, and other fields 结构体cell包含VMX和VT-d虚拟化扩展,VMX的io_bitmap,cpu_set和其他内容

初始化步骤

First, cell_init() copies a name for the cell from a descriptor and allocates cpu_data->cpu_set if needed (sets less than 64 CPUs in size are stored within struct cell in the small_cpu_set field). 先通过函数cell_init()从描述符(用户传递)中拷贝cell名字,并申请cpu_data->cpu_set(分配cpu核心) Then, arch_cell_create(), the same function that shrinks the Linux cell, calls vmx_cell_init() for the new cell; 然后arch_cell_create()会调用vmx_cell_init() it allocates VMX and VT-d resources (page directories and I/O bitmap), creates EPT mappings for the guest's physical address ranges (as per struct jailhouse_cell_desc), maps the LAPIC access page described above, and copies the I/O bitmap to struct cell from the cell descriptor (struct jailhouse_cell_desc) 函数vmx_cell_init()主要如下: 1. 分配VMX和VT-d资源(页目录和io_bitmap) 2. 为虚拟机物理地址创建EPT映射 EPT: Intel为实现内存虚拟化专门增加的硬件特性 3. 映射LAPIC访问页面 4. 拷贝io_bitmap到struct cell

对于内存,jailhouse是分离的,并不会干预的,对于IO,jailhouse是某个cell独占的

When the Linux cell is shrunk, jailhouse_cell_create() has already put the detached CPUs offline 当前linux cell收缩时,jailhouse_cell_create()已经detach了所有离线的cpu Linux never uses guest memory pages since they are taken from the region reserved at boot 因为启动时通过memmap=预留内存,所以linux不会使用给虚拟机的内存页 Jailhouse currently takes no action to detach I/O resources or devices in general. If they were attached to the Linux cell, they will remain attached, and it may cause a panic if a Linux driver tries to use an I/O port that has been moved to another cell jailhouse不会分离IO资源或设备,如果这些资源被某个linux cell使用,那就一直保持在那个cell中,如果其他的linux cell的驱动尝试使用这个linux cell的IO资源,则会导致panic 这里意思其实是jailhouse不分离IO,所以IO只能被一个cell独占

禁用jailhouse

用户关闭jailhouse

To disable Jailhouse, the user-space tool issues the JAILHOUSE_DISABLE ioctl() command, causing a call to jailhouse_disable() 用户空间调用JAILHOUSE_DISABLE禁用,内核调用jailhouse_disable()

销毁大致过程

This function calls leave_hypervisor() (found in main.c) on each CPU in the Linux cell and waits for these calls to complete. Then the hypervisor_mem mapping created in jailhouse_enable() is destroyed, the function brings up all offlined CPUs (which were presumably moved to other cells), and exits. From this point, Linux kernel will be running on bare metal again. 函数内核调用jailhouse_disable()在每个linux cell上的cpu调用leave_hypervisor(),然后原先在jailhouse_enable()申请的hypervisor_mem的映射随机销毁,然后jailhouse驱动退出,原来的linux cell现在直接运行,也就是bare metal(裸金属)

leave_hypervisor的详细过程

The leave_hypervisor() call issues a JAILHOUSE_HC_DISABLE hypercall, causing a VM exit at the given CPU, after which vmx_handle_exit() calls shutdown() 函数leave_hypervisor()会发出JAILHOUSE_HC_DISABLE的hypercall,导致指定CPU上的VM模式退出,然后vmx_handle_exit()调用shutdown() For the first Linux CPU that called it, this function iterates over CPUs in all cells other than Linux cell and calls arch_shutdown_cpu() for each of these CPUs 对于linux cell cpu调用shutdown(),他会遍历除了linux cell的所有cpu来调用arch_shutdown_cpu() A call to arch_shutdown_cpu() is equivalent to suspending the CPU, setting cpu_data->shutdown_cpu to true, then resuming the CPU 函数arch_shutdown_cpu()相当于挂起cpu,然后设置cpu_data->shutdown_cpu为true,然后唤醒这个cpu As described above, this sequence transfers the control to apic_handle_events(), but this time this function detects that the CPU is shutting down. It disables the LAPIC and effectively executes a VMXOFF; HLT sequence to disable VMX on the CPU and halt it. This way, the hypervisor is disabled on all CPUs outside of the Linux cell. 在apic_handle_events()中能够检测到CPU正在关闭,所以会禁用LAPIC并执行VMXOFF指令,HLT指令会关闭VMX并且halt住CPU,这样jailhouse相当于关闭了所有非linux cell的cpu。 HLT:暂停CPU执行,直到中断或复位信号被触发的指令

Linux的恢复工作

When shutdown() returns, VT-d is disabled and the hypervisor restores the Linux environment for the CPU. 当函数shutdown()返回,VT-d就已经被禁用了,jailhouse开始对这些cpu进行linux环境的恢复工作 First, the cpu_data->linux_* fields are copied from VMCS guest area 首先从虚拟机内复制cpu_data->linux_*数据出来 Then, arch_cpu_restore() is called to disable VMX (without halting the CPU this time) and restore various register values from cpu_data->linux_*. 然后arch_cpu_restore()关闭VMX(这时候没有调用HLT),并恢复cpu_data->linux_* Afterward, the general-purpose registers are popped from the hypervisor stack, the Linux stack is restored, the RAX register is zeroed and a RET instruction is issued 然后通用寄存器出栈,把原来Linux的堆栈恢复,RAX寄存器清零(返回值设置为0) After that, any offlined CPUs (likely halted by arch_shutdown_cpu()) are brought back to the active state 然后所有下线的CPU(是被HLT指令halt的cpu)都恢复到活动状态

源码doc文档

源码文档在如下

Documentation/articles/LJ-article-04-2015.txt Documentation/articles/LWN.net-article-01-2014.txt

jailhouse内存布局

An overall (physical) layout of Jailhouse memory region is shown at Fig. 1: +-----------+-------------------+-----------------+------------+--------------+------+ | Guest RAM | hypervisor_header | Hypervisor Code | cpu_data[] | system_confg | Heap | +-----------+-------------------+-----------------+------------+--------------+------+ |__start |__page_pool |hypervisor_header.size
mem_pool to manage free physical pages, and remap_pool for [host] virtual address space regions mem_pool管理物理内存,remap_pool管理虚拟内存 mem_pool begins at __page_pool, so paging_init() marks per_cpu area (cpu_data[]) and system_config as already allocated. mem_pool开始于__page_pool,所以cpu_data和system_config是已经申请的 remap_pool appears in VT-d and xAPIC code, and maps config_memory pages. It also includes the foreign mapping region (see below) remap_pool在vt-d和xAPIC代码中,映射了config_memory页,包括外来映射区域 As the last step, paging_init() copies the Linux mappings for the hypervisor memory (addresses between __start and hypervisor_header.size) into hv_page_table global variable. 函数paging_init()拷贝从__start到hypervisor_header.size的内存到hv_page_table上 page_map_create() function creates page mappings. It is mainly useful when initializing Extended Page Tables (EPT) for the cell. 函数page_map_create()为新的cell创建EPT

关于文章1:LWN.net-article-01-2014.txt 是lwn原始两篇文章的修改,其意思大致一致,有补充,但是我聚焦arm64,我看改动不是很大,所以不再逐字分析了
关于文章2:LJ-article-04-2015.txt 说明了jailhouse的应用方向,与现在流行的KVM不同的是,jailhouse能够在嵌入式领域发挥作用,例如实时,安全。对于行业方面,工业自动化,医疗,车载和高性能计算都能发挥作用。

想想现在机器人火起来了,实时越发重要,怪不得jailhouse又起来了。
相信工业自动化,机器人和车载发展起来后,医疗肯定也随即而来。