windows电脑自带防火墙,防火墙默认打开的,很多服务需要配置防火墙规则,但是对于自己开发而言,每次配开一个端口开一个端口,真的挺累的。自己弄又不是企业弄,别人也攻击不到自己。这里说明一下windows开端口,方便自己在windows上使用其他端口
按照下面方法配置即可
注意配过之后,小心别人攻击你。
开始编译jailhouse驱动和内核了。
apt install checkinstall build-essential qemu python3-mako export ARCH=arm64 export PATH=$PATH:/root/sdk/linux-x86/aarch64/gcc-arm-10.3-2021.07-x86_64-aarch64-none-linux-gnu/bin export CROSS_COMPILE=aarch64-none-linux-gnu-
平台为RK3588,内核版本如下:
VERSION = 5 PATCHLEVEL = 10 SUBLEVEL = 160
补丁如下
0001-jailhouse-config-jailhouse.ko-need-those-config.patch 0025-jailhouse-ivshmem-net-Improve-identification-of-reso.patch 0002-jailhouse-Add-simple-debug-console-via-the-hyperviso.patch 0026-jailhouse-ivshmem-net-Switch-to-reset-state-on-each-.patch 0003-jailhouse-arm-Export-__boot_cpu_mode-for-use-in-Jail.patch 0027-jailhouse-ivshmem-net-Add-ethtool-register-dump.patch 0004-jailhouse-mm-Re-export-ioremap_page_range.patch 0028-jailhouse-ivshmem-net-Fix-stuck-state-machine-during.patch 0005-jailhouse-arm-arm64-export-__hyp_stub_vectors.patch 0029-jailhouse-ivshmem-net-Switch-to-relative-descriptor-.patch 0006-jailhouse-uio-Enable-read-only-mappings.patch 0030-jailhouse-ivshmem-net-Switch-to-pci_alloc_irq_vector.patch 0007-jailhouse-ivshmem-Add-header-file.patch 0031-jailhouse-ivshmem-net-fill-in-and-check-used-descrip.patch 0008-jailhouse-uio-Add-driver-for-inter-VM-shared-memory-.patch 0032-jailhouse-ivshmem-net-slightly-improve-debug-output.patch 0009-Revert-jailhouse-ivshmem-Add-header-file.patch 0033-jailhouse-ivshmem-net-set-and-check-descriptor-flags.patch 0010-jailhouse-ivshmem-Add-header-file.patch 0034-jailhouse-ivshmem-net-add-MAC-changing-interface.patch 0011-jailhouse-WIP-virtio-Add-virtio-over-ivshmem-transpo.patch 0035-jailhouse-ivshmem-net-Silence-compiler-warning.patch 0012-jailhouse-virtio-ivshmem-check-peer_state-early.patch 0036-jailhouse-ivshmem-net-Fix-bogus-transition-to-RESET-.patch 0013-jailhouse-WIP-tools-Add-virtio-ivshmem-console-demo.patch 0037-jailhouse-ivshmem-net-Refactor-and-comment.patch 0014-jailhouse-WIP-tools-Add-virtio-ivshmem-block-demo.patch 0038-jailhouse-ivshmem-net-Switch-to-netdev_xmit_more-hel.patch 0015-jailhouse-mm-vmalloc-Export-__get_vm_area_caller.patch 0039-jailhouse-ivshmem-net-Adjust-to-reworked-version-of-.patch 0016-jailhouse-x86-Export-lapic_timer_period.patch 0040-jailhouse-ivshmem-net-Fix-and-rework-MTU-configurati.patch 0017-jailhouse-arm64-dts-marvell-armada-37xx-Set-pci-doma.patch 0041-jailhouse-ivshmem-net-Mark-vring_used_event-access-R.patch 0018-jailhouse-arm64-dts-marvell-armada-8030-mcbin-Set-pc.patch 0042-jailhouse-ivshmem-net-Simplify-interface-of-ivshm_ne.patch 0019-jailhouse-PCI-portdrv-Do-not-setup-up-IRQs-if-there-.patch 0043-jailhouse-ivshmem-net-Fix-TX-queue-locking-and-plug-.patch 0020-jailhouse-ivshmem-net-virtual-network-device-for-Jai.patch 0044-ivshmem-net-Synchronize-ivshm_net_state_change-again.patch 0021-jailhouse-ivshmem-net-Map-shmem-region-as-RAM.patch 0045-jailhouse-ivshmem-net-Fix-and-rework-carrier-managem.patch 0022-jailhouse-ivshmem-net-fix-race-in-state-machine.patch 0046-jailhouse-Revert-mm-vmalloc-Export-__get_vm_area_cal.patch 0023-jailhouse-ivshmem-net-Remove-unused-variable.patch 0047-jailhouse-config-open-jailhouse-feature.patch 0024-jailhouse-ivshmem-net-Enable-INTx.patch
合入补丁:
for i in jailhouse-patch/* ; do patch -p1 < $i ; done
主要修改文件如下:
modified: arch/arm64/kernel/hyp-stub.S modified: drivers/net/Kconfig modified: drivers/net/Makefile modified: drivers/pci/pcie/portdrv_core.c modified: drivers/uio/Kconfig modified: drivers/uio/Makefile modified: drivers/uio/uio.c modified: drivers/virt/Kconfig modified: drivers/virt/Makefile modified: drivers/virtio/Kconfig modified: drivers/virtio/Makefile modified: include/linux/pci_ids.h modified: include/linux/uio_driver.h modified: mm/ioremap.c modified: tools/virtio/Makefile drivers/net/ivshmem-net.c drivers/uio/uio_ivshmem.c drivers/virt/jailhouse_dbgcon.c drivers/virtio/virtio_ivshmem.c include/linux/ivshmem.h tools/virtio/virtio-ivshmem-block.c tools/virtio/virtio-ivshmem-console.c
编译内核
make ARCH=arm64 -j24
需要打开的配置
CONFIG_KALLSYMS_ALL=y CONFIG_KPROBES=y CONFIG_IVSHMEM_NET=y CONFIG_UIO_IVSHMEM=y CONFIG_JAILHOUSE_DBGCON=y CONFIG_VIRTIO_IVSHMEM=y
合入补丁
0001-driver-main-add-kprobe-for-kallsyms_lookup_name.patch
编译
make KDIR=../kernel/ DESTDIR=jailhouse-bin install
scp -r ./jailhouse-bin root@172.25.80.124:/ scp -r ./tools/jailhouse-bin root@172.25.80.124:/ scp -r ../kernel/jailhouse-bin/ root@172.25.80.124:/
运行
modprobe jailhouse
至此jailhouse移植完成了。
因为CPU分区越来越流行,这里为了学习jailhouse,先从其文档了解开始,这里记录了解的过程
https://lwn.net/Articles/574273/ https://lwn.net/Articles/574274/
Jailhouse是partitioning hypervisor
分区虚拟化,能够运行bare-metal applications or non-Linux OSes aside a standard Linux kernel
裸金属或无linuxOS的标准Linux
we expect to see in non-Linux cells are applications with highly demanding real-time, safety or security requirements
其宗旨是在non-linux侧应用高实时,高安全,高可靠程序
如果发展可观,个人感觉这两年这个愿景应该不远了,但也足足十年了
Jailhouse is different. First of all, it is a partitioning hypervisor that is more concerned with isolation than virtualization
jailhouse更多关心隔离
Each cell runs one guest and has a set of assigned resources (CPUs, memory regions, PCI devices) that it fully controls
对CPU,内存,PCI设备的完全控制
the Linux cell doesn't assert full control over hardware resources as dom0 does
when a new cell is created, the Linux cell cedes control over some of its CPU, device, and memory resources to that new cell
不会和其他虚拟机一样控制硬件,而是交给新的cell去完全控制
struct Jailhouse_system # 运行中的系统描述符 1. hypervisor_memory # jailhouse内的内存 2. config_memory # 硬件配置信息 3. system # 对linux侧的初始化配置描述符 struct jailhouse_cell_desc 成员如下: 1. cell名字 2. cpu_set # cpu个数 3. mem_regions # 内存区域 4. irq_lines # IRQ号 5. I/O bitmap # io 位图 5. pci_devices # PCI设备 jailhouse没有对人友好的配置文件,是通过objdump生成的原始二进制,未来可能用xml来实现 struct per_cpu 分配的CPU(cpu_data)包括如下: cpu_id,acpi_id,栈大小,cell结构体引用,register,cpu模式 struct jailhouse_header 描述整个虚拟机管理程序,包含如下: 虚拟机二进制起始位置,内存大小,页偏移,cpu数量。
需要对三个结构体详细分析(Jailhouse_system,jailhouse_cell_desc,jailhouse_header),后面补上
Jailhouse operates in a physically continuous memory region 需要连续物理内存 using the "memmap=" kernel command-line parameter 使用cmdline配置这块连续内存 future versions may use the contiguous memory allocator (CMA) 未来可能直接使用cma the loader linearly maps this memory into the kernel's virtual address space. Its offset from the memory region's base address is stored in the page_offset field of the header 物理地址和虚拟地址映射关系在page_offset上(struct jailhouse_header) For these tasks, the jailhouse user-space tool issues ioctl() commands to /dev/jailhouse 通过用户工具下发ioctl来初始化虚拟机,/dev/jailhouse是main.c编译的ko文件 the jailhouse tool is used to issue a JAILHOUSE_ENABLE ioctl() which causes a call to jailhouse_enable() 先发送JAILHOUSE_ENABLE,对应内核调用jailhouse_enable It loads the hypervisor code into the reserved memory region via a request_firmware() call 加载虚拟程序二进制到预留内存上(request_firmware调用) Then jailhouse_enable() maps Jailhouse's reserved memory region into kernel space using ioremap() and marks its pages as executable 将申请的内存地址ioremap到内核空间,并设置可执行权限 The hypervisor and a system configuration (struct jailhouse_system) copied from user space are laid out in the reserved region 结构体jailhouse_system是虚拟机运行的描述符,也会保存在这块区域 Finally, jailhouse_enable() calls enter_hypervisor() on each CPU, passing it the header, and waits until all these calls return 最后enter_hypervisor会在每个cpu上调用 This code locates the per_cpu region for a given cpu_id, stores the Linux stack pointer and cpu_id in it, sets the Jailhouse stack, and calls the architecture-independent entry() function, passing it a pointer to cpu_data. When this function returns, the Linux stack pointer is restored. 函数enter_hypervisor通过cpu_id来定位cpu,保存sp指针和cpu_id(虚拟机程序入栈),设置jailhouse的栈和函数入口,入参是cpu_data指针。函数返回后栈会恢复(虚拟机程序出栈) The entry() function is what actually enables Jailhouse 入口函数就是虚拟机二进制,之前不是说这块内存设置执行标志了嘛,这里就是运行它
我整个理解如下:
Linux启动--->初始化子系统--->ioctl下发--->加载虚拟程序--->入口函数设置--->运行虚拟程序 1. cmdline参数添加memmap项,内核启动时映射这块预留内存 2. 创建linux cell,使能虚拟化,将linux以访客模式迁移到cell内 3. 用户空间ioctl发出JAILHOUSE_ENABLE 内核调用Jailhouse_enable 4. 通过request_firmware将虚拟程序加载到预留内存,并标记可执行将虚拟机信息结构体填充调用入口函数 5. 对每个CPU保存现场,为入口函数设置堆栈和相关信息,在入口函数返回时恢复现场 6. 对linux cell设置分页,配置中断IDT,设置IO虚拟化,而对no root cell运行虚拟程序即可
1. CPU initialization is a lengthy process that begins in the cpu_init() function 2. if it is on the system CPU set, it is added to the Linux cell The rest of the procedure is architecture-specific and continues in arch_cpu_init() 3. Then Jailhouse swaps the IDT (interrupt handlers), the Global Descriptor Table (GDT) that contains segment descriptors, and CR3 (page directory pointer) register with its own values 4. Finally, arch_cpu_init() fills the cpu_data->apic_id field (see apic_cpu_init()) and configures Virtual Machine Extensions (VMX) for the CPU. Then it prepares the Virtual Machine Control Structure (VMCS) which is located in cpu_data, and enables VMX on the CPU. The VMCS region is configured in vmcs_setup() so that on every VM entry or exit: 5. When all CPUs are initialized, entry() calls arch_cpu_activate_vmm(). it sets the RAX register to zero, loads all the general-purpose registers left and issues a VMLAUNCH instruction to enter the guest 1. 首先使用cpu_init()初始化,注册Linux CPU。 2. 如果cpuset在系统上,则加入linux cell,否则继续运行arch_cpu_init(保存当前寄存器并在进入vm时恢复) 3. jailhouse将IDT,GDT和CR3这些值和自己对应的值切换 4. 最后arch_cpu_init()填充cpu_data->apic_id,并配置VMX,然后在CPU上启用VMX 5. 当cpu初始化完成后,entry() 调用 arch_cpu_activate_vmm(),返回值清零,加载所有通用寄存器并发出VMLAUNCH指令
这里关于虚拟机启动退出额外解释如下
主机: 1. jailhouse获取控制(CR0-15)和段寄存器值(CS/SS/DS/ES/FS/GS) 2. 将IA32_EFER MSR拉高,切换处理器到64位模式 3. SP指针设置到cpu_data->stack末尾 4. RIP(arm是PC)设置为vm_exit函数入口 5. RFLAGS(arm是CPSR)寄存器中清楚IF(禁用中断) 6. vm_exit调用vmx_handle_exit并在返回是使用VMRESUME指令恢复VM执行,这样每次VM退出都会禁用中断 7. 因为jailhouse没有系统调用,所以SYSENTER MSR被清除。 虚拟机: 1. 从cpu_data->linux_*获取控制(CR0-15)和段寄存器值(CS/SS/DS/ES/FS/GS) 2. RSP(栈顶寄存器)和RIP(PC)的值从arch_entry取得 3. 获取到了RIP也就等于运行了虚拟机程序 4. IA32_EFER MSR寄存器会设置为Linux上原来的值 5. 虚拟机程序运行后,会覆盖所有的CPU相关寄存器
对于CPU初始化,个人理解是根据cpuset来判断是否在linux cell,把不在linux cell上的cpu继续运行arch_cpu_init,jailhouse驱动把当前cpu寄存器,中断,全局描述符,页目录表的页面指针保存并修改,拉高IA32_EFER MSR,设置为64位模式,重新设置堆栈,设置RIP寄存器,禁用中断,然后启动虚拟化,也就是发送VMLAUNCH指令。而guest这边则是在启动VMX后,拿到控制寄存器和段寄存器,栈顶指针,下一条指令指针,将IA32_EFER MSR恢复为linux的值,然后运行虚拟机程序,因为虚拟机程序是一个操作系统,所以又会覆盖所有的CPU相关寄存器。而虚拟机退出之后,jailhouse驱动会把原来保存的寄存器恢复。
具体实现还得从代码了解,暂时有个框架性质的概念吧。
第一部分知道了jailhouse的组成,启动方式和初始化CPU,第二部分说处理中断,创建cell,禁用虚拟机
概念介绍
APIC(Advanced Programmable Interrupt Controller):高级可编程中断控制器分为IOAPIC和LAPIC(Local APIC),通常LAPIC用作处理器本身,而IOAPIC顾名思义用于IO设备 MMIO(Memory mapping I/O):内存映射I/O,它是PCI规范的一部分,I/O设备被放置在内存空间而不是I/O空间。从处理器的角度看,内存映射I/O后系统设备访问起来和内存一样 MSR(Model-specific registers):CPU的一组64 位寄存器,可以分别通过 RDMSR 和WRMSR 两条指令进行读和写的操作 IDT(Interrupt Descriptor Table):中断描述符表,它将每个异常或中断向量分别与它们的处理过程联系起来 ICR(Interrupt control register):中断控制寄存器
jailhouse虚拟LAPIC的大概方法
Jailhouse virtualizes the LAPIC only; the I/O APIC is simply mapped into the Linux cell jailhouse只虚拟化LAPIC,IO APIC直接映射到Linux侧也就是root cell When Jailhouse's apic_init() function initializes the LAPIC, it checks to see if x2APIC mode is enabled and sets up its apic_ops access methods appropriately. Internally, Jailhouse refers to all APIC registers by their MSR addresses. For xAPIC, these values are transparently converted to the corresponding MMIO offsets 当apic_init()初始化LAPIC时会检查x2APIC是否使能来填充对应的apic_ops,对应x2APIC,通过MSR地址引用所有APIC寄存器,对于xAPIC,会转换成MMIO的偏移量
xAPIC和x2APIC的具体实现方法
For xAPIC mode, a special LAPIC access page (apic_access_page[PAGE_SIZE] defined in vmx.c) is mapped into the guest's physical address space at XAPIC_BASE (0xfee00000); this happens in vmx_cell_init(). Later, in vmcs_setup(), LAPIC virtualization is enabled; this way, every time a guest tries to access the virtual LAPIC MMIO region, a trap back to the hypervisor (a "VM exit") occurs. No data is really read from the virtual LAPIC MMIO page or written to it, so CPUs can share this page. 对于xAPIC,之前了解是MMIO方式访问,所以在函数vmx_cell_init()中apic_access_page会被映射到虚拟机的XAPIC_BASE (0xfee00000),然后在vmcs_setup()中开启LAPIC的虚拟化工作,这样虚拟机每次访问LAPIC的MMIO区域时,都会陷入vm exit的异常,所以对于虚拟机来说,没有数据能够真正的对虚拟LAPIC的MMIO页面进行读写。因为都是陷入到jailhouse中去做的 当然所以CPU之间可以共享这个MMIO页,不然jailhouse怎么访问LAPIC的MMIO呢。 For x2APIC, instead, normal MSR bitmaps are used. By default, Jailhouse traps access to all LAPIC registers; however, if apic_init() detects that host LAPIC is in x2APIC mode, the bitmap is changed so that only ICR (interrupt control register) access is trapped. This happens when the master CPU executes vmx_init(). 对于x2APIC,通常是使用MSR的位图,对于xAPIC可以知道,jailhouse捕获了所有的LAPIC寄存器,也就是MMIO区域,而如果在apic_init()中检测到主机的LAPIC处于x2APIC模式,则在函数vmx_init()中会修改MSR位图,仅捕获ICR的访问 There is a special case when a guest tries to access a virtual x2APIC on a system where x2APIC is not enabled. In this case, the MSR bitmap remains unmodified. Jailhouse intercepts accesses to all LAPIC registers and passes incoming requests to xAPIC using the apic_ops access methods, effectively emulating an x2APIC on top of xAPIC. Since LAPIC registers are referred to in apic.c by their MSR addresses regardless the mode, this emulation has very little overhead. 如果虚拟机尝试在未使能x2APIC的机器上使用x2APIC来访问寄存器,这时候不会修改MSR的位图,又因为jailhouse拦截了所有的LAPIC寄存器,并通过apic_ops来访问xAPIC,从而有效模拟了虚拟机使用x2APIC模式来访问xAPIC。因为寄存器是引用的MSR寄存器地址,所以开销非常小 意思就是apic_init中能够有效的检查到主机的LAPIC模式,如果是xAPIC就直接MMIO,如果是x2APIC就修改位图,只访问ICR
jailhouse捕获ICR的原因
The main reason behind Jailhouse's trapping of ICR (and few other registers) access is isolation: a cell shouldn't be able to send an IPI to a CPU that is not in its own CPU set, and the ICR is what defines an interrupt's destination. To achieve this isolation, apic_cpu_init() is called by the master CPU during initialization; it stores the mapping from the apic_id to the associated cpu_id in an array called, appropriately, apic_to_cpu_id. When a CPU is assigned a logical LAPIC ID, Jailhouse ensures that it is equal to cpu_id. This way, when an IPI is sent to a physical or logical destination, the hypervisor is able to map it to cpu_id and check if the CPU is in the cell's set. jailhouse捕获ICR的目的是为了隔离,因为cell不应该发送一个不在自己cpuset上的处理器间中断,并且ICR定义了中断的目的地址。主要调用流程如下: 1. 初始化时调用apic_cpu_init,对于cpu_id的apic_id信息存放在数组apic_to_cpu_id中 2. 当jailhouse分配lapic id时,也就确定了cpu_id 3. 这样当处理器间中断发送到目的地时,管理程序能够直接找到对应的cpu_id,并且确定当前cpu是否属于cell
jailhouse只接受NMI中断
In vmcs_setup(), Jailhouse does not enable traps to the hypervisor on external interrupts and sets the exception bitmaps to all zeroes. This means that the only interrupt that results in a VM exit is a non-maskable interrupt (NMI); 在函数vmcs_setup()中,jailhouse不会捕获给虚拟机的外部中断异常,并且设置异常的位图为全零,也就是只有不可屏蔽中断NMI才能够导致虚拟机退出 所以对于虚拟机来说,cell是完全控制自己的中断资源的
NMI的作用
NMIs can only come from the hypervisor itself, which uses them to control guest CPUs (arch_suspend_cpu() in apic.c is an example). When an NMI occurs in a guest, that guest exits VM mode and Jailhouse re-throws the NMI in host mode. The CPU dispatches it through the host IDT and jumps to apic_nmi_handler(). It schedules another VM exit using a virtual machines extensions (VMX) feature known as a "preemption timer." vmcs_setup() sets this timer to zero, so, if it is enabled, a VM exit occurs immediately after VM entry. The reason behind this indirection is serialization: this way, NMIs (which are asynchronous by nature) are always delivered after entry into the guest system and cannot interfere with the host-to-guest transition. NMI中断只来源于jailhouse自身,目的是用来管理虚拟机CPU 当虚拟机内发生NMI时,虚拟机就退出VM模式,jailhouse会在主机模式下重新抛出NMI,然后跳到中断处理函数apic_nmi_handler。 它会通过抢占计时器的特性功能的(VMX扩展功能)调度另一个虚拟机退出 vmcs_setup()会设置这个抢占计时器的timer为0 如果这个功能打开,则VM会在VM entry后退出,为了管理的串行化 中断是本身是异步的,这样设计下,NMI先到虚拟机上,然后再由主机重新抛出NMI异常,如果主机设置timer为0再立马退出虚拟机。所以不会干扰主机和虚拟机之间的传输工作。
用户创建cell
This process starts in the Linux cell with the JAILHOUSE_CELL_CREATE ioctl() command, leading to a jailhouse_cell_create() function call in the kernel This function copies the cell configuration and guest image from user space (the jailhouse user-space tool reads these from files and stores them in memory). Then, the cell's physical memory region is mapped and the guest image is moved to the target (physical) address specified by the user. 创建cell先发送JAILHOUSE_CELL_CREATE的ioctl,导致内核jailhouse_cell_create被调用,jailhouse_cell_create从用户空间复制cell的配置和二进制,然后映射cell的物理内存,将二进制移动到这个物理内存地址上
jailhouse内部先退出linux cell
After that, jailhouse_cell_create() calls the standard Linux cpu_down() function to offline each CPU assigned to the new cell; Finally, the loader issues a hypercall (JAILHOUSE_HC_CELL_CREATE) using the VMCALL instruction and passes a pointer to a struct jailhouse_cell_desc that describes the new cell. This causes a VM exit from the Linux cell to the hypervisor;vmx_handle_exit() dispatches the call to the cell_create() function defined in hypervisor/control.c 函数jailhouse_cell_create会使用cpu_down为cell预留cpu。 最后loader程序通过VMCALL发出hypercall (JAILHOUSE_HC_CELL_CREATE),并传递指针给新的cell的jailhouse_cell_desc结构体,这会导致虚拟机从linux cell退出到jailhouse,vmx_handle_exit()函数会调用cell_create()来创建
映射地址用来准备运行虚拟机
cell_create() suspends all CPUs assigned to the cell except the one executing the function (if it is in the cell's CPU set) to prevent races. 函数cell_create()会挂起在当前cell的cpu,以防止多核竞争 This is done in cell_suspend(), which indirectly signals an NMI (as described above) to each CPU and waits for the cpu_stopped flag to be set on the target's cpu_data. 函数cell_suspend()会给每个CPU发送NMI信号并等待cpu_stopped标志 Then, the cell configuration is mapped from the Linux cell to a per-CPU region above FOREIGN_MAPPING_BASE in the host's virtual address space (the loader copies this structure into kernel space). 然后linux cell配置信息会映射到每个CPU的FOREIGN_MAPPING_BASE地址(linux cell配置信息是通过loader拷贝到内核空间的) the new cell's I/O resources have their bits set in the Linux cell's io_bitmap,so accessing them will result in VM exit (and panic) 新的cell的io资源对应在linux cell的io_bitmap中,和中断APIC映射一样,虚拟机对IO资源的访问会导致vm退出 Finally, the new cell is added to the list of cells (which is a singly linked list having linux_cell as its head) and each CPU in the cell is reset using arch_cpu_reset(). 最后,新的cell会添加到以linux_cell为头的链表上,并且相应的cpu通过arch_cpu_reset()来reset
虚拟机从0x000ffff0处开始运行
On the next VM entry, the CPU will start executing code located at 0x000ffff0 in real mode. 下一次vm entry时,cpu会在实模式下执行0x000ffff0的代码 The address 0x000ffff0 is different from the normal x86 reset vector (0xfffffff0), and there is a reason: Jailhouse is not designed to run unmodified guests and has no BIOS emulation, so it can simplify the boot process and skip much of the work required for a real reset vector to work 我们知道x86默认指令从0xffffffff0运行,但是jailhouse是从0x000ffff0运行,所以jailhouse不是设计为运行一个未修改的虚拟机,因此jailhouse可以简化启动程序并跳过启动所需的大量工作
cell的结构体
This structure contains the page table directories for use with the VMX and VT-d virtualization extensions, the io_bitmap for VMX, cpu_set, and other fields 结构体cell包含VMX和VT-d虚拟化扩展,VMX的io_bitmap,cpu_set和其他内容
初始化步骤
First, cell_init() copies a name for the cell from a descriptor and allocates cpu_data->cpu_set if needed (sets less than 64 CPUs in size are stored within struct cell in the small_cpu_set field). 先通过函数cell_init()从描述符(用户传递)中拷贝cell名字,并申请cpu_data->cpu_set(分配cpu核心) Then, arch_cell_create(), the same function that shrinks the Linux cell, calls vmx_cell_init() for the new cell; 然后arch_cell_create()会调用vmx_cell_init() it allocates VMX and VT-d resources (page directories and I/O bitmap), creates EPT mappings for the guest's physical address ranges (as per struct jailhouse_cell_desc), maps the LAPIC access page described above, and copies the I/O bitmap to struct cell from the cell descriptor (struct jailhouse_cell_desc) 函数vmx_cell_init()主要如下: 1. 分配VMX和VT-d资源(页目录和io_bitmap) 2. 为虚拟机物理地址创建EPT映射 EPT: Intel为实现内存虚拟化专门增加的硬件特性 3. 映射LAPIC访问页面 4. 拷贝io_bitmap到struct cell
对于内存,jailhouse是分离的,并不会干预的,对于IO,jailhouse是某个cell独占的
When the Linux cell is shrunk, jailhouse_cell_create() has already put the detached CPUs offline 当前linux cell收缩时,jailhouse_cell_create()已经detach了所有离线的cpu Linux never uses guest memory pages since they are taken from the region reserved at boot 因为启动时通过memmap=预留内存,所以linux不会使用给虚拟机的内存页 Jailhouse currently takes no action to detach I/O resources or devices in general. If they were attached to the Linux cell, they will remain attached, and it may cause a panic if a Linux driver tries to use an I/O port that has been moved to another cell jailhouse不会分离IO资源或设备,如果这些资源被某个linux cell使用,那就一直保持在那个cell中,如果其他的linux cell的驱动尝试使用这个linux cell的IO资源,则会导致panic 这里意思其实是jailhouse不分离IO,所以IO只能被一个cell独占
用户关闭jailhouse
To disable Jailhouse, the user-space tool issues the JAILHOUSE_DISABLE ioctl() command, causing a call to jailhouse_disable() 用户空间调用JAILHOUSE_DISABLE禁用,内核调用jailhouse_disable()
销毁大致过程
This function calls leave_hypervisor() (found in main.c) on each CPU in the Linux cell and waits for these calls to complete. Then the hypervisor_mem mapping created in jailhouse_enable() is destroyed, the function brings up all offlined CPUs (which were presumably moved to other cells), and exits. From this point, Linux kernel will be running on bare metal again. 函数内核调用jailhouse_disable()在每个linux cell上的cpu调用leave_hypervisor(),然后原先在jailhouse_enable()申请的hypervisor_mem的映射随机销毁,然后jailhouse驱动退出,原来的linux cell现在直接运行,也就是bare metal(裸金属)
leave_hypervisor的详细过程
The leave_hypervisor() call issues a JAILHOUSE_HC_DISABLE hypercall, causing a VM exit at the given CPU, after which vmx_handle_exit() calls shutdown() 函数leave_hypervisor()会发出JAILHOUSE_HC_DISABLE的hypercall,导致指定CPU上的VM模式退出,然后vmx_handle_exit()调用shutdown() For the first Linux CPU that called it, this function iterates over CPUs in all cells other than Linux cell and calls arch_shutdown_cpu() for each of these CPUs 对于linux cell cpu调用shutdown(),他会遍历除了linux cell的所有cpu来调用arch_shutdown_cpu() A call to arch_shutdown_cpu() is equivalent to suspending the CPU, setting cpu_data->shutdown_cpu to true, then resuming the CPU 函数arch_shutdown_cpu()相当于挂起cpu,然后设置cpu_data->shutdown_cpu为true,然后唤醒这个cpu As described above, this sequence transfers the control to apic_handle_events(), but this time this function detects that the CPU is shutting down. It disables the LAPIC and effectively executes a VMXOFF; HLT sequence to disable VMX on the CPU and halt it. This way, the hypervisor is disabled on all CPUs outside of the Linux cell. 在apic_handle_events()中能够检测到CPU正在关闭,所以会禁用LAPIC并执行VMXOFF指令,HLT指令会关闭VMX并且halt住CPU,这样jailhouse相当于关闭了所有非linux cell的cpu。 HLT:暂停CPU执行,直到中断或复位信号被触发的指令
Linux的恢复工作
When shutdown() returns, VT-d is disabled and the hypervisor restores the Linux environment for the CPU. 当函数shutdown()返回,VT-d就已经被禁用了,jailhouse开始对这些cpu进行linux环境的恢复工作 First, the cpu_data->linux_* fields are copied from VMCS guest area 首先从虚拟机内复制cpu_data->linux_*数据出来 Then, arch_cpu_restore() is called to disable VMX (without halting the CPU this time) and restore various register values from cpu_data->linux_*. 然后arch_cpu_restore()关闭VMX(这时候没有调用HLT),并恢复cpu_data->linux_* Afterward, the general-purpose registers are popped from the hypervisor stack, the Linux stack is restored, the RAX register is zeroed and a RET instruction is issued 然后通用寄存器出栈,把原来Linux的堆栈恢复,RAX寄存器清零(返回值设置为0) After that, any offlined CPUs (likely halted by arch_shutdown_cpu()) are brought back to the active state 然后所有下线的CPU(是被HLT指令halt的cpu)都恢复到活动状态
源码文档在如下
Documentation/articles/LJ-article-04-2015.txt Documentation/articles/LWN.net-article-01-2014.txt
jailhouse内存布局
An overall (physical) layout of Jailhouse memory region is shown at Fig. 1: +-----------+-------------------+-----------------+------------+--------------+------+ | Guest RAM | hypervisor_header | Hypervisor Code | cpu_data[] | system_confg | Heap | +-----------+-------------------+-----------------+------------+--------------+------+ |__start |__page_pool |hypervisor_header.size
mem_pool to manage free physical pages, and remap_pool for [host] virtual address space regions mem_pool管理物理内存,remap_pool管理虚拟内存 mem_pool begins at __page_pool, so paging_init() marks per_cpu area (cpu_data[]) and system_config as already allocated. mem_pool开始于__page_pool,所以cpu_data和system_config是已经申请的 remap_pool appears in VT-d and xAPIC code, and maps config_memory pages. It also includes the foreign mapping region (see below) remap_pool在vt-d和xAPIC代码中,映射了config_memory页,包括外来映射区域 As the last step, paging_init() copies the Linux mappings for the hypervisor memory (addresses between __start and hypervisor_header.size) into hv_page_table global variable. 函数paging_init()拷贝从__start到hypervisor_header.size的内存到hv_page_table上 page_map_create() function creates page mappings. It is mainly useful when initializing Extended Page Tables (EPT) for the cell. 函数page_map_create()为新的cell创建EPT
关于文章1:LWN.net-article-01-2014.txt 是lwn原始两篇文章的修改,其意思大致一致,有补充,但是我聚焦arm64,我看改动不是很大,所以不再逐字分析了
关于文章2:LJ-article-04-2015.txt 说明了jailhouse的应用方向,与现在流行的KVM不同的是,jailhouse能够在嵌入式领域发挥作用,例如实时,安全。对于行业方面,工业自动化,医疗,车载和高性能计算都能发挥作用。
想想现在机器人火起来了,实时越发重要,怪不得jailhouse又起来了。
相信工业自动化,机器人和车载发展起来后,医疗肯定也随即而来。
公司的虚拟化方案是用rust重构的,为了搭建环境,又开始起来的rust。哎,太菜了太菜了。以前学的东西脑子里全丢了。这里补一个rust环境配置。切记,rust不要想用ubuntu源里面的,直接根据文档使用最新的版本
apt install curl rsync gdb-multiarch openocd cargo doxygen qemu-user-static \ build-essential libncurses5-dev libssl-dev libgtk2.0-dev libglib2.0-dev
vim ~/.bash_profile export RUSTUP_DIST_SERVER=https://mirrors.ustc.edu.cn/rust-static export RUSTUP_UPDATE_ROOT=https://mirrors.ustc.edu.cn/rust-static/rustup
curl -L https://static.rust-lang.org/rustup.sh -O sh rustup.sh
Current installation options: default host triple: x86_64-unknown-linux-gnu default toolchain: stable (default) profile: default modify PATH variable: yes 1) Proceed with installation (default) 2) Customize installation 3) Cancel installation
选择自定义,配置最后如下:
default host triple: x86_64-unknown-linux-gnu default toolchain: nightly profile: complete modify PATH variable: yes
最后安装好了获得如下提示
nightly-x86_64-unknown-linux-gnu installed - rustc 1.77.0-nightly (30dfb9e04 2024-01-14)
source ~/.bashrc rustc --version rustc 1.59.0 rustup --version rustup 1.26.0 (5af9b9484 2023-04-05) info: This is the version for the rustup toolchain manager, not the rustc compiler. info: The currently active `rustc` version is `rustc 1.77.0-nightly (30dfb9e04 2024-01-14)`
这样,rust就安装好了
rust默认是安装stable版本,但是有nightly和beta版本可选,其用到了"火车发布模型",rust特地规定了stable无法使用打上feature flags的功能,而我们的仓库就是这种,所以我只好选择nightly。简单理解如下:
关于版本的详细了解如下链接:
https://rustwiki.org/zh-CN/book/appendix-07-nightly-rust.html
否则会出现如下错误:
`#![feature]` may not be used on the stable release channel
如果需要切换stable和nightly或beta,如下
rustup default stable/nightly/beta
rustc --print target-list
rustup target add aarch64-unknown-linux-gnu
下载编译工具链
wget https://developer.arm.com/-/media/Files/downloads/gnu-a/9.2-2019.12/binrel/gcc-arm-9.2-2019.12-x86_64-aarch64-none-linux-gnu.tar.xz xz -d gcc-arm-9.2-2019.12-x86_64-aarch64-none-linux-gnu.tar.xz tar xvf gcc-arm-9.2-2019.12-x86_64-aarch64-none-linux-gnu.tar
配置工具链
vim ~/.cargo/config [source.crates-io] registry = "https://github.com/rust-lang/crates.io-index" replace-with = 'ustc' [source.ustc] registry = "git://mirrors.ustc.edu.cn/crates.io-index" [build] target = "aarch64-unknown-linux-gnu" [target.aarch64-unknown-linux-gnu] linker = "aarch64-none-linux-gnu-gcc"
cargo new hello --bin cd hello && cargo build
如果想要编译x86的,如下即可
cargo build --target=x86_64-unknown-linux-gnu
Rust文档还是很全的,如下:
https://www.kancloud.cn/thinkphp/rust/36040
https://kaisery.github.io/trpl-zh-cn/ch01-01-installation.html
https://forge.rust-lang.org/index.html
https://doc.rust-lang.org/book/second-edition/foreword.html
debian类系统通过apt和dpkg命令来安装系统软件包,但是通常会出现依赖问题,其主要原因来是dpkg这类命令太过细节和直接,对整体系统依赖关系的考量不够多,一旦出现问题只能依靠其他命令例如apt,aptitude来解决。这里介绍一下如何通过apt和aptitude来解决依赖问题。
为了制造出依赖的问题,通过dpkg可以强制卸载某个存在依赖关系的包。
dpkg -P --force-all xserver-xorg-core xserver-xorg-core是一个与系统显示强相关的软件包。它和系统许多应用强相关。在上述命令中,会强制卸载掉这个包。此时如果通过apt安装其他包的时候,就会因为依赖问题导致无法正常使用apt,如下:
root@kylin:~# apt install 正在读取软件包列表... 完成 正在分析软件包的依赖关系树 正在读取状态信息... 完成 您也许需要运行“apt --fix-broken install”来修正上面的错误。 下列软件包有未满足的依赖关系: xorgxrdp : 依赖: xorg-input-abi-24 依赖: xorg-video-abi-24 依赖: xserver-xorg-core (>= 2:1.18.99.901) 但是它还没有被安装 xserver-xorg : 依赖: xserver-xorg-core (>= 2:1.17.2-2) 但是它还没有被安装 xserver-xorg-input-libinput : 依赖: xorg-input-abi-24 依赖: xserver-xorg-core (>= 2:1.18.99.901) 但是它还没有被安装 E: 有未能满足的依赖关系。请尝试不指明软件包的名字来运行“apt --fix-broken install”(也可以指定一个解决办法)。
当依赖问题出现时,最直接的办法就是按照提示修正错误。
apt --fix-broken install 通常是能够正常修复错误的(在系统源和网络正常的情况下),如下
root@kylin:~# apt --fix-broken install 正在读取软件包列表... 完成 正在分析软件包的依赖关系树 正在读取状态信息... 完成 正在修复依赖关系... 完成 将会同时安装下列软件: xserver-xorg-core 建议安装: xfonts-100dpi | xfonts-75dpi xfonts-scalable 下列【新】软件包将被安装: xserver-xorg-core 升级了 0 个软件包,新安装了 1 个软件包,要卸载 0 个软件包,有 546 个软件包未被升级。 需要下载 1,343 kB 的归档。 解压缩后会消耗 3,946 kB 的额外空间。 您希望继续执行吗? [Y/n]
如果需要追溯出来依赖产生原因,可以找找谁修改了dpkg列表
当某个包被破坏,其实意味着某人在不经意之间修改了dpkg的列表。查看dpkg的处理日志可以初步判断原因
root@kylin:~# grep "remove" /var/log/dpkg.log 2024-01-15 11:10:53 remove xserver-xorg-core:arm64 3:1.20.8-2rk7 <无>
找到原因是 2024-01-15 11:10:53 卸载了xserver-xorg-core 软件包导致依赖问题 那么接下来是找到3:1.20.4-1kord1版本的xorg即可
先从源里搜索
apt-cache show xserver-xorg-core | grep 3:1.20.8-2rk7
Version: 3:1.20.8-2rk7
这时候手动下载安装即可
apt download xserver-xorg-core=3:1.20.8-2rk7 dpkg -i xserver-xorg-core*.deb
如果依赖问题一旦出现,可以第一步试试-f命令能否修复
apt install -f
如果不满足预期,可以使用aptitude工具,也可以修复
aptitude install
安装 下列软件包: 1) xserver-xorg-core [3:1.20.8-2rk7 (v101)] 是否接受该解决方案?[Y/n/q/?]
选择适合自己的方案即可
如果尝试上述方法无法解决,首先应该警觉起来,因为需要安装的包与当前系统和当前源存在不兼容的问题。你应该在心里做好即使安装好了,软件包仍会存在问题的最坏预期。如下分享一下判断手段
这种情况下,说明需要安装的包的依赖和系统要求的依赖存在版本之间的差异。 aptitude install pkgname 会找到最好的方案将系统的包进行升级和降级,从而适配需要安装的软件包。
如果aptitude无法解决这个问题,再去尝试修改需要安装包的依赖信息
dpkg -x pkgname.deb test (将pkgname.deb解包到test目录) cd test dpkg -e ../pkgname.deb (将pkgname.deb的control信息解开)
这里需要修改依赖描述文件的信息,使其符合系统当前要求
(注意:此操作仅修改依赖版本关系,保证安装包正常安装,不保证安装后运行正常)
vim DEBIAN/control
根据实际情况,修改包的依赖关系(降低包版本依赖要求,删除依赖信息,增加依赖信息)
Depends: xserver-common (>= 3:1.20.8-2rk7), keyboard-configuration, udev (>= 149), libegl1, libaudit1 (>= 1:2.2.1), libbsd0 (>= 0.7.0), libc6 (>= 2.29), libdbus-1-3 (>= 1.9.14), libdrm2 (>= 2.4.66), libepoxy0 (>= 1.5.4), libgcrypt20 (>= 1.8.0), libgl1, libpciaccess0 (>= 0.12.902), libpixman-1-0 (>= 0.30.0), librga2, libselinux1 (>= 2.0.82), libsystemd0, libudev1 (>= 183), libunwind8, libxau6, libxdmcp6, libxfont2 (>= 1:2.0.1), libxshmfence1
然后重新打包
cd .. && dpkg -b test/ pkgname.deb (重新打包)
在aptitude install无法正常解决依赖问题时,如果仅仅为了包的安装成功,通过强制命令可以将其安装成功,或许能够将其依赖问题解决。
(此操作可能损坏系统,执行操作的人需要为这样的操作承担后果)
dpkg -i --force-all pkgname.deb
安装之后,直接验证安装包内的程序是否正常使用。
如果安装之后,应用软件能够正常使用,可以修改包的安装信息,以匹配系统。如下所示:
vim /var/lib/dpkg/status (找到对应包的说明,修改成系统匹配的依赖状态)
vim /usr/share/doc/pkgname/changelog.gz(适当的修改changlog内容,用于明示此包是个人擅自修改过的)
如果安装成功后,应用软件不能正常使用或出现问题,请尝试从ld,patchelf等其他方式处理,通过安装包已无法正常处理。
下面将恢复软件包现状
dpkg -P pkgname.deb apt download pkgname.deb dpkg -i pkgname.deb apt install -f
apt-cache depends xserver-xorg-core # 查找安装包的正向依赖 apt-cache rdepends xserver-xorg-core # 查找安装包的反向依赖 apt-cache show xserver-xorg-core # 查看包的详细信息 apt-cache policy xserver-xorg-core # 查看包的优先级,以及哪个源提供的包 apt-cache showpkg xserver-xorg-core # 以更详细的方式查看包的依赖关系 echo "pkgname hold" | sudo dpkg --set-selections # 锁定某个包,让其他人无法升级和卸载 dpkg --get-selections | grep hold # 查看已经锁定的软件包 dpkg -S /usr/lib/xorg/Xorg # 查看文件输入哪个包 dpkg-deb -c pkgname.deb # 查看deb文件的内容 dpkg -i --force-overwrite B.deb # 强制覆盖某个安装包 apt install pkgname --download-only # 只想下载不想安装
#!/bin/bash dpkg --get-selections | awk -F ' ' '{print $1}' | while read name do echo $name echo "$name hold" | dpkg --set-selections done
vim /etc/apt/preferences.d/my.perf (优先级越高的源会首先安装) Package: pkgname Pin: origin "源地址" Pin-Priority: 1600
mv /var/lib/dpkg/info/* /tmp/ dpkg --configure -a apt update apt-get install --reinstall linux-base apt install -f
for package in $(apt-get install -f 2>&1 |\ grep "warning: files list file for package '" |\ grep -Po "[^'\n ]+'" | grep -Po "[^']+"); do apt-get install --reinstall "$package"; done
其实如果大家在安装包过程中存在问题,尝试apt和aptitude后仍无法解决,其实已经代表这个依赖很可能无法解决。换个思路,而不是一定要安装上这个包,或许能够解决你的问题。