编辑
2024-02-28
工作知识
0

PCIE的拓扑结构

pcie主要有rc,ep,和switch以及pcie to pci/pci-x bridge组成,如下图所示

image.png

root complex

rc设备通常直接对接cpu和memory,通常也是pcie设备的根设备,也就是bus域的根设备

endpoints

ep设备是终端设备,也就是具体实现功能的设备,它有如下特殊的ep设备:

  • 传统EP设备
  • PCIE EP设备
  • 复合rc和ep功能的pcie设备
  1. 对于传统ep设备,主要兼容早期的pci-x等设备,所以可能生成io requests
  2. 对于pcie ep设备禁止生成io requests
  3. 对于rc和ep复合设备其大部分仍是pcie ep设备的要求,它可以灵活的作为rc设备,但是这种设备不能出现在rc设备的拓扑结构中,也不能出现在switch设备的拓扑结构中,只能单一存在。

switch

switch设备可以提供多个虚拟的pci-pci的bridge设备,如下图所示

image.png

所以switch具有如下两个特性

  • switch设备能够在内部bridge产生竞争的时候产生仲裁
  • switch设备只能是bridge,内部不能出现ep设备

对于switch包含设备

  • Root Complex Event Collector(RCEC)设备
  • PCI Express to PCI/PCI-X Bridge
  1. RCEC设备是特殊的switch设备(0x08,0x07),它对RC设备的错误信息收集和PME消息,
  2. PCIE to PCI/PCI-X作为bridge来扩展pcie设备
编辑
2024-02-27
工作知识
0

pcie读取配置空间bar空间的size

pcie开始通过配置空间来读取pcie基本信息,改信息通过上层lspci能够正常解析,客户反馈自己的xilinx设备bar0只有512k,需要我们确定一下,遂确定如下

type0和type1

For device Functions with Type 0 headers (all types of Endpoints) For device Functions with Type 1 headers (Root Ports, Switches and Bridges)
  • 所有的ep设备都是type0 headers
  • 所有的rc设备,switch,bridge都是type1 headers

type0/1 headers的标志位

如何查看设备是type0还是type1 headers,可以通过配置空间的0x1e的值来确定,如下 image.png

For Functions that implement a Type 0 Configuration Space header the encoding 000 0000b must be used. For Functions that implement a Type 1 Configuration Space header the encoding 000 0001b must be used

也就是说,对于0x0e的值,如果是0x1则是type1 headers型设备,通常是rc或bridges,如果是0x0则是type0 headers型设备,通常是ep。

lspci读取type0/1

root@kylin:~# lspci -x 00:00.0 PCI bridge: Fuzhou Rockchip Electronics Co., Ltd Device 3588 (rev 01) 00: 87 1d 88 35 07 05 10 00 01 00 04 06 00 00 01 00 10: 00 00 00 00 00 00 00 00 00 01 ff 00 f0 00 00 00 20: 00 f0 00 f0 f1 ff 01 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 40 00 00 00 00 00 00 00 70 01 02 00 01:00.0 Memory controller: Xilinx Corporation Device 7014 00: ee 10 14 70 00 00 10 00 00 00 80 05 00 00 00 00 10: 00 00 f8 ff 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 ee 10 07 00 30: 00 00 00 00 80 00 00 00 00 00 00 00 00 01 00 00

lspci -x可以读到0xe的值,这里可以看到00:00.0 是type1 headers,而01:00.0则是type0 headers

type0/1的配置空间

通常的配置空间布局如下图所示

image.png type0的配置空间布局如下图所示

image.png type1的配置空间布局如下图所示

image.png

这里可以知道,无论哪种type设备,bar0的寄存器都在0x10处,这里聚焦0x10处的信息

bar0的地址

对于0x10的值,默认是linux设置的映射地址,用作pcie域的读写操作。如下可以确定

root@kylin:~# lspci -x -s 01:00.0 01:00.0 Memory controller: Xilinx Corporation Device 7014 00: ee 10 14 70 00 00 10 00 00 00 80 05 00 00 00 00 10: 00 00 00 f0 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 ee 10 07 00 30: 00 00 00 00 80 00 00 00 00 00 00 00 ff 01 00 00

这里可以知道0x10的值为f0000000,这个0xf0000000是linux内通过设备树映射的可访问的地址。

bar0的大小

通过lspci读取

对于bar0的大小lspci已经读取出来了是512k,如下

root@kylin:~# lspci -s 01:00.0 -v 01:00.0 Memory controller: Xilinx Corporation Device 7014 Subsystem: Xilinx Corporation Device 0007 Flags: fast devsel, IRQ 255 Memory at f0200000 (32-bit, non-prefetchable) [disabled] [size=512K] Capabilities: [80] Power Management version 3 Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [c0] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting lspci: Unable to load libkmod resources: error -2

通过setpci读取

setpci --dumpregs setpci -s 01:00.0 0x10.L=0xffffffff root@kylin:~# lspci -s 01:00.0 -x 01:00.0 Memory controller: Xilinx Corporation Device 7014 00: ee 10 14 70 00 00 10 00 00 00 80 05 00 00 00 00 10: 00 00 f8 ff 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 ee 10 07 00 30: 00 00 00 00 80 00 00 00 00 00 00 00 ff 01 00 00

这时候读出来是0xfff80000,取最低有效位0x80000,则正好是512k

通过io命令获取

io -4 0xf0000000 -r -l 64 io -4 -w 0xf0000010 0xffffffff io -4 0xf0000010 -r

这里读出来仍是0xfff80000,换算也是512k

内核代码

内核主要代码实现在如下两个函数

  • __pci_read_base
  • pci_size

主要函数如下:

__pci_read_base pci_read_config_dword(dev, pos, &l); pci_write_config_dword(dev, pos, l | mask); pci_read_config_dword(dev, pos, &sz); pci_size u64 size = mask & maxbase; size = size & ~(size-1);
编辑
2024-01-25
工作知识
0

hvc特权指令

在看jailhouse代码的时候,看到了hvc特权指令,这里详细研究一下

c
static inline __jh_arg jailhouse_call_arg2(__jh_arg num, __jh_arg arg1, __jh_arg arg2) { register __jh_arg num_result asm(JAILHOUSE_CALL_NUM_RESULT) = num; register __jh_arg __arg1 asm(JAILHOUSE_CALL_ARG1) = arg1; register __jh_arg __arg2 asm(JAILHOUSE_CALL_ARG2) = arg2; asm volatile( JAILHOUSE_CALL_INS : "+r" (num_result), "+r" (__arg1), "+r" (__arg2) : : "memory", JAILHOUSE_CALL_CLOBBERED); return num_result; } #define JAILHOUSE_CALL_INS "hvc #0x4a48" #define JAILHOUSE_CALL_NUM_RESULT "x0" #define JAILHOUSE_CALL_ARG1 "x1" #define JAILHOUSE_CALL_ARG2 "x2" #define JAILHOUSE_CALL_CLOBBERED "x3"

解析内联汇编

asm asm-qualifiers ( AssemblerTemplate : OutputOperands : InputOperands : Clobbers : GotoLabels)

对于上文中的内联汇编,宏扩展后伪代码如下

asm volatile( hvc #0x4a48 : "+r" (x0), "+r" (x1), "+r" (x2) : : "memory", x3);

操作数参数':'

冒号作为操作数参数的分隔符

Extended asm syntax uses colons (‘:’) to delimit the operand parameters after the assembler template:

限定符volatile

禁用gcc的优化(move code out of loops

GCC’s optimizers sometimes discard asm statements if they determine there is no need for the output variables. Also, the optimizers may move code out of loops if they believe that the code will always return the same result (i.e. none of its input values change between calls). Using the volatile qualifier disables these optimizations. asm statements that have no output operands and asm goto statements, are implicitly volatile.

约束'+r/memory'

对于'+',代表操作数是可读可写的

Means that this operand is both read and written by the instruction.

对于'r',代表寄存器是一个通用寄存器

A register operand is allowed provided that it is in a general register.

对于'memory',代表告诉编译器这个内存可能被读写,不要被优化。等于内存屏障

The "memory" clobber tells the compiler that the assembly code performs memory reads or writes to items other than those listed in the input and output operands (for example, accessing the memory pointed to by one of the input parameters). To ensure memory contains correct values, GCC may need to flush specific register values to memory before executing the asm. Further, the compiler does not assume that any values read from memory before an asm remain unchanged after that asm; it reloads them as needed. Using the "memory" clobber effectively forms a read/write memory barrier for the compiler.

通用寄存器'x0/x1/x2'

x1和x2是通用寄存器用于传递参数,x0作为函数的返回值。

hvc

hvc指令会进入hyp模式,cpsr的值会保存到hyp模式的spsr中并执行hvc向量(指向hypervisor call的异常处理程序的入口地址) 但是imm立即数会被处理器忽略,但是在入口函数可以检索到imm的值,从而确定是什么服务

HVC #imm imm is an expression evaluating to an integer in the range 0-65535. In a processor that implements the Virtualization Extensions, the HVC instruction causes a Hypervisor Call exception. This means that the processor enters Hyp mode, the CPSR value is saved to the Hyp mode SPSR, and execution branches to the HVC vector. imm is ignored by the processor. However, it can be retrieved by the exception handler to determine what service is being requested.

总体解释

Instruction: hvc #0x4a48 Hypercall code: x0 1. argument: x1 2. argument: x2 Return code: x0

这里使用虚拟化指令hvc调用立即数#0x4a48,立即数0x4a48只是用作指明是什么服务(jailhouse),参数为x1,x2返回值为x0.如x1,x2不存在则缺省.这里x0先是hypervisor调用的code,然后作为返回值提供返回出去,x1,x2是传入参数,用作根据根据x0的code入口函数的传参,最后的'memory, JAILHOUSE_CALL_CLOBBERED'用作缺省,如入口函数不需要多个参数,这里声明x1,x2,x3带有memory作为暂存寄存器.

While the compiler is aware of changes to entries listed in the output operands, the inline asm code may modify more than just the outputs. For example, calculations may require additional registers, or the processor may overwrite a register as a side effect of a particular assembler instruction. In order to inform the compiler of these changes, list them in the clobber list. Clobber list items are either register names or the special clobbers (listed below). Each clobber list item is a string constant enclosed in double quotes and separated by commas

上面意思是计算可能需要额外寄存器,或者处理器对特殊汇编指令可能会覆写这些寄存器,为了让编译器知道这种情况,可以把这些寄存器放在clobber列表作为暂存寄存器。

相关调用

#define JAILHOUSE_HC_DISABLE 0 #define JAILHOUSE_HC_CELL_CREATE 1 #define JAILHOUSE_HC_CELL_START 2 #define JAILHOUSE_HC_CELL_SET_LOADABLE 3 #define JAILHOUSE_HC_CELL_DESTROY 4 #define JAILHOUSE_HC_HYPERVISOR_GET_INFO 5 #define JAILHOUSE_HC_CELL_GET_STATE 6 #define JAILHOUSE_HC_CPU_GET_INFO 7 #define JAILHOUSE_HC_DEBUG_CONSOLE_PUTC 8 /* Hypervisor information type */ #define JAILHOUSE_INFO_MEM_POOL_SIZE 0 #define JAILHOUSE_INFO_MEM_POOL_USED 1 #define JAILHOUSE_INFO_REMAP_POOL_SIZE 2 #define JAILHOUSE_INFO_REMAP_POOL_USED 3 #define JAILHOUSE_INFO_NUM_CELLS 4 /* Hypervisor information type */ #define JAILHOUSE_CPU_INFO_STATE 0 #define JAILHOUSE_CPU_INFO_STAT_BASE 1000 /* CPU state */ #define JAILHOUSE_CPU_RUNNING 0 #define JAILHOUSE_CPU_FAILED 2 /* terminal state */ /* CPU statistics */ #define JAILHOUSE_CPU_STAT_VMEXITS_TOTAL 0 #define JAILHOUSE_CPU_STAT_VMEXITS_MMIO 1 #define JAILHOUSE_CPU_STAT_VMEXITS_MANAGEMENT 2 #define JAILHOUSE_CPU_STAT_VMEXITS_HYPERCALL 3 #define JAILHOUSE_GENERIC_CPU_STATS 4

上面调用作为x0传入hypervisor call,从而发送hyc #0x4a48来管理虚拟机。 举个例子如下

C
err = jailhouse_call(JAILHOUSE_HC_DISABLE); static inline __jh_arg jailhouse_call(__jh_arg num) { register __jh_arg num_result asm(JAILHOUSE_CALL_NUM_RESULT) = num; asm volatile( JAILHOUSE_CALL_INS : "+r" (num_result) : : "memory", JAILHOUSE_CALL_ARG1, JAILHOUSE_CALL_ARG2, JAILHOUSE_CALL_CLOBBERED); return num_result; }

至此,关于hyc的汇编理解清楚了,接下来继续跟踪jailhouse驱动源码

参考链接

编辑
2024-01-25
工作知识
0

Jailhouse启动分析

jailhouse的ko已经编译出来了,这里主要开始从代码分析jailhouse的加载过程

init

初始化主要如下几个步骤

  1. jailhouse_sysfs_init
  2. misc_register
  3. jailhouse_pci_register
  4. register_reboot_notifier

sysfs

对于sysfs的创建,如下解释:

/sys/devices/jailhouse |- console - hypervisor console (see [1]) |- enabled - 1 if Jailhouse is enabled, 0 otherwise |- mem_pool_size - number of pages in hypervisor memory pool |- mem_pool_used - used pages of hypervisor memory pool |- remap_pool_size - number of pages in hypervisor remapping pool |- remap_pool_used - used pages of hypervisor remapping pool `- cells |- <id> - unique numerical ID | |- name - cell name | |- state - "running", "running/locked", "shut down", or | | "failed" | |- cpus_assigned - bitmask of assigned logical CPUs | |- cpus_assigned_list - human readable list of assigned logical CPUs | |- cpus_failed - bitmask of logical CPUs that caused a failure | |- cpus_failed_list - human readable list of logical CPUs that | | caused a failure | `- statistics | |- cpu<n> | | |- vmexits_total - Total number of VM exits on CPU <n> | | `- vmexits_<reason> - VM exits due to <reason> on CPU <n> | |- vmexits_total - Total number of VM exits on all cell CPUs | `- vmexits_<reason> - VM exits due to <reason> on all cell CPUs `- ...

在目录/sys/devices/jailhouse下

C
static struct attribute *jailhouse_sysfs_entries[] = { &dev_attr_console.attr, &dev_attr_enabled.attr, &dev_attr_mem_pool_size.attr, &dev_attr_mem_pool_used.attr, &dev_attr_remap_pool_size.attr, &dev_attr_remap_pool_used.attr, NULL }; kobject_create_and_add("cells", &dev->kobj);

主要文件如下:

  1. console 虚拟机状态信息
  2. enabled jailhouse使能信息
  3. mem_pool_size 虚拟机内存池大小
  4. mem_pool_used 虚拟机内存池使用量
  5. remap_pool_size 虚拟机内存映射大小
  6. remap_pool_used 虚拟机内存映射使用量
  7. cell目录

对于cell目录的内容,如下

C
kobject_init_and_add(&cell->kobj, &cell_type, cells_dir, "%d", cell->id);

可见cell目录下只存在id目录,在cell的enable和create的过程中,会主动创建,模块加载时并不会创建

对于id目录内的内容,如下

C
static struct attribute *cell_attrs[] = { &cell_name_attr.attr, &cell_state_attr.attr, &cell_cpus_assigned_attr.attr, &cell_cpus_assigned_list_attr.attr, &cell_cpus_failed_attr.attr, &cell_cpus_failed_list_attr.attr, NULL, }; kobject_init_and_add(&cell->stats_kobj, &cell_stats_type, &cell->kobj, "%s", "statistics");

主要文件如下:

  1. name cell的名字
  2. state cell的状态
  3. cpus_assigned cpumask信息,以%*pb显示,例如ffff
  4. cpus_assigned_list cpumask信息,以%*pbl显示,例如0-7
  5. cpus_failed 失败的cpumask信息,以%*pb显示
  6. cpus_failed_list 失败的cpumask信息,以%*pbl显示
  7. statistics目录

对于statistics目录,主要内容如下

C
static struct attribute *cell_stats_attrs[] = { &vmexits_total_cell_attr.kattr.attr, &vmexits_mmio_cell_attr.kattr.attr, &vmexits_management_cell_attr.kattr.attr, &vmexits_hypercall_cell_attr.kattr.attr, &vmexits_maintenance_cell_attr.kattr.attr, &vmexits_virt_irq_cell_attr.kattr.attr, &vmexits_virt_sgi_cell_attr.kattr.attr, &vmexits_psci_cell_attr.kattr.attr, &vmexits_smccc_cell_attr.kattr.attr, NULL }; kobject_init_and_add(&cell_cpu->kobj, &cell_cpu_type, &cell->stats_kobj, "cpu%u", cpu);

主要文件如下:

  1. vmexits_total 虚拟机退出个数
  2. vmexits_mmio (Memory mapping I/O)
  3. vmexits_management (Memory management)
  4. vmexits_hypercall (Hypercall)
  5. vmexits_maintenance (Cache Maintenance)
  6. vmexits_virt_irq (IRQ)
  7. vmexits_virt_sgi (Software Generated Interrupt)
  8. vmexits_psci (Power State Coordination Interface)
  9. vmexits_smccc (SMC Calling Convention SMC调用约定)
  10. cpu%u cpu目录

对于cpu%u目录,主要内容如下:

C
static struct attribute *cpu_stats_attrs[] = { &vmexits_total_cpu_attr.kattr.attr, &vmexits_mmio_cpu_attr.kattr.attr, &vmexits_management_cpu_attr.kattr.attr, &vmexits_hypercall_cpu_attr.kattr.attr, &vmexits_maintenance_cpu_attr.kattr.attr, &vmexits_virt_irq_cpu_attr.kattr.attr, &vmexits_virt_sgi_cpu_attr.kattr.attr, &vmexits_psci_cpu_attr.kattr.attr, &vmexits_smccc_cpu_attr.kattr.attr, NULL };

对于文件如下:

  1. vmexits_total
  2. vmexits_mmio
  3. vmexits_management
  4. vmexits_hypercall
  5. vmexits_maintenance
  6. vmexits_virt_irq
  7. vmexits_virt_sgi
  8. vmexits_psci
  9. vmexits_smccc

这里需要留意的是如下:

C
asm volatile( JAILHOUSE_CALL_INS : "+r" (num_result), "+r" (__arg1), "+r" (__arg2) : : "memory", JAILHOUSE_CALL_CLOBBERED); #define JAILHOUSE_CALL_INS "hvc #0x4a48" #define JAILHOUSE_CALL_NUM_RESULT "x0" #define JAILHOUSE_CALL_ARG1 "x1" #define JAILHOUSE_CALL_ARG2 "x2" #define JAILHOUSE_CALL_CLOBBERED "x3"

misc

对于misc设备,就是往/dev下创建jailhouse文件。主要如下:

C
static const struct file_operations jailhouse_fops = { .owner = THIS_MODULE, .unlocked_ioctl = jailhouse_ioctl, .compat_ioctl = jailhouse_ioctl, .llseek = noop_llseek, .open = jailhouse_console_open, .release = jailhouse_console_release, .read = jailhouse_console_read, }; static struct miscdevice jailhouse_misc_dev = { .minor = MISC_DYNAMIC_MINOR, .name = "jailhouse", .fops = &jailhouse_fops, };

关于ioctl,主要提供如下

JAILHOUSE_ENABLE JAILHOUSE_DISABLE JAILHOUSE_CELL_CREATE JAILHOUSE_CELL_LOAD JAILHOUSE_CELL_START JAILHOUSE_CELL_DESTROY

这些ioctl提供了jailhouse的基本使能,禁用,创建,加载,开始,销毁的能力。
对于open release read三个file operations

  1. open 提供console_state结构体
  2. release 销毁console_state结构体
  3. read 向dump 虚拟机状态console信息

关于在jailhouse enable的ioctl中做的事情,后续再分析

pci

注册一个pci驱动,在probe中遍历并打印宣传在no root cell的pci设备

C
list_for_each_entry(claimed_dev, &claimed_devs, list) { if (claimed_dev->dev == dev) { dev_info(&dev->dev, "claimed for use in non-root cell\n"); ret = 0; break; } }

关于在jailhouse enable的ioctl中做的事情,后续再分析

notifier

这里只是简单的在reboot的时候,给jailhouse发送disable的cmd

C
static int jailhouse_shutdown_notify(struct notifier_block *unused1, unsigned long unused2, void *unused3) { int err; err = jailhouse_cmd_disable(); if (err && err != -EINVAL) pr_emerg("jailhouse: ordered shutdown failed!\n"); return NOTIFY_DONE; }

总结

至此,jailhouse驱动的加载过程完成了,接下来就是给jailhouse发送enable的ioctl的流程了

编辑
2024-01-19
工作知识
0

wsl2使用vhdx虚拟磁盘

windwos的wsl如果直接使用windows的盘符来存储文件可以发现速度非常的慢。如下

dd if=/dev/zero of=test.img status=progress 4970496 bytes (5.0 MB, 4.7 MiB) copied, 1 s, 5.0 MB/s^C 16147+0 records in 16147+0 records out 8267264 bytes (8.3 MB, 7.9 MiB) copied, 1.66701 s, 5.0 MB/s

可以发现速度为5m/s,这个是不可忍受的。为了使得wsl使用更快,有两种方式

  1. wsl直接挂载ext4的img格式
  2. wsl挂载虚拟磁盘vhdx

ext4的img挂载

  • 制作ext4分区
dd if=/dev/zero of=wsl_code.img count=10240

此时会产生5.12M的wsl_code.img 此时格式化分区为ext4,并扩大分区为100G

mkfs.ext4 wsl_code.img resize2fs wsl_code.img 100G

此时对于wsl来说,开机挂载这个img即可,如下

mount /mnt/k/wsl_code.img ~/wsl_code/

vhdx在wsl的挂载

对于wsl官方文档,推荐使用vhdx的格式,主要方法如下:

  • 计算机管理--->磁盘管理--->操作--->创建VHD--->位置--->虚拟硬盘大小--->VHDX
  • 计算机管理--->磁盘管理--->操作--->附加VHD--->位置--->确定
    这样vhdx格式的虚拟磁盘已经创建好,接下来挂载在wsl上
1. PowerShell以管理员权限打开 2. 输入命令: GET-CimInstance -query "SELECT * from Win32_DiskDrive" PS C:> GET-CimInstance -query "SELECT * from Win32_DiskDrive" DeviceID Caption Partitions Size Model -------- ------- ---------- ---- ----- \\.\PHYSICALDRIVE1 UNITEK USB3.0 TO SATA SCSI Disk Device 1 1000202273280 UNITEK USB3.0 TO SATA SCSI Disk D... \\.\PHYSICALDRIVE0 KIOXIA-EXCERIA SSD 4 500105249280 KIOXIA-EXCERIA SSD \\.\PHYSICALDRIVE2 Microsoft 虚拟磁盘 1 966363471360 Microsoft 虚拟磁盘 找到自己创建的虚拟磁盘的Windows磁盘标号: \\.\PHYSICALDRIVE2 3. 关闭wsl wsl --shutdown 4. 挂载虚拟磁盘 wsl --mount \\.\PHYSICALDRIVE2 --bare 5. 启动wsl 6. 创建文件系统格式 mkfs.ext4 /dev/sdc1 7. 挂载文件系统格式 mount /dev/sdc1 wsl_code/ 8. 设置label e2label /dev/sdc1 wsl_code 9. 自动挂载 vim /etc/fstab LABEL=wsl_code /root/wsl_code ext4 defaults,nofail 0 1 10. 卸载wsl上的虚拟磁盘 wsl --unmount \\.\PHYSICALDRIVE2

至此,就可以愉快的使用wsl的虚拟磁盘的内容啦

root@kylin:~/wsl_code# dd if=/dev/zero of=test.img status=progress 2114826752 bytes (2.1 GB, 2.0 GiB) copied, 4 s, 529 MB/s