pcie主要有rc,ep,和switch以及pcie to pci/pci-x bridge组成,如下图所示
rc设备通常直接对接cpu和memory,通常也是pcie设备的根设备,也就是bus域的根设备
ep设备是终端设备,也就是具体实现功能的设备,它有如下特殊的ep设备:
switch设备可以提供多个虚拟的pci-pci的bridge设备,如下图所示
所以switch具有如下两个特性
对于switch包含设备
pcie开始通过配置空间来读取pcie基本信息,改信息通过上层lspci能够正常解析,客户反馈自己的xilinx设备bar0只有512k,需要我们确定一下,遂确定如下
For device Functions with Type 0 headers (all types of Endpoints) For device Functions with Type 1 headers (Root Ports, Switches and Bridges)
如何查看设备是type0还是type1 headers,可以通过配置空间的0x1e的值来确定,如下
For Functions that implement a Type 0 Configuration Space header the encoding 000 0000b must be used. For Functions that implement a Type 1 Configuration Space header the encoding 000 0001b must be used
也就是说,对于0x0e的值,如果是0x1则是type1 headers型设备,通常是rc或bridges,如果是0x0则是type0 headers型设备,通常是ep。
root@kylin:~# lspci -x 00:00.0 PCI bridge: Fuzhou Rockchip Electronics Co., Ltd Device 3588 (rev 01) 00: 87 1d 88 35 07 05 10 00 01 00 04 06 00 00 01 00 10: 00 00 00 00 00 00 00 00 00 01 ff 00 f0 00 00 00 20: 00 f0 00 f0 f1 ff 01 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 40 00 00 00 00 00 00 00 70 01 02 00 01:00.0 Memory controller: Xilinx Corporation Device 7014 00: ee 10 14 70 00 00 10 00 00 00 80 05 00 00 00 00 10: 00 00 f8 ff 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 ee 10 07 00 30: 00 00 00 00 80 00 00 00 00 00 00 00 00 01 00 00
lspci -x可以读到0xe的值,这里可以看到00:00.0 是type1 headers,而01:00.0则是type0 headers
通常的配置空间布局如下图所示
type0的配置空间布局如下图所示
type1的配置空间布局如下图所示
这里可以知道,无论哪种type设备,bar0的寄存器都在0x10处,这里聚焦0x10处的信息
对于0x10的值,默认是linux设置的映射地址,用作pcie域的读写操作。如下可以确定
root@kylin:~# lspci -x -s 01:00.0 01:00.0 Memory controller: Xilinx Corporation Device 7014 00: ee 10 14 70 00 00 10 00 00 00 80 05 00 00 00 00 10: 00 00 00 f0 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 ee 10 07 00 30: 00 00 00 00 80 00 00 00 00 00 00 00 ff 01 00 00
这里可以知道0x10的值为f0000000,这个0xf0000000是linux内通过设备树映射的可访问的地址。
对于bar0的大小lspci已经读取出来了是512k,如下
root@kylin:~# lspci -s 01:00.0 -v 01:00.0 Memory controller: Xilinx Corporation Device 7014 Subsystem: Xilinx Corporation Device 0007 Flags: fast devsel, IRQ 255 Memory at f0200000 (32-bit, non-prefetchable) [disabled] [size=512K] Capabilities: [80] Power Management version 3 Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [c0] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting lspci: Unable to load libkmod resources: error -2
setpci --dumpregs setpci -s 01:00.0 0x10.L=0xffffffff root@kylin:~# lspci -s 01:00.0 -x 01:00.0 Memory controller: Xilinx Corporation Device 7014 00: ee 10 14 70 00 00 10 00 00 00 80 05 00 00 00 00 10: 00 00 f8 ff 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 ee 10 07 00 30: 00 00 00 00 80 00 00 00 00 00 00 00 ff 01 00 00
这时候读出来是0xfff80000,取最低有效位0x80000,则正好是512k
io -4 0xf0000000 -r -l 64 io -4 -w 0xf0000010 0xffffffff io -4 0xf0000010 -r
这里读出来仍是0xfff80000,换算也是512k
内核主要代码实现在如下两个函数
主要函数如下:
__pci_read_base pci_read_config_dword(dev, pos, &l); pci_write_config_dword(dev, pos, l | mask); pci_read_config_dword(dev, pos, &sz); pci_size u64 size = mask & maxbase; size = size & ~(size-1);
在看jailhouse代码的时候,看到了hvc特权指令,这里详细研究一下
cstatic inline __jh_arg jailhouse_call_arg2(__jh_arg num, __jh_arg arg1,
__jh_arg arg2)
{
register __jh_arg num_result asm(JAILHOUSE_CALL_NUM_RESULT) = num;
register __jh_arg __arg1 asm(JAILHOUSE_CALL_ARG1) = arg1;
register __jh_arg __arg2 asm(JAILHOUSE_CALL_ARG2) = arg2;
asm volatile(
JAILHOUSE_CALL_INS
: "+r" (num_result), "+r" (__arg1), "+r" (__arg2)
: : "memory", JAILHOUSE_CALL_CLOBBERED);
return num_result;
}
#define JAILHOUSE_CALL_INS "hvc #0x4a48"
#define JAILHOUSE_CALL_NUM_RESULT "x0"
#define JAILHOUSE_CALL_ARG1 "x1"
#define JAILHOUSE_CALL_ARG2 "x2"
#define JAILHOUSE_CALL_CLOBBERED "x3"
asm asm-qualifiers ( AssemblerTemplate : OutputOperands : InputOperands : Clobbers : GotoLabels)
对于上文中的内联汇编,宏扩展后伪代码如下
asm volatile( hvc #0x4a48 : "+r" (x0), "+r" (x1), "+r" (x2) : : "memory", x3);
冒号作为操作数参数的分隔符
Extended asm syntax uses colons (‘:’) to delimit the operand parameters after the assembler template:
禁用gcc的优化(move code out of loops
)
GCC’s optimizers sometimes discard asm statements if they determine there is no need for the output variables. Also, the optimizers may move code out of loops if they believe that the code will always return the same result (i.e. none of its input values change between calls). Using the volatile qualifier disables these optimizations. asm statements that have no output operands and asm goto statements, are implicitly volatile.
对于'+',代表操作数是可读可写的
Means that this operand is both read and written by the instruction.
对于'r',代表寄存器是一个通用寄存器
A register operand is allowed provided that it is in a general register.
对于'memory',代表告诉编译器这个内存可能被读写,不要被优化。等于内存屏障
The "memory" clobber tells the compiler that the assembly code performs memory reads or writes to items other than those listed in the input and output operands (for example, accessing the memory pointed to by one of the input parameters). To ensure memory contains correct values, GCC may need to flush specific register values to memory before executing the asm. Further, the compiler does not assume that any values read from memory before an asm remain unchanged after that asm; it reloads them as needed. Using the "memory" clobber effectively forms a read/write memory barrier for the compiler.
x1和x2是通用寄存器用于传递参数,x0作为函数的返回值。
hvc指令会进入hyp模式,cpsr的值会保存到hyp模式的spsr中并执行hvc向量(指向hypervisor call的异常处理程序的入口地址) 但是imm立即数会被处理器忽略,但是在入口函数可以检索到imm的值,从而确定是什么服务
HVC #imm imm is an expression evaluating to an integer in the range 0-65535. In a processor that implements the Virtualization Extensions, the HVC instruction causes a Hypervisor Call exception. This means that the processor enters Hyp mode, the CPSR value is saved to the Hyp mode SPSR, and execution branches to the HVC vector. imm is ignored by the processor. However, it can be retrieved by the exception handler to determine what service is being requested.
Instruction: hvc #0x4a48 Hypercall code: x0 1. argument: x1 2. argument: x2 Return code: x0
这里使用虚拟化指令hvc调用立即数#0x4a48,立即数0x4a48只是用作指明是什么服务(jailhouse),参数为x1,x2返回值为x0.如x1,x2不存在则缺省.这里x0先是hypervisor调用的code,然后作为返回值提供返回出去,x1,x2是传入参数,用作根据根据x0的code入口函数的传参,最后的'memory, JAILHOUSE_CALL_CLOBBERED'用作缺省,如入口函数不需要多个参数,这里声明x1,x2,x3带有memory作为暂存寄存器.
While the compiler is aware of changes to entries listed in the output operands, the inline asm code may modify more than just the outputs. For example, calculations may require additional registers, or the processor may overwrite a register as a side effect of a particular assembler instruction. In order to inform the compiler of these changes, list them in the clobber list. Clobber list items are either register names or the special clobbers (listed below). Each clobber list item is a string constant enclosed in double quotes and separated by commas
上面意思是计算可能需要额外寄存器,或者处理器对特殊汇编指令可能会覆写这些寄存器,为了让编译器知道这种情况,可以把这些寄存器放在clobber列表作为暂存寄存器。
#define JAILHOUSE_HC_DISABLE 0 #define JAILHOUSE_HC_CELL_CREATE 1 #define JAILHOUSE_HC_CELL_START 2 #define JAILHOUSE_HC_CELL_SET_LOADABLE 3 #define JAILHOUSE_HC_CELL_DESTROY 4 #define JAILHOUSE_HC_HYPERVISOR_GET_INFO 5 #define JAILHOUSE_HC_CELL_GET_STATE 6 #define JAILHOUSE_HC_CPU_GET_INFO 7 #define JAILHOUSE_HC_DEBUG_CONSOLE_PUTC 8 /* Hypervisor information type */ #define JAILHOUSE_INFO_MEM_POOL_SIZE 0 #define JAILHOUSE_INFO_MEM_POOL_USED 1 #define JAILHOUSE_INFO_REMAP_POOL_SIZE 2 #define JAILHOUSE_INFO_REMAP_POOL_USED 3 #define JAILHOUSE_INFO_NUM_CELLS 4 /* Hypervisor information type */ #define JAILHOUSE_CPU_INFO_STATE 0 #define JAILHOUSE_CPU_INFO_STAT_BASE 1000 /* CPU state */ #define JAILHOUSE_CPU_RUNNING 0 #define JAILHOUSE_CPU_FAILED 2 /* terminal state */ /* CPU statistics */ #define JAILHOUSE_CPU_STAT_VMEXITS_TOTAL 0 #define JAILHOUSE_CPU_STAT_VMEXITS_MMIO 1 #define JAILHOUSE_CPU_STAT_VMEXITS_MANAGEMENT 2 #define JAILHOUSE_CPU_STAT_VMEXITS_HYPERCALL 3 #define JAILHOUSE_GENERIC_CPU_STATS 4
上面调用作为x0传入hypervisor call,从而发送hyc #0x4a48来管理虚拟机。 举个例子如下
Cerr = jailhouse_call(JAILHOUSE_HC_DISABLE);
static inline __jh_arg jailhouse_call(__jh_arg num)
{
register __jh_arg num_result asm(JAILHOUSE_CALL_NUM_RESULT) = num;
asm volatile(
JAILHOUSE_CALL_INS
: "+r" (num_result)
: : "memory", JAILHOUSE_CALL_ARG1, JAILHOUSE_CALL_ARG2,
JAILHOUSE_CALL_CLOBBERED);
return num_result;
}
至此,关于hyc的汇编理解清楚了,接下来继续跟踪jailhouse驱动源码
jailhouse的ko已经编译出来了,这里主要开始从代码分析jailhouse的加载过程
初始化主要如下几个步骤
对于sysfs的创建,如下解释:
/sys/devices/jailhouse |- console - hypervisor console (see [1]) |- enabled - 1 if Jailhouse is enabled, 0 otherwise |- mem_pool_size - number of pages in hypervisor memory pool |- mem_pool_used - used pages of hypervisor memory pool |- remap_pool_size - number of pages in hypervisor remapping pool |- remap_pool_used - used pages of hypervisor remapping pool `- cells |- <id> - unique numerical ID | |- name - cell name | |- state - "running", "running/locked", "shut down", or | | "failed" | |- cpus_assigned - bitmask of assigned logical CPUs | |- cpus_assigned_list - human readable list of assigned logical CPUs | |- cpus_failed - bitmask of logical CPUs that caused a failure | |- cpus_failed_list - human readable list of logical CPUs that | | caused a failure | `- statistics | |- cpu<n> | | |- vmexits_total - Total number of VM exits on CPU <n> | | `- vmexits_<reason> - VM exits due to <reason> on CPU <n> | |- vmexits_total - Total number of VM exits on all cell CPUs | `- vmexits_<reason> - VM exits due to <reason> on all cell CPUs `- ...
在目录/sys/devices/jailhouse下
Cstatic struct attribute *jailhouse_sysfs_entries[] = {
&dev_attr_console.attr,
&dev_attr_enabled.attr,
&dev_attr_mem_pool_size.attr,
&dev_attr_mem_pool_used.attr,
&dev_attr_remap_pool_size.attr,
&dev_attr_remap_pool_used.attr,
NULL
};
kobject_create_and_add("cells", &dev->kobj);
主要文件如下:
对于cell目录的内容,如下
Ckobject_init_and_add(&cell->kobj, &cell_type, cells_dir, "%d",
cell->id);
可见cell目录下只存在id目录,在cell的enable和create的过程中,会主动创建,模块加载时并不会创建
对于id目录内的内容,如下
Cstatic struct attribute *cell_attrs[] = {
&cell_name_attr.attr,
&cell_state_attr.attr,
&cell_cpus_assigned_attr.attr,
&cell_cpus_assigned_list_attr.attr,
&cell_cpus_failed_attr.attr,
&cell_cpus_failed_list_attr.attr,
NULL,
};
kobject_init_and_add(&cell->stats_kobj, &cell_stats_type,
&cell->kobj, "%s", "statistics");
主要文件如下:
对于statistics目录,主要内容如下
Cstatic struct attribute *cell_stats_attrs[] = {
&vmexits_total_cell_attr.kattr.attr,
&vmexits_mmio_cell_attr.kattr.attr,
&vmexits_management_cell_attr.kattr.attr,
&vmexits_hypercall_cell_attr.kattr.attr,
&vmexits_maintenance_cell_attr.kattr.attr,
&vmexits_virt_irq_cell_attr.kattr.attr,
&vmexits_virt_sgi_cell_attr.kattr.attr,
&vmexits_psci_cell_attr.kattr.attr,
&vmexits_smccc_cell_attr.kattr.attr,
NULL
};
kobject_init_and_add(&cell_cpu->kobj, &cell_cpu_type, &cell->stats_kobj, "cpu%u", cpu);
主要文件如下:
对于cpu%u目录,主要内容如下:
Cstatic struct attribute *cpu_stats_attrs[] = {
&vmexits_total_cpu_attr.kattr.attr,
&vmexits_mmio_cpu_attr.kattr.attr,
&vmexits_management_cpu_attr.kattr.attr,
&vmexits_hypercall_cpu_attr.kattr.attr,
&vmexits_maintenance_cpu_attr.kattr.attr,
&vmexits_virt_irq_cpu_attr.kattr.attr,
&vmexits_virt_sgi_cpu_attr.kattr.attr,
&vmexits_psci_cpu_attr.kattr.attr,
&vmexits_smccc_cpu_attr.kattr.attr,
NULL
};
对于文件如下:
这里需要留意的是如下:
C asm volatile(
JAILHOUSE_CALL_INS
: "+r" (num_result), "+r" (__arg1), "+r" (__arg2)
:
: "memory", JAILHOUSE_CALL_CLOBBERED);
#define JAILHOUSE_CALL_INS "hvc #0x4a48"
#define JAILHOUSE_CALL_NUM_RESULT "x0"
#define JAILHOUSE_CALL_ARG1 "x1"
#define JAILHOUSE_CALL_ARG2 "x2"
#define JAILHOUSE_CALL_CLOBBERED "x3"
对于misc设备,就是往/dev下创建jailhouse文件。主要如下:
Cstatic const struct file_operations jailhouse_fops = {
.owner = THIS_MODULE,
.unlocked_ioctl = jailhouse_ioctl,
.compat_ioctl = jailhouse_ioctl,
.llseek = noop_llseek,
.open = jailhouse_console_open,
.release = jailhouse_console_release,
.read = jailhouse_console_read,
};
static struct miscdevice jailhouse_misc_dev = {
.minor = MISC_DYNAMIC_MINOR,
.name = "jailhouse",
.fops = &jailhouse_fops,
};
关于ioctl,主要提供如下
JAILHOUSE_ENABLE JAILHOUSE_DISABLE JAILHOUSE_CELL_CREATE JAILHOUSE_CELL_LOAD JAILHOUSE_CELL_START JAILHOUSE_CELL_DESTROY
这些ioctl提供了jailhouse的基本使能,禁用,创建,加载,开始,销毁的能力。
对于open release read三个file operations
关于在jailhouse enable的ioctl中做的事情,后续再分析
注册一个pci驱动,在probe中遍历并打印宣传在no root cell的pci设备
C list_for_each_entry(claimed_dev, &claimed_devs, list) {
if (claimed_dev->dev == dev) {
dev_info(&dev->dev,
"claimed for use in non-root cell\n");
ret = 0;
break;
}
}
关于在jailhouse enable的ioctl中做的事情,后续再分析
这里只是简单的在reboot的时候,给jailhouse发送disable的cmd
Cstatic int jailhouse_shutdown_notify(struct notifier_block *unused1,
unsigned long unused2, void *unused3)
{
int err;
err = jailhouse_cmd_disable();
if (err && err != -EINVAL)
pr_emerg("jailhouse: ordered shutdown failed!\n");
return NOTIFY_DONE;
}
至此,jailhouse驱动的加载过程完成了,接下来就是给jailhouse发送enable的ioctl的流程了
windwos的wsl如果直接使用windows的盘符来存储文件可以发现速度非常的慢。如下
dd if=/dev/zero of=test.img status=progress 4970496 bytes (5.0 MB, 4.7 MiB) copied, 1 s, 5.0 MB/s^C 16147+0 records in 16147+0 records out 8267264 bytes (8.3 MB, 7.9 MiB) copied, 1.66701 s, 5.0 MB/s
可以发现速度为5m/s,这个是不可忍受的。为了使得wsl使用更快,有两种方式
dd if=/dev/zero of=wsl_code.img count=10240
此时会产生5.12M的wsl_code.img 此时格式化分区为ext4,并扩大分区为100G
mkfs.ext4 wsl_code.img resize2fs wsl_code.img 100G
此时对于wsl来说,开机挂载这个img即可,如下
mount /mnt/k/wsl_code.img ~/wsl_code/
对于wsl官方文档,推荐使用vhdx的格式,主要方法如下:
1. PowerShell以管理员权限打开 2. 输入命令: GET-CimInstance -query "SELECT * from Win32_DiskDrive" PS C:> GET-CimInstance -query "SELECT * from Win32_DiskDrive" DeviceID Caption Partitions Size Model -------- ------- ---------- ---- ----- \\.\PHYSICALDRIVE1 UNITEK USB3.0 TO SATA SCSI Disk Device 1 1000202273280 UNITEK USB3.0 TO SATA SCSI Disk D... \\.\PHYSICALDRIVE0 KIOXIA-EXCERIA SSD 4 500105249280 KIOXIA-EXCERIA SSD \\.\PHYSICALDRIVE2 Microsoft 虚拟磁盘 1 966363471360 Microsoft 虚拟磁盘 找到自己创建的虚拟磁盘的Windows磁盘标号: \\.\PHYSICALDRIVE2 3. 关闭wsl wsl --shutdown 4. 挂载虚拟磁盘 wsl --mount \\.\PHYSICALDRIVE2 --bare 5. 启动wsl 6. 创建文件系统格式 mkfs.ext4 /dev/sdc1 7. 挂载文件系统格式 mount /dev/sdc1 wsl_code/ 8. 设置label e2label /dev/sdc1 wsl_code 9. 自动挂载 vim /etc/fstab LABEL=wsl_code /root/wsl_code ext4 defaults,nofail 0 1 10. 卸载wsl上的虚拟磁盘 wsl --unmount \\.\PHYSICALDRIVE2
至此,就可以愉快的使用wsl的虚拟磁盘的内容啦
root@kylin:~/wsl_code# dd if=/dev/zero of=test.img status=progress 2114826752 bytes (2.1 GB, 2.0 GiB) copied, 4 s, 529 MB/s