事务层主要处理生成和接收TLP包,以及其他如交换流控信息,支持软件和硬件启动的电源管理,这里主要说TLP包。
主要包括四个类型,内存,IO,配置,消息。如下图
TLP包有前缀,头,数据,摘要四个部分组成,如下图
详细的TLP包格式如下图
对于TLP的包的前缀包含如下字段。
前缀默认只有1dword,也就是上述字段。这里注意的是prefix的fmt一定是100b,其中type位的4位代表前缀的类型。对于local tlp prefix的type字段如下
组合起来如下
对于end-end tlp prefix 的type字段如下
组合起来如下
这里VendPrefix均是Vendor定义的TLP前缀字段
和prefix一样,header的第一个dword也是固定格式
但是对于fmt字段,不是prefix的固定的100,而是如下定义
这里fmt的值代表整个header包含多少个dword,
而接下来的type字段确定tlp的类型,故fmt和type的对于tlp包的类型如下图
这里还有如下字段解释
关于length域,主要如下
对于data payload,需要注意如下
对于数据 76543210,如果非AtomicOp Request和AtomicOp Completion,则0x100存放0,0x107存放7(第二点) 如果是AtomicOp Request和AtomicOp Completion,如果目标内存架构是小端,则100存放0,107存放7(小端数据高位存在低地址)。如果是大端,则100存放7,107存放0.
如上述可知,TD位如果是1,则tlp址出digest,并且digest域存放ecrc的值,当然,如果tlp本身不支持ecrc校验,则应该主动忽略digest域的内容
这里ECRC的意思是,end to end crc,端到端crc。ecrc与lcrc不同的点是,数据的内部转发不会对lcrc值进行校验,如果转发过程中出现错误,则可以通过ecrc来进行判断。
pcie主要有rc,ep,和switch以及pcie to pci/pci-x bridge组成,如下图所示
rc设备通常直接对接cpu和memory,通常也是pcie设备的根设备,也就是bus域的根设备
ep设备是终端设备,也就是具体实现功能的设备,它有如下特殊的ep设备:
switch设备可以提供多个虚拟的pci-pci的bridge设备,如下图所示
所以switch具有如下两个特性
对于switch包含设备
pcie开始通过配置空间来读取pcie基本信息,改信息通过上层lspci能够正常解析,客户反馈自己的xilinx设备bar0只有512k,需要我们确定一下,遂确定如下
For device Functions with Type 0 headers (all types of Endpoints) For device Functions with Type 1 headers (Root Ports, Switches and Bridges)
如何查看设备是type0还是type1 headers,可以通过配置空间的0x1e的值来确定,如下
For Functions that implement a Type 0 Configuration Space header the encoding 000 0000b must be used. For Functions that implement a Type 1 Configuration Space header the encoding 000 0001b must be used
也就是说,对于0x0e的值,如果是0x1则是type1 headers型设备,通常是rc或bridges,如果是0x0则是type0 headers型设备,通常是ep。
root@kylin:~# lspci -x 00:00.0 PCI bridge: Fuzhou Rockchip Electronics Co., Ltd Device 3588 (rev 01) 00: 87 1d 88 35 07 05 10 00 01 00 04 06 00 00 01 00 10: 00 00 00 00 00 00 00 00 00 01 ff 00 f0 00 00 00 20: 00 f0 00 f0 f1 ff 01 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 40 00 00 00 00 00 00 00 70 01 02 00 01:00.0 Memory controller: Xilinx Corporation Device 7014 00: ee 10 14 70 00 00 10 00 00 00 80 05 00 00 00 00 10: 00 00 f8 ff 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 ee 10 07 00 30: 00 00 00 00 80 00 00 00 00 00 00 00 00 01 00 00
lspci -x可以读到0xe的值,这里可以看到00:00.0 是type1 headers,而01:00.0则是type0 headers
通常的配置空间布局如下图所示
type0的配置空间布局如下图所示
type1的配置空间布局如下图所示
这里可以知道,无论哪种type设备,bar0的寄存器都在0x10处,这里聚焦0x10处的信息
对于0x10的值,默认是linux设置的映射地址,用作pcie域的读写操作。如下可以确定
root@kylin:~# lspci -x -s 01:00.0 01:00.0 Memory controller: Xilinx Corporation Device 7014 00: ee 10 14 70 00 00 10 00 00 00 80 05 00 00 00 00 10: 00 00 00 f0 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 ee 10 07 00 30: 00 00 00 00 80 00 00 00 00 00 00 00 ff 01 00 00
这里可以知道0x10的值为f0000000,这个0xf0000000是linux内通过设备树映射的可访问的地址。
对于bar0的大小lspci已经读取出来了是512k,如下
root@kylin:~# lspci -s 01:00.0 -v 01:00.0 Memory controller: Xilinx Corporation Device 7014 Subsystem: Xilinx Corporation Device 0007 Flags: fast devsel, IRQ 255 Memory at f0200000 (32-bit, non-prefetchable) [disabled] [size=512K] Capabilities: [80] Power Management version 3 Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [c0] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting lspci: Unable to load libkmod resources: error -2
setpci --dumpregs setpci -s 01:00.0 0x10.L=0xffffffff root@kylin:~# lspci -s 01:00.0 -x 01:00.0 Memory controller: Xilinx Corporation Device 7014 00: ee 10 14 70 00 00 10 00 00 00 80 05 00 00 00 00 10: 00 00 f8 ff 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 ee 10 07 00 30: 00 00 00 00 80 00 00 00 00 00 00 00 ff 01 00 00
这时候读出来是0xfff80000,取最低有效位0x80000,则正好是512k
io -4 0xf0000000 -r -l 64 io -4 -w 0xf0000010 0xffffffff io -4 0xf0000010 -r
这里读出来仍是0xfff80000,换算也是512k
内核主要代码实现在如下两个函数
主要函数如下:
__pci_read_base pci_read_config_dword(dev, pos, &l); pci_write_config_dword(dev, pos, l | mask); pci_read_config_dword(dev, pos, &sz); pci_size u64 size = mask & maxbase; size = size & ~(size-1);
在看jailhouse代码的时候,看到了hvc特权指令,这里详细研究一下
cstatic inline __jh_arg jailhouse_call_arg2(__jh_arg num, __jh_arg arg1,
__jh_arg arg2)
{
register __jh_arg num_result asm(JAILHOUSE_CALL_NUM_RESULT) = num;
register __jh_arg __arg1 asm(JAILHOUSE_CALL_ARG1) = arg1;
register __jh_arg __arg2 asm(JAILHOUSE_CALL_ARG2) = arg2;
asm volatile(
JAILHOUSE_CALL_INS
: "+r" (num_result), "+r" (__arg1), "+r" (__arg2)
: : "memory", JAILHOUSE_CALL_CLOBBERED);
return num_result;
}
#define JAILHOUSE_CALL_INS "hvc #0x4a48"
#define JAILHOUSE_CALL_NUM_RESULT "x0"
#define JAILHOUSE_CALL_ARG1 "x1"
#define JAILHOUSE_CALL_ARG2 "x2"
#define JAILHOUSE_CALL_CLOBBERED "x3"
asm asm-qualifiers ( AssemblerTemplate : OutputOperands : InputOperands : Clobbers : GotoLabels)
对于上文中的内联汇编,宏扩展后伪代码如下
asm volatile( hvc #0x4a48 : "+r" (x0), "+r" (x1), "+r" (x2) : : "memory", x3);
冒号作为操作数参数的分隔符
Extended asm syntax uses colons (‘:’) to delimit the operand parameters after the assembler template:
禁用gcc的优化(move code out of loops
)
GCC’s optimizers sometimes discard asm statements if they determine there is no need for the output variables. Also, the optimizers may move code out of loops if they believe that the code will always return the same result (i.e. none of its input values change between calls). Using the volatile qualifier disables these optimizations. asm statements that have no output operands and asm goto statements, are implicitly volatile.
对于'+',代表操作数是可读可写的
Means that this operand is both read and written by the instruction.
对于'r',代表寄存器是一个通用寄存器
A register operand is allowed provided that it is in a general register.
对于'memory',代表告诉编译器这个内存可能被读写,不要被优化。等于内存屏障
The "memory" clobber tells the compiler that the assembly code performs memory reads or writes to items other than those listed in the input and output operands (for example, accessing the memory pointed to by one of the input parameters). To ensure memory contains correct values, GCC may need to flush specific register values to memory before executing the asm. Further, the compiler does not assume that any values read from memory before an asm remain unchanged after that asm; it reloads them as needed. Using the "memory" clobber effectively forms a read/write memory barrier for the compiler.
x1和x2是通用寄存器用于传递参数,x0作为函数的返回值。
hvc指令会进入hyp模式,cpsr的值会保存到hyp模式的spsr中并执行hvc向量(指向hypervisor call的异常处理程序的入口地址) 但是imm立即数会被处理器忽略,但是在入口函数可以检索到imm的值,从而确定是什么服务
HVC #imm imm is an expression evaluating to an integer in the range 0-65535. In a processor that implements the Virtualization Extensions, the HVC instruction causes a Hypervisor Call exception. This means that the processor enters Hyp mode, the CPSR value is saved to the Hyp mode SPSR, and execution branches to the HVC vector. imm is ignored by the processor. However, it can be retrieved by the exception handler to determine what service is being requested.
Instruction: hvc #0x4a48 Hypercall code: x0 1. argument: x1 2. argument: x2 Return code: x0
这里使用虚拟化指令hvc调用立即数#0x4a48,立即数0x4a48只是用作指明是什么服务(jailhouse),参数为x1,x2返回值为x0.如x1,x2不存在则缺省.这里x0先是hypervisor调用的code,然后作为返回值提供返回出去,x1,x2是传入参数,用作根据根据x0的code入口函数的传参,最后的'memory, JAILHOUSE_CALL_CLOBBERED'用作缺省,如入口函数不需要多个参数,这里声明x1,x2,x3带有memory作为暂存寄存器.
While the compiler is aware of changes to entries listed in the output operands, the inline asm code may modify more than just the outputs. For example, calculations may require additional registers, or the processor may overwrite a register as a side effect of a particular assembler instruction. In order to inform the compiler of these changes, list them in the clobber list. Clobber list items are either register names or the special clobbers (listed below). Each clobber list item is a string constant enclosed in double quotes and separated by commas
上面意思是计算可能需要额外寄存器,或者处理器对特殊汇编指令可能会覆写这些寄存器,为了让编译器知道这种情况,可以把这些寄存器放在clobber列表作为暂存寄存器。
#define JAILHOUSE_HC_DISABLE 0 #define JAILHOUSE_HC_CELL_CREATE 1 #define JAILHOUSE_HC_CELL_START 2 #define JAILHOUSE_HC_CELL_SET_LOADABLE 3 #define JAILHOUSE_HC_CELL_DESTROY 4 #define JAILHOUSE_HC_HYPERVISOR_GET_INFO 5 #define JAILHOUSE_HC_CELL_GET_STATE 6 #define JAILHOUSE_HC_CPU_GET_INFO 7 #define JAILHOUSE_HC_DEBUG_CONSOLE_PUTC 8 /* Hypervisor information type */ #define JAILHOUSE_INFO_MEM_POOL_SIZE 0 #define JAILHOUSE_INFO_MEM_POOL_USED 1 #define JAILHOUSE_INFO_REMAP_POOL_SIZE 2 #define JAILHOUSE_INFO_REMAP_POOL_USED 3 #define JAILHOUSE_INFO_NUM_CELLS 4 /* Hypervisor information type */ #define JAILHOUSE_CPU_INFO_STATE 0 #define JAILHOUSE_CPU_INFO_STAT_BASE 1000 /* CPU state */ #define JAILHOUSE_CPU_RUNNING 0 #define JAILHOUSE_CPU_FAILED 2 /* terminal state */ /* CPU statistics */ #define JAILHOUSE_CPU_STAT_VMEXITS_TOTAL 0 #define JAILHOUSE_CPU_STAT_VMEXITS_MMIO 1 #define JAILHOUSE_CPU_STAT_VMEXITS_MANAGEMENT 2 #define JAILHOUSE_CPU_STAT_VMEXITS_HYPERCALL 3 #define JAILHOUSE_GENERIC_CPU_STATS 4
上面调用作为x0传入hypervisor call,从而发送hyc #0x4a48来管理虚拟机。 举个例子如下
Cerr = jailhouse_call(JAILHOUSE_HC_DISABLE);
static inline __jh_arg jailhouse_call(__jh_arg num)
{
register __jh_arg num_result asm(JAILHOUSE_CALL_NUM_RESULT) = num;
asm volatile(
JAILHOUSE_CALL_INS
: "+r" (num_result)
: : "memory", JAILHOUSE_CALL_ARG1, JAILHOUSE_CALL_ARG2,
JAILHOUSE_CALL_CLOBBERED);
return num_result;
}
至此,关于hyc的汇编理解清楚了,接下来继续跟踪jailhouse驱动源码
jailhouse的ko已经编译出来了,这里主要开始从代码分析jailhouse的加载过程
初始化主要如下几个步骤
对于sysfs的创建,如下解释:
/sys/devices/jailhouse |- console - hypervisor console (see [1]) |- enabled - 1 if Jailhouse is enabled, 0 otherwise |- mem_pool_size - number of pages in hypervisor memory pool |- mem_pool_used - used pages of hypervisor memory pool |- remap_pool_size - number of pages in hypervisor remapping pool |- remap_pool_used - used pages of hypervisor remapping pool `- cells |- <id> - unique numerical ID | |- name - cell name | |- state - "running", "running/locked", "shut down", or | | "failed" | |- cpus_assigned - bitmask of assigned logical CPUs | |- cpus_assigned_list - human readable list of assigned logical CPUs | |- cpus_failed - bitmask of logical CPUs that caused a failure | |- cpus_failed_list - human readable list of logical CPUs that | | caused a failure | `- statistics | |- cpu<n> | | |- vmexits_total - Total number of VM exits on CPU <n> | | `- vmexits_<reason> - VM exits due to <reason> on CPU <n> | |- vmexits_total - Total number of VM exits on all cell CPUs | `- vmexits_<reason> - VM exits due to <reason> on all cell CPUs `- ...
在目录/sys/devices/jailhouse下
Cstatic struct attribute *jailhouse_sysfs_entries[] = {
&dev_attr_console.attr,
&dev_attr_enabled.attr,
&dev_attr_mem_pool_size.attr,
&dev_attr_mem_pool_used.attr,
&dev_attr_remap_pool_size.attr,
&dev_attr_remap_pool_used.attr,
NULL
};
kobject_create_and_add("cells", &dev->kobj);
主要文件如下:
对于cell目录的内容,如下
Ckobject_init_and_add(&cell->kobj, &cell_type, cells_dir, "%d",
cell->id);
可见cell目录下只存在id目录,在cell的enable和create的过程中,会主动创建,模块加载时并不会创建
对于id目录内的内容,如下
Cstatic struct attribute *cell_attrs[] = {
&cell_name_attr.attr,
&cell_state_attr.attr,
&cell_cpus_assigned_attr.attr,
&cell_cpus_assigned_list_attr.attr,
&cell_cpus_failed_attr.attr,
&cell_cpus_failed_list_attr.attr,
NULL,
};
kobject_init_and_add(&cell->stats_kobj, &cell_stats_type,
&cell->kobj, "%s", "statistics");
主要文件如下:
对于statistics目录,主要内容如下
Cstatic struct attribute *cell_stats_attrs[] = {
&vmexits_total_cell_attr.kattr.attr,
&vmexits_mmio_cell_attr.kattr.attr,
&vmexits_management_cell_attr.kattr.attr,
&vmexits_hypercall_cell_attr.kattr.attr,
&vmexits_maintenance_cell_attr.kattr.attr,
&vmexits_virt_irq_cell_attr.kattr.attr,
&vmexits_virt_sgi_cell_attr.kattr.attr,
&vmexits_psci_cell_attr.kattr.attr,
&vmexits_smccc_cell_attr.kattr.attr,
NULL
};
kobject_init_and_add(&cell_cpu->kobj, &cell_cpu_type, &cell->stats_kobj, "cpu%u", cpu);
主要文件如下:
对于cpu%u目录,主要内容如下:
Cstatic struct attribute *cpu_stats_attrs[] = {
&vmexits_total_cpu_attr.kattr.attr,
&vmexits_mmio_cpu_attr.kattr.attr,
&vmexits_management_cpu_attr.kattr.attr,
&vmexits_hypercall_cpu_attr.kattr.attr,
&vmexits_maintenance_cpu_attr.kattr.attr,
&vmexits_virt_irq_cpu_attr.kattr.attr,
&vmexits_virt_sgi_cpu_attr.kattr.attr,
&vmexits_psci_cpu_attr.kattr.attr,
&vmexits_smccc_cpu_attr.kattr.attr,
NULL
};
对于文件如下:
这里需要留意的是如下:
C asm volatile(
JAILHOUSE_CALL_INS
: "+r" (num_result), "+r" (__arg1), "+r" (__arg2)
:
: "memory", JAILHOUSE_CALL_CLOBBERED);
#define JAILHOUSE_CALL_INS "hvc #0x4a48"
#define JAILHOUSE_CALL_NUM_RESULT "x0"
#define JAILHOUSE_CALL_ARG1 "x1"
#define JAILHOUSE_CALL_ARG2 "x2"
#define JAILHOUSE_CALL_CLOBBERED "x3"
对于misc设备,就是往/dev下创建jailhouse文件。主要如下:
Cstatic const struct file_operations jailhouse_fops = {
.owner = THIS_MODULE,
.unlocked_ioctl = jailhouse_ioctl,
.compat_ioctl = jailhouse_ioctl,
.llseek = noop_llseek,
.open = jailhouse_console_open,
.release = jailhouse_console_release,
.read = jailhouse_console_read,
};
static struct miscdevice jailhouse_misc_dev = {
.minor = MISC_DYNAMIC_MINOR,
.name = "jailhouse",
.fops = &jailhouse_fops,
};
关于ioctl,主要提供如下
JAILHOUSE_ENABLE JAILHOUSE_DISABLE JAILHOUSE_CELL_CREATE JAILHOUSE_CELL_LOAD JAILHOUSE_CELL_START JAILHOUSE_CELL_DESTROY
这些ioctl提供了jailhouse的基本使能,禁用,创建,加载,开始,销毁的能力。
对于open release read三个file operations
关于在jailhouse enable的ioctl中做的事情,后续再分析
注册一个pci驱动,在probe中遍历并打印宣传在no root cell的pci设备
C list_for_each_entry(claimed_dev, &claimed_devs, list) {
if (claimed_dev->dev == dev) {
dev_info(&dev->dev,
"claimed for use in non-root cell\n");
ret = 0;
break;
}
}
关于在jailhouse enable的ioctl中做的事情,后续再分析
这里只是简单的在reboot的时候,给jailhouse发送disable的cmd
Cstatic int jailhouse_shutdown_notify(struct notifier_block *unused1,
unsigned long unused2, void *unused3)
{
int err;
err = jailhouse_cmd_disable();
if (err && err != -EINVAL)
pr_emerg("jailhouse: ordered shutdown failed!\n");
return NOTIFY_DONE;
}
至此,jailhouse驱动的加载过程完成了,接下来就是给jailhouse发送enable的ioctl的流程了