tlp有三种方式路由,Address方式,ID方式和implicit(隐式) address和id是常用路由方式,implicit仅用作message request的type tlp包的路由,这里描述一下address和id。
address 路由用作memory和io类型,如果是64位地址则header是4dword,32位地址则是3dword,如下

对于地址路由时,header的1dword的at字段在memory read/write和atomicOp请求时有效,其他tlp类型时,at字段保留。如下

也就是:00b=未转换的地址,01b=地址转换请求,10b=已转换的地址,11b=预留
如果存在转换的地址,则地址映射关系,体现在header域的byte8-byte15上,如下图所示

值得注意的是,对于4g以下的地址,请求必须以32位的格式。
id路由通常作为配置请求,ID Routed Message和Completions。对于header中,需要定义Vendor_Defined Messages作为ID。
id路由通过BDF(Bus,Device,Function Numbers)来明确TLP的目的地址
对于ari和非ari设备,id路由的header域值如下

如果是4dword的tlp header,id路由的header布局如下.(不同字节的header由tlp的类型决定)

如果是3dword的tlp header,id路由的header布局如下

这里可以从byte8和byte9来确定BDF值作为id路由的依据
对于header的byte7,有两个字段,last dw/1st dw。它与byte2和byte3的length[9:0]有一定关系:
事务层主要处理生成和接收TLP包,以及其他如交换流控信息,支持软件和硬件启动的电源管理,这里主要说TLP包。

主要包括四个类型,内存,IO,配置,消息。如下图

TLP包有前缀,头,数据,摘要四个部分组成,如下图

详细的TLP包格式如下图

对于TLP的包的前缀包含如下字段。

前缀默认只有1dword,也就是上述字段。这里注意的是prefix的fmt一定是100b,其中type位的4位代表前缀的类型。对于local tlp prefix的type字段如下

组合起来如下
对于end-end tlp prefix 的type字段如下

组合起来如下
这里VendPrefix均是Vendor定义的TLP前缀字段
和prefix一样,header的第一个dword也是固定格式

但是对于fmt字段,不是prefix的固定的100,而是如下定义

这里fmt的值代表整个header包含多少个dword,
而接下来的type字段确定tlp的类型,故fmt和type的对于tlp包的类型如下图


这里还有如下字段解释
关于length域,主要如下

对于data payload,需要注意如下

对于数据 76543210,如果非AtomicOp Request和AtomicOp Completion,则0x100存放0,0x107存放7(第二点) 如果是AtomicOp Request和AtomicOp Completion,如果目标内存架构是小端,则100存放0,107存放7(小端数据高位存在低地址)。如果是大端,则100存放7,107存放0.
如上述可知,TD位如果是1,则tlp址出digest,并且digest域存放ecrc的值,当然,如果tlp本身不支持ecrc校验,则应该主动忽略digest域的内容
这里ECRC的意思是,end to end crc,端到端crc。ecrc与lcrc不同的点是,数据的内部转发不会对lcrc值进行校验,如果转发过程中出现错误,则可以通过ecrc来进行判断。
pcie主要有rc,ep,和switch以及pcie to pci/pci-x bridge组成,如下图所示

rc设备通常直接对接cpu和memory,通常也是pcie设备的根设备,也就是bus域的根设备
ep设备是终端设备,也就是具体实现功能的设备,它有如下特殊的ep设备:
switch设备可以提供多个虚拟的pci-pci的bridge设备,如下图所示

所以switch具有如下两个特性
对于switch包含设备
pcie开始通过配置空间来读取pcie基本信息,改信息通过上层lspci能够正常解析,客户反馈自己的xilinx设备bar0只有512k,需要我们确定一下,遂确定如下
For device Functions with Type 0 headers (all types of Endpoints) For device Functions with Type 1 headers (Root Ports, Switches and Bridges)
如何查看设备是type0还是type1 headers,可以通过配置空间的0x1e的值来确定,如下

For Functions that implement a Type 0 Configuration Space header the encoding 000 0000b must be used. For Functions that implement a Type 1 Configuration Space header the encoding 000 0001b must be used
也就是说,对于0x0e的值,如果是0x1则是type1 headers型设备,通常是rc或bridges,如果是0x0则是type0 headers型设备,通常是ep。
root@kylin:~# lspci -x 00:00.0 PCI bridge: Fuzhou Rockchip Electronics Co., Ltd Device 3588 (rev 01) 00: 87 1d 88 35 07 05 10 00 01 00 04 06 00 00 01 00 10: 00 00 00 00 00 00 00 00 00 01 ff 00 f0 00 00 00 20: 00 f0 00 f0 f1 ff 01 00 00 00 00 00 00 00 00 00 30: 00 00 00 00 40 00 00 00 00 00 00 00 70 01 02 00 01:00.0 Memory controller: Xilinx Corporation Device 7014 00: ee 10 14 70 00 00 10 00 00 00 80 05 00 00 00 00 10: 00 00 f8 ff 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 ee 10 07 00 30: 00 00 00 00 80 00 00 00 00 00 00 00 00 01 00 00
lspci -x可以读到0xe的值,这里可以看到00:00.0 是type1 headers,而01:00.0则是type0 headers
通常的配置空间布局如下图所示
type0的配置空间布局如下图所示
type1的配置空间布局如下图所示

这里可以知道,无论哪种type设备,bar0的寄存器都在0x10处,这里聚焦0x10处的信息
对于0x10的值,默认是linux设置的映射地址,用作pcie域的读写操作。如下可以确定
root@kylin:~# lspci -x -s 01:00.0 01:00.0 Memory controller: Xilinx Corporation Device 7014 00: ee 10 14 70 00 00 10 00 00 00 80 05 00 00 00 00 10: 00 00 00 f0 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 ee 10 07 00 30: 00 00 00 00 80 00 00 00 00 00 00 00 ff 01 00 00
这里可以知道0x10的值为f0000000,这个0xf0000000是linux内通过设备树映射的可访问的地址。
对于bar0的大小lspci已经读取出来了是512k,如下
root@kylin:~# lspci -s 01:00.0 -v 01:00.0 Memory controller: Xilinx Corporation Device 7014 Subsystem: Xilinx Corporation Device 0007 Flags: fast devsel, IRQ 255 Memory at f0200000 (32-bit, non-prefetchable) [disabled] [size=512K] Capabilities: [80] Power Management version 3 Capabilities: [90] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [c0] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting lspci: Unable to load libkmod resources: error -2
setpci --dumpregs setpci -s 01:00.0 0x10.L=0xffffffff root@kylin:~# lspci -s 01:00.0 -x 01:00.0 Memory controller: Xilinx Corporation Device 7014 00: ee 10 14 70 00 00 10 00 00 00 80 05 00 00 00 00 10: 00 00 f8 ff 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 ee 10 07 00 30: 00 00 00 00 80 00 00 00 00 00 00 00 ff 01 00 00
这时候读出来是0xfff80000,取最低有效位0x80000,则正好是512k
io -4 0xf0000000 -r -l 64 io -4 -w 0xf0000010 0xffffffff io -4 0xf0000010 -r
这里读出来仍是0xfff80000,换算也是512k
内核主要代码实现在如下两个函数
主要函数如下:
__pci_read_base pci_read_config_dword(dev, pos, &l); pci_write_config_dword(dev, pos, l | mask); pci_read_config_dword(dev, pos, &sz); pci_size u64 size = mask & maxbase; size = size & ~(size-1);
在看jailhouse代码的时候,看到了hvc特权指令,这里详细研究一下
cstatic inline __jh_arg jailhouse_call_arg2(__jh_arg num, __jh_arg arg1,
__jh_arg arg2)
{
register __jh_arg num_result asm(JAILHOUSE_CALL_NUM_RESULT) = num;
register __jh_arg __arg1 asm(JAILHOUSE_CALL_ARG1) = arg1;
register __jh_arg __arg2 asm(JAILHOUSE_CALL_ARG2) = arg2;
asm volatile(
JAILHOUSE_CALL_INS
: "+r" (num_result), "+r" (__arg1), "+r" (__arg2)
: : "memory", JAILHOUSE_CALL_CLOBBERED);
return num_result;
}
#define JAILHOUSE_CALL_INS "hvc #0x4a48"
#define JAILHOUSE_CALL_NUM_RESULT "x0"
#define JAILHOUSE_CALL_ARG1 "x1"
#define JAILHOUSE_CALL_ARG2 "x2"
#define JAILHOUSE_CALL_CLOBBERED "x3"
asm asm-qualifiers ( AssemblerTemplate : OutputOperands : InputOperands : Clobbers : GotoLabels)
对于上文中的内联汇编,宏扩展后伪代码如下
asm volatile( hvc #0x4a48 : "+r" (x0), "+r" (x1), "+r" (x2) : : "memory", x3);
冒号作为操作数参数的分隔符
Extended asm syntax uses colons (‘:’) to delimit the operand parameters after the assembler template:
禁用gcc的优化(move code out of loops)
GCC’s optimizers sometimes discard asm statements if they determine there is no need for the output variables. Also, the optimizers may move code out of loops if they believe that the code will always return the same result (i.e. none of its input values change between calls). Using the volatile qualifier disables these optimizations. asm statements that have no output operands and asm goto statements, are implicitly volatile.
对于'+',代表操作数是可读可写的
Means that this operand is both read and written by the instruction.
对于'r',代表寄存器是一个通用寄存器
A register operand is allowed provided that it is in a general register.
对于'memory',代表告诉编译器这个内存可能被读写,不要被优化。等于内存屏障
The "memory" clobber tells the compiler that the assembly code performs memory reads or writes to items other than those listed in the input and output operands (for example, accessing the memory pointed to by one of the input parameters). To ensure memory contains correct values, GCC may need to flush specific register values to memory before executing the asm. Further, the compiler does not assume that any values read from memory before an asm remain unchanged after that asm; it reloads them as needed. Using the "memory" clobber effectively forms a read/write memory barrier for the compiler.
x1和x2是通用寄存器用于传递参数,x0作为函数的返回值。
hvc指令会进入hyp模式,cpsr的值会保存到hyp模式的spsr中并执行hvc向量(指向hypervisor call的异常处理程序的入口地址) 但是imm立即数会被处理器忽略,但是在入口函数可以检索到imm的值,从而确定是什么服务
HVC #imm imm is an expression evaluating to an integer in the range 0-65535. In a processor that implements the Virtualization Extensions, the HVC instruction causes a Hypervisor Call exception. This means that the processor enters Hyp mode, the CPSR value is saved to the Hyp mode SPSR, and execution branches to the HVC vector. imm is ignored by the processor. However, it can be retrieved by the exception handler to determine what service is being requested.
Instruction: hvc #0x4a48 Hypercall code: x0 1. argument: x1 2. argument: x2 Return code: x0
这里使用虚拟化指令hvc调用立即数#0x4a48,立即数0x4a48只是用作指明是什么服务(jailhouse),参数为x1,x2返回值为x0.如x1,x2不存在则缺省.这里x0先是hypervisor调用的code,然后作为返回值提供返回出去,x1,x2是传入参数,用作根据根据x0的code入口函数的传参,最后的'memory, JAILHOUSE_CALL_CLOBBERED'用作缺省,如入口函数不需要多个参数,这里声明x1,x2,x3带有memory作为暂存寄存器.
While the compiler is aware of changes to entries listed in the output operands, the inline asm code may modify more than just the outputs. For example, calculations may require additional registers, or the processor may overwrite a register as a side effect of a particular assembler instruction. In order to inform the compiler of these changes, list them in the clobber list. Clobber list items are either register names or the special clobbers (listed below). Each clobber list item is a string constant enclosed in double quotes and separated by commas
上面意思是计算可能需要额外寄存器,或者处理器对特殊汇编指令可能会覆写这些寄存器,为了让编译器知道这种情况,可以把这些寄存器放在clobber列表作为暂存寄存器。
#define JAILHOUSE_HC_DISABLE 0 #define JAILHOUSE_HC_CELL_CREATE 1 #define JAILHOUSE_HC_CELL_START 2 #define JAILHOUSE_HC_CELL_SET_LOADABLE 3 #define JAILHOUSE_HC_CELL_DESTROY 4 #define JAILHOUSE_HC_HYPERVISOR_GET_INFO 5 #define JAILHOUSE_HC_CELL_GET_STATE 6 #define JAILHOUSE_HC_CPU_GET_INFO 7 #define JAILHOUSE_HC_DEBUG_CONSOLE_PUTC 8 /* Hypervisor information type */ #define JAILHOUSE_INFO_MEM_POOL_SIZE 0 #define JAILHOUSE_INFO_MEM_POOL_USED 1 #define JAILHOUSE_INFO_REMAP_POOL_SIZE 2 #define JAILHOUSE_INFO_REMAP_POOL_USED 3 #define JAILHOUSE_INFO_NUM_CELLS 4 /* Hypervisor information type */ #define JAILHOUSE_CPU_INFO_STATE 0 #define JAILHOUSE_CPU_INFO_STAT_BASE 1000 /* CPU state */ #define JAILHOUSE_CPU_RUNNING 0 #define JAILHOUSE_CPU_FAILED 2 /* terminal state */ /* CPU statistics */ #define JAILHOUSE_CPU_STAT_VMEXITS_TOTAL 0 #define JAILHOUSE_CPU_STAT_VMEXITS_MMIO 1 #define JAILHOUSE_CPU_STAT_VMEXITS_MANAGEMENT 2 #define JAILHOUSE_CPU_STAT_VMEXITS_HYPERCALL 3 #define JAILHOUSE_GENERIC_CPU_STATS 4
上面调用作为x0传入hypervisor call,从而发送hyc #0x4a48来管理虚拟机。 举个例子如下
Cerr = jailhouse_call(JAILHOUSE_HC_DISABLE);
static inline __jh_arg jailhouse_call(__jh_arg num)
{
register __jh_arg num_result asm(JAILHOUSE_CALL_NUM_RESULT) = num;
asm volatile(
JAILHOUSE_CALL_INS
: "+r" (num_result)
: : "memory", JAILHOUSE_CALL_ARG1, JAILHOUSE_CALL_ARG2,
JAILHOUSE_CALL_CLOBBERED);
return num_result;
}
至此,关于hyc的汇编理解清楚了,接下来继续跟踪jailhouse驱动源码