深度学习arm MMU一篇就够了

快速连接
👉👉👉【精选】ARMv8/ARMv9架构入门到精通-目录 👈👈👈

title=

思考
为什么要用虚拟地址？为什么要用MMU？
MMU硬件完成了地址翻译，我们软件还需要做什么？
MMU在哪里？MMU和SMMU是什么关系？

1、MMU概念介绍

MMU分为两个部分: TLB maintenance 和 address translation
在这里插入图片描述
MMU的作用，主要是完成地址的翻译，即虚拟地址到物理地址的转换，无论是main-memory地址(DDR地址)，还是IO地址(设备device地址)，在开启了MMU的系统中，CPU发起的指令读取、数据读写都是虚拟地址，在ARM Core内部，会先经过MMU将该虚拟地址自动转换成物理地址，然后在将物理地址发送到AXI总线上，完成真正的物理内存、物理设备的读写访问.

那么为什么要用MMU？为什么要用虚拟地址？以下总结了三点：

多个程序独立执行 --- 不需要知道具体物理地址
虚拟地址是连续的 --- 程序可以在多个分段的物理内存运行
允许操作系统管理内存 --- 哪些是可见的，哪些是允许读写的，哪些是cacheable的……

既然MMU开启后，硬件会自动的将虚拟地址转换成物理地址，那么还需要我们软件做什么事情呢？即创建一个页表翻译都需要做哪些事情呢？或者说启用一个MMU需要软件做什么事情呢？

设置页表基地址VBAR_EL3 (Specify the location of the translation table)
初始化MAIR_EL3 (Memory Attribute Indirection Register)
配置TCR_EL3 (Configure the translation regime)
创建页表 (Generate the translation tables)
Enable the MMU

2、虚拟地址空间和物理地址空间

2.1、(虚拟/物理)地址空间的范围

内核虚拟地址空间的范围是什么？应用程序的虚拟地址空间的范围是什么？
以前我们在学习操作系统时，最常看到的一句话是：内核的虚拟地址空间范围是3G-4G地址空间，应用程序的虚拟地址空间的范围是0-3G地址空间；到了aarch64上，则为：内核的虚拟地址空间是0xffff_0000_0000_0000 - 0xffff_ffff_ffff_ffff , 应用程序的虚拟地址空间是: 0x0000_0000_0000_0000 - 0x0000_ffff_ffff_ffff.
做为一名杠精，必需告诉你这句话是错误的。错误主要有两点：
(1) arm处理器，并没有规定你的内核必需要使用哪套地址空间，以上这是Linux Kernel自己的设计，它设计了让Linux Kernel使用0xffff_0000_0000_0000 - 0xffff_ffff_ffff_ffff地址区间，这里正好可以举一个反例，比如optee os，它的kernel mode和user mode使用的都是高位的虚拟地址空间。
(2) 高位是有几个F（几个1）是根据你操作系统使用的有效虚拟地址位来决定的，也并非固定的。比如optee中的mode和user mode的虚拟地址空间范围都是： 0x0000_0000_0000_0000 - 0x0000_0000_ffff_ffff

其实arm文档中有一句标准的描述 :
高位是1的虚拟地址空间，使用TTBR1_ELx基地址寄存器进行页表翻译；高位是0的虚拟地址空间，使用TTBR0_ELx基地址寄存器页表翻译。所以不应该说，因为你使用了哪个寄存器(TTBR0/TTBR1)，然后决定了你使用的哪套虚拟地址空间；应该说，你操作系统(或软件)使用了哪套虚拟地址空间，决定了使用哪个哪个基地址寄存器(TTBR0/TTBR1)进行翻译。

在这里插入图片描述

As Figure shows, for 48-bit VAs:
• The address range translated using TTBR0_ELx is 0x0000000000000000 to 0x0000FFFFFFFFFFFF.
• The address range translated using TTBR1_ELx is 0xFFFF000000000000 to 0xFFFFFFFFFFFFFFFF.
In an implementation that includes ARMv8.2-LVA and is using <font color=blue size=3>Secure EL3 the 64KB translation granule, for 52-bit VAs</font>:
• The address range translated using TTBR0_ELx is 0x0000000000000000 to 0x000FFFFFFFFFFFFF.
• The address range translated using TTBR1_ELx is 0xFFF0000000000000 to 0xFFFFFFFFFFFFFFFF.
Which TTBR_ELx is used depends only on the VA presented for translation. The most significant bits of the VA must all be the same value and:
• If the most significant bits of the VA are zero, then TTBR0_ELx is used.
• If the most significant bits of the VA are one, then TTBR1_ELx is used.

2.2、物理地址空间有效位(范围)

具体每一个core的物理地址是多少位，其实都是定死的，虚拟地址是多少位，是编译或开发的时候根据自己的需要自己配置的。如下表格摘出了部分arm core的物理地址有效位，所以你具体使用多少有效位的物理地址，可以查询core TRM手册。
在这里插入图片描述

2.2.1、页表翻译相关寄存器的配置(TCR)

ID_AA64MMFR0_EL1.PARange : Physical address size : 读取arm寄存器，得到当前系统支持的有效物理地址是多少位
TCR_EL1.IPS : Output address size : 告诉mmu，你需要给我输出多少位的物理地址
TCR_EL1.T0SZ和TCR_EL1.T1SZ : Input address size : 告诉mmu，我输入的是多数为的虚拟地址
3、Translation regimes
内存管理单元 (MMU) 执行地址翻译。MMU 包含以下内容：
The table walk unit : 它从内存中读取页表，并完成地址转换
Translation Lookaside Buffers (TLBs) ：缓存，相当于cache

软件看到的所有内存地址都是虚拟的。这些内存地址被传递到 MMU，它检查最近使用的缓存转换的 TLB。如果 MMU 没有找到最近缓存的翻译，表遍历单元从内存中读取适当的一个或多个表条目，如下所示：
在这里插入图片描述
Translation tables 的工作原理是将虚拟地址空间划分为大小相等的块，并在表中为每个块提供一个entry。
Translation tables 中的entry 0 提供block 0 的映射，entry 1 提供block 1 的映射，依此类推。每个条目都包含相应物理内存块的地址以及访问物理地址时要使用的属性。
在这里插入图片描述

Secure EL1&0 translation regime, when EL2 is disabled
Non-secure EL1&0 translation regime, when EL2 is disabled
Secure EL1&0 translation regime, when EL2 is enabled
Non-secure EL1&0 translation regime, when EL2 is enabled
Secure EL2&0 translation regime
Non-secure EL2&0 translation regime
Secure EL2 translation regime
Non-secure EL2 translation regime
Secure EL3 translation regime

在这里插入图片描述
Secure and Non-secure addresses
在REE(linux)和TEE(optee)双系统的环境下，可同时开启两个系统的MMU.
在secure和non-secure中使用不同的页表.secure的页表可以映射non-secure的内存，而non-secure的页表不能去映射secure的内存，否则在转换时会发生错误
在这里插入图片描述
Two Stage Translations

4、地址翻译/几级页表？

4.1、思考：页表到底有几级？

在这里插入图片描述

4.2、以4KB granule为例，页表的组成方式

在这里插入图片描述

除了第一级index(这里是leve 0 table中的index)，每一个查找table/page的index都是9个bit，也就是说除了第一级页表，后面的每一级table都是有512个offset
如果VA_BIT = 39，那么leve 0 table用BIT[38:39]表示，只有1个offset
如果VA_BIT = 48，那么leve 0 table用BIT[47:39]表示，有512个offset
如果VA_BIT > 48，那是不存在的，因为arm规定，大于48的，只有一个，那就是VA_BIT=52，并且规定该情况下的最小granue size=64KB，而我们这里讲述的是granue size=4KB的情况
如果VA_BIT = 32，那么leve 0 table就不用了，TTBR_ELx指向Level 1 table
另外我们还需注意一点，在Level 0 table中，他只能指向D_Table，不能指向D_Block

在这里插入图片描述

4.3、optee实际使用的示例

32位有效虚拟地址、32位有效物理地址，3级页表查询(L1、L2、L3)，颗粒的位4KB
在这里插入图片描述

在这里插入图片描述

5、页表格式（Descriptor format）

5.1、ARMV8支持的3种页表格式

AArch64 Long Descriptor : 我们只学习这个
Armv7-A Long Descriptor ： for Large Physical Address Extension (LPAE)
Armv7-A Short Descriptor

Armv8-A supports three different sets of translation table format:
• The Armv8-A AArch64 Long Descriptor format.
• The Armv7-A Long Descriptor format such as the Large Physical Address Extension (LPAE) to the Armv7-A architecture, for example, the Arm Cortex-A15 processor.
• The Armv7-A Short Descriptor format.

5.2、AArch64 Long Descriptor支持的四种entry

对于AArch64 Long Descriptor，又分为下面四种entry：

An invalid or fault entry.
A table entry, that points to the next-level translation table.
A block entry, that defines the memory properties for the access.
A reserved format

注意：entry[1:0] 表示该entry属于哪类entry， Block Descriptor和Page Descriptor是一个意思。在当前架构中，reserved也是invalid。
在这里插入图片描述

5.3、页表的属性位介绍（ Block Descriptor/Page Descriptor ）

在这里插入图片描述
内存属性相关比特位的解释如下：

Indx = b01, take Type information from entry [1] in the MAIR
NS = b0, output physical addresses are Secure
AP = b00, address is readable and writeable
SH = b00, Non-shareable
AF = b1, Access Flag is pre-set. No Access Flag Fault is generated on access
nG = Not used at EL3
Contig = b0, the entry is not part of a contiguous block
PXN = b0, block is executable. This attribute is called XN at EL3.
UXN = Not used at EL3

5.4、mair寄存器的定义

在这里插入图片描述

6、TCR寄存器介绍

在ARM Core中(aarch64)，有三个Translation Control Register 寄存器:

TCR_EL1
TCR_EL2
TCR_EL3

比特位	功能	说明
ORGN1、IRGN1、ORGN0、IRGN0	cacheable属性	outer/inner cableability的属性(如直写模式、回写模式)
SH1、SH0	shareable属性	cache的共享属性配置(如non-shareable, outer/inner shareable)
TG0/TG1	Granule size	Granule size(其实就是页面的大小,4k/16k/64k)
IPS	物理地址size	物理地址size,如32bit/36bit/40bit
EPD1、EPD0	-	TTBR_EL1/TTBR_EL0的enable和disable
TBI1、TBI0	-	top addr是ignore，还是用于MTE的计算
A1	-	ASID的选择，是使用TTBR_EL1中的，还是使用TTBR_EL0中的
AS	-	ASID是使用8bit，还是使用16bit

7、地址翻译指令介绍

address translation的指令大约14个：
在这里插入图片描述
总结一下：

8、地址翻译相关的系统寄存器总结

地址转换由系统寄存器的组合控制：
(1)、SCTLR_ELx

M - Enable Memory Management Unit (MMU).
C - Enable for data and unified caches.
EE - Endianness of translation table walks.

(2)、TTBR0_ELx and TTBR1_ELx

BADDR - Physical address (PA) (or intermediate physical address, IPA, for EL0/EL1) of start of translation table.
ASID - The Address Space Identifier for Non-Global translations.

(3)、TCR_ELx

PS/IPS - Size of PA or IPA space, the maximum output addresssize.
TnSZ - Size of address space covered by table.
TGn - Granule size.
SH/IRGN/ORGN - Cacheability and shareability to be used by MMU table walks.
TBIn - Disabling of table walks to a specific table.

(4)、MAIR_ELx

Attr - Controls the Type and cacheability in Stage 1 tables.
关注"Arm精选"公众号，备注进ARM交流讨论区。

1、MMU概念介绍

2、虚拟地址空间和物理地址空间

2.1、(虚拟/物理)地址空间的范围

2.2、物理地址空间有效位(范围)

2.2.1、页表翻译相关寄存器的配置(TCR)

3、Translation regimes

4、地址翻译/几级页表？

4.1、思考：页表到底有几级？

4.2、以4KB granule为例，页表的组成方式

4.3、optee实际使用的示例

5、页表格式（Descriptor format）

5.1、ARMV8支持的3种页表格式

5.2、AArch64 Long Descriptor支持的四种entry

5.3、页表的属性位介绍（ Block Descriptor/Page Descriptor ）

5.4、mair寄存器的定义

6、TCR寄存器介绍

7、地址翻译指令介绍

8、地址翻译相关的系统寄存器总结

Attr - Controls the Type and cacheability in Stage 1 tables.

推荐阅读

目录