Segmentation 分段

以下内容是对mit6.828 xv6book、pcasm-book、cmu 15-410关于segment的doc的整理

xv6book, PC Assembly Language, CMU-15-410 segment

在分段中,寻址使用的是一个 <selector, offset> 的pair

real mode

  • selector 保存在segment register,这是一个paragraph number。

  • 内存单元常常以多个byte为单元一起使用,比如,两个byte是word,4个是double word,8个是quad word,16个则是paragraph

  • segmentation hardware(也就是那个进行段转换的机构)直接把selector的值乘以16然后加上offset的值得到physical address

16-bit protected mode

  • selector 同样保存在segment register中,不过现在不是单纯的paragraph number,而是包含一组信息:a segment number, a table selector flag, a request privilege level。

    segment number是到GDT(global decriptor table)或者LDT(local decriptor table)的index(数组下标称为array index,所以这个index就是类似数组下标的东西)。

    table selector flag指示的是segment number使用的是GDT还是LDT的index

    RPL(request privilege level)对于不同的段寄存器有不同的含义。对于cs寄存器,RPL设置处理器的特权级

    In this case (the %CS register), the RPL sets the privilege level of the processor

    不过,在mit 6.828的那本xv6book的Appendix B中有一幅图

    其中使用16bit的selector直接作为GDT/LDT的index,所以,这地方有一些疑点

  • GDT/LDT的descriptor包含base address、size和一些flag bits、privilege level,当访问内存时,处理器将offset与size相比较,如果offset>= size ,则是越界访问。如果是合法的访问,则base address + offset得到 linear address

32-bit protected mode

  • 80386 引入了32-bit protected mode,这个模式相对于16-bit的保护模式有两个区别——offset现在是32bit,segment 现在被切分成4K-sized 的unit,称为page。

  • 解释一下整个地址转换流程。logical address(或者叫 virtual address) 就是<selector, offset> ,linear address就是selector 和 offset 经过segment translation 转换后得到的地址。如果没有开paging hardware,那么,linear address就直接作为physical address使用。如果有,则要查页表

题外话

  • 可以使用一些方法使得分段实际上跟没有起作用一样。

  • 比如,所有的descriptor中base address都是0x00000000,size都是0xFFFFFFFF(假设是在32位的机器上),那么,segment translation的越界检查、$offset+base size * 16$ 实际上等价于无用功。所以看起来跟flat address space一样的

  • 在xv6的boot过程中,实际上paging hardware、segmentation hardware并没有起作用

    The boot loader does not enable the paging hardware; the logical addresses that it uses are translated to linear addresses by the segmentation harware, and then used directly as physical addresses. Xv6 configures the segmentation hardware to translate logical to linear addresses without change, so that they are always equal.

  • 逻辑地址segmen:offset 可以得出21bit的physical address(0xffff0+0xffff=0x10ffef),但是intel 8088把第21bit直接丢掉。所以为了兼容性,虽然后面的intel cpu可以支持21bit的地址,IBM还是提供了一个向后兼容。A20 Line wiki

    If the second bit of the keyboard controller’s output port is low, the 21st physical address bit is always cleared; if high, the 21st bit acts normally. The boot loader must enable the 21st address bit using I/O to the keyboard controller on ports 0x64 and 0x60
    The traditional method for A20 line enabling is to directly probe the keyboard controller. The reason for this is that Intel’s 8042 keyboard controller had a spare pin which they decided to route the A20 line through. This seems foolish now given their unrelated nature, but at the time computers weren’t quite so standardized. Keyboard controllers are usually derivatives of the 8042 chip. By programming that chip accurately, you can either enable or disable bit #20 on the address bus.
    When your PC boots, the A20 gate is always disabled, but some BIOSes do enable it for you, as do some high-memory managers (HIMEM.SYS) or bootloaders (GRUB).