Address translation and sharing using page tables
Reading: 80386 chapters 5 and 6
Handout: x86 address translation diagram -
PDF -
SVG,
Why do we care about x86 address translation?
- It can simplify s/w structure: addresses in one process not constrained
by what other processes might be running.
- It can implement tricks like demand paging and copy-on-write.
- It can isolate programs to contain bugs or increase security.
- It can provide efficient sharing between processes.
Why aren't protected-mode segments enough?
- Why did the 386 add translation using page tables as well?
- Isn't it enough to give each process its own segments?
- Programming model, fragmentation
- In practice, segments are little-used
Translation using page tables (on x86):
- segmentation hardware first computes the linear address
- in practice, most segments (e.g. in pintos, Linux) have
base 0 and max limit,
making the segmentation step a no-op.
- paging hardware then maps linear address (la) to physical address (pa)
- (we will often interchange "linear" and "virtual")
- when paging is enabled, every instruction that accesses memory is subject
to translation by paging
- paging idea: break up memory into 4096-byte chunks called pages
- independently control mapping for each page of linear address space
- compare with segmentation (single base + limit): many more degrees of freedom
- 4096-byte pages means there are 2^20 = 1,048,576 pages in 2^32 bytes
- conceptual model: array of 2^20 entries, called a page table,
specifying the mapping for each linear page number
- table[20-bit linear page #] => 20-bit phys page #
- PTE entries: bottom of handout
- 20-bit phys page number, present, read/write, user/supervisor, etc
- puzzle: can supervisor read/write user pages?
- can use paging hardware for many purposes
- (seen some of this two lectures ago)
- flat memory
- segment-like protection: contiguous mappings
- solve fragmentation problems when allocating more memory (xv6-like process memory layout)
- demand-paging (%cr2 stores faulting address)
- copy-on-write
- sharing, direct access to devices (e.g. /dev/fb on linux)
- switching between processes
- where is this table stored? back in memory.
- in our conceptual model, CPU holds the physical address of the
base of this table.
- %cr3 serves this purpose on the x86 (with one more detail below)
- for each memory access, access memory again to look up in table
- why not just have a big array with each page #'s translation?
- same problems that we were trying to solve with paging!
(demand-paging, fragmentation)
- so, apply the same trick
- we broke up our 2^32-byte memory into 4096-byte chunks and
represented them in a 2^22-byte (2^20-entry) table
- now break up the 2^22-byte table into 4096-byte chunks too,
and represent them in another 2^12-byte (2^10-entry) table
- just another level of indirection
- now all data structures are page-sized
- 386 uses 2-level mapping structure
- one page directory page, with 1024 page directory entries (PDEs)
- up to 1024 page table pages, each with 1024 page table entries (PTEs)
- so la has 10 bits of directory index, 10 bits table index, 12 bits offset
- %cr3 register holds physical address of current page directory
- puzzle: what do PDE read/write and user/supervisor flags mean?
- now, access memory twice more for every memory access: really expensive!
- optimization: CPU's TLB caches vpn => ppn mappings
- if you change any part of the page table, you must flush the TLB!
- by re-loading %cr3 (flushes everything)
- by executing
invlpg va
Is TLB write through? Is it write back? If not, what is it?
- turn on paging by setting CR0_PG bit of %cr0
- Here's how the MMU translates an la to a pa:
uint
translate (uint la, bool user, bool write)
{
uint pde;
pde = read_mem (%CR3 + 4*(la >> 22));
access (pde, user, write);
pte = read_mem ( (pde & 0xfffff000) + 4*((la >> 12) & 0x3ff));
access (pte, user, write);
return (pte & 0xfffff000) + (la & 0xfff);
}
// check protection. pxe is a pte or pde.
// user is true if CPL==3
void
access (uint pxe, bool user, bool write)
{
if (!(pxe & PG_P)
=> page fault -- page not present
if (!(pxe & PG_U) && user)
=> page fault -- not access for user
if (write && !(pxe & PG_W)) {
if (user)
=> page fault -- not writable
if (%CR0 & CR0_WP)
=> page fault -- not writable
}
}
Can we use paging to limit what memory an app can read/write?
- user can't modify cr3 (requires privilege)
- is that enough?
- could user modify page tables? after all, they are in memory.
Who stores what?
- cr3: physical address
- GDT descriptor: linear address
- IDT descriptor: virtual address
Pintos:
- Kernel mapped in top 1GB of every process address space
- Kernel pages have PG_U bit 0
- Dealing with user pointers -- three options:
- Convert user pointer to kernel pointer (in kernel's address space) by walking the page table in software. Then dereference kernel pointer
- Check user pointer by walking the page table in software. Then dereference user pointer
- Dereference user pointer at select code locations only (e.g.,
copy_from_user
, copy_to_user
). If there is a bad pointer, it will cause a page fault. The page fault handler can check eip
to see if it is one of the copy_xx_user
functions. If so, the handler can kill the process and cleanup it's state.
- When will two virtual addresses in the same page table point to a common physical address?
- When will two virtual addresses in two different page tables point to a common physical address?
Page tables vs. Segmentation
- Good: Pages are easy to allocate (keep a list of available pages
and just allocate the first available).
- Good: Pages are easy to swap as everything
is same size and pages are usually same size as disk blocks.
- Bad: Page tables can become very large (need one entry for each
page-sized unit of virtual memory).