L12

Process Address Spaces Using Paging

Paging allows non-contiguous regions to be mapped in Virtual Address (VA) space and Physical Address (PA) space.
To map a region a VA space, the page directory and page table entries at the corresponding offsets should be mapped.
To map a region of PA space, the values in the page table entries should point to the corresponding physical pages.
When paging is enabled, all addresses (including EIP, ESP, direct-addressing, etc.) go through the paging hardware.
Each entry in the Page Directory (PDE - page directory entry) and in the Page Table (PTE - page table entry) is 32-bit wide. The top 20 bits store a pointer (using physical address) to the page table or the page itself. The last 12 bits contains flags: present, writable, and user-accessible.
The final access permissions are determined by looking at the stricter of the permissions specified by the flags in the corresponding PDE and PTE.
CR3, PDE, and PTE contain physical addresses
Each process has a separate page table, so a different address is loaded into the CR3 register on every context switch.
Also, each page table also maps the kernel into its address space above a certain address. This address is 0x80000000 (2GB) on xv6.
Thus the kernel is mapped at the same addresses in every page table. So, copies of the same PDEs/PTEs are present in the page directory and page tables of all process to map the kernel at these addresses. Let's call this address space region, the kernel's address space. Another way to put this is that each process has two halves: kernel half and user half. The kernel half is shared among all processes. The user half is separate. Thus every user process (separate address spaces) also acts as a kernel thread (shared address space).
On xv6, the kernel maps the entire physical memory into its address space starting at 0x80000000. Thus, virtual address (0x80000000+x) always translates to physical address x.
The kernel's address space holds the kernel's code and the kernel's data. All the other space is managed as a heap (using malloc and free for example).
The kernel's heap is used to allocate memory for its own data structures (e.g., PCBs), processes' kernel stacks, processes' address spaces, among other things.
For example, if a kernel wants to allocate an address space for a process, it simply mallocs some space from its heap, converts the returned pointer to its physical address, and then creates a mapping into the process's user-side address space (by creating entries in its page table) so the process can access it in future.
Thus every page that is accessible by the user also has a mapping in the kernel's address space. Thus two entries in the page table point to the same physical page (one in the kernel-space, and another in the user-space).
The Interrupt Descriptor Table (IDT) is setup to hold kernel pointers, i.e., the CS:EIP entries are setup such that the handler runs in priviliged mode (last two bits of CS are zero), and EIP points to a kernel address (above 0x80000000).
On a trap (due to an interrupt, exception, or system call), execution control transfers to the kernel's handler running in privileged mode. Because the kernel is mapped in every process's address space, the pointers to the handler (and the kernel stack of the process) in the kernel address space are always valid. Thus, the handler can start executing immediately on a trap, without requiring any address-space switch.
The kernel's handler executes the necessary logic. This sharing of address space between the process user-space and the kernel also allows very fast communication between the kernel and the user spaces. For example, if the user wants to provide a string argument to a system call, it can simply pass a pointer to the string stored in its own address space. The kernel can de-reference that pointer, and because the user's address space is still mapped, the de-referencing of the pointer will result in the desired data (as supplied by the user). Thus communication from user-space to kernel-space can be done only by sending a pointer - this is very fast. Recall that kernel can always access user pages (the user bit in PDE/PTE only prevents the user from accessing a kernel page).
This fast communication between the user and the kernel is the primary reason, why the kernel maps itself entirely in the process page table. Linux maps itself starting at 0xc0000000 (3GB) and Windows maps itself starting at 0x80000000 (2GB), but can be configured to map itself starting at 0xc0000000. As already discussed, xv6 maps itself starting at 0x80000000.
Because xv6 maps the entire available physical memory in the kernel space, there is a limit to the size of physical memory that it can support (less than 2GB). Full operating systems (like Linux/Windows) handle this by mapping a part of the physical memory at all times (esp. the parts which contain kernel's code and data). For the other parts of the physical memory, the kernel maps and unmaps those regions into a VA space, to access them, depending on what needs to get accessed.
The page directory itself is allocated from the kernel address space (per process), and its corresponding physical address is stored in the cr3 register when that process is running. Security is ensured by disallowing a process from accessing its own page table (by mapping the pages containing the page tables and page directory only in the kernel address space and not in the user address space).
Compare this organization to a kernel which only uses segmentation. In that case, a trap would switch the CS register (and the associated base and limit values of the descriptor) and thus the kernel would be executing in a different address space. All other segment registers will also be loaded with the kernel's base, so they can access the kernel's data. However, if the kernel wants to read an argument from the user-space (passed as a pointer), it needs to switch one of its segment registers to the user's descriptor before de-referencing that pointer through that segment register.
Notice that the kernel cannot simply de-reference a pointer supplied by the user in this case, as the address space is now different - the kernel needs to switch to user address space to de-reference that pointer.
Compare this organization to another paging-based organization of the kernel, where the entire kernel is not mapped into the process address space, but only a small slice of the kernel address space (which stores the trap handler and the kernel stack) is mapped in the process address space. In this case, the trap handler will switch to the kernel's page table and then execute the kernel's logic in a different address space (by switching to a different page table). However, if the kernel needs to de-reference a user pointer, it cannot do so. In this case, the trap handler should also de-reference all user pointers and copy them into its space before switching to the kernel's address space.
Because transitions between user and kernel require page table switches, and because the kernel cannot access user memory directly in this organization ( the trap handler needs to copy portions of user memory for the kernel to read), this organization is relatively more complex and less performant.

Bootup : Executing the first instruction after power-on

Switching on your computer, makes it start from a clean state, where memory contents are completely uninitialized. Only the disk contains state that persists across power cycles. The x86 architecture specifies that when a computer is powered-on, the first block (or sector) on disk will be read and its contents pasted at address 0x7c00, and control will be transferred to the instruction at its first byte (address 0x7c00). Recall that the x86 architecture boots in 16-bit mode, with no paging. Also, the segmentation hardware in 16-bit mode simply multiplies the segment register's value by 16 and adds it to the virtual address to obtain a physical address.

A disk block (or sector) is sized at 512 bytes. Thus the machine reads the 512 bytes in the first block and pastes them at addresses 0x7c00-0x7e00 and transfers control to 0x7c00. This code, which must fit in 512 bytes then loads the kernel from the disk (it must know the location of the kernel on disk) and pastes it into memory (it must know the location in memory where it needs to be pasted), before transferring control to it.