PC Architecture and Processor Setup

Outline

PC architecture
x86 instruction set
GCC calling conventions

PC architecture

A full PC has:
- one or more x86 CPUs, each containing:
  - integer registers (can you name them?) and execution unit
  - floating-point/vector registers and execution unit(s)
  - memory management unit (MMU)
  - multiprocessor/multicore: local interrupt controller (APIC)
- memory
- disk (IDE, SCSI, USB)
- keyboard
- display
- other resources: BIOS ROM, clock, ...
We will start with the original 16-bit 8086 CPU (1978)
CPU runs instructions:
```
for(;;){
	run next instruction
}
```
Draw figure with common bus, I/O, and CPU. The CPU has registers, cache, etc.
Draw figure showing EIP and how it gets incremented automatically after executing each instruction.
Needs work space: registers
- four 16-bit data registers: AX, BX, CX, DX
- each in two 8-bit halves, e.g. AH and AL
- very fast, very few
More work space: memory
- CPU sends out address on address lines (wires, one bit per wire)
- Data comes back on data lines
- or data is written to data lines
Add address registers: pointers into memory
- SP - stack pointer
- BP - frame base pointer
- SI - source index
- DI - destination index
Instructions are in memory too!
- IP - instruction pointer (PC on PDP-11, everything else)
- increment after running each instruction
- can be modified by CALL, RET, JMP, conditional jumps
Want conditional jumps
- FLAGS - various condition codes
  - whether last arithmetic operation overflowed
  - ... was positive/negative
  - ... was [not] zero
  - ... carry/borrow on add/subtract
  - ... etc.
  - whether interrupts are enabled
  - direction of data copy instructions
- JP, JN, J[N]Z, J[N]C, J[N]O ...
What if we want to use more than 2^16 bytes of memory?
- 8086 has 20-bit physical addresses, can have 1 Meg RAM
- the extra four bits usually come from a 16-bit "segment register":
- CS - code segment, for fetches via IP
- SS - stack segment, for load/store via SP and BP
- DS - data segment, for load/store via other registers
- ES - another data segment, destination for string operations
- virtual to physical translation: pa = va + seg*16
- e.g. set CS = 4096 to execute starting at 65536
- tricky: can't use the 16-bit address of a stack variable as a pointer
- a far pointer includes full segment:offset (16 + 16 bits)
- tricky: pointer arithmetic and array indexing across segment boundaries
But 8086's 16-bit addresses and data were still painfully small, so 80386 added support for 32-bit data and addresses (1985)
- boots in 16-bit mode, bootasm.S switches to 32-bit mode
- registers are 32 bits wide, called EAX rather than AX
- operands and addresses that were 16-bit became 32-bit in 32-bit mode, e.g. ADD does 32-bit arithmetic
- prefixes 0x66/0x67 toggle between 16-bit and 32-bit operands and addresses: in 32-bit mode, MOVW is expressed as 0x66 MOVW
- the .code32 in bootasm.S tells assembler to generate 0x66 for e.g. MOVW
- 80386 also changed segments and added paged memory...

Example instruction encoding

	b8 cd ab		16-bit CPU,  AX <- 0xabcd
	b8 34 12 cd ab		32-bit CPU, EAX <- 0xabcd1234
	66 b8 cd ab		32-bit CPU,  AX <- 0xabcd

...and even 32 bits eventually wasn't enough, so AMD added support for 64-bit data addresses (1999)
- registers are 64 bits wide, called RAX, RBX, etc.
- 8 more general-purpose registers: R8 thru R15
- boot: still go thru 16-bit and 32-bit modes on the way!

x86 Instruction Set

Intel syntax: op dst, src (Intel manuals!)
AT&T (gcc/gas) syntax: op src, dst (labs, xv6)
- uses b, w, l suffix on instructions to specify size of operands
Operands are registers, constant, memory via register, memory via constant

Examples:

AT&T syntax	"C"-ish equivalent
movl %eax, %edx	edx = eax;	register mode
movl $0x123, %edx	edx = 0x123;	immediate
movl 0x123, %edx	edx = (int32_t)0x123;	direct
movl (%ebx), %edx	edx = (int32_t)ebx;	indirect
movl 4(%ebx), %edx	edx = (int32_t)(ebx+4);	displaced

Instruction classes
- data movement: MOV, PUSH, POP, ...
- arithmetic: TEST, SHL, ADD, AND, ...
- i/o: IN, OUT, ...
- control: JMP, JZ, JNZ, CALL, RET
- string: REP MOVSB, ...
- system: IRET, INT
Intel architecture manual Volume 2 is the reference

gcc x86 calling conventions

x86 dictates that stack grows down:

Example instruction	What it does
pushl %eax	subl $4, %esp movl %eax, (%esp)
popl %eax	movl (%esp), %eax addl $4, %esp
call 0x12345	pushl %eip ^() movl $0x12345, %eip ^()
ret	popl %eip ^(*)

(*) Not real instructions

GCC dictates how the stack is used. Contract between caller and callee on x86:
- at entry to a function (i.e. just after call):
  - %eip points at first instruction of function
  - %esp+4 points at first argument
  - %esp points at return address
- after ret instruction:
  - %eip contains return address
  - %esp points at arguments pushed by caller
  - called function may have trashed arguments
  - %eax (and %edx, if return type is 64-bit) contains return value (or trash if function is void)
  - %eax, %edx (above), and %ecx may be trashed
  - %ebp, %ebx, %esi, %edi must contain contents from time of call
- Terminology:
  - %eax, %ecx, %edx are "caller save" registers
  - %ebp, %ebx, %esi, %edi are "callee save" registers

Functions can do anything that doesn't violate contract. By convention, GCC does more:

each function has a stack frame marked by %ebp, %esp

		       +------------+   |
		       | arg 2      |   \
		       +------------+    >- previous function's stack frame
		       | arg 1      |   /
		       +------------+   |
		       | ret %eip   |   /
		       +============+   
		       | saved %ebp |   \
		%ebp-> +------------+   |
		       |            |   |
		       |   local    |   \
		       | variables, |    >- current function's stack frame
		       |    etc.    |   /
		       |            |   |
		       |            |   |
		%esp-> +------------+   /

%esp can move to make stack frame bigger, smaller
%ebp points at saved %ebp from previous function, chain to walk stack
function prologue:
```
			pushl %ebp
			movl %esp, %ebp
		
```
or
```
			enter $0, $0
		
```
enter usually not used: 4 bytes vs 3 for pushl+movl, not on hardware fast-path anymore
function epilogue can easily find return EIP on stack:
```
			movl %ebp, %esp
			popl %ebp
		
```
or
```
			leave
		
```
leave used often because it's 1 byte, vs 3 for movl+popl

Compiling, linking, loading:
- Preprocessor takes C source code (ASCII text), expands #include etc, produces C source code
- Compiler takes C source code (ASCII text), produces assembly language (also ASCII text)
- Assembler takes assembly language (ASCII text), produces .o file (binary, machine-readable!)
- Linker takes multiple '.o's, produces a single program image (binary)
- Loader loads the program image into memory at run-time and starts it executing

x86 Physical Memory Map

The physical address space mostly looks like ordinary RAM
Except some low-memory addresses actually refer to other things
Writes to VGA memory appear on the screen
Reset or power-on jumps to ROM at 0x000ffff0 (so must be ROM at top of BIOS)

+------------------+  <- 0xFFFFFFFF (4GB)
|      32-bit      |
|  memory mapped   |
|     devices      |
|                  |
/\/\/\/\/\/\/\/\/\/\

/\/\/\/\/\/\/\/\/\/\
|                  |
|      Unused      |
|                  |
+------------------+  <- depends on amount of RAM
|                  |
|                  |
| Extended Memory  |
|                  |
|                  |
+------------------+  <- 0x00100000 (1MB)
|     BIOS ROM     |
+------------------+  <- 0x000F0000 (960KB)
|  16-bit devices, |
|  expansion ROMs  |
+------------------+  <- 0x000C0000 (768KB)
|   VGA Display    |
+------------------+  <- 0x000A0000 (640KB)
|                  |
|    Low Memory    |
|                  |
+------------------+  <- 0x00000000

I/O

Original PC architecture: use dedicated I/O space

Works same as memory accesses but set I/O signal
Only 1024 I/O addresses
Accessed with special instructions (IN, OUT)

Example: write a byte to line printer:

#define DATA_PORT    0x378
#define STATUS_PORT  0x379
#define   BUSY 0x80
#define CONTROL_PORT 0x37A
#define   STROBE 0x01
void
lpt_putc(int c)
{
  /* wait for printer to consume previous byte */
  while((inb(STATUS_PORT) & BUSY) == 0)
    ;

  /* put the byte on the parallel lines */
  outb(DATA_PORT, c);

  /* tell the printer to look at the data */
  outb(CONTROL_PORT, STROBE);
  outb(CONTROL_PORT, 0);
}

Memory-Mapped I/O
- Use normal physical memory addresses
  - Gets around limited size of I/O address space
  - No need for special instructions
  - System controller routes to appropriate device
- Works like ``magic'' memory:
  - Addressed and accessed like memory, but ...
  - ... does not behave like memory!
  - Reads and writes can have ``side effects''
  - Read results can change due to external events

Example hardware for address spaces: x86 segments

The operating system can switch the x86 to protected mode, which supports virtual and physical addresses, and allows the O/S to set up address spaces so that user processes can't change them. Translation in protected mode is as follows:

selector:offset (virtual / logical addr)
==SEGMENTATION==>
linear address
==PAGING ==>
physical address

Next lecture covers paging; now we focus on segmentation.

Protected-mode segmentation works as follows (see handout):

segment register holds segment selector
selector: 13 bits of index, local vs global flag, 2-bit RPL
selector indexes into global descriptor table (GDT)
segment descriptor holds 32-bit base, limit, type, protection
la = va + base ; assert(va < limit);
choice of seg register usually implicit in instruction
- ESP uses SS, EIP uses CS, others (mostly) use DS
- some instructions can take far addresses:
  - ljmp $selector, $offset
GDT lives in memory, CPU's GDTR register points to base of GDT
LGDT instruction loads GDTR
you turn on protected mode by setting PE bit in CR0 register
What about protection?
- instructions can only r/w/x memory reachable through seg regs
- not before base, not after limit
- can my program change a segment register? yes, but... to one of the permitted (accessible) descriptors in the GDT
- can my program re-load GDTR? no!
- how does h/w know if user or kernel?
- Current privilege level (CPL) is in the low 2 bits of CS
- CPL=0 is privileged O/S, CPL=3 is user
- why can't app modify the descriptors in the GDT? it's in memory...
- what about system calls? how do they transfer to kernel?
- app cannot just lower the CPL

Traps

The x86 processor uses a table known as the interrupt descriptor table (IDT) to determine how to transfer control when a trap occurs. The x86 allows up to 256 different interrupt or exception entry points into the kernel, each with a different interrupt vector. A vector is a number between 0 and 256. An interrupt's vector is determined by the source of the interrupt: different devices, error conditions, and application requests to the kernel generate interrupts with different vectors. The CPU uses the vector as an index into the processor's IDT, which the kernel sets up in kernel-private memory of the kernel's choosing, much like the GDT. From the appropriate entry in this table the processor loads:

the value to load into the instruction pointer (EIP) register, pointing to the kernel code designated to handle that type of exception.
the value to load into the code segment (CS) register, which includes in bits 0-1 the privilege level at which the exception handler is to run. In PIOS, all exceptions are handled in kernel mode, privilege level 0.

Entering and Returning from Trap Handlers

When an x86 processor takes a trap while in kernel mode, it first pushes a trap frame onto the kernel stack, to save the old values of certain registers before the trap handling mechanism modifies them. The processor then looks up the CS and EIP of the trap handler in the IDT, and transfers control to that instruction address. The following diagram illustrates the format of the basic kernel trap frame, defining the state of the kernel stack on entry to the trap handler:

                     +--------------------+ <---- old ESP
                     |     old EFLAGS     |     " - 4
                     | 0x00000 | old CS   |     " - 8
                     |      old EIP       |     " - 12
                     +--------------------+ <---- ESP

For certain types of x86 exceptions, in addition to the basic three 32-bit words above, the processor pushes onto the stack another word containing an error code. The page fault exception, number 14, is an important example. See the x86 manuals to determine for which exception numbers the processor pushes an error code, and what the error code means in that case. When the processor pushes an error code, the stack would look as follows at the beginning of the trap handler:

                     +--------------------+ <---- old ESP
                     |     old EFLAGS     |     " - 4
                     | 0x00000 | old CS   |     " - 8
                     |      old EIP       |     " - 12
                     |     error code     |     " - 16
                     +--------------------+ <---- ESP

The x86 processor provides a special instruction, iret, to return from trap handlers. It expects the kernel's stack to look like the first figure above, with ESP pointing to the old EIP. When the processor executes an iret instruction, pops the saved values of EIP, CS, and EFLAGS off the stack and back into the corresponding registers, and resumes instruction execution at the popped EIP.

Note that when returning from a trap, the processor doesn't actually know or care whether the "old" values it is popping off the stack are really the exact same values that it originally pushed onto the stack on entry to the trap handler. Think about what would happen - for better or worse - if the kernel trap handler changes these values during its execution.

The Task State Segment. The processor needs a place to save the old processor state before the interrupt or exception occurred, such as the original values of EIP and CS before the processor invoked the exception handler, so that the exception handler can later restore that old state and resume the interrupted code from where it left off. But this save area for the old processor state must in turn be protected from unprivileged user-mode code; otherwise buggy or malicious user code could compromise the kernel: for example, one user mode thread could change the kernel state of another thread while the latter is in a system call, or user code could simply point ESP to unmapped or read-only memory, making it impossible for the processor to push the trap frame and causing an immediate reset as described above.

For this reason, when an x86 processor takes an interrupt or trap that causes a privilege level change from user to kernel mode, it also switches to a stack in the kernel's memory. A structure called the task state segment (TSS) specifies the segment selector and address where this stack lives. The processor pushes (on this new stack) SS, ESP, EFLAGS, CS, EIP, and an optional error code. Then it loads the CS and EIP from the interrupt descriptor, and sets the ESP and SS to refer to the new stack.

Although the TSS is large and can potentially serve a variety of purposes, PIOS only uses it to define the kernel stack that the processor should switch to when it transfers from user to kernel mode. Since "kernel mode" in PIOS/Pintos/xv6 is privilege level 0 on the x86, the processor uses the ESP0 and SS0 fields of the TSS to define the kernel stack when entering kernel mode. PIOS/Pintos/xv6 don't use any other TSS fields.

Combined, the IDT and TSS provide the kernel with a mechanism to ensure that traps are handled only by calling well-defined entrypoints in the kernel (the interrupt vectors in the IDT) and that trap handlers will have a well-defined, protected workspace (the stack pointers in the TSS). Exactly where these entrypoints and kernel stacks are located is up to the kernel, however.