PC Architecture and Processor Setup
Outline
- PC architecture
- x86 instruction set
- GCC calling conventions
PC architecture
- A full PC has:
- one or more x86 CPUs, each containing:
- integer registers (can you name them?) and execution unit
- floating-point/vector registers and execution unit(s)
- memory management unit (MMU)
- multiprocessor/multicore: local interrupt controller (APIC)
- memory
- disk (IDE, SCSI, USB)
- keyboard
- display
- other resources: BIOS ROM, clock, ...
- We will start with the original 16-bit 8086 CPU (1978)
- CPU runs instructions:
for(;;){
run next instruction
}
- Draw figure with common bus, I/O, and CPU. The CPU has registers, cache,
etc.
- Draw figure showing EIP and how it gets incremented automatically after
executing each instruction.
- Needs work space: registers
- four 16-bit data registers: AX, BX, CX, DX
- each in two 8-bit halves, e.g. AH and AL
- very fast, very few
- More work space: memory
- CPU sends out address on address lines (wires, one bit per wire)
- Data comes back on data lines
- or data is written to data lines
- Add address registers: pointers into memory
- SP - stack pointer
- BP - frame base pointer
- SI - source index
- DI - destination index
- Instructions are in memory too!
- IP - instruction pointer (PC on PDP-11, everything else)
- increment after running each instruction
- can be modified by CALL, RET, JMP, conditional jumps
- Want conditional jumps
- FLAGS - various condition codes
- whether last arithmetic operation overflowed
- ... was positive/negative
- ... was [not] zero
- ... carry/borrow on add/subtract
- ... etc.
- whether interrupts are enabled
- direction of data copy instructions
- JP, JN, J[N]Z, J[N]C, J[N]O ...
- What if we want to use more than 2^16 bytes of memory?
- 8086 has 20-bit physical addresses, can have 1 Meg RAM
- the extra four bits usually come from a 16-bit "segment register":
- CS - code segment, for fetches via IP
- SS - stack segment, for load/store via SP and BP
- DS - data segment, for load/store via other registers
- ES - another data segment, destination for string operations
- virtual to physical translation: pa = va + seg*16
- e.g. set CS = 4096 to execute starting at 65536
- tricky: can't use the 16-bit address of a stack variable as a pointer
- a far pointer includes full segment:offset (16 + 16 bits)
- tricky: pointer arithmetic and array indexing across segment boundaries
- But 8086's 16-bit addresses and data were still painfully small, so
80386 added support for 32-bit data and addresses (1985)
- boots in 16-bit mode, bootasm.S switches to 32-bit mode
- registers are 32 bits wide, called EAX rather than AX
- operands and addresses that were 16-bit became 32-bit in 32-bit mode, e.g. ADD does 32-bit arithmetic
- prefixes 0x66/0x67 toggle between 16-bit and 32-bit operands and addresses: in 32-bit mode, MOVW is expressed as 0x66 MOVW
- the .code32 in bootasm.S tells assembler to generate 0x66 for e.g. MOVW
- 80386 also changed segments and added paged memory...
- Example instruction encoding
b8 cd ab 16-bit CPU, AX <- 0xabcd
b8 34 12 cd ab 32-bit CPU, EAX <- 0xabcd1234
66 b8 cd ab 32-bit CPU, AX <- 0xabcd
- ...and even 32 bits eventually wasn't enough, so
AMD added support for 64-bit data addresses (1999)
- registers are 64 bits wide, called RAX, RBX, etc.
- 8 more general-purpose registers: R8 thru R15
- boot: still go thru 16-bit and 32-bit modes on the way!
x86 Instruction Set
- Intel syntax: op dst, src (Intel manuals!)
- AT&T (gcc/gas) syntax: op src, dst (labs, xv6)
- uses b, w, l suffix on instructions to specify size of operands
- Operands are registers, constant, memory via register, memory via constant
- Examples:
AT&T syntax | "C"-ish equivalent
|
movl %eax, %edx | edx = eax; | register mode
|
movl $0x123, %edx | edx = 0x123; | immediate
|
movl 0x123, %edx | edx = *(int32_t*)0x123; | direct
|
movl (%ebx), %edx | edx = *(int32_t*)ebx; | indirect
|
movl 4(%ebx), %edx | edx = *(int32_t*)(ebx+4); | displaced
|
- Instruction classes
- data movement: MOV, PUSH, POP, ...
- arithmetic: TEST, SHL, ADD, AND, ...
- i/o: IN, OUT, ...
- control: JMP, JZ, JNZ, CALL, RET
- string: REP MOVSB, ...
- system: IRET, INT
- Intel architecture manual Volume 2 is the reference
gcc x86 calling conventions
- x86 dictates that stack grows down:
Example instruction | What it does
|
pushl %eax
|
subl $4, %esp
movl %eax, (%esp)
|
popl %eax
|
movl (%esp), %eax
addl $4, %esp
|
call 0x12345
|
pushl %eip (*)
movl $0x12345, %eip (*)
|
ret
|
popl %eip (*)
|
(*) Not real instructions
- GCC dictates how the stack is used.
Contract between caller and callee on x86:
- at entry to a function (i.e. just after call):
- %eip points at first instruction of function
- %esp+4 points at first argument
- %esp points at return address
- after ret instruction:
- %eip contains return address
- %esp points at arguments pushed by caller
- called function may have trashed arguments
- %eax (and %edx, if return type is 64-bit) contains
return value (or trash if function is void)
- %eax, %edx (above), and %ecx may be trashed
- %ebp, %ebx, %esi, %edi must contain contents from time of call
- Terminology:
- %eax, %ecx, %edx are "caller save" registers
- %ebp, %ebx, %esi, %edi are "callee save" registers
- Functions can do anything that doesn't violate contract.
By convention, GCC does more:
- each function has a stack frame marked by %ebp, %esp
+------------+ |
| arg 2 | \
+------------+ >- previous function's stack frame
| arg 1 | /
+------------+ |
| ret %eip | /
+============+
| saved %ebp | \
%ebp-> +------------+ |
| | |
| local | \
| variables, | >- current function's stack frame
| etc. | /
| | |
| | |
%esp-> +------------+ /
- %esp can move to make stack frame bigger, smaller
- %ebp points at saved %ebp from previous function,
chain to walk stack
- function prologue:
pushl %ebp
movl %esp, %ebp
or
enter $0, $0
enter usually not used: 4 bytes vs 3 for pushl+movl,
not on hardware fast-path anymore
- function epilogue can easily find return EIP on stack:
movl %ebp, %esp
popl %ebp
or
leave
leave used often because it's 1 byte, vs 3 for movl+popl
- Compiling, linking, loading:
- Preprocessor takes C source code (ASCII text),
expands #include etc, produces C source code
- Compiler takes C source code (ASCII text),
produces assembly language (also ASCII text)
- Assembler takes assembly language (ASCII text),
produces .o file (binary, machine-readable!)
- Linker takes multiple '.o's,
produces a single program image (binary)
- Loader loads the program image into memory
at run-time and starts it executing
x86 Physical Memory Map
- The physical address space mostly looks like ordinary RAM
- Except some low-memory addresses actually refer to other things
- Writes to VGA memory appear on the screen
- Reset or power-on jumps to ROM at 0x000ffff0 (so must be ROM at top of BIOS)
+------------------+ <- 0xFFFFFFFF (4GB)
| 32-bit |
| memory mapped |
| devices |
| |
/\/\/\/\/\/\/\/\/\/\
/\/\/\/\/\/\/\/\/\/\
| |
| Unused |
| |
+------------------+ <- depends on amount of RAM
| |
| |
| Extended Memory |
| |
| |
+------------------+ <- 0x00100000 (1MB)
| BIOS ROM |
+------------------+ <- 0x000F0000 (960KB)
| 16-bit devices, |
| expansion ROMs |
+------------------+ <- 0x000C0000 (768KB)
| VGA Display |
+------------------+ <- 0x000A0000 (640KB)
| |
| Low Memory |
| |
+------------------+ <- 0x00000000
I/O
- Original PC architecture: use dedicated I/O space
- Works same as memory accesses but set I/O signal
- Only 1024 I/O addresses
- Accessed with special instructions (IN, OUT)
- Example: write a byte to line printer:
#define DATA_PORT 0x378
#define STATUS_PORT 0x379
#define BUSY 0x80
#define CONTROL_PORT 0x37A
#define STROBE 0x01
void
lpt_putc(int c)
{
/* wait for printer to consume previous byte */
while((inb(STATUS_PORT) & BUSY) == 0)
;
/* put the byte on the parallel lines */
outb(DATA_PORT, c);
/* tell the printer to look at the data */
outb(CONTROL_PORT, STROBE);
outb(CONTROL_PORT, 0);
}
- Memory-Mapped I/O
- Use normal physical memory addresses
- Gets around limited size of I/O address space
- No need for special instructions
- System controller routes to appropriate device
- Works like ``magic'' memory:
- Addressed and accessed like memory,
but ...
- ... does not behave like memory!
- Reads and writes can have ``side effects''
- Read results can change due to external events
Example hardware for address spaces: x86 segments
The operating system can switch the x86 to protected mode, which
supports virtual and physical addresses, and allows the O/S to set up
address spaces so that user processes can't change them. Translation
in protected mode is as follows:
- selector:offset (virtual / logical addr)
==SEGMENTATION==>
- linear address
==PAGING ==>
- physical address
Next lecture covers paging; now we focus on segmentation.
Protected-mode segmentation works as follows (see handout):
- segment register holds segment selector
- selector: 13 bits of index, local vs global flag, 2-bit RPL
- selector indexes into global descriptor table (GDT)
- segment descriptor holds 32-bit base, limit, type, protection
- la = va + base ; assert(va < limit);
- choice of seg register usually implicit in instruction
- ESP uses SS, EIP uses CS, others (mostly) use DS
- some instructions can take far addresses:
- GDT lives in memory, CPU's GDTR register points to base of GDT
- LGDT instruction loads GDTR
- you turn on protected mode by setting PE bit in CR0 register
- What about protection?
- instructions can only r/w/x memory reachable through seg regs
- not before base, not after limit
- can my program change a segment register? yes, but... to one of the
permitted (accessible) descriptors in the GDT
- can my program re-load GDTR? no!
- how does h/w know if user or kernel?
- Current privilege level (CPL) is in the low 2 bits of CS
- CPL=0 is privileged O/S, CPL=3 is user
- why can't app modify the descriptors in the GDT? it's in memory...
- what about system calls? how do they transfer to kernel?
- app cannot just lower the CPL
Traps
The x86 processor uses a table known as the
interrupt descriptor table (IDT)
to determine how to transfer control when a trap occurs.
The x86 allows up to 256 different
interrupt or exception entry points into the kernel,
each with a different interrupt vector.
A vector is a number between 0 and 256.
An interrupt's vector is determined by the
source of the interrupt: different devices, error
conditions, and application requests to the kernel
generate interrupts with different vectors.
The CPU uses the vector as an index
into the processor's IDT,
which the kernel sets up in kernel-private memory of the kernel's choosing,
much like the GDT.
From the appropriate entry in this table
the processor loads:
- the value to load into
the instruction pointer (EIP) register,
pointing to the kernel code designated
to handle that type of exception.
- the value to load into
the code segment (CS) register,
which includes in bits 0-1 the privilege level
at which the exception handler is to run.
In PIOS, all exceptions are handled in kernel mode,
privilege level 0.
Entering and Returning from Trap Handlers
When an x86 processor takes a trap while in kernel mode,
it first pushes a trap frame onto the kernel stack,
to save the old values of certain registers
before the trap handling mechanism modifies them.
The processor then
looks up the CS and EIP of the trap handler in the IDT,
and transfers control to that instruction address.
The following diagram illustrates the format
of the basic kernel trap frame,
defining the state of the kernel stack on entry to the trap handler:
+--------------------+ <---- old ESP
| old EFLAGS | " - 4
| 0x00000 | old CS | " - 8
| old EIP | " - 12
+--------------------+ <---- ESP
For certain types of x86 exceptions,
in addition to the basic three 32-bit words above,
the processor pushes onto the stack another word
containing an error code.
The page fault exception, number 14,
is an important example.
See the x86 manuals to determine for which exception numbers
the processor pushes an error code,
and what the error code means in that case.
When the processor pushes an error code,
the stack would look as follows at the beginning of the trap handler:
+--------------------+ <---- old ESP
| old EFLAGS | " - 4
| 0x00000 | old CS | " - 8
| old EIP | " - 12
| error code | " - 16
+--------------------+ <---- ESP
The x86 processor provides a special instruction, iret,
to return from trap handlers.
It expects the kernel's stack to look like the first figure above,
with ESP pointing to the old EIP.
When the processor executes an iret instruction,
pops the saved values of EIP, CS, and EFLAGS off the stack
and back into the corresponding registers,
and resumes instruction execution at the popped EIP.
Note that when returning from a trap,
the processor doesn't actually know or care
whether the "old" values it is popping off the stack
are really the exact same values
that it originally pushed onto the stack on entry to the trap handler.
Think about what would happen - for better or worse -
if the kernel trap handler
changes these values during its execution.
The Task State Segment.
The processor needs a place
to save the old processor state
before the interrupt or exception occurred,
such as the original values of EIP and CS
before the processor invoked the exception handler,
so that the exception handler can later restore that old state
and resume the interrupted code from where it left off.
But this save area for the old processor state
must in turn be protected from unprivileged user-mode code;
otherwise buggy or malicious user code
could compromise the kernel:
for example, one user mode thread could change the kernel state
of another thread while the latter is in a system call,
or user code could simply point ESP to unmapped or read-only memory,
making it impossible for the processor to push the trap frame
and causing an immediate reset as described above.
For this reason,
when an x86 processor takes an interrupt or trap
that causes a privilege level change from user to kernel mode,
it also switches to a stack in the kernel's memory.
A structure called the task state segment (TSS) specifies
the segment selector and address where this stack lives.
The processor pushes (on this new stack)
SS, ESP, EFLAGS,
CS, EIP, and an optional error code.
Then it loads the CS and EIP
from the interrupt descriptor,
and sets the ESP and SS to refer to the new stack.
Although the TSS is large
and can potentially serve a variety of purposes,
PIOS only uses it to define
the kernel stack that the processor should switch to
when it transfers from user to kernel mode.
Since "kernel mode" in PIOS/Pintos/xv6
is privilege level 0 on the x86,
the processor uses the ESP0 and SS0 fields of the TSS
to define the kernel stack when entering kernel mode.
PIOS/Pintos/xv6 don't use any other TSS fields.
Combined,
the IDT and TSS provide the kernel with a mechanism
to ensure that traps are handled
only by calling well-defined entrypoints in the kernel
(the interrupt vectors in the IDT)
and that trap handlers will have a well-defined, protected workspace
(the stack pointers in the TSS).
Exactly where these entrypoints and kernel stacks are located
is up to the kernel,
however.