xv6 runs on a multiprocessor and allows multiple CPUs to execute concurrently inside the kernel. These usually correspond to system calls made by processes running on different CPUs. There might also be interrupt handlers running at the same time as other kernel code. An interrupt can occur on any of the CPUs; both user-level processes and processes inside the kernel can be interrupted. xv6 uses spin-locks to coordinate how these concurrent activities share data structures.
How do xv6 locks work? Let's look at acquire() in spinlock.c. [1474]
And release():
sleep()
and wakeup()
functions:
void sleep(void *channel, struct lock *mutex); void wakeup(void *channel);
The sleep()
function is similar to the wait()
function and the wakeup()
function is similar to the
notify
function (recall condition variables). The difference
is that instead of declaring a condition variable explicity (recall
struct cv
), we simply use any number (or void pointer) as
our "channel" to wait on. Recall that because a condition variable is
stateless, we do not really need to declare it, and can simply use
any program variable's address to act as the "condition variable" (or
channel, as called in xv6). The channel is just a number, so callers of
sleep and wakeup must agree on a convention so they correctly synchronize
between themselves.
The semantics of sleep()/wakeup() are identical to those of condition variables. The sleep() function goes to sleep on the channel releasing the mutex atomically, and the wakeup() function wakes up all threads sleeping on the channel. Below we describe the semantics of sleep and wakeup using code (assuming xv6 process table structure):
void sleep(void *chan, struct lock *lk): //begin{atomic} curproc->chan = chan //curproc points to the PCB of current process curproc->state = SLEEPING release(lk); sched() //call the scheduler, which will context switch to another process //end{atomic} wakeup(chan): //begin{atomic} foreach p in ptable[]: //ptable is an array containing all PCBs if p->chan == chan and p->state == SLEEPING: p->state = RUNNABLE //end{atomic}The sleep() and wakeup() functions need to be atomic with respect to each other (also indicated by the begin-atomic and end-atomic annotations in the pseudo-code) and also with respect to other accesses to the process table. Recall that xv6 uses the
ptable.lock
to
implement mutual-exclusion for accesses to the ptable
.
void sleep(void *chan, struct lock *lk): acquire(&ptable.lock); curproc->chan = chan //curproc points to the PCB of current process curproc->state = SLEEPING release(lk); sched() //recall that the scheduler (or the next process) //releases ptable.lock void wakeup(void *chan): acquire(&ptable.lock); foreach p in ptable[]: //ptable is an array containing all PCBs if p->chan == chan and p->state == SLEEPING: p->state = RUNNABLE release(&ptable.lock);Notice that it is also okay to release the lock
lk
before
accessing curproc
(as ptable.lock
is now
protecting accesses to curproc
).
However, we need to ensure that our implementation cannot result
in a deadlock (given that a thread can hold two locks
at the same time). Firstly, we should check if lk
is
the same as ptable.lock
--- if so, we do not need
to reacquire it again. Secondly, and more importantly, we need
to ensure that there is a global ordering on the lock acquisition.
Notice that ptable.lock
is acquired after the
acquisition of lk
(which must have been acquired by
the caller of sleep()). The invariant followed by xv6 is that
the ptable.lock
will always be the inner-most
lock, i.e., no other lock acquisition will be attempted while
ptable.lock
is held. This ensures that the ptable.lock
is always the least priority, and so no deadlocks can result from cycles
involving ptable.lock
. Here is xv6's (correct) implementation
of sleep and wakeup:
void sleep(void *chan, struct lock *lk): if (lk != &ptable.lock) { acquire(&ptable.lock); release(lk); } curproc->chan = chan //curproc points to the PCB of current process curproc->state = SLEEPING sched() //recall that the scheduler (or the next process) //releases ptable.lock void wakeup(void *chan): acquire(&ptable.lock); foreach p in ptable[]: //ptable is an array containing all PCBs if p->chan == chan and p->state == SLEEPING: p->state = RUNNABLE release(&ptable.lock);Notice that it is the programmer's responsibility to ensure that
wakeup()
is not called with ptable.lock
held.
Recall API: Suppose process P1 has forked child process P2. If P1 calls wait(), it should return after P2 calls exit().
wait() in proc.c looks for any of its children that have already exited; if any exist, clean them up and return. If none, it calls sleep().
exit() calls wakeup(), marks itself as dead (ZOMBIE), and calls sched().
What lock should protect wait() and exit() from missing each other?
ptable.lock
as these functions manipulate the
ptable.
What should be the channel convention? Using the process-ID of the waiting process seems like a good one. The waiting process will wait on its own process-ID. The exiting processes will wakeup processes waiting on their parent's process-ID.
What if we used a common channel for all processes? e.g., let's say
we always used channel (void *)1
(recall that a channel
is just a number). Then exiting children of other processes may cause
a waiting process to wakeup from sleep (inside wait). This may not
be a correctness problem if the woken-up process again checks if it
has any zombie children, and goes to sleep otherwise. However this causes
poor performance, as all waiting processes will be woken up on every
exit, and all but one of them will go back to sleep. Using the parent's
process ID as the channel avoids this extra work.
What if some unrelated part of the kernel decides to sleep and wakeup on the same channel(s)? This will not be a problem from a correctness standpoint (irrelevant sleeping processes will wakeup only to go back to sleep). However this is not desirable from a performance standpoint.
Let's look at the xv6 code for wait and exit. Why does wait() free the child's memory (e.g., pgdir, kstack), and not exit()? Because the exit() function is currently using the process's pgdir and kstack, and so cannot deallocate them. Instead the process's parent can deallocate these structures inside wait.
What if exit is called before wait?
What if wait is called just after exit calls wakeup?