sets p->killed for the pid after taking ptable.lock
.
What does setting p->killed do? Nothing, for now. But the
process must be in the middle of something when this was
done --- e.g., running in user mode, running in kernel mode,
sleeping, etc. Whenever the process reaches a safe
boundary,
p->killed
will be checked and
exit()
would be called. Notice that if the process
was sleeping, kill() marks it RUNNABLE.
Where is killed checked? At syscall entry and exit (see trap function at lines 3104 and 3108). At other places where the process may have woken up from a sleep, and it is no longer correct to go back to sleep again.
Why not destroy the process right away? It is in the middle of something, destroying it will leave the kernel in an inconsistent state.
Why is it safe to mark the process RUNNABLE if it was SLEEPING?
The programmer has ensured that the waiting loops that call
sleep() within them, also check p->killed
wherever
necessary. e.g., the wait()
functions waiting loop
checks proc->killed
.
Wouldn't it be okay to let the sleeping process keep sleeping, even
if it was killed? Won't it anyways exit whenever it wakes up? This is
true, and if you expect the process to wakeup in some bounded amount
of time, then this may be okay. However, if the process could
wait for an unbounded amount of time (e.g., parent waiting on child
to exit), then this will not be acceptable. In general, all sleeping
processes are woken up. Assuming that all calls to sleep
are nested inside a while
loop that checks the condition,
it is possible that some of these processes go back to sleep again
(where the sleep time is bounded anyways).
For example, the sleep loop inside wait()
checks
p->killed
(because this sleep loop can wait for an
unbounded time). However, the sleep loop inside the
IDE device driver iderw
does not check p->killed
,
which means that if it is woken up by kill()
, it will go
back to sleep again till the IDE request is not completed (see following
discussion for more clarity on this).
The iderw
function is called with an argument
of type struct buf *
, which represents a buffer in memory
that will store the contents of the disk block. This function is used
to read/write from/to the disk which has an IDE interface. The
API for iderw
specifies that if the B_DIRTY flag is set
in buf
, then the buffer needs to be written to the
disk. Similarly, if the B_VALID
flag is not set
in buf
, then the buffer needs to be read from the disk.
Multiple processes may be trying to access the disk through
system calls (e.g., read/write). Also, disk requests could be made
due to virtual memory's demand paging logic.
All these accesses to the disk need to be mutually exclusive. So, xv6
uses a lock, called idelock
.
Also, as we know, disk accesses are likely to be very slow. So,
it is quite likely that multiple processes are simultaneously waiting
for the disk. The right thing to do is to make the processes sleep,
while they are waiting for the disk. However, before they go to sleep,
they must register their request. These outstanding requests are
maintained in a FIFO list, called idequeue
.
The disk driver processes one request at a time, starting from the head of the list.
Let's look at how the IDE driver uses sleep and wakeup. ide_rw() starts a disk operation and calls sleep(). ide_intr() calls wakeup() when disk interrupts to say that it's done.
Why does it work? What if the disk finishes and interrupts just before ide_rw() calls sleep()? What if the disk interrupts during the execution of the interrupt handler?
What's the sleep channel convention? Why does it make sense?
Let's look at the IDE disk driver as an example of xv6's use of locks. Sheet 39. The file system kernel code calls ide_rw() to read or write a block of data from/to the disk. ide_rw() should return when the read or write has completed. There may be multiple calls to ide_rw() outstanding because multiple processes may be reading files at the same time. The disk can only perform one operation at a time, so the IDE driver keeps a queue of requests that are waiting their turn.
The IDE driver has just one lock (ide_lock). This one lock helps enforce multiple invariants required for correct operation:
The software (and hardware) for these activities was all designed for one-at-a-time use, which is why it needs locking.
What happens if two processes call ide_rw() at the same time from different CPUs? [3954]
ide_intr() deletes a request from ide_queue (line 3902). What happens if an IDE interrupt occurs when a process is adding a block to the queue?
Why does idestart() want the caller to hold the lock? Why doesn't it just acquire the lock itself?
recursive locks are a bad idea
ideintr should certainly not use that instead of disabling interrupts!
lock ordering
iderw: sleep acquires ptable.lock
never acquire ptable.lock and then ide_lock
Also never acquire ide_lock and then some other lock
So, ptable.lock
has least priority (will always be the
inner-most lock), ide_lock will have higher priority than
ptable.lock
but lower priority than all other locks in xv6.
Why not check p->killed in the ide_rw() loop? As also discussed earlier,
this sleep loop is guaranteed to finish in bounded time. Also, the
caller of iderw
may be in an inconsistent state and may
not expect iderw
to return without actually reading the buffer.
In this case, it is okay for some function higher in the stack to check
p->killed
and exit.
Further, previously we said that while loading a process, the
executable file in a.out
format is parsed, and all its
memory contents are loaded into physical memory (from
disk) and corresponding
mappings created in the virtual address space (through the page table).
In general, a loaded process may not necessarily access all its code/data
and so, many disk reads can be avoided if the code/data pages are loaded
on-demand.
Here, at program load time, you could load some pages in physical memory and create corresponding mappings in the page table. For others, you may mark them not-present, but also store somewhere that these pages are present at a location on the disk (along with the details of the location). Notice that now, some mappings in the page table are marked not-present, even if the program believes that it has loaded them. In other words, the OS is playing tricks with the program under the carpet, without the program's knowledge.
Previously, if a program tried to access a virtual address that
is not currently mapped (i.e., the corresponding
present bit is zero in the page table), an exception
would get generated. This exception is also called
a page fault. The corresponding OS exception handler (also
called page fault handler
) would
get to execute and would likely kill the process. With
demand paging however, the page fault
handler will additionally check if this address is logically
present (but physically not present due to its demand paging optimization), and
if so, will load it from the disk to the physical memory on-demand.
After the OS loads the page from disk to physical memory, it creates a mapping in the page table, such that a page fault is not generated again on that address, and restarts the faulting instruction. Notice that in doing so, the OS assumes that it is safe to restart a faulting instruction. On the x86 platform, the hardware guarantees that if an instruction causes an exception, the machine state is left unmodified (as though the instruction never executed) before transferring control to the exception handler. This property of the hardware is called precise exceptions. If the hardware guarantees precise exceptions, it is safe for the OS to return control to the faulting instruction, thus causing it to execute again without changing the semantics of the process logic.