System Calls and Threads

Outline

System Calls

Case study: Unix/xv6 shell (simplified)

Threads

Figure on process address space: code, static data, stack, heap. On fork, the whole address space gets replicated. On thread create, the created thread has a different program counter, registers, and stack (through stack pointer). Everything else is shared between threads.

Kernel-level threads are just processes minus separate address spaces. Discuss the kernel scheduler which is invoked at every timer interrupt. Each thread is an independent entity for the kernel.

Write the cswitch function for processes and threads. Notice that switching among threads requires no privileged operations. Switching the stack can be done by switching the sp register. A cswitch needs to be fast (typically a few 100 microseconds).

Advantages of threads over processes

User-level threads can be implemented inside a process by writing scheduler() and cswitch() functions. The scheduler can be called periodically using SIGALRM signal.

Pros of user-level threads:

Cons of user-level threads:

Threading models (slide)

Thread pools (slide)

Concurrency and Synchronization

But if multiple threads share a memory location, good-looking sequential programs can start failing. e.g., shared static variable: hits shared code: hits = hits + 1 Assume the shared code gets compiled into the following assembly code
  ld   [hits], R
	add  R, R, 1
  st   R, [hits]
hits represents a memory location, and [hits] represents the contents of that memory location. ld [hits], R loads the contents of hits to register R.

If two threads try and execute this code simultaneously what can happen? Assume hits=10 before starting these threads. What interleavings cause hits=12? What interleavings cause hits=11?

This is a problem associated with concurrent access to a shared data region, in short concurrency. This is also called a race condition (perhaps because it is a race between two threads and the result depends on who wins).

Concurrency could be "logical concurrency" on a single CPU due to possibly arbitrary interruptions (or preemptions) caused by the timer interrupt, or could be "physical concurrency" due to threads executing simultaneously on different CPUs. The nature of the problem in both cases is identical.

Another example:

  Thread A:
     i = 0;
     while (i < 10)
         i = i + 1;
     print "A won!";

  Thread B:
     i = 0;
     while (i > -10)
         i = i - 1;
     print "B won!";
Who wins? Guaranteed that someone wins? What if both threads run on identical speed CPU executing in parallel? (guaranteed to go on forever?)

Another example

    push(v):
        if (n == stack_size)
                return full;
        stack[n] = v;
        n = n + 1;
Some bad schedules? Some that will work?

One way to deal with the problem is to enforce atomicity of certain code regions. In the hits example, we wanted the three instructions to execute atomically with respect to themselves. The code region that needs to be made atomic for the program to be correct is also called the critical section.

Locks are one way of implementing atomic regions. A lock L is a shared variable that provides two methods acquire(L) and release(L). Any lock L can be acquired by at most one thread at a time. Another thread can acquire it only after it's holder has released it. Any thread trying to acquire a lock which is already acquired by another thread must wait for it to be released.

Locks allow implementation of atomic regions by bracketing the critical section using calls to acquire() and release. For our hits example, the new code will be:

   lock hit_lock;    //shared lock.
   ...
   acquire(hit_lock);
   hit = hit + 1;
   release(hit_lock);
What happens when one thread is inside the critical section and another thread tries to enter it at the same time? What are the possible interleavings now? Are all possible interleavings correct?