Lab 1 Cooperative Green Threads

Due: Wednesday April 19, 2017 11:59PM

Overview

In this lab, you will write a cooperative green threads library, chloros (a Greek word meaning “green”) . You will be programming in C and x86-64 assembly.

Greens threads are similar to the type of threads you might be familiar with, OS threads, but are implemented entirely in user-level code. They are also known as user-level threads. They are typically speedier than OS threads since there is no context switching into/out of the kernel. Cooperative green threads are green threads that release control to the scheduler deliberately, usually through a yield() call. That is, they execute for as long as they want.

A user of your chloros library first calls grn_init() to initialize the first green thread, one representing the originally executing program. Then, the user calls grn_spawn(user_fn) to create a new thread that executes the user_fn. (Note that in a production thread library, grn_spawn would take an extra void * argument to pass to user_fn so as to spawn different threads running the same function on different arguments.) The user calls grn_yield() inside a thread to yield execution to another thread, including the original. Finally, the user might call grn_wait() to wait for all of the spawned threads to finish executing before ending the process.

A full program using your library might look like this:

#include "chloros.h"

extern void have_work();
extern void waiting();
extern void do_work();

void user_fn() {
  while (have_work()) {
    while (waiting()) {
      grn_yield();
    }

    do_work();
  }
}

int main() {
  grn_init(false);
  for (int i = 0; i < 10; ++i) {
    grn_spawn(user_fn);
  }

  grn_wait();
  grn_exit();
}

This lab is divided into 5 phases and a 6th (optional) extra credit phase. The first 4 phases will guide you towards implementing cooperative green threads. Once you have completed phase 4, the example program above will run as you expect. In phase 5, you will implement simple thread garbage collection so that resources aren’t leaked after a thread terminates. In the optional extra credit phase, you will add preemption support to your library.

There are unit tests corresponding to each phase, allowing you to mark your progress as you work through the lab. You may find the list of references to the right (or below if on mobile) useful as your proceed.

Phase 0: Getting Started

First, ensure that you are working on the lab using a compatible machine. For this lab, compatible means that the machine meets the following requirements:

Runs a modern Unix: Linux, BSD, or OS X
Runs a 64-bit variant of the OS (check with arch)
Has GCC or Clang and GNU Make installed

If you don’t have a machine that meets these requirements, you can work on the lab through corn.stanford.edu. If you’re not familiar with corn, see Stanford’s notes on its shared computing environment.

Getting the skeleton code

To begin, clone the lab 1 skeleton git repository to your development machine:

git clone https://web.stanford.edu/class/cs240/lab1-skeleton.git lab1

Exploring the skeleton code

The majority, if not all of your work will be to modify the files found in the src/ directory of the skeleton code:

src/main.c contains the high-level green-thread library functions
src/thread.c contains thread management functions: allocating and destroying threads, and thread list management
src/context_switch.S contains the assembly routines that implement thread context switching

The include/ directory contains the header files for these src/ files where applicable and are named like their src/ counterpart. There are two exceptions: 1) include/chloros.h, which exports the high-level library routines for use by user binaries that link to your library, and 2) include/utils.h, which defines the following helpful macros (among others):

debug(...) like printf, but only prints if there is a #define DEBUG at the top of the file
assert(cond) checks that cond evaluates to true. If it does not, prints an error message and terminates the program
err_exit(...) like printf but terminates the program after printing the error message

Finally, the test/ directory contains the unit testing code. The tests themselves are in test/src/phase{n}_tests.c. If a test you expect to pass fails, you can look at its test file to see why.

Make targets

You will use make to build, test, and create a submission file for this lab. The following targets are defined in the Makefile:

all: compiles and packages your library
test: runs the unit tests against your library
submission: creates the lab1.tar.gz submission file
clean: deletes files that can be remade - useful when changing headers

Calling make in your shell will implicitly run the all target. The other targets can be invoked using make {target}. Ensure that you are in the root of your lab directory before invoking make. Try calling make test now. You should see five failing test suites. If this doesn’t work, your machine may not meet the requirements.

Getting familiar

Before you continue, read through the src/ files and some of the unit test files (test/src/phase{n}_tests.c) to see how the functions you will write are intended to be used and what we will test for. Also pay attention to the code style: how variables are named, how code is aligned, and how spacing is used. You should write your code with an equivalent style. Once you have a good grasp of the chloros thread interface and functions, you’re ready to start writing code.

Phase 1: Allocating and Deallocating Threads

In phase 1, you will implement the functions marked FIXME in src/thread.c.

The grn_thread structure, defined in include/chloros.h, contains all of the information needed to context switch into a thread. Specifically, it contains a thread’s state (context) and metadata the scheduler uses to make decisions. We keep track of the threads in the system via a linked list: the next and prev pointers in the grn_thread structure point to the next and previous elements of the linked list, respectively, and the threads field of the global STATE (declared in include/main.h and defined in src/main.c) holds a pointer to the head of the list. Examine and ensure that you understand the thread linked list manipulation functions we have implemented for your use in src/thread.c: add_thread, remove_thread, and next_thread. We’ve also provided the atomic_next_id function which simply returns a strictly increasing counter.

Your task is to implement the functions that allocate and destroy (deallocate) grn_thread structures for use by the rest of the system: grn_new_thread and grn_destroy_thread. You can find the functions’ specifications directly above their definition. Once you have done this, you should pass the phase 1 unit tests.

Phase 2: Context Switching

In phase 2, you will write the code to context switch from one thread to another in the assembly function marked FIXME in src/context_switch.S.

The context_switch function will be called on behalf of a green thread during its execution to context switch from one thread to another. For a context switch to be successful, all state of the currently executing thread must be saved in some structure and the state of the thread being switched into must be restored. A thread’s state is also referred to as its context.

A thread’s context is already defined in the grn_context structure in include/chloros.h as seven 64-bit registers: rsp, the stack pointer, rbp, the base pointer, and rbx, r12, r13, r14, and r15, general purpose registers. All of these registers are known as callee saved, which means that the callee of a function guarantees that those registers will have the same value when the function returns as they did when the function was called. In other words, a function A calling another function B can expect those registers to contain the same values after function B returns as they had when function B was called because the callee (B) saves and restore those values. These registers are defined this way by the System V X64 ABI calling convention, which is followed on most Unix systems running on 64-bit machines. You can read more at the ABI reference link.

Where are the rest of the registers?

A keen reader might wonder why a thread's context only contains the callee-saved registers when there are more general purpose 64-bit registers. The answer relies on the notion of `caller-saved` registers, which are registers that a function caller saves before calling a function. The registers that don't appear in `grn_context` are caller-saved registers. Since the `context_switch` function will be _called_, the compiler will emit instructions to save and restore those registers, so no extra work is needed to maintain that state, and the `callee-saved` registers suffice as our register state.

Apart from these registers, a thread’s full context also includes the values in its stack. Saving and restoring a thread’s stack on each context switch would be a very expensive operation, so instead of doing this, we simply give each thread its own unique stack. Then, as long as each thread’s stack pointer points to its own unique stack, saving and restoring the stack pointer suffices to save and restore the thread’s stack. Later on, you’ll write the code that ensures that each thread’s stack pointer points to its own unique stack (which you’ve already allocated in grn_new_thread). The context_switch function should only be called on already running threads, so the validity and uniqueness of the stack pointer can be assumed.

You’re ready to implement the context switching assembly function in src/context_switch.S. You can find the function’s specification above its definition. Keep in mind that according to the calling convention, the first two parameters to a function are passed in the rdi and rsi registers. Also note that GCC calls the GNU Assembler implicitly, which uses the GAS syntax for assembly. You may wish to consult the calling convention in the ABI reference or the X86 instruction reference, both of which are linked in the column to the right. After you have implemented this routine, you should pass the phase 2 unit tests.

Hint: You can implement `grn_context_switch` using only two different assembly instructions: `mov` and `ret`.

Phase 3: Yield

In this phase, you will implement grn_yield in src/main.c, the main thread scheduling function.

When a thread wants to release control to the scheduler, it yields its execution time by calling grn_yield(). The function’s task is to find and execute an available thread. It is a scheduler: it makes decisions about which thread to schedule when and does the necessary bookkeeping to continue scheduling threads reliably in the future.

Each thread in chloros is marked with a status of READY, RUNNING, WAITING, or ZOMBIE. A thread is READY when it can be executed, RUNNING when it is executing, WAITING when some processing needs to happen before it is READY, and ZOMBIE when a thread has finished executing but hasn’t has its resources freed and destroyed.

Your grn_yield() function should use and modify these statuses for scheduling. It should implement a round-robin scheduler, performing a linear search for the next READY thread beginning at the currently executing thread, STATE.current. Once it finds a thread, the status of the currently executing thread is set to READY if it was previous RUNNING, and the newly found thread’s status is set to RUNNING. The pointer to the currently executing thread, STATE.current, is updated. Finally, the function context_switches into the newly found thread.

Your task is to implement the grn_yield function. The full specification is above its definition. After you have implemented this routine, you should pass the phase 3 unit tests.

Hint: Use the `next_thread` function for a looping linear search.

Warning: Yield's unit tests are not exhaustive. A later phase's unit tests might reveal a bug in this phase.

Warning: The unit tests assume a round-robin scheduler! Ensure your scheduling is round-robin.

Phase 4: Spawn

In this phase, you will implement grn_spawn in src/main.c, the new thread initializer. After implementing this function, you will have implemented cooperative green threads.

A thread created by a user executes a function of the user’s choice. The grn_spawn function accomplishes exactly this: it allocates a new thread, initializes its context so that the user’s function is executed after a context switch, and finally calls grn_yield to yield the caller’s execution to (potentially) the newly initialized thread.

When a thread is allocated by your grn_new_thread function, it has an empty context, so context switching into it would likely crash the process. Your task is to write grn_spawn to set up values in the stack so that after the first context_switch into the thread, the thread begins executing the start_thread routine, a thread start function written in assembly we have provided to you. start_thread expects the user’s fn to be at the top of the stack when it is called. Thus, you must design the initial stack so that during a given thread’s first context_switch, context_switch returns to start_thread, and start_thread finds fn at the top of the stack.

The implementation of this function requires you to write values into the thread’s initial stack and then set the stack pointer appropriately. Note that the x64 ABI declares mandates the stock pointer to be 16-byte aligned. The full function specification is above its definition. After you have implemented this routine, you should pass the phase 4 unit tests.

Hint: The only register needing initialization is %rsp.

Phase 5: Garbage Collection

In this phase, you will implement grn_gc in src/main.c, a simple thread garbage collection routine.

Upon thread termination, the thread’s initial function returns to start_thread, and start_thread calls grn_exit to terminate the thread. grn_exit is a simple function: it simply sets the thread’s status to ZOMBIE so that grn_yield doesn’t schedule it again.

Your task is to ensure that a thread’s resources are freed after it has exited. You must implement grn_gc, which finds all ZOMBIEd threads and frees their resources. Then, insert a call to grn_gc in an appropriate location in your library. Note that you have already written grn_destroy_thread: grn_gc should call this function appropriately. The full function specification is above its definition. After you have implemented this routine, you should pass the phase 5 unit tests.

Hint: When is it and when is it not safe to call `grn_gc`?

Submission

Once you’ve completed the tasks above, you’re done and ready to submit! Ensure that your lab and tests run as you expect them to on corn.stanford.edu. We will grade your lab on the FarmShare machines. Any changes made to files outside of the src/ or include/ directories will not be submitted with your lab. Ensure that your submission is not dependent on those changes.

Finally, run make submission and then proceed to the submission page to upload your submission.

Phase 6: Preemption (Extra Credit)

This phase is entirely optional and lack of completion will have no adverse effects on your score. Successful completion of this phase will result in a 25% boost to your score. You must successfully complete all phases to be awarded the extra credit.

In this extra credit phase, you will add preemption to your green threads library. A preemptive thread is a thread that is interrupted by the scheduler when the scheduler sees fit; the thread has no control over when it yields execution to other threads. You will likely need to modify grn_init to handle the true case for the preempt parameter as well as add new functions and modify previous functions. To test this phase, uncomment the line underneath the TODO in test/src/test.c.

A preemptive scheduler works by setting a timer for some short duration and receiving an interrupt when that timer expires, transferring control to the scheduler without the thread’s intervention. Once the scheduler has control, it can decide to switch execution to another thread.

In an OS, the kernel sets up a timer interrupt using either an external Programmable Interrupt Timer (such as the intel 8254) or a modern CPU’s internal APIC. In user-level code, we don’t have direct access to hardware, so we must use the mechanisms exposed to us by the underlying OS. Thankfully, POSIX has provided us with timers which deliver a signal to the timer-setting process when the timer has expired. You can think of signals as user-level interrupts. You should consult the signal(7) man page reference for further important information on signals.

The setitimer function, also referenced, allows you to set up a timer to signal the calling process after it expires. To catch the signal, you must register a signal handler using the sigaction function. The following is pseudocode for setting up a timer and signal handler:

void setup_timer() {
  struct itimerval itimer = { /* some timer interval */ };
  const struct sigaction timeout_action;
  timeout_action.sa_handler = /* your handler function */;
  sigemptyset(&timeout_action.sa_mask);
  timeout_action.sa_flags = /* your flags */;
  sigaction(/* EXPECTED_SIGNAL */, &timeout_action, NULL);

  if (setitimer(/* REQUESTED_TIMER */, &itimer, NULL)) {
    err_exit("setitimer failed: %s\n", strerror(errno));
  }
}

Your task is to setup a timer immediately after a thread has been scheduled, catch the resulting signal, and yield the interrupted thread. If a new thread is scheduled as a result of the yield, a new timer should be setup (which may be done automatically depending on how you initially called setitimer).

Be aware that context switching out of a signal handler means that technically the signal handler has not returned. Hence, you have to be wary of some of the particularities of Unix signals. Specifically:

By default, a signal is blocked or masked while it is being handled and only unmasked when the signal handler returns. If you yield while the timer signal is masked, the next thread will not be preempted by a signal. You may wish to address this through use of the SA_NODEFER flag to sigaction, or through use of sigprocmask(2).
Many functions in the C library are not reentrant, a good example being malloc. You cannot call these functions from within a signal handler. The signal-safety(7) man page enumerates the set of all async-signal-safe functions that are safe to call from within a signal handler. Note that since threads are preempted within signal handlers, you may need to mask interrupts or hold a lock for the duration of these functions. If needed, you can wrap library functions using C preprocessor macros (e.g., #define malloc chloros_malloc), but are only required to do so for functions actually used in the lab.
Across a call to grn_yield(), the compiler naturally saves any live caller-saved floating point registers to the stack, just like caller-saved integer registers. With signals, however, the C language spec does not require implementations to save and restore floating point registers across handlers. As it happens, Linux saves and restores floating point registers anyway on x86. As a rule, this will not be the case for all OSes and architectures, however. For the purposes of this lab, you can assume threads do not use floating point as long as you don’t use it yourself.

After successful completion of this phase, you should pass the phase 6 unit tests. To receive the extra credit, please add a file named extra_credit.txt inside the src/ directory explaining your implementation. Then, resubmit.

References

Green Threads (Wikipedia)
Wikipedia's description of green threads. Note that a VM is not a requirement for green threads.

User-Level vs. Kernel-Level Threads
Slides from an operating systems lecture at Texas A&M discussing the differences between kernel-level and user-level threads.

System V X64 ABI
Details the calling conventions, register usage, and much more for System V X64 binaries.

X86 Assembly Guide
A concise explanation and reference for X86 instructions. Note that for 64-bit, you'll append 'q' for 'quad-word' to data transfer instructions.

Quick Guide to GDB
A quick-and-dirty guide meant to get you started with the GNU Debugger. A must-read for beginners.

Guide to objdump w/Examples
A quick guide with many examples on how to use objdump to display information about object files.

signal (man) Page
An overview of signals straight from the Linux manual.

sititimer (man) Page
The manual page for the `sititimer` function which can be used to set an interval timer.