In this lab, you will write a cooperative green threads library, chloros (a Greek word meaning “green”) . You will be programming in C and x86-64 assembly.
Greens threads are similar to the type of threads you might be familiar with,
OS threads, but are implemented entirely in user-level code. They are also known
as user-level threads. They are typically speedier than OS threads since there
is no context switching into/out of the kernel. Cooperative green threads are
green threads that release control to the scheduler deliberately, usually
through a yield()
call. That is, they execute for as long as they want.
A user of your chloros library first calls grn_init()
to initialize the first
green thread, one representing the originally executing program. Then, the user
calls grn_spawn(user_fn)
to create a new thread that executes the user_fn
.
(Note that in a production thread library, grn_spawn
would take an extra void
*
argument to pass to user_fn
so as to spawn different threads running the
same function on different arguments.) The user calls grn_yield()
inside a
thread to yield execution to another thread, including the original. Finally,
the user might call grn_wait()
to wait for all of the spawned threads to
finish executing before ending the process.
A full program using your library might look like this:
#include "chloros.h"
extern void have_work();
extern void waiting();
extern void do_work();
void user_fn() {
while (have_work()) {
while (waiting()) {
grn_yield();
}
do_work();
}
}
int main() {
grn_init(false);
for (int i = 0; i < 10; ++i) {
grn_spawn(user_fn);
}
grn_wait();
grn_exit();
}
This lab is divided into 5 phases and a 6th (optional) extra credit phase. The first 4 phases will guide you towards implementing cooperative green threads. Once you have completed phase 4, the example program above will run as you expect. In phase 5, you will implement simple thread garbage collection so that resources aren’t leaked after a thread terminates. In the optional extra credit phase, you will add preemption support to your library.
There are unit tests corresponding to each phase, allowing you to mark your progress as you work through the lab. You may find the list of references to the right (or below if on mobile) useful as your proceed.
First, ensure that you are working on the lab using a compatible machine. For this lab, compatible means that the machine meets the following requirements:
arch
)If you don’t have a machine that meets these requirements, you can work on the
lab through corn.stanford.edu
. If you’re not familiar with corn
, see
Stanford’s notes on its shared computing
environment.
To begin, clone the lab 1 skeleton git repository to your development machine:
git clone https://web.stanford.edu/class/cs240/lab1-skeleton.git lab1
The majority, if not all of your work will be to modify the files found in the
src/
directory of the skeleton code:
src/main.c
contains the high-level green-thread library functionssrc/thread.c
contains thread management functions: allocating and
destroying threads, and thread list managementsrc/context_switch.S
contains the assembly routines that implement thread
context switchingThe include/
directory contains the header files for these src/
files where
applicable and are named like their src/
counterpart. There are two
exceptions: 1) include/chloros.h
, which exports the high-level library
routines for use by user binaries that link to your library, and
2) include/utils.h
, which defines the following helpful macros (among others):
debug(...)
like printf
, but only prints if there is a #define DEBUG
at
the top of the fileassert(cond)
checks that cond
evaluates to true. If it does not, prints
an error message and terminates the programerr_exit(...)
like printf
but terminates the program after printing the
error messageFinally, the test/
directory contains the unit testing code. The tests
themselves are in test/src/phase{n}_tests.c
. If a test you expect to pass
fails, you can look at its test file to see why.
You will use make
to build, test, and create a submission file for this lab.
The following targets are defined in the Makefile
:
all
: compiles and packages your librarytest
: runs the unit tests against your librarysubmission
: creates the lab1.tar.gz
submission fileclean
: deletes files that can be remade - useful when changing headersCalling make
in your shell will implicitly run the all
target. The other
targets can be invoked using make {target}
. Ensure that you are in the root of
your lab directory before invoking make
. Try calling make test
now. You
should see five failing test suites. If this doesn’t work, your machine may not
meet the requirements.
Before you continue, read through the src/
files and some of the unit test
files (test/src/phase{n}_tests.c
) to see how the functions you will write are
intended to be used and what we will test for. Also pay attention to the code
style: how variables are named, how code is aligned, and how spacing is used.
You should write your code with an equivalent style. Once you have a good grasp
of the chloros thread interface and functions, you’re ready to start writing
code.
In phase 1, you will implement the functions marked FIXME
in src/thread.c
.
The grn_thread
structure, defined in include/chloros.h
, contains all of the
information needed to context switch into a thread. Specifically, it contains a
thread’s state (context) and metadata the scheduler uses to make decisions. We
keep track of the threads in the system via a linked list: the next
and prev
pointers in the grn_thread
structure point to the next and previous elements
of the linked list, respectively, and the threads
field of the global STATE
(declared in include/main.h
and defined in src/main.c
) holds a pointer to
the head of the list. Examine and ensure that you understand the thread linked
list manipulation functions we have implemented for your use in src/thread.c
:
add_thread
, remove_thread
, and next_thread
. We’ve also provided the
atomic_next_id
function which simply returns a strictly increasing counter.
Your task is to implement the functions that allocate and destroy (deallocate)
grn_thread
structures for use by the rest of the system: grn_new_thread
and
grn_destroy_thread
. You can find the functions’ specifications directly above
their definition. Once you have done this, you should pass the phase 1 unit
tests.
In phase 2, you will write the code to context switch from one thread to another
in the assembly function marked FIXME
in src/context_switch.S
.
The context_switch
function will be called on behalf of a green thread during
its execution to context switch from one thread to another. For a context switch
to be successful, all state of the currently executing thread must be saved in
some structure and the state of the thread being switched into must be restored.
A thread’s state is also referred to as its context.
A thread’s context is already defined in the grn_context
structure in
include/chloros.h
as seven 64-bit registers: rsp
, the stack pointer, rbp
,
the base pointer, and rbx
, r12
, r13
, r14
, and r15
, general purpose
registers. All of these registers are known as callee saved, which means that
the callee of a function guarantees that those registers will have the same
value when the function returns as they did when the function was called. In
other words, a function A calling another function B can expect those
registers to contain the same values after function B returns as they had
when function B was called because the callee (B) saves and restore
those values. These registers are defined this way by the System V X64
ABI
calling convention, which is followed on most Unix systems running on 64-bit
machines. You can read more at the ABI reference link.
Where are the rest of the registers?
A keen reader might wonder why a thread's context only contains the callee-saved registers when there are more general purpose 64-bit registers. The answer relies on the notion of `caller-saved` registers, which are registers that a function caller saves before calling a function. The registers that don't appear in `grn_context` are caller-saved registers. Since the `context_switch` function will be _called_, the compiler will emit instructions to save and restore those registers, so no extra work is needed to maintain that state, and the `callee-saved` registers suffice as our register state.
Apart from these registers, a thread’s full context also includes the values in
its stack. Saving and restoring a thread’s stack on each context switch would be
a very expensive operation, so instead of doing this, we simply give each thread
its own unique stack. Then, as long as each thread’s stack pointer points to its
own unique stack, saving and restoring the stack pointer suffices to save and
restore the thread’s stack. Later on, you’ll write the code that ensures that
each thread’s stack pointer points to its own unique stack (which you’ve already
allocated in grn_new_thread
). The context_switch
function should only be
called on already running threads, so the validity and uniqueness of the stack
pointer can be assumed.
You’re ready to implement the context switching assembly function in
src/context_switch.S
. You can find the function’s specification above its
definition. Keep in mind that according to the calling convention, the first two
parameters to a function are passed in the rdi
and rsi
registers. Also note
that GCC calls the GNU Assembler implicitly, which uses the GAS syntax for
assembly. You may wish to consult the calling convention in the ABI reference or
the X86 instruction reference, both of which are linked in the column to the
right. After you have implemented this routine, you should pass the phase 2 unit
tests.
In this phase, you will implement grn_yield
in src/main.c
, the main thread
scheduling function.
When a thread wants to release control to the scheduler, it yield
s its
execution time by calling grn_yield()
. The function’s task is to find and
execute an available thread. It is a scheduler: it makes decisions about which
thread to schedule when and does the necessary bookkeeping to continue
scheduling threads reliably in the future.
Each thread in chloros is marked with a status of READY
, RUNNING
, WAITING
,
or ZOMBIE
. A thread is READY
when it can be executed, RUNNING
when it is
executing, WAITING
when some processing needs to happen before it is READY
,
and ZOMBIE
when a thread has finished executing but hasn’t has its resources
freed and destroyed.
Your grn_yield()
function should use and modify these statuses for scheduling.
It should implement a round-robin scheduler, performing a linear search for the
next READY
thread beginning at the currently executing thread,
STATE.current
. Once it finds a thread, the status of the currently executing
thread is set to READY
if it was previous RUNNING
, and the newly found
thread’s status is set to RUNNING
. The pointer to the currently executing
thread, STATE.current
, is updated. Finally, the function context_switch
es
into the newly found thread.
Your task is to implement the grn_yield
function. The full specification is
above its definition. After you have implemented this routine, you should pass
the phase 3 unit tests.
In this phase, you will implement grn_spawn
in src/main.c
, the new thread
initializer. After implementing this function, you will have implemented
cooperative green threads.
A thread created by a user executes a function of the user’s choice. The
grn_spawn
function accomplishes exactly this: it allocates a new thread,
initializes its context so that the user’s function is executed after a context
switch, and finally calls grn_yield
to yield the caller’s execution to
(potentially) the newly initialized thread.
When a thread is allocated by your grn_new_thread
function, it has an empty
context, so context switching into it would likely crash the process. Your task
is to write grn_spawn
to set up values in the stack so that after the first
context_switch
into the thread, the thread begins executing the start_thread
routine, a thread start function written in assembly we have provided to you.
start_thread
expects the user’s fn
to be at the top of the stack when it is
called. Thus, you must design the initial stack so that during a given thread’s
first context_switch
, context_switch
returns to start_thread
, and
start_thread
finds fn
at the top of the stack.
The implementation of this function requires you to write values into the thread’s initial stack and then set the stack pointer appropriately. Note that the x64 ABI declares mandates the stock pointer to be 16-byte aligned. The full function specification is above its definition. After you have implemented this routine, you should pass the phase 4 unit tests.
In this phase, you will implement grn_gc
in src/main.c
, a simple thread
garbage collection routine.
Upon thread termination, the thread’s initial function returns to
start_thread
, and start_thread
calls grn_exit
to terminate the thread.
grn_exit
is a simple function: it simply sets the thread’s status to ZOMBIE
so that grn_yield
doesn’t schedule it again.
Your task is to ensure that a thread’s resources are freed after it has exited.
You must implement grn_gc
, which finds all ZOMBIE
d threads and frees their
resources. Then, insert a call to grn_gc
in an appropriate location in your
library. Note that you have already written grn_destroy_thread
: grn_gc
should call this function appropriately. The full function specification is
above its definition. After you have implemented this routine, you should pass
the phase 5 unit tests.
Once you’ve completed the tasks above, you’re done and ready to submit! Ensure
that your lab and tests run as you expect them to on corn.stanford.edu
. We
will grade your lab on the FarmShare machines. Any changes made to files outside
of the src/
or include/
directories will not be submitted with your lab.
Ensure that your submission is not dependent on those changes.
Finally, run make submission
and then proceed to the submission
page to upload your
submission.
This phase is entirely optional and lack of completion will have no adverse effects on your score. Successful completion of this phase will result in a 25% boost to your score. You must successfully complete all phases to be awarded the extra credit.
In this extra credit phase, you will add preemption to your green threads
library. A preemptive thread is a thread that is interrupted by the scheduler
when the scheduler sees fit; the thread has no control over when it yields
execution to other threads. You will likely need to modify grn_init
to handle
the true
case for the preempt
parameter as well as add new functions and
modify previous functions. To test this phase, uncomment the line underneath the
TODO
in test/src/test.c
.
A preemptive scheduler works by setting a timer for some short duration and receiving an interrupt when that timer expires, transferring control to the scheduler without the thread’s intervention. Once the scheduler has control, it can decide to switch execution to another thread.
In an OS, the kernel sets up a timer interrupt using either an external
Programmable Interrupt Timer (such as the intel
8254) or a modern
CPU’s internal APIC. In user-level code, we
don’t have direct access to hardware, so we must use the mechanisms exposed to
us by the underlying OS. Thankfully, POSIX has provided us with timers which
deliver a signal to the timer-setting process when the timer has expired. You
can think of signals as user-level interrupts. You should consult the
signal(7)
man page reference for further important information on signals.
The setitimer
function, also referenced, allows you to set up a timer to
signal the calling process after it expires. To catch the signal, you must
register a signal handler using the sigaction
function. The following is
pseudocode for setting up a timer and signal handler:
void setup_timer() {
struct itimerval itimer = { /* some timer interval */ };
const struct sigaction timeout_action;
timeout_action.sa_handler = /* your handler function */;
sigemptyset(&timeout_action.sa_mask);
timeout_action.sa_flags = /* your flags */;
sigaction(/* EXPECTED_SIGNAL */, &timeout_action, NULL);
if (setitimer(/* REQUESTED_TIMER */, &itimer, NULL)) {
err_exit("setitimer failed: %s\n", strerror(errno));
}
}
Your task is to setup a timer immediately after a thread has been scheduled,
catch the resulting signal, and yield
the interrupted thread. If a new thread
is scheduled as a result of the yield
, a new timer should be setup (which may
be done automatically depending on how you initially called setitimer
).
Be aware that context switching out of a signal handler means that technically the signal handler has not returned. Hence, you have to be wary of some of the particularities of Unix signals. Specifically:
By default, a signal is blocked or masked while it is being handled and only
unmasked when the signal handler returns. If you yield while the timer signal
is masked, the next thread will not be preempted by a signal. You may wish to
address this through use of the SA_NODEFER
flag to
sigaction
, or
through use of
sigprocmask(2)
.
Many functions in the C library are not reentrant, a good example being
malloc
. You cannot call these functions from within a signal handler. The
signal-safety(7)
man page enumerates the set of all async-signal-safe functions that are safe
to call from within a signal handler. Note that since threads are preempted
within signal handlers, you may need to mask interrupts or hold a lock for the
duration of these functions. If needed, you can wrap library functions using C
preprocessor macros (e.g., #define malloc chloros_malloc
), but are only
required to do so for functions actually used in the lab.
Across a call to grn_yield()
, the compiler naturally saves any live
caller-saved floating point registers to the stack, just like caller-saved
integer registers. With signals, however, the C language spec does not
require implementations
to save and restore floating point registers across handlers. As it happens,
Linux saves and restores floating point registers anyway on
x86. As a rule, this
will not be the case for all OSes and architectures, however. For the
purposes of this lab, you can assume threads do not use floating point as
long as you don’t use it yourself.
After successful completion of this phase, you should pass the phase 6 unit
tests. To receive the extra credit, please add a file named extra_credit.txt
inside the src/
directory explaining your implementation. Then, resubmit.
Green Threads (Wikipedia)
Wikipedia's description of green threads. Note that a VM is not a requirement for green threads.
User-Level vs. Kernel-Level Threads
Slides from an operating systems lecture at Texas A&M discussing the differences between kernel-level and user-level threads.
System V X64 ABI
Details the calling conventions, register usage, and much more for System V X64 binaries.
X86 Assembly Guide
A concise explanation and reference for X86 instructions. Note that for 64-bit, you'll append 'q' for 'quad-word' to data transfer instructions.
Quick Guide to GDB
A quick-and-dirty guide meant to get you started with the GNU Debugger. A must-read for beginners.
Guide to objdump w/Examples
A quick guide with many examples on how to use objdump to display information about object files.
signal (man) Page
An overview of signals straight from the Linux manual.
sititimer (man) Page
The manual page for the `sititimer` function which can be used to set an interval timer.