As before, place the write-up to the questions and any challenge problems you do in a file called answers.txt (plain text) or answers.html (HTML format) in the top level of your source directory before handing in your work.
Your first task in this lab is to change the JOS kernel so that it does not always just run the environment in envs, but instead can alternate between multiple environments in "round-robin" fashion. Round-robin scheduling in JOS works as follows:
Implement round-robin scheduling in sched_yield()
as described above,
and implement the crucial register state saving code
in env_run(). Don't forget to modify
syscall() to dispatch sys_yield()
Hint: the symbol UTF defined in kern/trap.h may be useful.
Modify kern/init.c to create two (or more!) environments that all run the program user/yield.c. You should see the environments switch back and forth between each other five times before terminating, at which point the idle process runs and invokes the JOS kernel debugger. If this does not happen or the output looks wrong, then fix your code before proceeding.
env_run()you should have called
lcr3(). But before and after the call to
lcr3(), your code makes references (at least it should) to the variable
e--the argument to
env_run. How can this work?
Upon loading the
%cr3 register, the addressing context
used by the MMU is instantly changed. The problem is that a virtual
e) has meaning relative to a given
address context--the address context specifies the physical address to
which the virtual address map. Why can the pointer
dereferenced both before and after the addressing switch?
Add a less trivial scheduling policy to the kernel,
such as a fixed-priority scheduler that allows each environment
to be assigned a priority
and ensures that higher-priority environments
are always chosen in preference to lower-priority environments.
If you're feeling really adventurous,
try implementing a Unix-style adjustable-priority scheduler
or even a lottery or stride scheduler.
(Look up "lottery scheduling" and "stride scheduling" in Google.)
Write a test program or two that verifies that your scheduling algorithm is working correctly (i.e., the right environments get run in the right order). It may be easier to write these test programs once you have implemented fork() and IPC in parts B and C of this lab.
|Challenge! The JOS kernel currently does not allow applications to use the x86 processor's x87 floating-point unit (FPU), MMX instructions, or Streaming SIMD Extensions (SSE). Extend the Env structure to provide a save area for the processor's floating point state, and extend the context switching code to save and restore this state properly when switching from one environment to another. The FXSAVE and FXRSTOR instructions may be useful, but note that these are not in the old i386 user's manual because they were introduced in more recent processors. Write a user-level test program that does something cool with floating-point.|
Although your kernel is now capable of running and switching between multiple user-level environments, it is still limited to running environments that the kernel initially set up. You will now implement the necessary JOS system calls to allow user environments to create and start other new user environments.
Unix provides the
fork() system call
as its process creation primitive.
Unix fork() copies
the entire address space of calling process (the parent)
to create a new process (the child).
The only differences between the two observable from user space
are their process IDs and parent process IDs
(as returned by
In the parent,
fork() returns the child's process ID,
while in the child,
fork() returns 0.
The two processes do not share any memory: writes to one
process's memory do not appear in the other and vice versa.
In JOS we will provide a different, much more primitive set of system system calls for creating new user-mode environments. With these system calls we will be able to implement Unix-like fork() functionality entirely in user space, in addition to other types of environment creation functionality. The new system calls we will use in JOS are as follows:
sys_env_alloccall. The only difference is that in the parent,
sys_env_allocwill return the id of the newly created environment, but in the child it will return 0. (Since the child starts out marked as not runnable,
sys_env_allocwill not actually return in the child until the parent has explicitly allowed this by marking the child runnable.)
For all of the system calls above that accept environment IDs,
the JOS kernel supports the special convention
that a value of 0 effectively means "the current environment."
This convention is implemented by
We have provided a very primitive implementation of Unix-like fork() functionality in the test program user/dumbfork.c. This test program uses the above system calls to create and run a child environment with a copy of its own address space. The two environments then switch back and forth using sys_yield as in the previous exercise. The parent exits after 10 iterations, whereas the child exits after 20.
|Exercise 2. Implement the system calls described above in kern/syscall.c. You will need to use various functions in kern/pmap.c and kern/env.cc, particularly envid2env(). For now, whenever you call envid2env(), just pass 0 in the checkperm parameter. Be sure you check for any invalid system call arguments, returning -E_INVAL in that case. Test your JOS kernel with user/dumbfork and make sure it works before proceeding.|
|Challenge! Add the additional system calls necessary to read all of the vital state of an existing environment as well as set it up. Then implement a user mode program that forks off a child process, runs it for a while (e.g., a few iterations of sys_yield()), then takes a complete snaphost or checkpoint of the child process, runs the child for a while longer, and finally restores the child process to the state it was in at the checkpoint and continues it from there. Thus, you are effectively "replaying" the execution of the child process from an intermediate state. Make the child process perform some interaction with the user using sys_cgetc() or readline() so that the user can view and mutate its internal state, and verify that with your checkpoint/restart functionality you can give the child process a case of selective amnesia, making it "forget" everthing that happened beyond a certain point.|
An environment can only use the above system calls to manipulate environments as long as all of the environments affected are either:
- The environment making the system call.
- Immediate children of the environment making the system call.
|Exercise 3. Implement the above security convention by modifying the code in envid2env() appropriately, and by changing the above system calls to pass 1 for the checkperm parameter, thereby enabling this permission check.|
This completes Part A of the lab; hand it in using gmake handin as usual.
fork()system call as its primary process creation primitive. The fork() system call copies the address space of the calling process (the parent) to create a new process (the child).
Original versions of Unix implemented
fork() by copying
the parent's entire data segment into a new memory region allocated
for the child. This is essentially the same approach that our
dumbfork() function above takes. The copying of the parent's
address into the child is generally, by far, the most expensive part
of the fork() operation.
However, a call to fork() is frequently followed almost immediately by a call to exec() in the child process, which clears out the child process's address space and loads a new program in its place. (This is what the the shell typically does, for example.) In this case, most of the laborious work done in copying the parent's entire address space into the child is useless, because the child process will probably actually touch very little of its memory before calling exec() to execute a new program.
For this reason,
Later versions of Unix took advantage
of more advanced virtual memory hardware
to allow the parent and child to share
the memory mapped into their respective address spaces
until one of the processes actually modifies it.
This technique is known as copy-on-write.
To do this,
on fork() the kernel would
merely copy the address space mappings
from the parent to the child
instead of the actual contents of the mapped pages,
and at the same time mark the now-shared pages read-only.
When one of the two processes tries to write to one of these shared pages,
the process takes a page fault.
At this point, the Unix kernel realizes that the page
was really a "virtual" or "copy-on-write" copy,
and so it makes a new, private copy of the page for the faulting process.
In this way, the contents of individual pages aren't actually copied
until they are actually written to.
This optimization makes a
fork() followed by
exec() in the child much cheaper:
the child will probably only need to copy one page
(the current page of its stack)
before it calls
In the next piece of this lab, you will implement "proper"
fork() functionality with copy-on-write,
as a user space library routine.
Implementing fork() and copy-on-write support in user space
has the benefit that the kernel remains much simpler
and thus more likely to be correct.
It also lets individual user-mode programs
define their own semantics for
A program that wants a slightly different implementation
(for example, the expensive always-copy version like dumbfork(),
or one in which the parent and child actually share memory afterward)
can easily provide its own.
Most page faults encountered during normal program execution are usually fixable. For example, most Unix kernels initially map only a single page in a new process's stack region, and allocate and map additional stack pages later "on demand" as the process's stack consumption increases and causes page faults on stack pages that are not yet mapped. A typical Unix kernel must track, for each user-mode process, the various regions of memory that are mapped and what to do when faults happen in them. For example, a fault in the stack region will typically allocate and map in a new page. A fault in the program's BSS region will typically map in a new page and also make sure it is zeroed. In systems with demand-paged executables, a fault in the text region will read the corresponding page of the binary off of disk and then map it in.
This is a lot of information for the kernel to track and get right. Instead of taking the traditional Unix approach, in JOS we push this fault handling functionality into user space, where bugs are less damaging. This design has the added benefit of allowing programs great flexibility in defining their memory regions. Not only do we get the ability to implement copy-on-write fork(), but we will use user-level page fault handling later for mapping and accessing files on a disk-based file system.
|Exercise 4. Implement the sys_set_pgfault_entry system call. Be sure to enable permission checking when looking up the environment ID of the target environment, since this is also a "dangerous" system call.|
The JOS user exception stack is also one page in size,
and its top is defined to be at virtual address
so the valid bytes of the user exception stack
are from UXSTACKTOP-BY2PG through UXSTACKTOP-1 inclusive.
While running on this exception stack,
the user-level page fault handler
can use JOS's regular system calls to map new pages or adjust mappings
so as to fix whatever problem originally caused the page fault.
Then the user-level page fault handler returns,
via an assembly language stub,
to the faulting code on the original stack.
Each user environment that wants to support user-level page fault handling will need to allocate memory for its own exception stack, using the sys_mem_alloc() system call introduced in part A.
You will now need to
change the page fault handling code in
to handle page faults from user mode as follows.
We will call the state of the user environment at the time of the
fault the trap-time state.
If there is no page fault handler registered, the JOS kernel destroys the user environment with a message as before. Otherwise, the kernel sets up a trap frame on the exception stack that looks like this:
-0 <-- UXSTACKTOP empty -4 empty -8 empty -12 empty -16 empty -20 trap-time eip -24 trap-time eflags -28 trap-time esp -32 tf_err (error code) -36 fault_va -40 <-- %esp when handler is run
The kernel then arranges for the user environment to resume execution
with the page fault handler
running on the exception stack with this stack frame.
(You must figure out how to make this happen.)
empty line in the frame above is simply
a 32-bit word-size space on the exception stack
that the kernel does not initialize,
but which the fault handler in the user environment can use.
fault_va is the virtual address
at which the page fault occurred.
If the user environment is already running on the user exception stack
when an exception occurs,
then the page fault handler itself has faulted.
In this case,
you should start the new stack frame just under the current
tf->tf_esp rather than at
(...existing contents of exception stack...) -0 <-- tf->tf_esp empty -4 empty -8 empty -12 empty -16 empty -20 trap-time eip -24 trap-time eflags -28 trap-time esp -32 tf_err (error code) -36 fault_va -40 <-- %esp when handler is run
To test whether
tf->tf_esp is already on the user
exception stack, check whether it is in the range
between UXSTACKTOP-BY2PG and UXSTACKTOP-1, inclusive.
|Exercise 5. Implement the code in kern/trap.c required to dispatch page faults the user-mode handler. Be sure to take appropriate precautions when writing into the exception stack. (What happens if the user environment runs out of space on the exception stack?)|
Next, you need to implement the assembly routine that will take care of calling the C page fault handler and resume execution at the original faulting instruction. This assembly routine is the handler that will be registered with the kernel using sys_set_pgfault_entry().
Implement the |
Three of the empty words in the stack frame created by the kernel above are for the assembly routine to use to save the caller-save registers before moving on to C code. The other two are important for the recursive fault case. Draw a stack diagram for the recursive case and carefully examine how each word gets used at each point during the execution of the fault handler in order to figure out how to use these words correctly.
Finally, you need to implement the C user library side of the user-level page fault handling mechanism.
user/faultread. Build your kernel and run it. You should see:
 new env 00000800  new env 00001001  user fault va 00000000 ip 0080003a TRAP frame ...  free env 00001001
kern/init.c to run
Build your kernel and run it. You should see:
 new env 00000800  new env 00001001 i faulted at va deadbeef, err 6  exiting gracefully  free env 00001001
kern/init.c to run
Build your kernel and run it. You should see:
 new env 00000800  new env 00001001 fault deadbeef this string was faulted in at deadbeef fault cafebffe fault cafec000 this string was faulted in at cafebffe  exiting gracefully  free env 00001001
If you see only the first "this string" line, it means you are not handling recursive page faults properly.
kern/init.c to run
Build your kernel and run it. You should see:
 new env 00000800  new env 00001001  PFM_KILL va deadbeef ip f010263d TRAP frame ...  free env 00001001(Your ip may differ from ours but should begin
Make sure you understand why
user/faultallocbad behave differently.
|Challenge! Extend your kernel so that not only page faults, but all types of processor exceptions that code running in user space can generate, can be redirected to a user-mode exception handler. Write user-mode test programs to test user-mode handling of various exceptions such as divide-by-zero, general protection fault, and illegal opcode.|
forkfunctionality, entirely in user space.
We have provided a skeleton for your fork() function in lib/fork.c. Like dumbfork(), fork() creates a new environment, then scans through the parent environment's entire address space and sets up corresponding page mappings in the child. The key difference is that, while dumbfork() copied entire pages, fork() will initially only copy page mappings. Notice that the duppage() helper function in dumbfork calls sys_mem_alloc() to allocate a new page of physical memory for each page in the parent, and then calls memcpy() to copy the contents of the parent's page into the child's new page. These calls to memcpy() represent the bulk of the time dumbfork() takes to run, and so fork() attempts to "optimize away" most of this page copying by copying pages lazily only when they are actually modified.
The basic control flow for
fork() is as follows:
pgfault()as the C-level page fault handler, using the
set_pgfault_handler()function you implemented above.
sys_env_alloc()to allocate a child environment.
The exception stack is not remapped this way, however. Instead you need to allocate a fresh page in the child for the exception stack. Since the page fault handler will be doing the actual copying and the page fault handler runs on the exception stack, the exception stack cannot be made copy-on-write: who would copy it?
After the fork, both processes will take page faults when the code they run attempts to write to a page that hasn't been copied yet. Here's the control flow for the user page fault handler:
_pgfault_entry, which calls fork()'s
pgfault()checks that the fault is a write (check
FEC_WR) and that the PTE for the page is marked
PTE_COW(copy-on-write). If not, panic.
pgfault()allocates a new page mapped at a temporary location and copies the contents of the faulting page contents into it. Then the fault handler maps the new page at the appropriate address with read/write permissions, in place of the old read-only mapping.
Test your code with the forktree program. It should produce the following messages, with interspersed 'new env', 'free env', and 'exiting gracefully' messages:
1001: I am '' 1802: I am '0' 2801: I am '00' 3802: I am '000' 2003: I am '1' 5001: I am '11' 4802: I am '10' 6801: I am '100' 5803: I am '110' 3004: I am '01' 8001: I am '011' 7803: I am '010' 4005: I am '001' 6006: I am '111' 7007: I am '101'
Implement a shared-memory |
Your implementation of |
How much faster is your new
You can answer this (roughly) by using analytical arguments to estimate how much of an improvement batching system calls will make to the performance of your fork: How expensive is an int 0x30 instruction? How many times do you execute int 0x30 in your fork? Is accessing the TSS stack switch also expensive? And so on...
Alternatively, you can boot your kernel on real hardware and really benchmark your code. See the RDTSC (read time-stamp counter) instruction, defined in the IA32 manual, which counts the number of clock cycles that have elapsed since the last processor reset. Bochs doesn't emulate this instruction faithfully.
This ends part B. As usual, you can grade your submission
gmake grade and hand it in with
Modify kern/init.c to run the user/spin test program. This test program forks off a child process, which simply spins forever in a tight loop once it receives control of the CPU. Neither the parent process nor the kernel ever regains the CPU. This is obviously not an ideal situation in terms of protecting the system from bugs or malicious code in user-mode environments, because any user-mode environment can bring the whole system to a halt simply by getting into an infinite loop and never giving back the CPU. In order to allow the kernel to preempt a running environment, forcefully retake control of the CPU from it, we must extend the JOS kernel to support external hardware interrupts.
External interrupts (i.e., device interrupts) are refered as IRQs.
There are 16 possible IRQs, numbered 0 through 15.
The mapping from IRQ number to IDT entry is not fixed.
picirq.c maps IRQs 0-15
IRQ_OFFSET is defined to be decimal 32.
Thus the IDT entries 32-44 correspond to the IRQs 0-15.
For example, the clock interrupt is IRQ 0.
Thus, IDT contains the address of
the clock's interrupt handler routine in the kernel.
IRQ_OFFSET is chosen so that the device interrupts
do not overlap with the processor exceptions,
which could obviously cause confusion.
(In fact, in the early days of PCs running MS-DOS,
the IRQ_OFFSET effectively was zero,
which indeed caused massive confusion between handling hardware interrupts
and handling processor exceptions!)
In JOS, we make a key simplification compared to many operating
systems. External device interrupts are always disabled when
in the kernel and always enabled when in user space. External
interrupts are controlled by the
FL_IF flag bit of the
%eflags register (see inc/mmu.h). When this bit
is set, external interrupts are enabled. While the bit can be
modified in several ways, because of our simplification, we will
handle it solely through the process of saving and restoring
%eflags register as we enter and leave user mode.
You will have to ensure that the
FL_IF flag is set in
user processes when they run so that when an interrupt arrives, it
gets passed through to the processor and handled by your interrupt code.
Otherwise, interrupts are masked,
or ignored until interrupts are re-enabled.
Interrupts are masked by default after processor reset,
and so far we have simply never gotten around to enabling them.
Modify kern/trapentry.S and kern/trap.c
to initialize the appropriate entries in the IDT
and provide handlers for IRQs 0 through 15.
Then modify the code in |
The processor never pushes an error code or checks the Descriptor Privilege Level (DPL) of the IDT entry when invoking a hardware interrupt handler. You might want to re-read section 9.2 of the 80386 Reference Manual, or section 5.8 of the IA-32 Intel Architecture Software Developer's Manual, Volume 3, at this time.
After doing this exercise, if you run your kernel with any test program that runs for a non-trivial length of time (e.g., dumbfork), you should see a kernel panic shortly into the program's execution. This is because our code has set up the clock hardware to generate clock interrupts, and interrupts are now enabled in the processor, but JOS isn't yet handling them.
In the user/spin program, after the child environment was first run, it just spun in a loop, and the kernel never got control back. We need to program the hardware to generate clock interrupts periodically, which will force control back to the kernel where we can switch control to a different user environment.
The calls to
which we have written for you,
set up the clock and the interrupt controller to generate interrupts.
You now need to write the code to handle these interrupts.
Modify the kernel's trap() function
so that it calls sched_yield()
to find and run a different environment
whenever a clock interrupt takes place.
You should now be able to get the user/spin test to work: the parent process should fork off the child, sys_yield() to it a couple times but in each case regain control of the CPU after one time slice, and finally kill the child process and terminate gracefully.
Make sure you can answer the following questions:
vbcommand mentioned earlier.
We've been focusing on the isolation aspects of the operating system, the ways it provides the illusion that each program has a machine all to itself. Another important service of an operating system is to allow programs to communicate with each other when they want to. It can be quite powerful to let programs interact with other programs. The Unix pipe model is the canonical example.
There are many models for interprocess communication. Even today there are still debates about which models are better for various reasons. We won't get into that debate. Instead, we'll implement a simple IPC mechanism and then try it out.
sys_ipc_can_send. Then you will implement two library wrappers
The "messages" that user environments can send to each other using JOS's IPC mechanism consist of two components: a single 32-bit value, and optionally a single page mapping. Allowing environments to pass page mappings in messages provides an efficient way to transfer more data than will fit into a single 32-bit integer, and also allows environments to set up shared memory arrangements easily.
To receive a message, an environment calls
This system call deschedules the current
environment and does not run it again until a message has
When an environment is waiting to receive a message,
any other environment can send it a message -
not just a particular environment,
and not just environments that have a parent/child arrangement
with the receiving environment.
In other words, the permission checking that you implemented in Part A
will not apply to IPC,
because the IPC system calls are carefully designed so as to be "safe":
an environment cannot cause another environment to malfunction
simply by sending it messages
(unless the target environment is also buggy).
To try to send a value, an environment calls
sys_ipc_can_send with both the receiver's
environment id and the value to be sent. If the named
environment is actually receiving (it has called
sys_ipc_recv and not gotten a value yet),
then the send delivers the message and returns 0. Otherwise
the send returns
-E_IPC_NOT_RECV to indicate
that the target environment is not currently expecting
to receive a value.
A library function
ipc_recv in user space will take care
sys_ipc_recv and then looking up
the information about the received values in the current
Similarly, a library function
take care of repeatedly calling
until the send succeeds.
When an environment calls
with a nonzero
the environment is stating that it is willing to receive a page mapping.
If the sender sends a page,
then that page should be mapped at
in the receiver's address space.
If the receiver already had a page mapped at
then that previous page is unmapped.
When an environment calls
with a nonzero
it means the sender wants to send the page
currently mapped at
srcva to the receiver,
After a successful IPC,
the sender keeps its original mapping
for the page at
srcva in its address space,
but the receiver also obtains a mapping for this same physical page
dstva originally specified by the receiver,
in the receiver's address space.
As a result this page becomes shared between the sender and receiver.
If either the sender or the receiver does not indicate
that a page should be transferred,
then no page is transferred.
After any IPC
the kernel sets the new field
in the receiver's
to the permissions of the page received,
or zero if no page was received.
Use the user/pingpong and user/primes functions
to test your IPC mechanism.
You might find it interesting to read
Why does |
|Challenge! The prime sieve is only one neat use of message passing between a large number of concurrent programs. Read C. A. R. Hoare, ``Communicating Sequential Processes,'' Communications of the ACM 21(8) (August 1978), 666-667, and implement the matrix multiplication example.|
|Challenge! Probably the most impressive example of the power of message passing is Doug McIlroy's power series calculator, described in M. Douglas McIlroy, ``Squinting at Power Series,'' Software--Practice and Experience, 20(7) (July 1990), 661-683. Implement his power series calculator and compute the power series for sin(1+x^2).|
This ends part C. As usual, you can grade your submission
gmake grade and hand it in with
handin. If you are trying to figure out why a particular
test case is failing, run
sh grade.sh -v, which will
show you the output of the kernel builds and Bochs runs for each
test, until a test fails. When a test fails, the script will stop,
and then you can inspect
bochs.out to see what the
kernel actually printed.