In this lab you will implement the basic kernel facilities required to get a protected user-mode environment (i.e., "process") running. You will enhance the JOS kernel to set up the data structures to keep track of user environments, create a single user environment, load a program image into it, and start it running. You will also make the JOS kernel capable of handling any system calls the user environment makes and handling any other exceptions it causes.
Note: In this lab, the terms environment and process are interchangeable - they have roughly the same meaning. We introduce the term "environment" instead of the traditional term "process" in order to stress the point that JOS environments do not provide the same semantics as UNIX processes, even though they are roughly comparable.
You will need to merge the code for lab 4 from ~class/src/lab4.tar.gz into your solution to the previous lab using your CVS repository. Start by making sure your solution to the previous lab is checked in. You might also want to tag it for good measure. If the JOS kernel is currently checked out in a directory called jos, do the following:
% setenv CVSROOT ~/cvsroot [or wherever your CVS repository is] % cd jos % cvs up -dP cvs update: Updating . ... % cvs ci -m "my solution to lab3" ... % cvs tag SOL3 ... % cd .. %Next, you will want to import the code for this lab:
% tar xzf ~class/src/lab4.tar.gz % cd lab4 % cvs import -m "lab4 import" jos JOS LAB4 U jos/CODING U jos/GNUmakefile ... N jos/user/testbss.c 1 conflicts created by this import. Use the following command to help the merge: cvs checkout -jJOS:yesterday -jJOS jos % cd ../jos % cvs up -dP -jLAB3 -jLAB4 cvs update: Updating . cvs update: scheduling .labsetup for removal ? bochs.log ? bochs.out ... % cvs ci -m "merged lab3 -> lab4 changes" cvs commit: Examining . cvs commit: Examining boot cvs commit: Examining conf ... %Note that CVS suggests using the flag -jJOS:yesterday. This would work too, assuming you don't import more than one lab per day, since then the head of the JOS branch yesterday would have been your import of the lab3 code distribution. However, since we have been careful to assign the tags LAB3 and LAB4 to the labs, we can more precisely specify exactly what we want merged.
Lab 4 contains a number of new source files, which you should browse through as your merge them into your kernel:
inc/ | env.h | Public definitions for user-mode environments |
trap.h | Public definitions for trap handling | |
syscall.h | Public definitions for system calls from user environments to the kernel | |
lib.h | Public definitions for the user-mode support library | |
kern/ | env.h | Kernel-private definitions for user-mode environments |
env.c | Kernel code implementing user-mode environments | |
trap.h | Kernel-private trap handling definitions | |
trap.c | Trap handling code | |
trapentry.S | Assembly-language trap handler entrypoints | |
syscall.h | Kernel-private definitions for system call handling | |
syscall.c | System call implementation code | |
lib/ | Makefrag | Makefile fragment to build user-mode library, obj/lib/libuser.a |
entry.S | Assembly-language entrypoint for user environments | |
libmain.c | User-mode library setup code called from entry.S | |
syscall.c | User-mode system call stub functions | |
console.c | User-mode implementations of putchar and getchar, providing console I/O | |
exit.c | User-mode implementation of exit | |
panic.c | User-mode implementation of panic | |
user/ | * | Various test programs to check lab 3 functionality |
In addition, a number of the source files we handed out for lab2 are modified in lab3. To see the differences, you can type:
% cvs diff -uN -rSOL3
%
As in the previous lab, you will need to do all of the regular exercises described in the lab, and put brief answers to the questions in a file answers.txt or answers.html. if you do any of the challenge problems, please also include a description of what you have done in answers.txt.
As before, you can test your code against our test scripts by running gmake grade. When you are done, run gmake handin to tar up and hand in your source tree.
As you can see in kern/env.c, the kernel maintains three main global variables pertaining to environments:
struct Env *envs = NULL; /* All environments */ struct Env *curenv = NULL; /* the current env */ static struct Env_list env_free_list; /* Free list */
Once JOS gets up and running, the envs pointer points to an array of Env structures representing all the environments in the system. In our design, the JOS kernel will support a maximum of NENV simultaneously active environments, although there will typically be far fewer running environments at any given time. (NENV is a constant #define'd in inc/env.h.) Once it is allocated, the envs array will contain a single instance of the Env data structure for each of the NENV possible environments.
The JOS kernel keeps all of the inactive Env structures
on the env_free_list
.
This design allows extremely quick and efficient allocation and
deallocation of environments,
as they merely have to be added to or removed from the free list.
The kernel uses the curenv
variable
to keep track of the currently executing environment at any given time.
During boot up, before the first environment is run,
curenv is initially set to NULL
.
struct Env { struct Trapframe env_tf; // Saved registers LIST_ENTRY(Env) env_link; // Free list link pointers u_int env_id; // Unique environment identifier u_int env_parent_id; // env_id of this env's parent u_int env_status; // Status of the environment // Address space Pde *env_pgdir; // Kernel virtual address of page dir u_int env_cr3; // Physical address of page dir };
We now briefly describe the state kept by the kernel for each user environment.
env_free_list
.
See inc/queue.h
for details.
ENV_FREE
:
ENV_RUNNABLE
:
ENV_NOT_RUNNABLE
:
Like a Unix process,
a JOS environment couples the concepts of "thread" and "address space".
The thread is defined primarily by the saved registers
(the env_tf
field),
and the address space is defined by
the page directory and page tables
pointed to by env_pgdir
and env_cr3
.
To run an environment,
the kernel must set up the CPU
with both the saved registers and the appropriate address space.
In JOS, individual environments do not have their own kernel stacks as
processes do in UNIX. Instead, all JOS kernel code runs on a
single kernel stack, and the kernel saves user-mode register
state explicitly in each environment's struct Env
rather
than implicitly on the kernel stack.
In the last lab, you allocated memory in i386_vm_init() for
the pages[]
array, which is a table the kernel uses to
keep track of which pages are free and which are not. You will now
need to modify i386_vm_init() further to allocate a similar
array of Env structures, called envs.
Exercise 1. Modify i386_vm_init() in kern/pmap.c to allocate and map the envs array. This array consists of exactly NENV instances of the Env structure, laid out consecutively in the kernel's virtual address space starting at address UENVS (defined in inc/pmap.h). The physical pages that these virtual addresses map to do not have to be contiguous, since the kernel only ever uses virtual addresses to access the envs array. You should be able to allocate and map this array in exactly the same way as you did for the pages array. |
You will now write the code in kern/env.c necessary to run a user environment. Because we do not yet have a filesystem, we will set up the kernel to load a static binary image that is embedded within the kernel itself. These embedded binaries will be full ELF executable images,
Once you integrate our lab code with your solutions for the previous lab, you will notice that our makefiles generate a number of binary images in the obj/user/ directory. Further, if you look at kern/Makefrag, you will notice some magic that takes all of these binaries and ``links'' them directly into the kernel executable as if they were .o files. The '-b binary' option on the linker command line causes these files to be linked in as "raw" uninterpreted binary files rather than as regular .o files produced by the compiler. (As far as the linker is concerned, these files do not have to be ELF images at all - they could be anything, such as text files or pictures!) If you look at obj/kern/kernel.sym after building the kernel, you will notice that the linker has "magically" produced a number of funny symbols with obtuse names like _binary_obj_user_hello_start, _binary_obj_user_hello_end, and _binary_obj_user_hello_size. The linker generated these symbol names simply by mangling the file names of these binary files; these magic symbols provide provide the regular kernel code with a way to reference the embedded binary files.
In kern/env.h you will find some macros that kern/init.c uses to load one of these binary images into a user environment via env_create and then run it via env_run. However, the critical functions to set up user environments are not complete; you will need to fill them in.
Exercise 2.
In the file env.c ,
finish coding the following functions:
As you write these functions,
you might find the new printf verb r = -E_NO_MEM; panic("env_alloc: %e", r);will panic with the message "env_alloc: out of memory". |
Once you are done you should compile your kernel and run it under Bochs. Below is a call graph of the code up to the point where the user code is invoked. Make sure you understand the purpose of each step.
start
(kern/entry.S
)
i386_init
cons_init
i386_detect_memory
i386_vm_init
page_init
env_init
idt_init
(still incomplete at this point)
env_create
env_run
env_pop_tf
Set a Bochs breakpoint at env_pop_tf, which should be the last function you hit before actually entering user mode. Step through this function; the processor should enter user mode after the iret instruction. You should then see the first instruction in the user environment's executable, which is the cmpl instruction at the label start in lib/entry.S. You should be able to single-step through this user mode environment code until you first hit an int $0x30 instruction, which is the instruction that user-mode code executes to make a system call. (See lib/syscall.c to see how this works.) If you cannot get to this point, then something is wrong with your address space setup or program loading code; go back and fix it before continuing.
Exercise 3. Read Chapter 9, Exceptions and Interrupts in the 80386 Programmer's Manual (or Chapter 5 of the IA-32 Developer's Manual), if you haven't already. |
In this lab we generally follow Intel's terminology for interrupts, exceptions, and the like. However, be aware that terms such as exceptions, traps, interrupts, faults and aborts have no standardized meaning across architectures or operating systems, and often used rather loosely without close regard to the subtle distinctions between them on a particular architecture such as the x86. When you see these terms outside of this lab, the meanings might be slightly different.
In order to ensure that these protected control transfers are actually protected, the processor's interrupt/exception mechanism is designed so that the code currently running when the interrupt or exception occurs does not get to choose arbitrarily where the kernel is entered or how. Instead, the processor ensures that the kernel can be entered in this way only under carefully controlled conditions. On the x86, this protection is provided on the basis of two particular mechanisms:
The Interrupt Descriptor Table. The processor ensures that interrupts and exceptions can only cause the kernel to be entered at a few specific, well-defined entrypoints determined by the kernel itself, and not by the code currently running when the interrupt or exception is taken.
In particular, x86 interrupts and exceptions are differentiated into up to 256 possible "types", each associated with a particular interrupt number (often referred to synonymously as an exception number or trap number). Once the processor identifies a particular interrupt or exception to be taken, it uses the interrupt number as an index into the processor's interrupt descriptor table (IDT), which is a special table that the kernel sets up in kernel-private memory, much like the GDT. From the appropriate entry in this table the processor loads:
The Task State Segment. In addition to having a well-defined entrypoint in the kernel for an interrupt or exception handler, the processor also needs a place to save the old processor state before the interrupt or exception occurred, such as the original values of EIP and CS before the processor invoked the exception handler, so that the exception handler can later restore that old state and resume the interrupted code from where it left off. But this save area for the old processor state must in turn be protected from unprivileged user-mode code; otherwise buggy or malicious user code could easily compromise the kernel.
For this reason, when an x86 processor takes an interrupt or trap that causes a privilege level change from user to kernel mode, it not only loads new values into EIP and CS, but also loads new values into the stack pointer (ESP) and stack segment (SS) registers, effectively switching to a new stack private to the kernel. The processor then pushes the original values of all of these registers, along with the contents of the EFLAGS register, onto this new kernel stack before starting to run the kernel's exception handler code. The new ESP and SS do not come from the IDT like the EIP and CS do, but instead from a separate structure called the task state segment (TSS).
Although the TSS is a somewhat large and complex data structure that can potentially serve a variety of purposes, in JOS it will only be used to define the kernel stack that the processor should switch to when it transfers from user to kernel mode. Since "kernel mode" in JOS is privilege level 0 on the x86, the processor uses the ESP0 and SS0 fields of the TSS to define the kernel stack when entering kernel mode; none of the other fields in the TSS will ever ever be used in JOS.
All of the synchronous exceptions that the x86 processor can generate internally use interrupt numbers between 0 and 31, and therefore map to IDT entries 0-31. For example, the page fault handler is ``hard-wired'' by Intel to interrupt number 14. Interrupt numbers greater than 31 are only used by software interrupts, which can be generated by the INT instruction, or asynchronous hardware interrupts, caused by external devices when they need attention.
In this section we will extend JOS to handle the internally generated x86 exceptions in the 0-31 that are currently defined by Intel. In addition, in the next section we will also make JOS handle software interrupt number 0x30, which JOS (fairly arbitrarily) uses as its system call interrupt number. In Lab 5 we will extend JOS to handle externally generated hardware interrupts such as the clock interrupt.
+--------------------+ KSTACKTOP | 0x00000 old SS | " - 4 | old ESP | " - 8 | old EFLAGS | " - 12 | 0x00000 | old CS | " - 16 | old EIP | " - 20 <---- ESP +--------------------+
CS:EIP
to point to the handler function defined there.
For certain types of x86 exceptions, in addition to the "standard" five words above, the processor pushes onto the stack another word containing an error code. The page fault exception, number 14, is an important example. See the 80386 manual to determine for which exception numbers the processor pushes an error code, and what the error code means in that case. When the processor pushes an error code, the stack would look as follows at the beginning of the exception handler when coming in from user mode:
+--------------------+ KSTACKTOP | 0x00000 old SS | " - 4 | old ESP | " - 8 | old EFLAGS | " - 12 | 0x00000 | old CS | " - 16 | old EIP | " - 20 | error code | " - 24 <---- ESP +--------------------+
The processor can take exceptions and interrupts both from kernel and user mode. It is only when entering the kernel from user mode, however, that the x86 processor automatically switches stacks before pushing its old register state onto the stack and invoking the appropriate exception handler through the IDT. If the processor is already in kernel mode when the interrupt or exception occurs (the low 2 bits of the CS register are already zero), then the kernel just pushes more values on the same kernel stack. In this way, the kernel can gracefully handle nested exceptions caused by code within the kernel itself. This capability is an important tool in implementing protection, as we will see later in the section on system calls.
If the processor is already in kernel mode and takes a nested exception, since it does not need to switch stacks, it does not save the old SS or ESP registers. For exception types that do not push an error code, the kernel stack therefore looks like the following on entry to the exception handler:
+--------------------+ <---- old ESP | old EFLAGS | " - 4 | 0x00000 | old CS | " - 8 | old EIP | " - 12 +--------------------+
For exception types that push an error code, the processor pushes the error code immediately after the old EIP, as before.
There is one important caveat to the processor's nested exception capability. If the processor takes an exception while already in kernel mode, and cannot push its old state onto the kernel stack for any reason such as lack of stack space, then there is nothing the processor can do to recover, so it simply resets itself. Needless to say, any decent kernel should be designed so that this will never happen unintentionally.
You should now have the basic information you need in order to set up the IDT and handle exceptions in JOS. For now, you will set up the IDT to handle interrupt numbers 0-31 (the processor exceptions) and interrupts 32-47 (the device IRQs). We may add additional interrupts later.
The header files inc/trap.h and kern/trap.h contain important definitions related to interrupts and exceptions that you will need to become familiar with. The file kern/trap.h contains trap-related definitions that will remain strictly private to the kernel, while the companion header file inc/trap.h contains general definitions that may also be useful to user-level programs and libraries in the system.
Note: Some of the exceptions in the range 0-31 are defined by Intel to be reserved. Since they will never be generated by the processor, it doesn't really matter how you handle them. Do whatever you think is cleanest.
The overall flow of control that you should achieve is depicted below:
IDT tranentry.S trap.c +----------------+ | &handler1 |---------> handler1: trap (struct Trapframe *tf) | | // do stuff { | | call trap // handle the exception/interrupt | | // undo stuff } +----------------+ | &handler2 |--------> handler2: | | // do stuff | | call trap | | // undo stuff +----------------+ . . . +----------------+ | &handlerX |--------> handlerX: | | // do stuff | | call trap | | // undo stuff +----------------+
Each exception or interrupt has its own handler in trapentry.S
and the IDT is initialized with the address of these handlers.
Each of the handlers should build a struct Trapframe
(see inc/trap.h
) on the stack and call into
trap()
(in trap.c
)
with a pointer to the Trapframe.
After control is passed to trap()
, that function handles the
exception/interrupt or dispatches the exception/interrupt to a specific
handler function.
If and when the trap() function returns,
the code in trapentry.S
restores the old CPU state saved in the Trapframe
and then uses the iret instruction
to return from the exception.
Exercise 4.
Edit trapentry.S and trap.c and
implement the functionality described above. The macros
IDTFNC and IDTFNC_NOEC in
trapentry.S should help you, as well as the T_*
defines in inc/trap.h . You will need to add an
entry point in trapentry.S (using those macros)
for each trap defined in inc/trap.h . You will
also need to modify idt_init() to initialize the
idt to point to each of these entry points
defined in trapentry.S .
Hint: your code should perform the following steps:
Consider using the Test your trap handling code using some of the test programs in the user directory that cause exceptions before making any system calls, such as user/divzero. You should be able to get make grade to succeed on the divzero, softint, and badsegment tests at this point. |
Challenge!
You probably have a lot of very similar code right now,
between the lists of IDTFNC in trapentry.S
and their installations in trap.c .
Clean this up.
Change the macros in trapentry.S to automatically
generate a table for trap.c to use.
Note that you can switch between laying down code and data
in the assembler by
using the directives .text and .data .
|
Exercise 5. Modify trap() to dispatch page fault exceptions to page_fault_handler(). You should now be able to get make grade to succeed on the faultread, faultreadkernel, faultwrite, and faultwritekernel tests. If any of them don't work, figure out why and fix them. |
You will further refine the kernel's page fault handling below, as you implement system calls.
Exercise 6. Modify trap() to make breakpoint exceptions invoke the kernel monitor. You should now be able to get make grade to succeed on the breakpoint test. |
Challenge!
Modify the JOS kernel monitor so that
you can 'continue' execution from the current location
(e.g., after the int3,
if the kernel monitor was invoked via the breakpoint exception),
and so that you can single-step one instruction at a time.
You will need to understand certain bits
of the EFLAGS register
in order to implement single-stepping.
Optional: If you're feeling really adventurous, find some x86 disassembler source code - e.g., by ripping it out of Bochs, or out of GNU binutils, or just write it yourself - and extend the JOS kernel monitor to be able to disassemble and display instructions as you are stepping through them. Combined with the symbol table loading functionality suggested by one of the challenge problems in the previous lab, this is the stuff of which real kernel debuggers are made. |
SETGATE
from idt_init
). Why?
How did you need to set it in order to get the breakpoint exception
to work as specified above?
In the x86 kernel, we will use the int
instruction, which causes a processor interrupt.
In particular, we will use int $0x30
as the system call interrupt.
We have defined the constant
T_SYSCALL
to 0x30 for you. You will have to
set up the interrupt descriptor to allow user processes to
cause that interrupt. Note that interrupt 0x30 cannot be
generated by hardware, so there is no ambiguity caused by
allowing user code to generate it.
In the x86 kernel, we will pass the system call number and
the system call arguments in registers. This way, we don't
need to grub around in the user environment's stack
or instruction stream. The
system call number will go in %eax
, and the
arguments (up to five of them) will go in %edx
,
%ecx
, %ebx
, %edi
,
and %esi
, respectively. The kernel passes the
return value back in %eax
. The assembly code to
invoke a system call has been written for you, in
syscall()
in lib/syscall.c
. You
should read through it and make sure you understand what
is going on.
Exercise 7.
Add a handler in the kernel
for interrupt number T_SYSCALL .
You will have to edit kern/trapentry.S and
kern/trap.c 's idt_init() . You
also need to change trap() to handle the
system call interrupt by calling syscall()
(defined in kern/syscall.c)
with the appropriate arguments,
and then arranging for
the return value to be passed back to the user process
in %eax .
Finally, you need to implement syscall() in
kern/syscall.c .
Make sure syscall() returns -E_INVAL
if the system call number is invalid.
You should read and understand lib/syscall.c
(especially the inline assembly routine) in order to confirm
your understanding of the system call interface.
You may also find it helpful to read inc/syscall.h .
Run the user/hello program under your kernel.
It should print " |
The user programs start running at the top of
lib/entry.S
. After some setup, this code
calls libmain()
, in lib/libmain.c
.
The libmain()
function needs to initialize a global pointer
env
to point at this environment's
struct Env
in the envs[]
array.
(Note that lib/entry.S
has already defined envs
to point at the UENVS
mapping you set up in lab 2.)
Hint: look in inc/env.h
and use
sys_getenvid
.
libmain()
then calls umain
, which,
in the case of the hello program, is in
user/hello.c
. Note that after printing
"hello, world
", it tries to access
env->env_id
. This is why it faulted earlier.
Now that you've initialized env
properly,
it should not fault.
If it still faults, you probably haven't mapped the
UENVS
area user-readable (back in lab 1 in
pmap.c
; this is the first time we've actually
used the UENVS
area).
Exercise 8.
Add the required functionality to the user library, then
boot your kernel. You should see user/hello
print "hello, world " and then print "i
am environment 00000800 ".
user/hello then attempts to "exit"
by calling sys_env_destroy()
(see lib/libmain.c and lib/exit.c).
Since the kernel currently only supports one user environment,
it should report that it has destroyed the only environment
and then drop into the kernel monitor.
|
user/hello
calls printf()
.
This printf() code in user/hello is compiled
from the same source file (lib/printf.c)
as the printf() in the kernel is compiled from,
but nevertheless these two instances of printf
are not the same.
What exactly is different
between the kernel's printf and user/hello's?
Why is this difference necessary?
Memory protection is a crucial feature of an operating system. By using memory protection, the operating system can ensure that bugs in one program cannot corrupt other programs or corrupt the operating system itself.
Typically, operating systems rely on hardware support to implement memory protection. The OS keeps the hardware informed about which virtual addresses are valid and which are not. When a program tries to access an invalid address or one for which it has no permissions, the processor stops the program at the instruction causing the fault and then traps into the kernel with information about the attempted operation. If the fault is fixable, the kernel can fix it and let the program continue running. If the fault is not fixable, then the program cannot continue, since it will never get past the instruction causing the fault.
As an example of a fixable fault, consider an automatically extended stack. In many systems the kernel allocates a single stack page, and then if a program faults accessing pages further down the stack, the kernel will allocate those pages automatically and let the program continue. By doing this, the kernel only allocates the memory that the program is going to use, but the program can work under the illusion that it has an arbitrarily large stack.
System calls present an interesting problem for memory protection. Most system call interfaces let user programs pass pointers to the kernel. These pointers point at user buffers to be read or written. The kernel then dereferences these pointers on behalf of the user while carrying out the system call. There are two problems with this:
For both of these reasons the kernel must be extremely careful when handling pointers presented by user programs.
You will now need to implement solutions to these two problems
in your kernel.
To address the first problem,
you will use a global variable page_fault_mode
to let the fault handler know when the kernel is manipulating memory
on behalf of the user environment. If a fault happens then,
the user environment will be destroyed.
(Otherwise, if a fault happens, the kernel should panic.)
Exercise 9.
Change kern/trap.c 's page fault handler as follows.
If a page fault happens while in kernel mode, check the setting
of page_fault_mode and act accordingly.
The possible page fault modes are listed
in kern/trap.h .
If you destroy the current environment,
print a message explaining the fault in the following format:
printf("[%08x] PFM_KILL va %08x ip %08x\n", curenv->env_id, fault_va, tf->tf_eip);
Hint: to determine whether a fault happened in user mode or
in kernel mode, check the low bits of the
Change
Change [00000000] new env 00000800 [00000800] PFM_KILL va 00000001 ip f010263d TRAP frame ... [00000800] free env 00000800 Destroyed the only environment - nothing more to do!(Your ip may be different
but should begin f01 .)
|
The check you just added protects against buggy environments that pass
invalid pointers, but does not protect against evil environments that
pass pointers to valid kernel memory. user/evilhello
is one such program.
To address this second protection problem,
you will "sanitize" all user pointers
by using the TRUP
macro ("TRanslate User Pointer")
defined in kern/pmap.h.
This macro will leave valid user pointers
as is, but will translate all other pointers to ULIM
,
which will always definitely cause a page fault when accessed.
Exercise 10.
Change the definition of sys_cputs to protect itself
against malicious user environments by using TRUP .
Change [00000000] new env 00000800 [00000000] new env 00000800 [00000800] PFM_KILL va f0100020 ip f010263d [00000800] free env 00000800(Your ip may be different
but should begin f01 .)
|
This completes the lab.