Project 1: A Simple Shell
In this project you will implement a simple shell program that supports I/O redirection and pipelines. The goal of this project is for you to learn about the fork
, execvp
, and waitpid
system calls, as well as I/O redirection.
Commands
Your shell will read commands from its standard input and execute them. Each command consists of one or more words separated by white space (spaces or tabs). The first word specifies a program to run. The shell invokes this program as a child process and all passes of the words (including the program name) to it as arguments. For example, consider this command:
find . -name shell.cc
It has four words: find
, .
, -name
, and shell.c
. The first word, find
, is the name of the program to run.
I/O Redirection
Normally, the shell’s child processes use the same standard input and standard output files as the shell. However, the characters <
and >
can be used to change the standard input and/or output of a command. For example, the command
grep std::string shell.cc > lines
will write all of the lines of the file shell.cc
that contain the string std::string
to the file lines
instead of the terminal, and the command
wc < shell.cc
will read shell.cc
and output statistics such as the number of lines. The characters <
and >
can appear anywhere in a command; they terminate the current word, if any, and the next word is taken as the name of the file. The redirection information is not included in the arguments passed to the command. For example, consider the following command:
grep<shell.cc>match.txt -H
The grep
command will receive two arguments, grep
and -H
; its standard input will come from the file shell.cc
and its output will be written to the file match.txt
.
Pipelines
If the character |
appears in a command line, it ends one command and starts another; the standard output of the first command will be passed as input to the second command using a pipe. This is called a pipeline. For example,
grep std::string shell.cc | wc
will arrange for the lines of shell.cc
containing the string std::string
to be passed as standard input to wc
, where they will be counted. It has the same effect as the following commands
grep std::string shell.cc > tmp
wc < tmp
rm tmp
However, in the pipeline case the grep
and wc
commands will run concurrently, so they can potentially finish more quickly. If a pipeline also includes I/O redirection, <
and >
apply to their respective commands in the pipeline and redirections take priority over pipes (see Other Requirements for details).
Implementation
For this project you will need to use the following system calls:
fork
: creates a new process, which is a duplicate of the parentexecvp
: after a new child process has been forked, invoke this in the child to start a new program in that process. This kernel call uses the PATH environment variable to find the executable file corresponding to a command; this provides easy access to the standard system programs.waitpid
: after creating one or more child processes, use this system call to wait for each child to exit. If you prefer, you may use thewait
kernel call, which waits for any child to exit, rather than waiting for a specific child.open
: open a file that is going to be used for I/O redirectionpipe
: creates a new pipe for pipelinesdup2
: once you have opened a file or pipe, this can be used to make a copy of its file decscriptor in descriptor 0 (for standard input or 1 (for standard output) for a child processclose
: to close files and pipes when they are no longer needed
You can learn more about these system calls with the man
program. For example,
man execvp
will print information about the execvp
system call.
Development and Testing
Do your work for this project on the Myth cluster. To get started, invoke the following command:
git clone https://web.stanford.edu/class/cs111/starters/shell.git cs111_p1
This will create a new directory cs111_p1
. Do your work for the project in this directory. You can use Git to manage your revisions to the project.
The directory will contain a skeleton file sh111.cc
in which we have provided code to parse command lines. The parser creates one cmd
struct describing each suprocess that must be invoked; each cmd
contains the words that must be passed to that command, plus zero or more redirect
structs that describe I/O redirection for that command. See the code for detailed documentation of these structures. Once a command line has been parsed, the function run_pipeline
will be invoked to create the child processes and wait for them to complete. You must implement the body of this function, creating children as described by the argument to run_pipeline
. If more than one command is specified, you must create pipes between consecutive commands in the pipeline.
The directory will also contain a Makefile
. If you type make
, it will compile sh111.cc
, creating an executable file sh111
. You can run your shell by typing the following command:
./sh111
A prompt should appear and you should be able to invoke shell commands. Most simple shell commands should work just as well in sh111
as in your normal shell. However, a few commands, such as cd
and exit
, won’t work in sh111
. These are built-in commands: normal shells such as bash
execute them directly, without invoking a child process. Commands like cd
need to be built-in because executing them in a child process won’t be useful. For example, if a child process changes its working directory, that won’t have any effect on the shell or on future commands it invokes, and if a child process exits, that won’t cause the parent shell to exit. sh111
doesn’t support built-in commands, so these commands won’t work.
You can run a series of basic tests on your shell by invoking the command ./run_tests
. It will print error messages if tests fail; you should then be able to go back and run the failing tests manually (each is just a series of shell commands) to debug the problem. We do not guarantee that these tests are exhaustive, so passing all of the tests is not necessarily sufficient to ensure a perfect score (CAs may discover other problems in reading through your code).
Order of Implementation
We recommend implementing the project in stages, so that you can see simple things working before trying more complex things. Here is one possible order:
- When
run_pipeline
is invoked, ignore its argument. Justfork
a child and have the child print a message and then exit immediately, without invokingexecvp
. The parent waits for the child to exit. - Get a single command to run in the child process and wait on it. This should allow commands like
ls -l
andsleep 5
to run just as they would in any standard shell. - Get two commands in a pipeline to run simultaneously in two separate child processes, with the first piping its output to the second. This means the command
ls | wc -l
will run and print out the total number of files in your current directory. - Get an arbitrary number of commands in a pipeline to run simultaneously, each in its own distinct child process. This means the command
ls | grep sh111 | wc -l
will print out the number of files in the current directory that contain the substring “sh111”. This task is probably the most difficult, so work carefully! - Finally, incorporate file redirection into each of the commands.
Other Requirements
You must use pipes to pass data between processes in a pipeline (for example, it is not OK to run the processes one at a time, redirecting one process’s output to a file and then redirecting the next process’s input from the same file). This allows all of the processes in the pipeline to run concurrently.
If I/O redirection is also specified for a command in a pipeline, redirection takes priority over pipes. For example, consider the following command:
grep std::string shell.cc > grep.out | wc
The I/O redirection will cause
grep
’s output to go to the filegrep.out
instead of the pipe connecting towc
. The pipe will still exist, andwc
will read from it, but the writing end of the pipe will have been closed, sowc
will immediately read an end-of-file condition. (If the redirection were specified after the pipe symbol, it would have applied towc
instead ofgrep
)You must ensure that each child process starts with only three open file descriptors: standard input, standard output, and standard error. If you have opened any other descriptors for files or pipes, you must close them before
exec
ing the child.
Submitting Your Work
To submit your solutions, cd
to the directory containing your work, then invoke the command
./submit
If you discover problems or improvements after you have submitted, you may resubmit; only the latest submit will be graded.