Project 1: A Simple Shell

In this project you will implement a simple shell program that supports I/O redirection and pipelines. The goal of this project is for you to learn about the fork, execvp, and waitpid system calls, as well as I/O redirection.

Commands

Your shell will read commands from its standard input and execute them. Each command consists of one or more words separated by white space (spaces or tabs). The first word specifies a program to run. The shell invokes this program as a child process and all passes of the words (including the program name) to it as arguments. For example, consider this command:

find . -name shell.cc

It has four words: find, ., -name, and shell.c. The first word, find, is the name of the program to run.

I/O Redirection

Normally, the shell’s child processes use the same standard input and standard output files as the shell. However, the characters < and > can be used to change the standard input and/or output of a command. For example, the command

grep std::string shell.cc > lines

will write all of the lines of the file shell.cc that contain the string std::string to the file lines instead of the terminal, and the command

wc < shell.cc

will read shell.cc and output statistics such as the number of lines. The characters < and > can appear anywhere in a command; they terminate the current word, if any, and the next word is taken as the name of the file. The redirection information is not included in the arguments passed to the command. For example, consider the following command:

grep<shell.cc>match.txt -H

The grep command will receive two arguments, grep and -H; its standard input will come from the file shell.cc and its output will be written to the file match.txt.

Pipelines

If the character | appears in a command line, it ends one command and starts another; the standard output of the first command will be passed as input to the second command using a pipe. This is called a pipeline. For example,

grep std::string shell.cc | wc

will arrange for the lines of shell.cc containing the string std::string to be passed as standard input to wc, where they will be counted. It has the same effect as the following commands

grep std::string shell.cc > tmp
wc < tmp
rm tmp

However, in the pipeline case the grep and wc commands will run concurrently, so they can potentially finish more quickly. If a pipeline also includes I/O redirection, < and > apply to their respective commands in the pipeline and redirections take priority over pipes (see Other Requirements for details).

Implementation

For this project you will need to use the following system calls:

fork: creates a new process, which is a duplicate of the parent
execvp: after a new child process has been forked, invoke this in the child to start a new program in that process. This kernel call uses the PATH environment variable to find the executable file corresponding to a command; this provides easy access to the standard system programs.
waitpid: after creating one or more child processes, use this system call to wait for each child to exit. If you prefer, you may use the wait kernel call, which waits for any child to exit, rather than waiting for a specific child.
open: open a file that is going to be used for I/O redirection
pipe: creates a new pipe for pipelines
dup2: once you have opened a file or pipe, this can be used to make a copy of its file decscriptor in descriptor 0 (for standard input or 1 (for standard output) for a child process
close: to close files and pipes when they are no longer needed

You can learn more about these system calls with the man program. For example,

man execvp

will print information about the execvp system call.

Development and Testing

Do your work for this project on the Myth cluster. To get started, invoke the following command:

git clone https://web.stanford.edu/class/cs111/starters/shell.git cs111_p1

This will create a new directory cs111_p1. Do your work for the project in this directory. You can use Git to manage your revisions to the project.

The directory will contain a skeleton file sh111.cc in which we have provided code to parse command lines. The parser creates one cmd struct describing each suprocess that must be invoked; each cmd contains the words that must be passed to that command, plus zero or more redirect structs that describe I/O redirection for that command. See the code for detailed documentation of these structures. Once a command line has been parsed, the function run_pipeline will be invoked to create the child processes and wait for them to complete. You must implement the body of this function, creating children as described by the argument to run_pipeline. If more than one command is specified, you must create pipes between consecutive commands in the pipeline.

The directory will also contain a Makefile. If you type make, it will compile sh111.cc, creating an executable file sh111. You can run your shell by typing the following command:

./sh111

A prompt should appear and you should be able to invoke shell commands. Most simple shell commands should work just as well in sh111 as in your normal shell. However, a few commands, such as cd and exit, won’t work in sh111. These are built-in commands: normal shells such as bash execute them directly, without invoking a child process. Commands like cd need to be built-in because executing them in a child process won’t be useful. For example, if a child process changes its working directory, that won’t have any effect on the shell or on future commands it invokes, and if a child process exits, that won’t cause the parent shell to exit. sh111 doesn’t support built-in commands, so these commands won’t work.

You can run a series of basic tests on your shell by invoking the command ./run_tests. It will print error messages if tests fail; you should then be able to go back and run the failing tests manually (each is just a series of shell commands) to debug the problem. We do not guarantee that these tests are exhaustive, so passing all of the tests is not necessarily sufficient to ensure a perfect score (CAs may discover other problems in reading through your code).

Order of Implementation

We recommend implementing the project in stages, so that you can see simple things working before trying more complex things. Here is one possible order:

When run_pipeline is invoked, ignore its argument. Just fork a child and have the child print a message and then exit immediately, without invoking execvp. The parent waits for the child to exit.
Get a single command to run in the child process and wait on it. This should allow commands like ls -l and sleep 5 to run just as they would in any standard shell.
Get two commands in a pipeline to run simultaneously in two separate child processes, with the first piping its output to the second. This means the command ls | wc -l will run and print out the total number of files in your current directory.
Get an arbitrary number of commands in a pipeline to run simultaneously, each in its own distinct child process. This means the command ls | grep sh111 | wc -l will print out the number of files in the current directory that contain the substring “sh111”. This task is probably the most difficult, so work carefully!
Finally, incorporate file redirection into each of the commands.

Other Requirements

You must use pipes to pass data between processes in a pipeline (for example, it is not OK to run the processes one at a time, redirecting one process’s output to a file and then redirecting the next process’s input from the same file). This allows all of the processes in the pipeline to run concurrently.
If I/O redirection is also specified for a command in a pipeline, redirection takes priority over pipes. For example, consider the following command:
```
grep std::string shell.cc > grep.out | wc
```
The I/O redirection will cause grep’s output to go to the file grep.out instead of the pipe connecting to wc. The pipe will still exist, and wc will read from it, but the writing end of the pipe will have been closed, so wc will immediately read an end-of-file condition. (If the redirection were specified after the pipe symbol, it would have applied to wc instead of grep)
You must ensure that each child process starts with only three open file descriptors: standard input, standard output, and standard error. If you have opened any other descriptors for files or pipes, you must close them before execing the child.

Submitting Your Work

To submit your solutions, cd to the directory containing your work, then invoke the command

./submit

If you discover problems or improvements after you have submitted, you may resubmit; only the latest submit will be graded.