Distributed Systems Lab 1: Introduction to RPC

This lab will introduce you to programming with RPC, which is commonly used in distributed systems.

Labs in this course will assume that you have access to a Linux machine on which to compile and run the code, and course staff will also be using a Linux machine to build and grade your code. If you don't already have access to a Linux machine, you can use the cardinal.stanford.edu machines provided by ITSS. Refer to this ITSS guide for instructions on how to log into these machines remotely.

The code provided for this lab is largely written in C++, and uses the C++ Standard Template Library (STL). The reference materials page includes links to helpful references for the C++ language itself and for the STL.

Part 1: Downloading and building the lab code

To start with, you should download the initial code for this lab, which may be found here: Before continuing with the lab, make sure you can build the code and run the resulting executables, as follows:
% wget http://www.scs.stanford.edu/07wi-cs244b/lab1.tar.gz
% tar zxf lab1.tar.gz
% cd lab1
% make
rpcgen -M cf.x
rpcgen -M -m cf.x > cf_svc.c
cc -g   -c -o cf_clnt.o cf_clnt.c
cc -g   -c -o cf_xdr.o cf_xdr.c
g++ -g   -c -o cfc.o cfc.cc
g++ -o cfc cf_clnt.o cf_xdr.o cfc.o -lpthread
cc -g   -c -o cf_svc.o cf_svc.c
g++ -g   -c -o cfd.o cfd.cc
g++ -g   -c -o cfd_ops.o cfd_ops.cc
g++ -o cfd cf_svc.o cf_xdr.o cfd.o cfd_ops.o -lpthread
% ./cfd
Usage: ./cfd port-number
% ./cfc
Usage: ./cfc server port command ...
  read pathname
  write pathname data
  mkdir pathname
  mkfile pathname
  rm pathname
  ls pathname
On the ITSS cardinal.stanford.edu machines, you may get an error message about a missing libstdc++.so.6. To fix it, run setenv LD_LIBRARY_PATH /usr/pubsw/lib if you are using csh or tcsh (or if using bash, run export LD_LIBRARY_PATH=/usr/pubsw/lib). You may also want to add this command to your ~/.cshrc or ~/.bash_profile file so that you don't need to run it every time you log in.

Part 2: Anatomy of an RPC application

Now that you have successfully set up your build environment, let's look at the actual application in more detail. This lab will be based around a simple distributed file store called cf (which stands for class file-store). The code provided to you consists of three main parts: Try to run the code we have provided; some of the functionality is missing (such as the ls command, which you will be adding later in this lab), but you should be able to create, read, and write files. To start the cfd file server, give it a TCP port number that it should listen on as the first argument. Providing a port number of zero will cause it to choose an arbitrary available port number. You should then be able to use the cfc client to talk to your file server and create files, as follows:
% ./cfd 0 &
[1] 536
Listening on port 51730
% ./cfc localhost 51730 mkfile /hello
% ./cfc localhost 51730 write /hello world
% ./cfc localhost 51730 read /hello
% ./cfc localhost 51730 read /world
Server reports error: No such pathname (1)
If you can't get the provided code to work as above, something may be wrong, and you should contact the course staff.

Now let's look at the code. First, read through the interface definition in the file cf.x. The file starts out by defining some data types that will be used by cf, such as error codes, pathnames, and so on. Then, using these data types, the file declares the CFS_PROG RPC program, and specifies the operations that it supports.

Exercise 1. Read through the interface definition provided in the file cf.x. What are some of the inherent limitations of this interface as it's defined? Would it be possible to implement a traditional file system on top of it? What operations are missing?

Place your answers in a file called answers.txt in the lab1/ directory; an empty answers.txt file should already exist there.

As you may recall from lecture, the interface definition is translated into real code by a special RPC interface compiler. For this lab, we will use the rpcgen compiler. Given the cf.x input file, rpcgen produces the following output files:

Now let's turn to the code that actually uses these interface stubs. The cfd server implements an in-memory file store and provides access to it through the cf RPC interface. The server consists of three source files:

There are also a few helpful abstractions we have included that hopefully simplify programming:

Exercise 2. Read through the provided server code. Why do all of the RPC functions in cfd_ops.cc lock a mutex lock for the duration of their execution? Will this mutex be held while the RPC function is receiving or sending data to or from a slow client, thereby prevent other clients from making any progress? Place your answers in the answers.txt file.

The client is implemented by code in cfc.cc. Based on the command-line arguments, the client connects to the server, invokes the appropriate RPC client stub with the right arguments, and prints the response (if any) back to the user.

Exercise 3. Read through the provided client and server code. When you run cfc with a read command to read a non-existent file, an error message is printed, indicating that there is no such file. Where is this error generated in the cfd server, and how does it flow from there in the server process to the client and to the printf statement which prints it to your screen? Again, place your answers in the answers.txt file.

Part 3: Implementing new functionality

You may have noticed that some functionality, such as the CFS_READDIR interface function, corresponding to the cfc ... ls command, is not fully implemented. In particular, the code in the cfs_readdir_1_svc() function in cfd_ops.cc is largely missing. It will be your job to fill in the missing code to make cfc ls work. The in-memory file store code in cfd_fs.hh already provides a cfd_fs_dir::readdir() method which returns an STL std::vector of names in that directory. Use the STL reference from the reference materials page to refresh your memory on how to work with STL's std::vector.

We have provided a test script called readdir-test.sh for you, which will run some simple tests on your file server to see if it appears to implement the CFS_READDIR operation reasonably well. If the test script seems to fail when running against your file server, you can see what operations it's issuing by running it as sh -x ./test-readdir.sh host port.

Exercise 4. Fill in the code for cfs_readdir_1_svc() to implement the CFS_READDIR operation. Test your code using cfc ... ls and the provided readdir-test.sh script to make sure your code works.

Now that your ls command works, you may also want to learn more information about the files shown to you by the ls command, much like what the ls -l command shows on Unix.

Exercise 5. Implement an "ls -l" operation which reports not only the names of the files in a directory, but also whether these names correspond to files or directories, and their size (if a file).

In doing this exercise, you will need to augment the interface in cf.x to transmit additional information over the network. What are the different ways you can extend the interface to accomplish this goal? For example, you may be able to change some existing calls, such as readdir, or you may introduce new calls to fetch the type and size for a pathname. What are the advantages and disadvantages of the different approaches you can think of? Put your answers to this question in the answers.txt file.

To complete this exercise, you will also need to modify both the server, to support your modified RPC interface, and the client, to provide a new "ls -l" command. At the end of this exercise, your cfc client should be able to list the contents of a directory at the server, and print the type of each entry (file or directory) and the size of each file.

Challenge! The current interface reads and writes the entire file at the same time, making it prohibitively expensive to operate on large files. Extend the interface to support reading and writing ranges of a file, and implement the corresponding server-side and client-side code. This challenge is an optional part of the lab.

Part 4: Turn in your lab

To turn in your answers for this lab, you must package up your code and answers.txt and submit it to cs244b-staff@scs.stanford.edu by the turn-in deadline. You can package up your code and answers.txt using make submit.tar.gz, and either send the resulting submit.tar.gz file to us as an attachment, or use make turnin to automatically mail it to us:
% make submit.tar.gz
tar -zcf submit.tar.gz Makefile *.[chx] *.{cc,hh} answers.txt
% make turnin
tar -zcf submit.tar.gz Makefile *.[chx] *.{cc,hh} answers.txt
uuencode submit.tar.gz submit.tar.gz | mail cs244b-staff@scs.stanford.edu,username@stanford.edu
Make sure that you receive the copy of your submission sent to your email address (username@stanford.edu) if you use make turnin.