Distributed Storage Systems Lab1 - TCP proxy

Introduction

The first programming project is meant to introduce you to some programming tools you'll be using for the rest of the course, particularly the Unix software development environment (g++, autoconf, etc.) and the C++ asynchronous I/O library. You'll find it useful to refer to the source of the C++ multifinger program described in Using TCP through sockets.

Your task will be to write a TCP Proxy using the same C++ asynchronous library. You'll learn how to write both client and server code in this lab.

A TCP proxy server is a server that acts as an intermediary between a client and another server, called the destination server. Clients establish connections to the TCP proxy server, which then establishes a connection to the destination server. The proxy server sends data received from the client to the destination server and forwards data received from the destination server to the client. Interestingly, the TCP proxy server is actually both a server and a client. It is a server to its client and a client to its destination server.

A TCP proxy server can be useful to get around services which restrict connections based on the network addresses. For example, the web page http://www.scs.stanford.edu/06wi-cs240d/restricted/ is only accessible from the class machines. If you try to access it from elsewhere, you will receive an access denied error. However, you can view this page from a web browser anywhere on the Internet by running a proxy server on one of the class machines. The web server will think it is serving the data to a web client on the machine running the proxy. However, the proxy is forwarding the data out of the class network, thus subverting the protection mechanism. (Note that while you can do this for the www.scs.stanford.edu server, you should not point a proxy at other restricted servers at Stanford, as you could get in trouble for this.)

The assignment

The proxy server you will build for this lab will be invoked at the command line as follows:

% ./tcpproxy destination-host destination-port listen-port

For example, to redirect all connections to port 3000 on your local machine to yahoo's web server, run:

% ./tcpproxy www.yahoo.com 80 3000 &

As another example, to view the restricted web page mentioned above, you might run the following command on machine class5:

% ./tcpproxy www.scs.stanford.edu 80 4000 &
Then you can view the restricted web page by typing the URL http://class5.scs.stanford.edu:4000/06wi-cs240d/restricted/ into your browser window. The trailing slash on .../restricted/ is actually important important in this context, to avoid getting redirected to www.scs.stanford.edu. Of course if someone is already using port 4000, you will need to choose another port.

The proxy server will accept connections from multiple clients and forward them using multiple connections to the server. No client or server should be able to hang the proxy server by refusing to read or write data on its connection. For instance, if one client suddenly stops reading from the socket to the proxy, other clients should not notice interruptions of service through the proxy. You will need asynchronous behavior, described in "Using TCP Through Sockets".

The proxy must also handle hung clients and servers. In particular, if one end keeps transmitting data but the the other stops reading, the proxy must not buffer an unlimited amount of data. Once the amount of buffered data in a given direction reaches some high water mark (e.g., 8K), the proxy must stop reading in that direction until the buffer drains. If the proxy has buffered data in one direction and is unable to write any of it for 10 seconds, it should abort both connection pairs.

Connection termination

The proxy must handle end-of-file conditions as transparently as possible. If it reads end-of-file from one socket, it should pass the condition along to the other socket (using shutdown) after writing any remaining buffered data. However, the proxy should continue to forward data in the other direction. The proxy should terminate a connection pair and close the file descriptors under either of the following two circumstances:

  1. The proxy has read an end-of-file (or experienced a read error other than EAGAIN) in both directions and has written all remaining buffered data.
  2. The proxy experiences a write error (other than EAGAIN) in either direction.
The reason for giving up more easily on write errors is that they signify some failure of the higher-level protocol. A read end-of-file can be a legitimate part of a protocol, whereas when a program writes data to the network, it indicates a serious problem if no one is there to read it.

Fetching and building the source

Start by unpacking the skeletal tcpproxy build tree in your home directory. On the class machines, you can do so with the following commands:
% tar xzf ~class/src/tcpproxy.tar.gz
% cd tcpproxy
% sh ./setup
Using AUTOCONF_VERSION 2.59
+ chmod +x setup
+ libtoolize
...

            *** * * * * * * * * * * * * * * * * ***
            ***         setup succeeded         ***
            *** * * * * * * * * * * * * * * * * ***

% 
Note that because of various minor incompatibilities between various versions of tools, you will probably see a bunch of warnings flash by. This is okay as long as you see the message "setup succeeded" at the end.

Next, you must configure the software and generate a Makefile--a set of instructions for how to compile the software. For this class, we will use the GNU autoconf and automake tools to generate Makefiles. You will also be linking against the libasync library that is part of SFS. On the class machines, generate the Makefile with the following commands:

% setenv DEBUG -g
% ./configure --with-sfs=/home/cs2/class/src/sfs1
creating cache ./config.cache
checking for a BSD compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking whether make sets ${MAKE}... yes
checking for working aclocal... found
checking for working autoconf... found
checking for working automake... found
checking for working autoheader... found
checking for working makeinfo... found
checking host system type... i386-unknown-openbsd2.8
...
updating cache ./config.cache
creating ./config.status
creating Makefile
creating config.h
% 
It is very important that you supply the argument --with-sfs=/home/cs2/class/src/sfs1 to ./configure. If you don't, things will appear to work, but you will get a version of libasync without built-in debugging sanity checks. Your assignment will be linked against debugging libraries for grading, so you want to make sure you get the benefit of the sanity checking while testing the software yourself.

Once the software is configured, you can build it by running gmake. (Note that this is gmake with a g, and not make. At the end of the assignment you will make a software distribution that compiles with any make, but for development you must use gmake which is GNU make.)

% gmake
gmake  all-am
gmake[1]: Entering directory `/a/class-serv/disk/cs2/student/tcpproxy'
if g++ -DHAVE_CONFIG_H -I. -I. -I.   -I/home/cs2/class/src/sfs1 -I/home/cs2/class/src/sfs1/./async -I/home/cs2/class/src/sfs1/./arpc -I/home/cs2/class/src/sfs1/./crypt -I/home/cs2/class/src/sfs1/./sfsmisc -I/home/cs2/class/src/sfs1/svc -I/usr/local/include -I/usr/local/include  -g -ansi -Wall -Wsign-compare -Wchar-subscripts -Werror -Wno-unused  -MT tcpproxy.o -MD -MP -MF ".deps/tcpproxy.Tpo" -c -o tcpproxy.o tcpproxy.C; \
...
gmake[1]: Leaving directory `/a/class-serv/disk/cs2/student/tcpproxy'
% 
That's it! You've now built tcpproxy. To test it, type, for example:
% ./tcpproxy www.yahoo.com 80 8888
Now, assuming you ran the above on class2, in another window, run:
% telnet class2.scs.stanford.edu 8888
Connected to class2.scs.stanford.edu.
Escape character is '^]'.
Connection closed by foreign host.
% 
The message "Connected to class2.scs.stanford.edu" says that your proxy accepted a TCP connection, but then immediately closed it, since the proxy is not fully implemented. Your must finish implementing the proxy.

Out-of-directory builds

It is often useful to compile a program in a different directory from the source code. There are several reasons for this. One may want to compile the same source tree multiple times--for example once with debugging, once without. Using two copies of the same source tree would make it a pain to keep the two compiled versions in sync. Another issue is that C++ object files and executables can get pretty large--especially with debugging information. Thus it is considerably faster to compile on a local disk when the source code is not local. Finally, when you have limited backed-up disk space, there is no reason to waste it on huge C++ executables, since these can always be recreated from the source in the event of a disk crash.

Autoconf easily supports compiling in a different directory. You simply need to run the configure script from whatever directory you wish the compile to take place. However, when a source tree is being used for out-of-directory builds, you cannot also perform an in-place build. The following example illustrates how one might compile tcpproxy out-of-directory using local disk space on the machine class2:

% cd tcpproxy
% gmake distclean
rm -f config.h
rm -f *.tab.c
...
rm -f config.status
% mkdir /home/cl2/scratch/student
% cd /home/cl2/scratch/student
% mkdir tcpproxy
% cd tcpproxy
% setenv DEBUG -g
% ~/tcpproxy/configure --with-sfs=/home/cs2/class/src/sfs1
creating cache ./config.cache
checking for a BSD compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
...
creating Makefile
creating config.h
% gmake
c++ -DHAVE_CONFIG_H -I. -I/home/c/student/tcpproxy -I. ...
/bin/sh ./libtool --mode=link c++  -g -ansi -Wall -Wsign-compare ...
mkdir .libs
c++ -g -ansi -Wall -Wsign-compare -Wchar-subscripts -Werror ...
% 
(The gmake distclean command cleans up any previous in-directory build, restoring the tcpproxy directory to its pristine state.)

Testing

You should test your proxy to make sure that it continues to forward data even when some connections aren't responding. Here's one test you should be able to pass.

First, run the proxy and point it at www.scs.stanford.edu's HTTP port:

% ./tcpproxy www.scs.stanford.edu 80 1234
Now, in another window, use telnet to fetch /cgi-bin/big through the proxy:
% telnet 127.0.0.1 1234
Trying 127.0.0.1...
Connected to localhost (127.0.0.1).
Escape character is '^]'.
GET /cgi-bin/big
Watch the data go by for a while, then interrupt the output by typing control-], after which telnet should stop and print telnet>. Now check that the proxy hasn't been hung because telnet isn't reading data; suspend your telnet by typing ``z RETURN'' and fetch something else:
telnet> z

Suspended
% telnet 127.0.0.1 1234
Trying 127.0.0.1...
Connected to localhost (127.0.0.1).
Escape character is '^]'.
GET /ok
You were able to fetch the data.
Connection closed by foreign host.
% kill %telnet
% 
[1]    Terminated                    telnet 127.0.0.1 1234
% 
If you see "You were able to fetch the data," your program passes the test. Now try to access the restricted page from your web browser with a URL like http://class5.scs.stanford.edu:1234/06wi-cs240d/restricted/.

Once your proxy passes some basic tests, you can test it with the automated program test-tcpproxy. You can find this program on the class machines in ~class/bin/test-tcpproxy, which should be in your path by default. Assuming your proxy is in ./tcpproxy, you can test it as follows:

% test-tcpproxy ./tcpproxy
Single echo connection: passed
Two echo connections: passed
20 echo connections: passed
Bulk data, 20 connections: passed
Mix of blocked and normal: passed
One-way shutdown: passed
Early close: passed
Non-timeout of active client: passed
Timeout of lazy client: passed
% 
Your program should pass all phases of the tests.


How/What to hand in

TCP proxy

You should submit two things: You should build the software distribution with the gmake distcheck command, as follows:
% gmake distcheck
rm -rf tcpproxy-0.0
mkdir tcpproxy-0.0
chmod 777 tcpproxy-0.0
...
================================================
tcpproxy-0.0.tar.gz is ready for distribution
================================================
% 
To turn in your distribution, copy the files tcpproxy-0.0.tar.gz and typescript files to the directory ~class/handin/lab1/username where username is your username:
% cp tcpproxy-0.0.tar.gz ~class/handin/lab1/`logname`/
% 

Use the script command to create a typescript file. When you run script, everything you type gets saved in a file called typescript. Press CTRL-D to finish the script. The typescript file should be copied to the same directory as the software distribution. For example:

% script
Script started, output file is typescript
% test-tcpproxy ./tcpproxy
Single echo connection: passed
Two echo connections: passed
20 echo connections: passed
Bulk data, 20 connections: passed
Mix of blocked and normal: passed
One-way shutdown: passed
Early close: passed
Non-timeout of active client: passed
Timeout of lazy client: passed
% ^D Script done, output file is typescript
% cp typescript ~class/handin/lab1/`logname`/
% 

If you have any problems about submission, please contact the instructor.