G22.3250 Lab 2: TCP Proxy

Due date: Monday February 10. Don't wait until the last minute.

Introduction

In this lab, you'll learn to use libasync, a C++ library for building event-driven applications. The assignment is to build an asynchronous TCP proxy.

A TCP proxy is a server that acts as an intermediary between a client and another server, called the destination server. Clients establish connections to the TCP proxy server, which then establishes a connection to the destination server. The proxy server sends data received from the client to the destination server and forwards data received from the destination server to the client.

Unlike the web proxy you built for the previous lab, a TCP proxy must work for arbitrary protocols, and thus can make fewer assumptions about its usage. In particular, the web proxy expected first to read a request from the client, then to read a response from the server. The TCP proxy must handle protocols with multiple rounds of communication, or protocols in which both sides are continuously writing to each other.

A TCP proxy server can be useful to get around services which restrict connections based on the network addresses. For example, the web page http://www.scs.cs.nyu.edu/G22.3250/restricted/ is only accessible from the class machines. Suppose you wanted to make this web site available to people outside the class machines. One possibility would be to run your web proxy on the class machines, and access the web page through there. However, that would require everyone wanting to view the page to reconfigure their web browsers to use your web proxy. People might not want to do that (as then all pages would be fetched through your proxy). Moreover, you would also be giving people access to other restricted web pages within NYU they should not be able to access.

A better alternative would be to run a proxy server on the class machines, and give people a URL that connects to the server with a restricted page through that TCP proxy. The web server will think it is serving the data to a web client on the machine running the proxy. However, the proxy is forwarding the data out of the class network, thus subverting the protection mechanism. (Note that while you can do this for the www.scs.cs.nyu server, you should not point a proxy at other restricted servers at NYU--this would violate NYU's network policy.)

The assignment

The proxy server you will build for this lab will be invoked at the command line as follows:

% ./tcpproxy destination-host destination-port listen-port

For example, to redirect all connections to port 3000 on your local machine to yahoo's web server, run:

% ./tcpproxy www.yahoo.com 80 3000 &

As another example, to view the restricted web page mentioned above, you might run the following command on machine class5:

% ./tcpproxy www.scs.cs.nyu.edu 80 4000 &
Then you can view the restricted web page by typing a URL like http://class5.scs.cs.nyu.edu:4000/V22.0477-001/restricted/ into your browser window. The trailing slash on .../restricted/ is actually important important in this context, to avoid getting redirected to www.scs.cs.nyu.edu. Of course if someone is already using port 4000, you will need to choose another port.

The proxy server will accept connections from multiple clients and forward them using multiple connections to the server. No client or server should be able to hang the proxy server by refusing to read or write data on its connection. For instance, if one client suddenly stops reading from the socket to the proxy, other clients should not notice interruptions of service through the proxy. To avoid blocking, you will need the asynchronous behavior, described in "Using TCP Through Sockets". You should in particular examine Section 6, which describes a C++ version of multifinger build using libasync. The source for this multifinger is in ~class/src/multifinger.tar.gz. You should examine it before proceeding with this lab.

The proxy must also handle hung clients and servers. In particular, if one end keeps transmitting data but the the other stops reading, the proxy must not buffer an unlimited amount of data. Once the amount of buffered data in a given direction reaches some high water mark (e.g., 8K), the proxy must stop reading in that direction until the buffer drains. If the proxy has buffered data in one direction and is unable to write any of it for 10 seconds, it should abort both connection pairs.

Connection termination

The proxy must handle end-of-file conditions as transparently as possible. If it reads end-of-file from one socket, it should pass the condition along to the other socket (using shutdown) after writing any remaining buffered data. However, the proxy should continue to forward data in the other direction. The proxy should terminate a connection pair and close the file descriptors under either of the following two circumstances:

  1. The proxy has read an end-of-file (or experienced a read error other than EAGAIN) in both directions and has written all remaining buffered data.
  2. The proxy experiences a write error (other than EAGAIN) in either direction.
The reason for giving up more easily on write errors is that they signify some failure of the higher-level protocol. A read end-of-file can be a legitimate part of a protocol, whereas when a program writes data to the network, it indicates a serious problem if no one is there to read it.

Setting up your project directory

To set up a project directory for your tcp proxy, you can use the multifinger source tree as a template. (Note that if you have already built multifinger after untarring it, use the command gmake maintainer-clean to return your directory to a pristine state.) First unpack the software, then copy it as follows:
% tar xzf ~class/src/multifinger.tar.gz

% cp -pr multifinger tcpproxy
% cd tcpproxy
% rm multifinger.C multifinger.h
Now you must edit the autoconf and automake configuration files. First, edit configure.in. At the top, you fill find the line:
AM_INIT_AUTOMAKE(multifinger, 0.0)
Change multifinger to tcpproxy. (This changes the name of the software package when you make a distribution.)

Next, you must edit Makefile.am to change the program you are going to build and the source files to include. There are three lines of interest:

bin_PROGRAMS = multifinger
noinst_HEADERS = multifinger.h
multifinger_SOURCES = multifinger.C
Thus, your modified Makefile.am might contain the following lines:
bin_PROGRAMS = tcpproxy
noinst_HEADERS = 
tcpproxy_SOURCES = tcpproxy.C
And you would put the source to your TCP proxy in tcpproxy.C. Finally, when you are ready to compile, you can proceed as you did in the previous lab:
% ./setup
+ chmod +x setup
+ aclocal
...
+ set +x
% setenv DEBUG -g
% ./configure --with-sfs=/usr/local/os/sfs-dbg
creating cache ./config.cache
checking for a BSD compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
...
creating Makefile
creating config.h
% gmake
...

Testing

Note: unlike the previous lab, do not limit the number of file descriptors when testing your proxy. Your proxy should accept as many connections as possible. (The tester program will not attempt to run you out of file descriptors.)

You should test your proxy to make sure that it continues to forward data even when some connections aren't responding. Here's one test you should be able to pass.

First, run the proxy and point it at www.scs.cs.nyu.edu's HTTP port:

% ./tcpproxy www.scs.cs.nyu.edu 80 1234
Now, in another window, use telnet to fetch /cgi-bin/big through the proxy:
% telnet 127.0.0.1 1234
Trying 127.0.0.1...
Connected to localhost (127.0.0.1).
Escape character is '^]'.
GET /cgi-bin/big
Watch the data go by for a while, then interrupt the output by typing control-], after which telnet should stop and print telnet>. Now check that the proxy hasn't been hung because telnet isn't reading data; suspend your telnet by typing ``z RETURN'' and fetch something else:
telnet> z

Suspended
% telnet 127.0.0.1 1234
Trying 127.0.0.1...
Connected to localhost (127.0.0.1).
Escape character is '^]'.
GET /ok
You were able to fetch the data.
Connection closed by foreign host.
% kill %telnet
% 
[1]    Terminated                    telnet 127.0.0.1 1234
% 
If you see "You were able to fetch the data," your program passes the test. Now try to access the restricted page from your web browser with a URL like http://class5.scs.cs.nyu.edu:1234/V22.0477-001/restricted/.

Once your proxy passes some basic tests, you can test it with the automated program test-tcpproxy. You can find this program on the class machines in ~class/bin/test-tcpproxy, which should be in your path by default. Assuming your proxy is in ./tcpproxy, you can test it as follows:

% test-tcpproxy ./tcpproxy
Single echo connection: passed
Two echo connections: passed
20 echo connections: passed
Bulk data, 20 connections: passed
Mix of blocked and normal: passed
One-way shutdown: passed
Early close: passed
Non-timeout of active client: passed
Timeout of lazy forward client: passed
Timeout of lazy reverse client: passed
% 
Your program should pass all phases of the test. tcpproxy-test performs the following tests:

Single echo connection

tcpproxy-test creates one connection through your proxy, and sends 10 4-byte messages on the connection. tcpproxy-test waits for the echoed reply before sending each new message. tcpproxy-test verifies that the correct data was echoed.

Two echo connections and 20 echo connections

tcpproxy-test creates N connections through your proxy. At the server end, tcpproxy-test just writes data it reads back to the connection. At the client end, tcpproxy-test copies data read from connection i to connection i+1. tcpproxy-test sends some 4-byte messages through this chain and verifies that they arrive at the other end.

Bulk data, 20 connections

As above, tcpproxy-test sets up a chain of 20 connections. It then sends 2 megabytes of data through the chain, and verifies that the same data arrives at the other end of the chain.

Mix of blocked and normal

tcpproxy-test runs the above bulk test. In addition, it sets up a connection through the proxy but doesn't read data from the server end of the connection. tcpproxy-test writes as much data as it can to that connection. tcpproxy-test expects that your proxy will forward all of the bulk data; it doesn't explicitly check how you handle the blocked connection.

One-way shutdown

tcpproxy-test sets up a connection, and then calls shutdown(s, SHUT_WR). On the server side, tcpproxy-test waits for an end-of-file, and then writes one byte to the connection. tcpproxy-test verifies that your proxy forwards that byte.

Early close

In this test, the server side closes the connection immediately after accepting it. Then the client write a few bytes to the connection, separated by one-second pauses. tcpproxy-test expects that your proxy will eventually close the connection.

Non-timeout of active client

tcpproxy-test sets up a connection and writes to it periodically, with multi-second pauses between writes. tcpproxy-test expects that your proxy will leave the connection open.

Timeout of lazy forward client

The server end of the connection does not read any data, but the client sends data as fast as it can. tcpproxy-test expects that your proxy will close the connection after 10 seconds.

Timeout of lazy reverse client

As above, but in the reverse direction (the server generates data, but the client does not read it).

How/What to hand in

TCP proxy

You should submit two things: As in the previous lab, you should build the software distribution as follows:
% gmake distcheck
rm -rf tcpproxy-0.0
mkdir tcpproxy-0.0
chmod 777 tcpproxy-0.0
...
================================================
tcpproxy-0.0.tar.gz is ready for distribution
================================================
% 
Also as in the previous lab, use the script command to create a typescript file. To turn in your distribution, copy the files tcpproxy-0.0.tar.gz and typescript files to the directory ~class/handin/lab2/username where username is your username:
% cp tcpproxy-0.0.tar.gz ~class/handin/lab2/`logname`/
% 
If you have any problems about submission, please contact the instructor

References

The following are useful references (in addition to the class handout Using TCP through sockets):