G22.3250 Lab 4: Cryptographic file system

Introduction

In this lab, you will build a cryptographic file system as an NFS loopback server. Cryptographic file systems are used to store sensitive information on disk. The idea is that in order to access files, the user must enter a secret passphrase. Without the passphrase, even someone who steals your physical disk will be unable to read the sensitive files you store there. Your file system will be called CCFS, and will be invoked with two arguments:
% ./ccfs path-to-encrypted-files name
Passphrase: 
path-to-encrypted-files is the path to a directory under which you want to store encrypted files. Once CCFS is running you will be able to access unencrypted versions of the files under /classfs/name. The minute you kill CCFS, however, the contents of the files will be inaccessible to someone who doesn't know the correct passphrase to restart CCFS. A full cryptographic file system will work something like this:
% mkdir /shome/c2/scratch/myname
% ./ccfs /shome/c2/scratch/myname myname
Passphrase: 
^Z
Suspended
% bg
[1]    ./ccfs /shome/c2/scratch/myname myname &
% touch /classfs/myname/test
% echo hi > /classfs/myname/there
% cp /etc/termcap /classfs/myname/
% ls -al /classfs/myname/
total 732
drwxr-xr-x  2 dm  dm      512 Sep 23 21:38 .
dr-x------  4 dm  sfs     512 Sep 23 21:37 ..
-r--r--r--  1 dm  dm   732361 Sep 23 21:38 termcap
-rw-r--r--  1 dm  dm        0 Sep 23 21:37 test
-rw-r--r--  1 dm  dm        3 Sep 23 21:37 there
% ls -al /shome/c2/scratch/myname
total 735
drwxr-xr-x   2 dm  dm     512 Sep 23 21:38 .
drwxr-xr-x  31 dm  dm    4096 Sep 23 18:32 ..
-rw-r--r--   1 dm  dm      16 Sep 23 21:37 8gM7Ga4VrGrDJZjTa0Ruzg
-rw-r--r--   1 dm  dm     531 Sep 23 21:37 dWbFGNqKIUCB-dw0w10rRg
-r--r--r--   1 dm  dm  732889 Sep 23 21:38 ml0WK4ekrOXRPOG0CvCVZQ
% kill %./ccfs
[1]    Terminated                    ./ccfs /shome/c2/scratch/myname myname

% ls -al /classfs/myname/
ls: /classfs/myname: No such file or directory
% 
Not only are the file names on disk unintelligible, but the file contents, too. Thus, even someone who breaks into the file server will not be able to read your files without knowing the secret passphrase.

In this lab, you are only required to encrypt file contents, not the file name.

Approach

CCFS will be implemented as an NFS loopback server. That means you will write a user-level program that emulates a remote NFS server by accepting NFS RPCs from the local operating system kernel. You will use the asynchronous RPC library to handle multiple NFS RPCs in parallel. Encrypted files will be stored on a remote SFS file server. CCFS will communicate with the remote SFS server using non-blocking socket I/O. Thus, CCFS will be completely asynchronous. The following diagram depicts the architecture of CCFS:

You will begin this project with a ``dumb,'' 80-line file system that does nothing but relay NFS calls. You will build CCFS by progressively modifying this dumb file system until it encrypts all file contents and file names.

In order to build CCFS, you will make use of the classfs framework. Classfs contains a daemon, classfsd, a library, libclassfs.a, and a header file, classfscli.h. The principal purpose of the library and associated header are to communicate with classfsd and the remote SFS server when initially setting things up. classfsd is already installed and running on the class machines. The library is in /usr/local/os/classfs.

The classfsd daemon serves two functions. First, it handles the nasty and unportable details of creating NFS loopback mounts. Second, it will clean up the mess if your CCFS implementation crashes. classfsd is only active when you are first starting up or after CCFS exits or crashes. Otherwise, your CCFS implementation will be speaking NFS directly to the kernel.


Part A -- Getting acquainted with the software

Part A of this lab should be trivial, while Part B is much harder. Thus, you should finish Part A as soon as possible to start work on Part B. (The point of Part A is mostly to make sure you have at least looked at the software before Part B is due.)

Getting started with SFS

Since CCFS relies on SFS, the first thing you must do is register a public key with SFS on the class server machine. To do this, execute the following command:
% ssh -t class-serv sfskey register
student@class-serv's password: type your Unix password here
sfskey: /home/c/os/student/.sfs/random_seed: No such file or directory
sfskey: creating directory /home/c/os/student/.sfs
sfskey: creating directory /home/c/os/student/.sfs/authkeys
/var/sfs/sockets/agent.sock: No such file or directory
sfskey: sfscd not running, limiting sources of entropy
Creating new key: student@class-serv.scs.cs.nyu.edu#1 (Rabin)
       Key Label: student@class-serv.scs.cs.nyu.edu#1
Enter passprase: type a passphrase
          Again: type it again

sfskey needs secret bits with which to seed the random number generator.
Please type some random or unguessable text until you hear a beep:
DONE            
  UNIX password: type your Unix password here
class-serv.scs.cs.nyu.edu: authserver is in realm class.scs.cs.nyu.edu
class-serv.scs.cs.nyu.edu: New SRP key: student@class.scs.cs.nyu.edu/1024
wrote key: /home/c/os/student/.sfs/authkeys/student@class-serv.scs.cs.nyu.edu#1
Connection to class-serv closed.
% 
It may take several minutes for your public key to propagete to all the class machines, so run this command now before reading the rest of the lab.

Once you are registered with the SFS server, you must run an sfsagent process on any client machine from which you wish to access an SFS server. You have been accessing files on the class machines under /home through NFS. However, once you have run an sfsagent, you can access the same files through SFS under /shome. This includes both your home directory (/shome/os/yourname) and the scratch directories under /shome/cN. When you are done, before logging out, you should kill your agent with the sfskey kill command. For example:

% sfsagent
Passphrase for student@class-serv.scs.cs.nyu.edu#1: type your passphrase
% cd /shome/os/yourname
% ls -al
drwxr-xr-x   8 student  osclass    512 Sep 23 22:55 .
drwxr-xr-x  15 root     wheel      512 Sep 10 18:21 ..
-rw-------   1 student  osclass    811 Sep 23 22:50 .Xauthority
-rw-------   1 student  osclass   3418 Sep  6 11:41 .Xdefaults
-rw-r--r--   1 student  osclass   2841 Sep  6 11:42 .cshrc
-rw-------   1 student  osclass   6625 Jan 17  2001 .emacs
...
% 
And when finally logging out:
% sfskey kill
sfsagent: EOF from sfscd
sfsagent: exiting
% 

Compiling the CCFS software

To get started with the software, you should unpack the ``dumb'' skeletal CCFS source code from
~class/src/ccfs.tar.gz. The setup procedure is similar to the first lab, except that you must give ./configure the argument --with-classfs=/usr/local/os/classfs-dbg:
% cd
% tar xzf ~class/src/ccfs.tar.gz
% cd ccfs
% sh ./setup
+ chmod +x setup
+ aclocal
aclocal: macro `SFS_DB2' defined in acinclude.m4 but never used
+ autoheader
+ automake --add-missing
automake: configure.in: installing `./install-sh'
automake: configure.in: installing `./mkinstalldirs'
automake: configure.in: installing `./missing'
configure.in: 25: required file `./ltmain.sh' not found
automake: Makefile.am: installing `./INSTALL'
automake: Makefile.am: installing `./COPYING'
+ autoconf
+ set +x
% mkdir -p /home/c3/scratch/yourname/ccfs
% pushd /home/c3/scratch/yourname/ccfs
/home/c3/scratch/yourname ~/ccfs 
% setenv DEBUG -g
% =1/configure --with-classfs=/usr/local/os/classfs-dbg
creating cache ./config.cache
checking for a BSD compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking whether make sets ${MAKE}... yes
checking for working aclocal... found
checking for working autoconf... found
checking for working automake... found
checking for working autoheader... found
checking for working makeinfo... found
checking host system type... i386-unknown-openbsd2.9
...
updating cache ./config.cache
creating ./config.status
creating Makefile
creating config.h
% gmake
...
(If you are using classN, then make your build directory on /home/cN rather than /home/c3.)

Running CCFS

Once you have compiled CCFS, you can run the software. CCFS takes two arguments: First, a directory on an SFS file system, second a name under which to access the loopback server. (By default, the second argument, if omitted, will be the last component of the pathname in the first argument.) For example, you might make a scratch directory on the machine class2, and access it from a different machine:
% mkdir /shome/c2/scratch/myname
% ./ccfs /shome/c2/scratch/myname myname
^Z
Suspended
% bg
[1]    ./ccfs /shome/c2/scratch/myname myname &

% 
Now /classfs/myname and /shome/c2/scratch/myname will appear to be the same directory, except that /classfs/myname will be going through your software.

NOTE: You cannot access an SFS server from a client on the same machine. Because of NFS loopback server deadlock issues, SFS will not connect to a client on the same machine. If, for instance, you try to access /shome/c2 on class2, you will get a ``Resource deadlock avoided'' or ``not an SFS file system'' error.

When you kill CCFS, classfsd will attempt to unmount the file system. It will be unable to do so if any of your shells still has /classfs/myname as a working directory. Thus, make sure to cd / after killing CCFS. If you forget to leave a directory after killing CCFS, and you try to restart CCFS using the same second argument (name under /classfs), CCFS will give you the error File exists.

One final word about classfsd: Because of the risk of deadlock with NFS loopback mounts, classfsd periodically pings CCFS with an NFS request. If it does not receive any replies for 10 minutes, classfsd takes over the server UDP socket and attempts to unmount the file system. This should ordinarily not cause you any trouble, but if you leave CCFS stopped under the debugger for 10 minutes, your file system will get unmounted.

Tracing RPCs

Once you've gotten the skeletal CCFS running, try the following command in one window, while browsing /classfs/myname in a different window:
% env ASRV_TRACE=10 ./ccfs /shome/c2/scratch/myname myname
This command prints a complete trace of all NFS requests received by CCFS. (Large structures may be truncated; if this is ever a problem, try higher values than 10.) Similarly, setting ACLNT_TRACE instead of ASRV_TRACE shows a trace of all the NFS requests CCFS sends to the remote SFS server. Tracing RPCs can be invaluable in debugging strange behavior of your cryptographic file system--you can usually track the problem to a single RPC and then see why your code is misbehaving in that case. Now redirect the tracing output to a file:
% env ASRV_TRACE=10 ./ccfs /shome/c2/scratch/myname myname >& nfs.trace
(Note, you must use >& rather than just > because the tracing goes to standard error. If you use a Bourne-like shell instead of the default tcsh, you might need to use 2> instead of >&.)

After setting up CCFS to trace NFS traffic, run the following commands:

% cd /classfs/myname
% rm junk
rm: junk: No such file or directory
% echo hello > junk
% cat junk
hello
% cat junk
hello
% 
Now stop CCFS, and look at the RPCs in the nfs.trace file.

What to hand in

Hand in a the nfs.trace file, which you have annotated to show which RPCs correspond to which of the commands you ran. At the end, briefly explain any difference between the RPCs caused by the two cat commands. Copy this file to ~class/handin/lab4a.


Part B -- Encrypting file contents

In this part, you will modify CCFS to encrypt all file contents written to the server and decrypt all contents read from the server. You will primarily do this by special-casing the NFSPROC3_READ and NFSPROC3_WRITE RPCs, but you will also need to change handling of NFSPROC3_SETATTR. Finally, in all NFS RPCs you will need to adjust returned file attributes slightly.

For encrypting and decrypting data, you will be using the Rijndael (AES) algorithm. The implementation you will be using is described below. One of the complications of encrypting file data is that Rijndael operates on blocks of 16 bytes. Thus, all read and write operations must be in aligned multiples of 16 bytes. Furthermore, files with sizes not a multiple of 16 bytes will have to be padded slightly. Before adding encryption to CCFS, therefore, you will first modify the software to deal with file padding and to read and write multiples of 16 bytes.

Though you may be adding a few bytes to the ends of files, you would like file sizes to appear correct to the user. It is easy to do this if you observe the following rule: Add 16 bytes to the size of a file if and only if its size is not a multiple of 16 bytes. Thus, a 16-byte plaintext file will result in a 16-byte ciphertext file, but a 17-byte plaintext file will result in a 33 byte ciphertext file. Given this scheme, ciphertext file sizes can easily be adjusted in file attributes to contain the size of plaintext files. [If (size>16) and (size&15) then size-=16.] Before adding encryption, then, you will need to modify the following RPCs:

Once you have modified CCFS to read and write data in aligned multiples of 16-bytes, you are ready to begin encrypting and decrypting file contents. Upon startup, CCFS should get a passphrase from the user and initialize an aes object with this passphrase. One possible approach to encrypting files is simply to encrypt every 16-byte region with the aes object. However, then if two 16-byte regions of a plaintext file contain the same data, the encrypted file will contain the same ciphertext. For better security, such patterns should be hidden from people who have access to the ciphertext file.

You can ensure that identical 16-byte regions are encrypted differently by throwing the position into the equation. Let E(B) represent the encryption of 16-byte block B. (Of course, E requires an encryption key not shown in this notation. Just think of E as a method of an object that contains the key.) When transforming the plaintext data block P at offset pos to ciphertext block C, you should calculate C = E(P XOR E(pos,0)). In other words, pad the position pos to 16 bytes with 0s, and encrypt it to generate 16 bytes of random looking data. Then XOR this data, byte-by-byte, with the plaintext before encrypting. If D is the decryption function, then to decrypt you can simply compute P = D(C) XOR E(pos,0).

Note: Ordinarily, when one creates a sparse file (by extending the file with ftruncate or by writing far beyond the end of the file), unwritten portions of the file contain zeros. It is okay for CCFS not to emulate this behavior, but to contain garbage (the result of ``decrypting'' zeros) in sparse regions of files.

Assumptions about NFS client behavior

As currently described, CCFS could potentially suffer race conditions if it receives concurrent writes to the same file. For example, suppose a file is initially zero-length, and the NFS client issues two writes, one for 8K-1 bytes at offset 0, the other for 8K bytes at offset 8K. The NFS RPCs might proceed as follows:
NFS Client CCFS SFS Server
Write1 (off=0, count=8K-1)
Write2 (off=8K, count=8K)
Read1 (off=0, count=8K+15)
Write2' (off=8K, count=8K)
Read1-reply (EOF)
Write2'-reply
Write1' (off=0, count=8K+15)
Write2-reply
Write1'-reply
Write1-reply
Here Write1' will clobber the first 15 bytes of data written by Write2. The correct way to protect against this would be to keep track of outstanding WRITE RPCs on a particular file. This will be easier to do once you complete part D of the lab. A related problem would happen if the client issued two writes to the same 16-byte region (for instance writing byte 1 and byte 2 of the file in different RPCs).

Fortunately, most NFS client implementations only generate concurrent writes to the same files when those writes are for aligned buffers. Thus, you do not need to solve this problem in this part of the lab.

Another potential problem might occur if the client issued reads beyond the end of the file. For example, suppose you have a 17 byte plaintext file, and thus a 33 byte ciphertext file. Now suppose the client issued a read at offset 0 with count 32 bytes. According to the algorithm given for the lab, CCFS would pass the request straight through and get 32 bytes of data back without the eof flag set.

You can test for reads beyond the end of file using the attributes in the reply of the read. However, most NFS client implementations will not return data for read system calls that extend beyond the size field of fattr3 structures. Thus, you don't need to worry about this situation if you adjust file sizes properly.

Testing

To test your file system, you should make sure it is able to compile the emacs text editor. There is a script, ~class/bin/test-fs, which compiles emacs in the current working directory. (It uses a tiny utility called microtime to print out timestamps, but for the purposes of this assignment only correctness matters.) Your test should look something like this:
% mkdir /shome/c2/scratch/myname
% ./ccfs /shome/c2/scratch/myname test
Passphrase: 
^Z
Suspended
% bg
[1]    ./ccfs /shome/c2/scratch/myname test &
% cd /classfs/test
% test-fs
DIRECTORY: /classfs/test
TIME:      START == Sat Sep 29 15:15:27.448287 EDT 2001
tar xzf /home/c/os/class/src/emacs-20.7.tar.gz
    7.96s real     0.08s user     1.26s system
TIME:   UNTARRED == Sat Sep 29 15:15:35.601692 EDT 2001
env CFLAGS= ./configure
creating cache ./config.cache
checking host system type... i386-unknown-openbsd2.9
checking for gcc... gcc
checking whether the C compiler (gcc  ) works... yes
checking whether the C compiler (gcc  ) is a cross-compiler... no
checking whether we are using GNU C... yes
...
creating src/Makefile
   12.34s real     3.50s user     4.00s system
TIME: CONFIGURED == Sat Sep 29 15:15:47.986119 EDT 2001
gmake
cd lib-src; gmake all  \
  CC='gcc' CFLAGS='-g -O ' CPPFLAGS='' \
  LDFLAGS='-L/usr/X11R6/lib' MAKE='gmake'
gmake[1]: Entering directory `/var/tmp/emacs-20.7/lib-src'
gcc -DHAVE_CONFIG_H    -I. -I../src -I/var/tmp/emacs-20.7/lib-src -I/var/tmp/emacs-20.7/lib-src/../src -L/usr/X11R6/lib  -g -O  -o test-distrib /var/tmp/emacs-20.7/lib-src/test-distrib.c
./test-distrib /var/tmp/emacs-20.7/lib-src/testfile
gcc -DHAVE_CONFIG_H    -I. -I../src -I/var/tmp/emacs-20.7/lib-src -I/var/tmp/emacs-20.7/lib-src/../src -L/usr/X11R6/lib  -g -O  /var/tmp/emacs-20.7/lib-src/make-docfile.c -lc  -o make-docfile
...
gmake[1]: Nothing to be done for `all'.
gmake[1]: Leaving directory `/var/tmp/emacs-20.7/leim'
   85.93s real    66.92s user     5.62s system
TIME:   COMPILED == Sat Sep 29 15:17:13.933346 EDT 2001
rm -rf emacs-20.7
    2.07s real     0.00s user     0.18s system
TIME:        END == Sat Sep 29 15:17:16.009508 EDT 2001
% cd /
% kill %./ccfs
[1]    Terminated                    ./ccfs /shome/c2/scratch/myname myname

% 
If there is a compilation error, and you don't get to the end of the script (the rm -rf emacs-20.7), then your file system is doing something wrong. This will most likely manifest itself as ./configure failing to run, or the compilation failing.

Note that the compilation is somewhat lengthy. You probably don't want to run it with dmalloc -i 1, as the test will take too long. Just plain dmalloc high -i 0, or else -i 1000 should be fine.

What to hand in

As usual, make a tar.gz file with the command make distcheck. Copy the ccfs-0.0.tar.gz and a typescript file of your testing to ~class/handin/lab4b/username.

Extra credit

If you forget the passphrase you type to CCFS, you will lose all your files. There is no way to recover them. Ordinarily, this shouldn't be a problem. People using a tool like CCFS must accept that they cannot forget their passphrase. However, there is one slightly risky situation--what if you mistype your passphrase the first time you are creating a directory. On subsequent accesses, you will remember the password you wanted to type, but may not easily be able to figure out what you actually typed.

For extra credit, modify CCFS so that the first time you mount a ciphertext directory, it prompts you for the passphrase twice, aborting if the two passphrases do not match. On subsequent invocations, CCFS should refuse to mount a particular directory if you don't type the correct passphrase (as opposed to running, but encrypting everything with the wrong key).

Hint: You may pick a ``reserved'' file name that you assume no application will access. For example, the file name ".SFS \177KEY". (As a C-string, the '\177' is a delete character. Applications typically don't put spaces and deletes in file names.) You may store something in that file that helps you verify the key, or the file could be a symbolic link. However, make sure the file name doesn't show up in plaintext directory listings or it might confuse users.

NOTE: If you cut out all the entries in a READDIR or READDIRPLUS reply (for instance, because they are all hidden file names), you must either set eof or issue a new RPC starting at the cookie in the last entry you cut.

Include in the handin directory a short text file called extra-credit with a description of the exact technique you used to implement this feature.


Useful references

Useful classes and functions

Standard library

String functions

In addition to the discussion of str objects in
Using TCP through sockets, you may find the following useful:

Data serialization

The following functions are defined in "serial.h":

NFS-related functions

For the following functions, you need these includes:
#include "nfsserv.h"
#include "nfs3_nonnul.h"
#include "classfscli.h"
The skeleton CCFS code you will start with has a dispatch function that takes an argument nfscall *nc. This function gets called for every NFS3 RPC CCFS receives. The nfscall object has the following methods (written here as you would invoke these arguments on nfscall *nc):
CCFS also has a global object c of type ptr<sfsuclnt>. This object is used to send NFS RPCs to the remote SFS server that was specified on the command line. For more information, see classfscli.h. The main method you need to use is: One often wants to perform some operation for a large number of different NFS procedures. One possible approach is to demultiplex all 21 different NFS RPCs into different dispatch functions, and in each function implement the functionality you want. This turns out to be fairly painful in practice because you must write a large amount of repetitive code. Several functions use C++ templates to save you from having to do this.

Cryptographic functions

To access these functions, you will want the following include files in your program:
#include "crypt.h"
#include "aes.h"
The libraries you are using contain a cryptographic pseudo-random number generator, in a global object called rnd. Before using the random number generator, you must initialize it.

For actually encrypting and decrypting file data, you will use the Rijndael block cipher. Rijndael is a 128-bit block cipher. It supports two operations--encryption, and decryption. Encryption transforms 16 bytes (128 bits) of plaintext data into 16 bytes of ciphertext data using a secret key. Someone who does not know the secret key cannot recover the plaintext from the ciphertext. The decryption algorithm, given knowledge of the secret key, transforms ciphertext into plaintext.

The libraries you are using define a class called aes with the following methods:

The SHA-1 hash function hashes an arbitrary-length input (up to 2^64 bytes) to a 20-byte output. SHA-1 is known as a cryptographic hash function. While nothing has been formally proven about the function, it is generally assumed that SHA-1 is one-way and collision-resistant. These properties are defined as follows: The libraries you are using contain an implementation of SHA-1. The following functions are available for computing SHA-1: These functions are implemented in terms of a class called sha1ctx, with the following methods: