Distributed Storage Systems Lab2 - Cryptographic file system

Introduction

In the next several labs, you will build a cryptographic file system as an NFS loopback server. Cryptographic file systems are used to store sensitive information on disk. The idea is that in order to access files, the user must enter a secret passphrase. Without the passphrase, even someone who steals your physical disk will be unable to read the sensitive files you store there. Your file system will be called CCFS, and will be invoked with two arguments:
% ./ccfs path-to-encrypted-files name
Passphrase: 
path-to-encrypted-files is the path to a directory under which you want to store encrypted files. Once CCFS is running you will be able to access unencrypted versions of the files under /classfs/name. The minute you kill CCFS, however, the contents of the files will be inaccessible to someone who doesn't know the correct passphrase to restart CCFS. By the end of the labs, your file system will work something like this:
% mkdir /shome/cl2/scratch/myname
% ./ccfs /shome/cl2/scratch/myname myname
Passphrase: 
^Z
Suspended
% bg
[1]    ./ccfs /shome/cl2/scratch/myname myname &
% touch /classfs/myname/test
% echo hi > /classfs/myname/there
% cp /etc/termcap /classfs/myname/
% ls -al /classfs/myname/
total 732
drwxr-xr-x  2 dm  dm      512 Sep 23 21:38 .
dr-x------  4 dm  sfs     512 Sep 23 21:37 ..
-r--r--r--  1 dm  dm   732361 Sep 23 21:38 termcap
-rw-r--r--  1 dm  dm        0 Sep 23 21:37 test
-rw-r--r--  1 dm  dm        3 Sep 23 21:37 there
% ls -al /shome/cl2/scratch/myname
total 735
drwxr-xr-x   2 dm  dm     512 Sep 23 21:38 .
drwxr-xr-x  31 dm  dm    4096 Sep 23 18:32 ..
-rw-r--r--   1 dm  dm      16 Sep 23 21:37 8gM7Ga4VrGrDJZjTa0Ruzg
-rw-r--r--   1 dm  dm     531 Sep 23 21:37 dWbFGNqKIUCB-dw0w10rRg
-r--r--r--   1 dm  dm  732889 Sep 23 21:38 ml0WK4ekrOXRPOG0CvCVZQ
% kill %./ccfs
[1]    Terminated                    ./ccfs /shome/cl2/scratch/myname myname

% ls -al /classfs/myname/
ls: /classfs/myname: No such file or directory
% 
Not only are the file names on disk unintelligible, but the file contents, too. Thus, even someone who breaks into the file server will not be able to read your files without knowing the secret passphrase.

Approach

CCFS will be implemented as an NFS loopback server. That means you will write a user-level program that emulates a remote NFS server by accepting NFS RPCs from the local operating system kernel. You will use the asynchronous RPC library to handle multiple NFS RPCs in parallel. Encrypted files will be stored on a remote SFS file server. CCFS will communicate with the remote SFS server using non-blocking socket I/O. Thus, CCFS will be completely asynchronous. The following diagram depicts the architecture of CCFS:

You will begin this project with a ``dumb,'' ~100-line file system that does nothing but relay NFS calls. You will build CCFS by progressively modifying this dumb file system until it encrypts all file contents and file names.

In order to build CCFS, you will make use of the classfs framework. Classfs contains a daemon, classfsd, a library, libclassfs.a, and a header file, classfscli.h. The principal purpose of the library and associated header are to communicate with classfsd and the remote SFS server when initially setting things up. classfsd is already installed and running on the class machines. The library is in ~class/src/classfs.

The classfsd daemon serves two functions. First, it handles the nasty and unportable details of creating NFS loopback mounts. Second, it will clean up the mess if your CCFS implementation crashes. classfsd is only active when you are first starting up or after CCFS exits or crashes. Otherwise, your CCFS implementation will be speaking NFS directly to the kernel.


Part A -- Getting acquainted with the software

Part A of this lab should be trivial, while Part B is due less than a week after Part A. Thus, you should finish Part A sooner if possible and start work on Part B. (The point of Part A is mostly to make sure you have at least looked at the software by the last lecture before Part B is due. Really you should consider that you have 1 1/2 weeks to complete part B.)

Getting started with SFS

Since CCFS relies on SFS, the first thing you must do is register a public key with SFS on the class server machine. To do this, execute the following command:
% ssh -t class-serv.scs.stanford.edu sfskey register
sfskey: creating directory /home/fs/yourname/.sfs
Registering new user yourname@class-serv.scs.stanford.edu.
Creating new key for yourname@class-serv.scs.stanford.edu.
       Key Name: yourname@class-serv.scs.stanford.edu
Enter passprase: type a passphrase
          Again: type it again

sfskey needs secret bits with which to seed the random number generator.
Please type some random or unguessable text until you hear a beep:
DONE            
  UNIX password: type your password here
Connection to class-serv closed.
% 
It may take several minutes for your public key to propagete to all the class machines, so run this command now before reading the rest of the lab.

Once you are registered with the SFS server, you must run an sfsagent process on any client machine from which you wish to access an SFS server. You have been accessing files on the class machines under /home through NFS. However, once you have run an sfsagent, you can access the same files through SFS under /shome. This includes both your home directory (/shome/cs2/yourname) and the scratch directories under /shome/clN. When you are done, before logging out, you should kill your agent with the sfskey kill command. For example:

% sfsagent
Passphrase for /home/fs/yourname/.sfs/identity: type your passphrase
% cd /shome/fs/yourname
% ls -al
drwxr-xr-x   8 student  class      512 Sep 23 22:55 .
drwxr-xr-x  15 root     wheel      512 Sep 10 18:21 ..
-rw-------   1 student  class      811 Sep 23 22:50 .Xauthority
-rw-------   1 student  class     3418 Sep  6 11:41 .Xdefaults
-rw-r--r--   1 student  class     2841 Sep  6 11:42 .cshrc
-rw-------   1 student  class     6625 Jan 17  2001 .emacs
...
% 
And when finally logging out:
% sfskey kill
sfsagent: EOF from sfscd
sfsagent: exiting
% 

Compiling the CCFS software

To get started with the software, you should unpack the ``dumb'' skeletal CCFS source code from ~class/src/ccfs.tar.gz. The setup procedure is similar to the previous lab, except that you must additionally give ./configure the argument --with-classfs=/home/cs2/class/src/classfs:
% cd
% tar xzf ~class/src/ccfs.tar.gz
% cd ccfs
% sh ./setup
+ chmod +x setup
+ libtoolize
...
+ set +x

            *** * * * * * * * * * * * * * * * * ***
            ***         setup succeeded         ***
            *** * * * * * * * * * * * * * * * * ***

% mkdir -p /home/cl3/scratch/yourname/ccfs
% pushd /home/cl3/scratch/yourname/ccfs
/home/c3/scratch/yourname ~/ccfs 
% setenv DEBUG -g
% =1/configure --with-classfs=/home/cs2/class/src/classfs --with-sfs=/home/cs2/class/src/sfs1
creating cache ./config.cache
checking for a BSD compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking whether make sets ${MAKE}... yes
checking for working aclocal... found
checking for working autoconf... found
checking for working automake... found
checking for working autoheader... found
checking for working makeinfo... found
checking host system type... i386-unknown-openbsd2.9
...
updating cache ./config.cache
creating ./config.status
creating Makefile
creating config.h
% gmake
...
(If you are using classN, then make your build directory on /home/clN rather than /home/cl3.)

Running CCFS

Once you have compiled CCFS, you can run the software. CCFS takes two arguments: First, a directory on an SFS file system, second a name under which to access the loopback server. (By default, the second argument, if omitted, will be the last component of the pathname in the first argument.) For example, you might make a scratch directory on the machine class2, and access it from a different machine:
% mkdir /shome/cl2/scratch/myname
% ./ccfs /shome/cl2/scratch/myname myname
^Z
Suspended
% bg
[1]    ./ccfs /shome/cl2/scratch/myname myname &
% 
Now /classfs/myname and /shome/cl2/scratch/myname will appear to be the same directory, except that /classfs/myname will be going through your software.

NOTE: You cannot access an SFS server from a client on the same machine. Because of the NFS loopback server deadlock issues we discussed in class, SFS will not connect to a client on the same machine. If, for instance, you try to access /shome/cl2 on class2, you will get a ``Resource deadlock avoided'' or ``not an SFS file system'' error.

When you kill CCFS, classfsd will attempt to unmount the file system. It will be unable to do so if any of your shells still has /classfs/myname as a working directory. Thus, make sure to cd / after killing CCFS. If you forget to leave a directory after killing CCFS, and you try to restart CCFS using the same second argument (name under /classfs), CCFS will give you the error File exists.

One final word about classfsd: Because of the risk of deadlock with NFS loopback mounts, classfsd periodically pings CCFS with an NFS request. If it does not receive any replies for 10 minutes, classfsd takes over the server UDP socket and attempts to unmount the file system. This should ordinarily not cause you any trouble, but if you leave CCFS stopped under the debugger for 10 minutes, your file system will get unmounted.

Tracing RPCs

Once you've gotten the skeletal CCFS running, try the following command in one window, while browsing /classfs/myname in a different window:
% env ASRV_TRACE=10 ./ccfs /shome/cl2/scratch/myname myname
This command prints a complete trace of all NFS requests received by CCFS. (Large structures may be truncated; if this is ever a problem, try higher values than 10.) Similarly, setting ACLNT_TRACE instead of ASRV_TRACE shows a trace of all the NFS requests CCFS sends to the remote SFS server. Tracing RPCs can be invaluable in debugging strange behavior of your cryptographic file system--you can usually track the problem to a single RPC and then see why your code is misbehaving in that case. Now redirect the tracing output to a file:
% env ASRV_TRACE=10 ./ccfs /shome/cl2/scratch/myname myname >& nfs.trace
(Note, you must use >& rather than just > because the tracing goes to standard error. If you use a Bourne-like shell instead of the default tcsh, you might need to use 2> instead of >&.)

After setting up CCFS to trace NFS traffic, run the following commands:

% cd /classfs/myname
% rm junk
rm: junk: No such file or directory
% echo hello > junk
% cat junk
hello
% cat junk
hello
% 
Now stop CCFS, and look at the RPCs in the nfs.trace file.

What to hand in

Hand in a copy of the nfs.trace file, which you have annotated to show which RPCs correspond to which of the commands you ran. At the end, briefly explain any difference between the RPCs caused by the two cat commands. As usual, you should copy this nfs.trace to ~class/handin/lab2a/`logname`/.


Part B -- Encrypting file contents

In this part, you will modify CCFS to encrypt all file contents written to the server and decrypt all contents read from the server. You will primarily do this by special-casing the NFSPROC3_READ and NFSPROC3_WRITE RPCs, but you will also need to change handling of NFSPROC3_SETATTR. Finally, in all NFS RPCs you will need to adjust returned file attributes slightly.

For encrypting and decrypting data, you will be using the Rijndael (AES) algorithm. The implementation you will be using is described below. One of the complications of encrypting file data is that AES operates on blocks of 16 bytes. Thus, all read and write operations must be in aligned multiples of 16 bytes. Furthermore, files with sizes not a multiple of 16 bytes will have to be padded slightly. Before adding encryption to CCFS, therefore, you will first modify the software to deal with file padding and to read and write multiples of 16 bytes.

Though you may be adding a few bytes to the ends of files, you would like file sizes to appear correct to the user. It is easy to do this if you observe the following rule: Add 16 bytes to the size of a file if and only if its size is not a multiple of 16 bytes. Thus, a 16-byte plaintext file will result in a 16-byte ciphertext file, but a 17-byte plaintext file will result in a 33 byte ciphertext file. Given this scheme, ciphertext file sizes can easily be adjusted in file attributes to contain the size of plaintext files. [If (size>16) and (size&15) then size-=16.] Before adding encryption, then, you will need to modify the following RPCs:

Once you have modified CCFS to read and write data in aligned multiples of 16-bytes, you are ready to begin encrypting and decrypting file contents. Upon startup, CCFS should get a passphrase from the user and initialize an aes object with this passphrase. One possible approach to encrypting files is simply to encrypt every 16-byte region with the aes object. However, then if two 16-byte regions of a plaintext file contain the same data, the encrypted file will contain the same ciphertext. For better security, such patterns should be hidden from people who have access to the ciphertext file.

You can ensure that identical 16-byte regions are encrypted differently by throwing the position into the equation. Let E(B) represent the encryption of 16-byte block B. (Of course, E requires an encryption key not shown in this notation. Just think of E as a method of an object that contains the key.) When transforming the plaintext data block P at offset pos to ciphertext block C, you should calculate C = E(P XOR E(pos,0)). In other words, pad the position pos to 16 bytes with 0s, and encrypt it to generate 16 bytes of random looking data. Then XOR this data, byte-by-byte, with the plaintext before encrypting. If D is the decryption function, then to decrypt you can simply compute P = D(C) XOR E(pos,0).

Note: Ordinarily, when one creates a sparse file (by extending the file with ftruncate or by writing far beyond the end of the file), unwritten portions of the file contain zeros. It is okay for CCFS not to emulate this behavior, but to contain garbage (the result of ``decrypting'' zeros) in sparse regions of files.

Assumptions about NFS client behavior

As currently described, CCFS could potentially suffer race conditions if it receives concurrent writes to the same file. For example, suppose a file is initially zero-length, and the NFS client issues two writes, one for 8K-1 bytes at offset 0, the other for 8K bytes at offset 8K. The NFS RPCs might proceed as follows:
NFS Client CCFS SFS Server
Write1 (off=0, count=8K-1)
Write2 (off=8K, count=8K)
Read1 (off=0, count=8K+15)
Write2' (off=8K, count=8K)
Read1-reply (EOF)
Write2'-reply
Write1' (off=0, count=8K+15)
Write2-reply
Write1'-reply
Write1-reply
Here Write1' will clobber the first 15 bytes of data written by Write2. The correct way to protect against this would be to keep track of outstanding WRITE RPCs on a particular file. This will be easier to do once you complete part D of the lab. A related problem would happen if the client issued two writes to the same 16-byte region (for instance writing byte 1 and byte 2 of the file in different RPCs).

Fortunately, most NFS client implementations only generate concurrent writes to the same files when those writes are for aligned buffers. Thus, you do not need to solve this problem in this part of the lab.

Another potential problem might occur if the client issued reads beyond the end of the file. For example, suppose you have a 17 byte plaintext file, and thus a 33 byte ciphertext file. Now suppose the client issued a read at offset 0 with count 32 bytes. According to the algorithm given for the lab, CCFS would pass the request straight through and get 32 bytes of data back without the eof flag set.

You can test for reads beyond the end of file using the attributes in the reply of the read. However, most NFS client implementations will not return data for read system calls that extend beyond the size field of fattr3 structures. Thus, you don't need to worry about this situation if you adjust file sizes properly.

Testing

To test your file system, you should make sure it is able to compile the gzip compression program. There is a script, ~class/bin/test-fs, which compiles gzip in the current working directory. (It uses a tiny utility called microtime to print out timestamps, but for the purposes of this assignment only correctness matters.) Your test should look something like this:
% mkdir /shome/cl2/scratch/myname
% ./ccfs /shome/cl2/scratch/myname test
Passphrase: 
^Z
Suspended
% bg
[1]    ./ccfs /shome/cl2/scratch/myname test &
% cd /classfs/test
% test-fs
DIRECTORY: /classfs/test
TIME:      START == Fri Jan 20 22:12:11.838528 PST 2006
tar xzf /home/cs2/class/src/gzip-1.2.4a.tar.gz
    0m0.82s real     0m0.01s user     0m0.00s system
TIME:   UNTARRED == Fri Jan 20 22:12:12.666431 PST 2006
env CFLAGS= ./configure
checking for gcc
...
creating Makefile
    0m1.31s real     0m0.14s user     0m0.15s system
TIME: CONFIGURED == Fri Jan 20 22:12:13.980164 PST 2006
gmake
gcc -c -DASMV -DSTDC_HEADERS=1 -DHAVE_UNISTD_H=1 -DDIRENT=1  gzip.c
...
ln gzip gunzip
ln gzip gzcat
    0m1.38s real     0m0.48s user     0m0.14s system
TIME:   COMPILED == Fri Jan 20 22:12:15.363102 PST 2006
rm -rf gzip-1.2.4a
    0m0.25s real     0m0.00s user     0m0.00s system
TIME:        END == Fri Jan 20 22:12:15.621950 PST 2006

% cd /
% kill %./ccfs
[1]    Terminated                    ./ccfs /shome/cl2/scratch/myname myname

% 
If there is a compilation error, and you don't get to the end of the script (the rm -rf command), then your file system is doing something wrong. This will most likely manifest itself as ./configure failing to run, or the compilation failing.

Note that the compilation is somewhat lengthy. You probably don't want to run it with dmalloc -i 1, as the test will take too long. Just plain dmalloc high -i 0, or else -i 1000 should be fine.

What to hand in

As usual, make a tar.gz file with the command make distcheck. Copy the ccfs-0.0.tar.gz and a typescript file of your testing to ~class/handin/lab2b/username.

Extra credit

If you forget the passphrase you type to CCFS, you will lose all your files. There is no way to recover them. Ordinarily, this shouldn't be a problem. People using a tool like CCFS must accept that they cannot forget their passphrase. However, there is one slightly risky situation--what if you mistype your passphrase the first time you are creating a directory. On subsequent accesses, you will remember the password you wanted to type, but may not easily be able to figure out what you actually typed.

For extra credit, modify CCFS so that the first time you mount a ciphertext directory, it prompts you for the passphrase twice, aborting if the two passphrases do not match. On subsequent invocations, CCFS should refuse to mount a particular directory if you don't type the correct passphrase (as opposed to running, but encrypting everything with the wrong key).

Hint: You may pick a ``reserved'' file name that you assume no application will access. For example, the file name ".SFS \177KEY". (As a C-string, the '\177' is a delete character. Applications typically don't put spaces and deletes in file names.) You may store something in that file that helps you verify the key, or the file could be a symbolic link. However, make sure the file name doesn't show up in plaintext directory listings or it might confuse users.

NOTE: If you cut out all the entries in a READDIR or READDIRPLUS reply (for instance, because they are all hidden file names), you must either set eof or issue a new RPC starting at the cookie in the last entry you cut.

Include in the handin directory a short text file called extra-credit with a description of the exact technique you used to implement this feature.


Part C -- Encrypting file names and symbolic links

While CCFS currently encrypts file contents, you can still learn quite a bit about the contents of a directory tree from the file names. In this part of the lab, therefore, you will extend CCFS to encrypt file names and symbolic links.

There are several complications to encrypting file names. First, AES works on blocks of 16 bytes. File names must therefore be padded to a multiple of 16 bytes. This is okay--since file names cannot contain 0-valued bytes, you can pad file names with zeros, and then when decrypting determine the length by finding the first zero (if any).

Conversely, since file names cannot contain 0-valued bytes, you must ensure that the result of encrypting a file name does not produce a string containing zero bytes. Simply running a 16-byte block of an ASCII file name through the AES encryption algorithm produces random-looking binary data, which risks containing a zero byte. Even if the binary data does not contain a zero byte, it might contain '/' characters, which are also illegal, or else control characters or other octets that would not display well on people's terminals. Thus, you should transform encrypted file names back to ASCII test using the armor64A function (for which you must include "serial.h").

Another issue is that for file names longer than 16 bytes, you would like to conceal any 16-byte plaintext blocks that are repeated. Otherwise, people may deduce things they shouldn't from your ciphertext file names. For example, suppose you have a directory containing four files, named a, b, 17char-filename-a, and 17char-filename-b. If you simply encrypt block-by-block, an attacker will know the following:

If the attacker happened to know you were running some application that produced files called 17char-filename-a, ..., he or she would then know that you also had files called a, ....

It is a general problem when using block ciphers like AES that you would like to conceal any repetition of 16-byte regions of plaintext. In the case of individual files, you helped alleviate this problem by XORing in an encryption of the 16-byte block position. File encryption is somewhat special in that one needs to support random access to file contents. For encrypting a stream of bytes that does not require random access, people often employ a technique known as cipher block chaining (CBC). To encrypt in CBC mode, one XORs each plaintext block with the encryption of the previous block before encrypting, as shown here:

If the plaintext blocks are m1, m2, ..., and the ciphertext blocks c1, c2, ..., then encryption and decryption in CBC mode are performed as follows:

ci = E(mi XOR ci-1)
mi = D(ci) XOR ci-1
The first plaintext block is XORed with an initialization vector, or IV (which you can think of as c0, since there is no m0). The IV can be publicly known. It is often just 0, unless the same key will be used to encrypt multiple streams, in which case each stream must use a different IV.

Encrypting symbolic links

You should start by encrypting the contents of symbolic links, as this is simpler that most other occurrences of file names. You will modify the NFSPROC3_SYMLINK and NFSPROC3_READLINK RPCs to encrypt and decrypt the contents of symbolic links, respectively.

When encrypting links, you would like two symbolic links to look different even if they are pointing to the same destination. This can easily be achieved by choosing a random initialization vector and prepending it to the encrypted link contents. Thus, the process of encrypting a link will have the following steps (where ## denotes concatenation):

  1. IV = 16 random bytes
  2. symlink_data = symlink_data ## 0-valued bytes (pad to multiple of 16 bytes)
  3. symlink_data = Encrypt_CBC (IV, symlink_data)
  4. symlink_data = IV ## symlink_data
  5. symlink_data = armor64A (symlink_data)
Decryption follows the reverse process. In implementing this, you may find some of the string functions below useful.

Encrypting file names

Unlike symbolic link contents, file names in directories cannot be randomized in CCFS. If a user creates a file, and later looks up the file by name, the same plaintext file name must always encrypt to the same ciphertext name so that you look up the same file. However, you would still like to conceal any repetition of 16-byte blocks in plaintext file names. You can do this if every byte of an encrypted file name depends on every byte of the plaintext file name.

Encrypting file names in CBC mode would ensure that a plaintext block's encryption depended on all previous plaintext blocks. However, in the case of file names, we would also like a block's encryption to depend on any subsequent blocks. You can achieve this by encrypting file names twice in CBC mode, once forwards and once backwards. As an optimization, the last block does not need to be encrypted twice. Thus, file names of 16 bytes or less will only require one encryption. To encrypt a file name with blocks n1, n2, ..., nk you compute the encryption c1, c2, ..., ck in terms of intermediary values c'i as follows:

c'0 = 0
c'i = E(mi XOR c'i-1)       (for 1<=i<=k)
ck = c'k
ci = E(c'i XOR c'i+1)       (for 1<=i<k)

File names occur in the arguments of a number of different NFS calls, but always as part of a diropargs3 structure. The nfs3_traverse_arg function template described below can help you process all diropargs3 structures without special-casing each NFS call. Thus, you need only write a single function to encrypt the file name in a diropargs3 structure.

File names appear in the replies to two NFS RPCs, NFSPROC3_READDIR and NFSPROC3_READDIRPLUS. You must decrypt all file names in the replies to these RPCs. You can do this by just manually traversing the linked lists of entry3 and entry3plus data structures. (It's not worth using nfs3_traverse_res, since each data structure appears in only one return type.)

NOTE: For obvious reasons, do not encrypt or decrypt the file names ``.'' and ``..''!

What to hand in

As usual, make a tar.gz file with the command make distcheck. Copy the ccfs-0.0.tar.gz and a typescript file of your testing (including an invocation of test-fs) to ~class/handin/lab2c/username.

Extra credit

People tend to reuse passwords. It would be nice if when a user creates two, separate encrypted directories but uses the same password for both, file names nonetheless look different. For extra credit, modify CCFS so that each time you create an encrypted mount point, the same file names look different even if you choose the same password.

Hint: If you implemented the last extra-credit, you already store some state to verify a mount point's password. You can store additional state to help with the encryption and decryption of file names.

Include in the handin directory a short text file called extra-credit with a description of the exact technique you used to implement this feature.


Part D -- Initialization vectors

CCFS now encrypts both file names and file contents. However, if two plaintext files contain the same 16-byte data block at the same offset, the corresponding ciphertext blocks will also be identical. This may reveal information it shouldn't about file system usage. For example, the emacs text editor keeps backup copies of files (``~ files''). By comparing a ciphertext file to its backup one can tell at what offset a user started editing.

In this final part of the lab, you will introduce a per-file initialization vector that ensures two files with the same contents do not produce the same encryption. Whenever a file is created, you will chose a 64-bit initialization vector for the file. Recall that in part B, you converted between a plaintext file block P at offset pos and the corresponding a ciphertext block C with:

C = E(P XOR E(pos,0))
P = D(C) XOR E(pos,0)
Now you will add the file's initialization vector to the equation:
C = E(P XOR E(pos,IV))
P = D(C) XOR E(pos,IV)

The remaining question is where to store the initialization vector. The simplest technique is simply to store the initialization vector in the file itself, at the beginning of the file. You can simply add 8 bytes to the offsets of all READ and WRITE RPCs. Similarly, you must subtract 8 from the size field of the fattr3 and wcc_attr structures of regular files before performing the other length adjustments from part B.

You will want to keep a cache of initialization vectors, to improve performance. When you see a reference for a file handle not in the cache, you should read the first 8 bytes of the file. (The file may be zero length because of a crash. In this case, it would be elegant to write a new IV, but for the purposes of this lab you may also just return NFS3ERR_IO.)

In addition, you must store the initialization vector in a file when the file is first created. The complication here is that in response to non-exclusive opens (O_CREAT but not O_EXCL), clients may call NFSPROC3_CREATE on files that already exist. With how set to UNCHECKED, such CREATE RPCs will succeed. You want to avoid at all cost overwriting the initialization vector of a file, as you will turn the contents of the file to garbage.

You should employ two techniques to avoid overwriting initialization vectors. First, if a file's length is non-zero, do not write the IV. Instead, read what has already been written to the file. Second, use a deterministic 64-bit hash of the NFS file handle as an initialization vector (for instance, the first 64-bits of a SHA-1 hash of the handle). NFS file handles rarely change (usually only when the server is restored from a backup tape or the server's software is upgraded). Thus, even if two clients accidentally write the same IV, chances are overwhelming that they will chose the same value. (NOTE: As an alternative to these approaches, you might instead implement UNCHECKED CREATE RPCs in terms of GUARDED ones.)

What to hand in

As usual, make a tar.gz file with the command make distcheck. Copy the ccfs-0.0.tar.gz and a typescript file of your testing (including an invocation of test-fs) to ~class/handin/lab2d/username.

Extra credit

Add a per-directory initialization vector. In the extra credit for Part C, you already made the same plaintext file names produce different ciphertext names in different CCFS mountpoints. Extend this mechanism so that even within a single CCFS mountpoint, files with the same name in different directories have different encryptions.

Include in the handin directory a short text file called extra-credit with a description of the exact technique you used to implement this feature.


Useful references

In addition to the class references page, you may find the following links useful:

Useful classes and functions

Standard library

String functions

In addition to the discussion of str objects in Using TCP through sockets, you may find the following useful:

Data serialization

The following functions are defined in "serial.h":

NFS-related functions

For the following functions, you need these includes:
#include "nfsserv.h"
#include "nfs3_nonnul.h"
#include "classfscli.h"
The skeleton CCFS code you will start with has a dispatch function that takes an argument nfscall *nc. This function gets called for every NFS3 RPC CCFS receives. The nfscall object has the following methods (written here as you would invoke these arguments on nfscall *nc): CCFS also has a global object c of type ptr<sfsuclnt>. This object is used to send NFS RPCs to the remote SFS server that was specified on the command line. For more information, see classfscli.h. The main method you need to use is: One often wants to perform some operation for a large number of different NFS procedures. One possible approach is to demultiplex all 21 different NFS RPCs into different dispatch functions, and in each function implement the functionality you want. This turns out to be fairly painful in practice because you must write a large amount of repetitive code. Several functions use C++ templates to save you from having to do this.

Cryptographic functions

To access these functions, you will want the following include files in your program:
#include "crypt.h"
#include "aes.h"
The libraries you are using contain a cryptographic pseudo-random number generator, in a global object called rnd. Before using the random number generator, you must initialize it.

For actually encrypting and decrypting file data, you will use the AES block cipher. AES is a 128-bit block cipher. It supports two operations--encryption, and decryption. Encryption transforms 16 bytes (128 bits) of plaintext data into 16 bytes of ciphertext data using a secret key. Someone who does not know the secret key cannot recover the plaintext from the ciphertext. The decryption algorithm, given knowledge of the secret key, transforms ciphertext into plaintext.

The libraries you are using define a class called aes with the following methods:

The SHA-1 hash function hashes an arbitrary-length input (up to 2^64 bytes) to a 20-byte output. SHA-1 is known as a cryptographic hash function. While nothing has been formally proven about the function, it is generally assumed that SHA-1 is one-way and collision-resistant. These properties are defined as follows: The libraries you are using contain an implementation of SHA-1. The following functions are available for computing SHA-1: These functions are implemented in terms of a class called sha1ctx, with the following methods: