Computer System Security
Lab 1 - Encryption/Decryption Utility

Due date: Tuesday Feb. 22. Free extension to midnight if you show up in class.

Introduction

In this lab, you will build a simple encryption/decryption utility. The utility will allow you to generate private key/public key pairs for an Adaptive Chosen-Ciphertext (CCA) secure Public-Key Encryption scheme. You can then exchange your public key with your friends, and they will be able to encrypt files under your public key, so that only you can recover the original plaintext.

We provide you with a simple cryptographic library (~class/src/libdcrypt-0.2.tar.gz), containing an implementation of all the cryptographic tools you will need for the lab. By doing the lab, you will gain experience about how to use Cryptography to build security properties into your system.

Software Setup

The files you will need for this lab are available in ~class/src/lab1.tar.gz. To install them in your account, type the following:

% tar xzf ~class/src/lab1.tar.gz
% cd lab1
% 

We have set up the appropriate libraries and header files so that you can compile against libdcrypt on the class machines. If you are instead working on your own machine, you should install the dcrypt library in your system. (You will also need to have the GNU MP and dmalloc libraries installed in your machine.)

Lab Specification

Your Encryption/Decryption Utility will consist of three programs: edu_keygen, edu_encrypt and edu_decrypt. We provide you with a skeleton source directory (~class/src/lab1.tar.gz), containing the following files:

% ls ~class/src/lab1/
Makefile        edu_decrypt.c   edu_keygen.c
edu.h           edu_encrypt.c   edu_misc.c
%

Once you have implemented the necessary functions (see below), you will build the three programs edu_keygen, edu_encrypt and edu_decrypt using make:

% make
gcc -g -O2 -ansi -Wall -Wsign-compare -Wchar-subscripts -Werror -I. -I/usr/local
/include/ -c edu_keygen.c edu_misc.c
gcc -g -O2 -ansi -Wall -Wsign-compare -Wchar-subscripts -Werror -o edu_keygen ed
u_keygen.o edu_misc.o /usr/local/lib/libdcrypt.a /usr/local/lib/libgmp.a -L/usr/
local/lib/ -ldmalloc
gcc -g -O2 -ansi -Wall -Wsign-compare -Wchar-subscripts -Werror -I. -I/usr/local
/include/ -c edu_encrypt.c edu_misc.c
gcc -g -O2 -ansi -Wall -Wsign-compare -Wchar-subscripts -Werror -o edu_encrypt e
du_encrypt.o edu_misc.o /usr/local/lib/libdcrypt.a /usr/local/lib/libgmp.a -L/us
r/local/lib/ -ldmalloc
gcc -g -O2 -ansi -Wall -Wsign-compare -Wchar-subscripts -Werror -I. -I/usr/local
/include/ -c edu_decrypt.c edu_misc.c
gcc -g -O2 -ansi -Wall -Wsign-compare -Wchar-subscripts -Werror -o edu_decrypt e
du_decrypt.o edu_misc.o /usr/local/lib/libdcrypt.a /usr/local/lib/libgmp.a -L/us
r/local/lib/ -ldmalloc
%

You should now be able to create your own private key/public key pair, and use them as follows:

% ./edu_keygen my_key.priv my_key.pub 
% yes "test" | head -1000 > a_file
% ./edu_encrypt my_key.pub a_file an_encrypted_file
% ./edu_decrypt my_key.priv an_encrypted_file a_decrypted_file 
% diff a_file a_decrypted_file
%

Now you think, "This is cool!," and decide to give a copy of your encryption/decryption utility to your friend (whose name, surprise, surprise, we will assume is Bob). Now that Bob has your program, you can exchange public keys:

% cat my_key.pub | mail bob@bobserver.org 
%

Once Bob gets your email ...

bob>
You have mail in /var/mail/bob
... he will store your public key in a file named, let's say, alice.pub. Now Bob can send you highly confidential messages:
bob> echo "The deadline for lab1 is Feb. 22" > msg.plain 
bob>  ./edu_encrypt alice.pub msg.plain msg.cipher 
bob> uuencode -m msg.cipher bob_msg.cipher | mail alice@aliceserver.org
bob>
When you get Bob's email:
% ...
You have mail in /var/mail/alice
... you will uudecode it, obtaining a ciphertext file named bob_msg.cipher, and then you will use your private key to unveil Bob's top-secret communication:
% ./edu_decrypt my_key.priv bob_msg.cipher bob_msg.plain 
% cat bob_msg.plain 
The deadline for lab1 is Feb. 22

Design Guidelines

Now that you know how your encryption/decryption utility is supposed to work, let's get into doing something.

To help you get acquainted with libdcrypt, we implemented edu_keygen for you: take a look at edu.h, edu_misc.c and edu_keygen.c, and make sure you understand what's going on.

We also provide you with an incomplete implementation of edu_encrypt and edu_decrypt in edu_encrypt.c and edu_decrypt.c. Your job is to fill in the code for the two procedures:

Encrypting the content

The task of encrypt_file is to read the content from the file descriptor fin, encrypt it using pk, and place the resulting ciphertext in a file named ctxt_fname.

The encryption should be CCA-secure (cf. first class), which is the level of cryptographic protection that you should always expect of any implementation of an encryption algorithm.

Below we describe a possible design, based on the so-called "hybrid encryption" paradigm, but you are free to follow a different approach if you want, as long as it is also CCA-secure. The "hybrid encryption" paradigm combines both public-key and symmetric encryption functions: the idea is to:

  1. pick a random symmetric key K and encrypt it under the public key pk; and
  2. use K to encrypt the actual content read from the fin file descriptor using e.g., AES.

For this to be secure, the public-key encryption scheme used to encrypt K must be CCA-secure (but you don't have to worry about this; the public-key encryption functions included in the library you are going to use are all CCA-secure).

The symmetric encryption part must also be CCA-secure: one good approach is to use AES in CBC-mode (described below), and then append an HMAC SHA-1 mac of the resulting ciphertext. Always mac after encrypting!

The dcrypt library contains implementations of AES and of HMAC SHA-1 (cf. Useful functions). However, you should take care of using AES in CBC-mode, as the library only gives access to the basic AES block cipher functionality, which is not CCA-secure.

Notice that the key used to compute the HMAC SHA-1 mac must be different from the one used by AES. Never use the same cryptographic key for two different purposes: bad interference could occur.
For this reason, the key K encrypted under the public key pk actually consists of two pieces, one for AES and one for HMAC SHA-1. The length of both pieces (and hence the cryptographic strength of the encryption) is specified by the constant CCA_STRENGTH in edu.h; the default is 128 bits, or 16 bytes.

Recall that AES can only encrypt blocks of 128 bits, so you should use some padding in the case that the length (in bytes) of the plaintext is not a multiple of 16. This should be done in a way that allow proper decoding after decryption: in particular, the recipient must have a way to know where the padding begins so that it can be chopped off.

One possible design is to add enough 0 bytes to the plaintext so as to make its length a multiple of 16, and then append a byte at the end specifying how many zero-bytes were appended.

Thus, the overall layout of an encrypted file will be:

         +-----+-----------+--------------------------+---+--------+
         | X_l |     X     |             Y            | W | padlen |
         +-----+-----------+--------------------------+---+--------+

where X = PKE (pk, {K_AES, K_HSHA-1})
      X_l = length of X in bytes
      Y = CBC-AES (K_AES, {plaintext, 0^padlen})
      W = HMAC-SHA-1 (K_HSHA-1, Y)
      padlen = no. of zero-bytes added to the plaintext to make its
               length a multiple of 16
As for the sizes of the various components of a ciphertext file, notice that:

Decrypting the content

The task of decrypt_file is to read the ciphertext from the file descriptor fin, decrypt it using sk, and place the resulting plaintext in a file named ptxt_fname.

This procedure basically should just "undo" the operations performed by encrypt_file; for this reason, decrypt_file expects a ciphertext featuring the structure described above.

Notice that reading in X (i.e., the piece of ciphertext that encapsulates the symmetric keys used to encrypt and mac the actual file content) is easy, as its length is prefixed, so we know exactly how many bytes to read.

Reading Y (and then the mac and the pad length) is a bit trickier: below we sketch one possible approach, but you are free to implement this as you wish.

The idea is based on the fact that the ciphertext file ends with 21 bytes (i.e., the size of a hash + 1) used up by the HSHA-1 mac and by the pad length. Thus, we will repeatedly attempt to perform "long reads" of (aes_blocklen + sha1_hashsize + 2) bytes: once we get to the end of the ciphertext and only the last chunk of Y has to be read, such "long reads" will encounter the end-of-file, at which point we will know where Y ends, and how to finish reading the last bytes of the ciphertext.

Cipher-Block Chaining (CBC) Mode

For encrypting a stream of bytes that does not require random access, people often employ a technique known as Cipher-Block Chaining (CBC). To encrypt in CBC mode, one thinks of the stream of bytes as a sequence of block, each of the size of the block cipher being used (AES in your case); then, one XORs each plaintext block with the encryption of the previous block before encrypting, as shown here:
Ciphertext-Block Chaining
If the plaintext blocks are m1, m2, ..., and the ciphertext blocks c1, c2, ..., then encryption and decryption in CBC mode are performed as follows:
ci = E(mi XOR ci-1)
mi = D(ci) XOR ci-1
The first plaintext block is XORed with an initialization vector, or IV (which you can think of as c0, since there is no m0). The IV can be publicly known. It is often just 0, unless the same key will be used to encrypt multiple streams, in which case each stream must use a different IV.

Extra credit

The current implementation of edu_keygen stores the private key in a file with permissions set to 0600---readable only by the owner. This protects your private key from other users in your system, but won't prevent your system administrator from reading it. A better design would be to prompt the user for a passphrase, and then use (a hash of) the passphrase to encrypt the private key before storing it to the file.

For extra credit, modify edu_keygen so that it prompts for a passphrase twice, aborting if the two passphrases do not match. (Asking for the passphrase twice protects from the case that you mistype your passphrase---otherwise, on later access you will remember the password you wanted to type, but may not easily be able to figure out what you actually typed.)

You will also need to modify edu_decrypt so that it also prompts for a passphrase, and uses it to decrypt the ciphertext containing the private key. Then, you should be able to run the edu_keygen, edu_encrypt and edu_encrypt programs as follows:

% ./edu_keygen my_key.priv my_key.pub
Passphrase for my_key.priv:
                     Again:     
% yes "test" | head -1000 > a_file
% ./edu_encrypt my_key.pub a_file an_encrypted_file
% ./edu_decrypt my_key.priv an_encrypted_file a_decrypted_file 
Passphrase for my_key.priv:
% diff a_file a_decrypted_file
%
You may find the following function helpful to read in the passphrase:
#include <unistd.h>

char *getpass(const char *prompt);
Displays prompt to the user and reads a password typed at the terminal, turning off echo so that others cannot see the password typed. The password is returned as a pointer to a NULL-terminated ANSI C string.

Include in the handin directory a short text file called extra-credit with a description of the exact technique you used to implement this feature.

Collaboration policy

You must write all the code you hand in for the programming assignments, except for code that we give you as part of the assigment. You are not allowed to look at anyone else's solution. You may discuss the assignments with other students, but you may not look at or copy each others' code.

Hand-In Procedure

You must submit two files:

To build a software distribution, run the following commands (from the directory where your source files are located):

% cd ..
% tar cf edu.tar lab1/
% gzip edu.tar
%
(If the name of the directory containing your sources is not lab1, then substitute the appropriate name in the second command above.)

To create a script file, use the script command. When you run script, everything you type gets saved in a file called typescript. Press CTRL-D to finish the script. For example:

% script
Script started, output file is typescript
% ./edu_keygen my_key.priv my_key.pub 
% yes "test" | head -1000 > a_file
% ./edu_encrypt my_key.pub a_file an_encrypted_file
% ./edu_decrypt my_key.priv an_encrypted_file a_decrypted_file
% diff a_file a_decrypted_file
% 
% ...
% yes "test" | head -1015 > a_file
% ./edu_encrypt my_key.pub a_file an_encrypted_file
% ./edu_decrypt my_key.priv an_encrypted_file a_decrypted_file
% diff a_file a_decrypted_file
% 
% ^D
% ^D Script done, output file is typescript 
% 

To turn in your distribution and script file, copy the files edu.tar.gz and typescript to the directory ~class/handin/lab1/logname (where logname is your login name on the class machines):

% cp edu.tar.gz typescript ~class/handin/lab1/`logname`/
% 

If you have any problems with/question about the submission, please contact the TA or the instructor at email address

This completes the lab.


Useful Functions from the dcrypt Library

Below is a description of some of the functions implemented in the dcrypt library that you may find useful in completing the assignment. You will need to include the dcrypt.h header file to access these functions. You may also want to take a look at these sample programs (tst.c, tst_sha1.c) to see some of these functions in action.

Data serialization

Pseudo-Random Number Generation Functions

The libraries you are using contain a cryptographic pseudo-random number generator, whose state is kept in a global 16-byte array called prng_state. Before using the random number generator, you must initialize it.

Symmetric-Key Encryption Functions

For actually encrypting and decrypting file data, you will use the Rijndael [FIPS-197] block cipher (also called AES---Advanced Encryption Standard). Rijndael is a 128-bit block cipher. It supports two operations--encryption, and decryption. Encryption transforms 16 bytes (128 bits) of plaintext data into 16 bytes of ciphertext data using a secret key. Someone who does not know the secret key cannot recover the plaintext from the ciphertext. The decryption algorithm, given knowledge of the secret key, transforms ciphertext into plaintext.

The libraries you are using define a struct called aes_ctx that you should use to hold secret keys for AES. You should manipulate your AES secret keys with the following functions:

Public-Key Encryption Functions

If you only used symmetric-key cryptography, you would need to exchange a secret key with all the friends with whom you want to have confidential communication. With Public-Key Cryptography, instead, you can give all your friend the same public key, and they will be able to send you encrypted content that only you can recover. Similarly, you only need to know your friend's public key to create a ciphertext that only he/she will be able to decrypt. The libraries you are going to use contain an implementation of two very well-known Public-Key Encryption schemes: Rabin [Wil80] and ElGamal [ElG86] To complete the assignment, you don't need to know anything about how these schemes work, except for the interface that they provide, which is described below:

Cryptographic Hash Functions

The SHA-1 [FIPS-180-1] hash function hashes an arbitrary-length input (up to 2^64 bytes) to a 20-byte output. SHA-1 is known as a cryptographic hash function. While nothing has been formally proven about the function, it is generally assumed that SHA-1 is one-way and collision-resistant. These properties are defined as follows:

The libraries you are using contain an implementation of SHA-1.

Sometimes the input that you want to hash is so long that it is inconvenient to store it entirely in memory before being able to hash it. This is the case for example when hashing the entire content of a file into a short digest.
For this reason, the libraries you are using allow you to process a long input "one chunk at a time." To do that, you should use a struct called sha1_ctx, which will store the "partial digest" as you keep providing new input to be hashed. You should manipulate sha1_ctx structs with the following functions:

Message Authentication Codes (MACs)

Message Authentication Codes (MACs) are a symmetric-key primitive allowing you to check the integrity of the information to which the MAC is applied. Recall that encryption does not guarantee integrity! The fact that you were able to decrypt a ciphertext is not enough to be sure that nobody tampered with its content. For integrity, you should always append a MAC to the content.

You use MACs as follows. Let's say that you want to store a file on your file-server, but you are afraid that its content will be changed behind your back. Then, you use a secret key to "mac" the file, and store the resulting MAC along with the file. Now, when you check back with your file-server and retrieve your file, you will also retrieve the MAC that you appended. Then, you will use the secret key to compute the MAC again, and if the MAC you just computed is the same as the value that you retrieved from the file-server, then you are sure that nobody touched your file. This is because, if somebody had changed the file, then they should have computed the corresponding MAC in order to fool you. However, secure MACs are concocted such that, without knowing the secret key, it is computationally intractable to compute the right MAC, even after having seen a lot of valid (message, MAC) pairs.

The Keyed-Hash Message Authentication Code (HMAC) [FIPS-198a] is a secure Message Authentication Code based on the use of any cryptographic hash function, like SHA-1. The libraries you are using contain an implementation of HMAC, instantiated with the SHA-1 cryptographic hash function.

Similarly to what discussed for the case of SHA-1, the libraries you are using allow you to process a long input "one chunk at a time."

References

[ElG85] T. ElGamal, A public key cryptosystem and a signature scheme based on discrete logarithms.
IEEE Transactions on Information Theory, Vol. IT-31, No. 4, pp. 469--472, July 1985.
[FIPS-180-1] FIPS-180-1, Secure Hash Standard.
U.S. Department of Commerce/N.I.S.T., 1994
[FIPS-197] FIPS-197, Announcing the Advanced Encryption Standard.
U.S. Department of Commerce/N.I.S.T., 2001
[FIPS-198a] FIPS-198a, The Keyed-Hash Message Authentication Code (HMAC).
U.S. Department of Commerce/N.I.S.T., 2002
[Wil80] H. C. Williams, A Modification of the RSA Public-Key Encryption Procedure.
IEEE Transactions on Information Theory, Vol. IT-26, No. 6, November 1980.