Project 7: Reading Unix V6 File Systems

Thanks to Mendel Rosenblum for this neat project!

Unix V6 was an early version of the Unix operating system. It was released in 1975 and was the first version of Unix to see widespread usage outside Bell Labs. Unix V6 was a beautiful piece of software: simple and very small (less than 64 KBytes of code!), yet with many of the most powerful features found in modern Linux systems. For this project, we have created images of Unix V6 disks in a format virtually identical to how they would have appeared in 1976. Your task for this project is to recreate enough of the V6 file system code to read files from the disk. You will work in C for this project, since that was the language used for Unix V6.

Getting Started

All work for this project should be done on the myth cluster machines. To get started, login to a myth machine and clone the starter repo with the following commands:

git clone https://web.stanford.edu/class/cs111/starters/v6fs.git cs111_p7
ln -s /afs/ir.stanford.edu/class/cs111/test_disks cs111_p7/test_disks

This will create a new directory cs111_p7 with a collection of source files you will modify, as well as access to (shared) disk images for testing. Do all your work in this directory.

If you are working on a machine without AFS, you can also download the test_disks files from https://web.stanford.edu/class/cs111/starters/test_disks.tar.gz.

Structure of the Code

The file system code is structured in five layers; you will write code to implement four of them. For each layer there is a header file that defines the interface to the layer and a corresponding .c file that implements the interface. Working from bottom to top, the layers are:

Block layer (diskimg.h and diskimg.c)
The lowest layer of the file system provides facilities for reading and writing individual disk blocks. In Unix V6, each block is a 512-byte sector, so the terms block and sector are used interchangeably in this project. In this project, the block layer reads and writes blocks from a disk image file rather than an actual disk, but the API is the same as it would be for an actual disk. The starter kit provides you with a complete implementation of this layer; you will use the function diskimg_readsector in your code (see the comments in diskimg.h for more information).

Inode layer (inode.h, inode.c)
This layer provides functions for finding inodes on disk and also for using the information in inodes to translate from logical block numbers within a file to physical block/sector numbers on disk. You will implement the functions inode_iget and inode_indexlookup.

File layer (file.h and file.c)
This layer provides a function file_getblock that reads a block of data from a file, given the file’s i-number and the desired block within the file. You will write the implementation for this function.

Directory layer (directory.h and directory.c) This layer encapsulates knowledge about the structure of directories. You will write the function directory_findname, which reads a directory file to find the directory entry corresponding to a given name.

Pathname layer (pathname.h, pathname.c) This layer provides a function pathname_lookup which looks up a path and returns the i-number for the file specified by the path. You will implement this function.

The starter code contains three additional header files describing Unix V6 structures: ino.h defines the internal structure of an inode, direntv6.h defines the structure of a directory entry, and filesys.h defines the layout of the file system superblock and provides several useful constants. These files are slightly modified copies of the corresponding header files in Unix V6.

Finally, the starter code contains several other files that provide infrastructure for building your code and running tests against it.

Building and Testing

To build the project, cd into the project directory and type

make

This will create an executable file diskimageaccess. To test out your code on one of the sample disk images, type

./diskimageaccess <options> test_disks/<image>

The <image> argument selects a disk image to read (basic, depthFile, or dirFnameSize) and <options> indicates which test to perform:

-i: test the inode and file layers. This test will read all of the inodes in order of i-number and print out the contents of the inode, along with a SHA-1 checksum that uniquely summarizes the contents of each file.
-p: test the directory and pathname layers. This test will walk the entire file system directory tree, starting at the root directory, and print out all of the paths that it finds, along with inode and checksum information for each file or directory found.

You can specify -ip to perform both tests. If you type the command

./run_tests

it will run each of the tests on each of the images and check the results against expected output. You can also see the expected results for each test in the directory test_results.

Submitting Your Work

To submit your solution, cd to the project directory, then invoke the command

./submit

If you discover problems or improvements after you have submitted, you may resubmit; only the latest submit will be graded.

Additional Information

Suggested Implementation Order

We suggest that you implement your solution from the bottom up:

inode_iget and inode_indexlookup in inode.c.
file_getblock in file.c. After this step, your code should pass the inode level functionality tests (./diskimageaccess -i).
directory_findname in directory.c.
pathname_lookup in pathname.c. At this point the pathname tests should pass (./diskimageaccess -p).

Legacy of an Old Machine

Back in the 1970s, storage space — both on disk and in main memory — was at a premium. As a result, the Unix v6 file system goes to lengths to reduce the size of data it stores. You’ll notice that many integer values in the structs we provided are stored using only 16 bits, rather than today’s more standard 32 or 64. (In our code we use the type int16_t from stdint.h to get a 16-bit integer, but back in the ’70s, the C int type was 16 bits wide.)

In another space-saving move, the designers of the file system stored the inode’s size field as a 24-bit integer. There’s no 24-bit integer type in C, so we represent this value using two fields in the inodestruct: i_size1, which contains the least-significant 16 bits of the value, and i_size0, which contains the most-significant 8 bits of the value. We provide a function inode_getsize in inode.c that assembles these two fields into a normal C integer for you.

The First Inode

Since there is no inode with an inumber of 0, the designers of the file system decided not to waste the 32 bytes of disk space to store it. The first inode in the first inode block has inumber 1; this inode corresponds to the root directory for the file system. (See unixfilesystem.h for details.)

Be careful not to assume that the first inode has an inumber of 0. Off-by-one errors are the worst.

Inode’s `i_mode`

The 16-bit integer i_mode in the inode struct isn’t really a number; rather, the individual bits of the field indicate various properties of the inode. ino.h contains #defines that describe what each bit means.

For instance, we say an inode is allocated if it points to an existing file. The most-significant bit (i.e. bit 15) of i_mode indicates whether the inode is allocated or not. So the C expression (i_mode & IALLOC) == 0 is true if the inode is unallocated and false otherwise.

Similarly, bit 12 of i_mode indicates whether the file uses the large file mapping scheme. So if (i_mode & ILARG)!= 0, then the inode’s i_addr fields point at indirect and doubly-indirect blocks rather than directly at the data blocks.

Bits 14 and 13 form a 2-bit wide field specifying the type of file. This field is 0 for regular files and 2 (i.e. binary 10, or the constant IFDIR) for directories. So the expression (i_mode & IFMT) == IFDIR is true if the inode is a directory, and false otherwise.

Type casting

Try to minimize the use of type casting. For example, when implementing inode_iget, past students have sometimes declared generic character buffers (e.g. char buffer[DISKIMG_SECTOR_SIZE]) to store the contents of a sector in the inode table, and then gone on to use type casting and manual pointer arithmetic to replicate the contents of an inode in the space addressed by inp.

Don’t do that. Instead, declare an array of struct inode records (you know that the contents of sectors in the inode table store inode records, right?) and pass the address of that array to diskimg_readsector. That way, you can manipulate the contents of a strongly typed array of struct inodes instead of a weakly typed array of chars. Stated differently, you don’t need to use the void * trickery you learned in CS107 here. You only go with void *, manual pointer arithmetic, and memcpy when you don’t know the data types that are being stored, copied, and passed around. In the case of inode_iget, you know the sectors you’re examining contain arrays of struct inodes.

Similarly, your implementation of directory_findname should declare an array of struct direntv6 instead of a character array of length 512, and you should be able to take advantage of the fact that an indirect block contains an array of 256 two-byte integers (uint16_t).

Miscellaneous tips

Your implementation of directory_findname should make use of strncmp instead of strcmp, because some of the stored directory names aren’t null-terminated (if the name is 14 characters long). They are, however, guaranteed to be of length 14 or less. Check out the man page for strncmp and strcmp so you understand the difference.
Take advantage of constants defined in the header files, such as INODE_START_SECTOR and ROOT_INUMBER in unixfilesystem.h. Don’t hard code numbers such as 2 and 1 in your code, even if it works. You may also find it useful to define a few new constants of your own.
You should not need to use dynamic memory allocation allocation (i.e. malloc, realloc, and free) in this project.
If your code detects error conditions (e.g., diskimg_readsector returns an error, or the dirinumber argument to directory_findname doesn’t refer to a directory, be sure to log information about the error using fprintf(stderr, ...).