Thanks to Mendel Rosenblum for this neat project!
Unix V6 was an early version of the Unix operating system. It was released in 1975 and was the first version of Unix to see widespread usage outside Bell Labs. Unix V6 was a beautiful piece of software: simple and very small (less than 64 KBytes of code!), yet with many of the most powerful features found in modern Linux systems. For this project, we have created images of Unix V6 disks in a format virtually identical to how they would have appeared in 1976. Your task for this project is to recreate enough of the V6 file system code to read files from the disk. You will work in C for this project, since that was the language used for Unix V6.
All work for this project should be done on the myth cluster machines. To get started, login to a myth machine and clone the starter repo with the following commands:
git clone https://web.stanford.edu/class/cs111/starters/v6fs.git cs111_p7 ln -s /afs/ir.stanford.edu/class/cs111/test_disks cs111_p7/test_disks
This will create a new directory
cs111_p7 with a collection of source files you will modify, as well as access to (shared) disk images for testing. Do all your work in this directory.
If you are working on a machine without AFS, you can also download the
test_disks files from https://web.stanford.edu/class/cs111/starters/test_disks.tar.gz.
Structure of the Code
The file system code is structured in five layers; you will write code to implement four of them. For each layer there is a header file that defines the interface to the layer and a corresponding
.c file that implements the interface. Working from bottom to top, the layers are:
Block layer (
The lowest layer of the file system provides facilities for reading and writing individual disk blocks. In Unix V6, each block is a 512-byte sector, so the terms block and sector are used interchangeably in this project. In this project, the block layer reads and writes blocks from a disk image file rather than an actual disk, but the API is the same as it would be for an actual disk. The starter kit provides you with a complete implementation of this layer; you will use the function
diskimg_readsector in your code (see the comments in
diskimg.h for more information).
Inode layer (
This layer provides functions for finding inodes on disk and also for using the information in inodes to translate from logical block numbers within a file to physical block/sector numbers on disk. You will implement the functions
File layer (
This layer provides a function
file_getblock that reads a block of data from a file, given the file’s i-number and the desired block within the file. You will write the implementation for this function.
Directory layer (
directory.c) This layer encapsulates knowledge about the structure of directories. You will write the function
directory_findname, which reads a directory file to find the directory entry corresponding to a given name.
Pathname layer (
pathname.c) This layer provides a function
pathname_lookup which looks up a path and returns the i-number for the file specified by the path. You will implement this function.
The starter code contains three additional header files describing Unix V6 structures:
ino.h defines the internal structure of an inode,
direntv6.h defines the structure of a directory entry, and
filesys.h defines the layout of the file system superblock and provides several useful constants. These files are slightly modified copies of the corresponding header files in Unix V6.
Finally, the starter code contains several other files that provide infrastructure for building your code and running tests against it.
Building and Testing
To build the project,
cd into the project directory and type
This will create an executable file
diskimageaccess. To test out your code on one of the sample disk images, type
./diskimageaccess <options> disk_images/<image>
<image> argument selects a disk image to read (
<options> indicates which test to perform:
-i: test the inode and file layers. This test will read all of the inodes in order of i-number and print out the contents of the inode, along with a SHA-1 checksum that uniquely summarizes the contents of each file.
-p: test the directory and pathname layers. This test will walk the entire file system directory tree, starting at the root directory, and print out all of the paths that it finds, along with inode and checksum information for each file or directory found.
You can specify
-ip to perform both tests. If you type the command
it will run each of the tests on each of the images and check the results against expected output. You can also see the expected results for each test in the directory
Submitting Your Work
To submit your solution,
cd to the project directory, then invoke the command
If you discover problems or improvements after you have submitted, you may resubmit; only the latest submit will be graded.
Suggested Implementation Order
We suggest that you implement your solution from the bottom up:
file.c. After this step, your code should pass the inode level functionality tests (
pathname.c. At this point the pathname tests should pass (
Legacy of an Old Machine
Back in the 1970s, storage space — both on disk and in main memory — was at a premium. As a result, the Unix v6 file system goes to lengths to reduce the size of data it stores. You’ll notice that many integer values in the structs we provided are stored using only 16 bits, rather than today’s more standard 32 or 64. (In our code we use the type
stdint.h to get a 16-bit integer, but back in the ’70s, the C
int type was 16 bits wide.)
In another space-saving move, the designers of the file system stored the inode’s size field as a 24-bit integer. There’s no 24-bit integer type in C, so we represent this value using two fields in the inodestruct:
i_size1, which contains the least-significant 16 bits of the value, and
i_size0, which contains the most-significant 8 bits of the value. We provide a function
inode.c that assembles these two fields into a normal C integer for you.
The First Inode
Since there is no inode with an inumber of 0, the designers of the file system decided not to waste the 32 bytes of disk space to store it. The first inode in the first inode block has inumber 1; this inode corresponds to the root directory for the file system. (See
unixfilesystem.h for details.)
Be careful not to assume that the first inode has an inumber of 0. Off-by-one errors are the worst.
The 16-bit integer
i_mode in the
inode struct isn’t really a number; rather, the individual bits of the field indicate various properties of the inode.
#defines that describe what each bit means.
For instance, we say an inode is allocated if it points to an existing file. The most-significant bit (i.e. bit 15) of
i_mode indicates whether the inode is allocated or not. So the C expression
(i_mode & IALLOC) == 0 is true if the inode is unallocated and false otherwise.
Similarly, bit 12 of
i_mode indicates whether the file uses the large file mapping scheme. So if
(i_mode & ILARG)!= 0, then the inode’s
i_addr fields point at indirect and doubly-indirect blocks rather than directly at the data blocks.
Bits 14 and 13 form a 2-bit wide field specifying the type of file. This field is 0 for regular files and 2 (i.e. binary 10, or the constant
IFDIR) for directories. So the expression
(i_mode & IFMT) == IFDIR is true if the inode is a directory, and false otherwise.
Try to minimize the use of type casting. For example, when implementing
inode_iget, past students have sometimes declared generic character buffers (e.g.
char buffer[DISKIMG_SECTOR_SIZE]) to store the contents of a sector in the inode table, and then gone on to use type casting and manual pointer arithmetic to replicate the contents of an inode in the space addressed by
Don’t do that. Instead, declare an array of
struct inode records (you know that the contents of sectors in the inode table store inode records, right?) and pass the address of that array to
diskimg_readsector. That way, you can manipulate the contents of a strongly typed array of
struct inodes instead of a weakly typed array of
chars. Stated differently, you don’t need to use the
void * trickery you learned in CS107 here. You only go with
void *, manual pointer arithmetic, and
memcpy when you don’t know the data types that are being stored, copied, and passed around. In the case of
inode_iget, you know the sectors you’re examining contain arrays of
Similarly, your implementation of
directory_findname should declare an array of
struct direntv6 instead of a character array of length 512, and you should be able to take advantage of the fact that an indirect block contains an array of 256 two-byte integers (
- Your implementation of
directory_findnameshould make use of
strcmp, because some of the stored directory names aren’t null-terminated (if the name is 14 characters long). They are, however, guaranteed to be of length 14 or less. Check out the man page for
strcmpso you understand the difference.
- Take advantage of constants defined in the header files, such as
unixfilesystem.h. Don’t hard code numbers such as 2 and 1 in your code, even if it works. You may also find it useful to define a few new constants of your own.
- You should not need to use dynamic memory allocation allocation (i.e.
free) in this project.
- If your code detects error conditions (e.g.,
diskimg_readsectorreturns an error, or the
directory_findnamedoesn’t refer to a directory, be sure to log information about the error using