Project 5: Memory-Mapped Encrypted Files

In this project you will implement memory-mapped access to data that is stored in encrypted files. Normally, code like this would be written inside the operating system kernel; for this project, we have used the Linux mmap facility to simulate features that are typically available in the kernel, so you can write your code at user level. The goal of this project is to teach you about the following concepts:

Project Overview

You will implement a class MCryptFile that provides the following methods:

Supporting Code

We have written several classes for you to use in implementing MCryptFile:

CryptFile

The CryptFile class provides basic mechanisms for reading and writing encrypted files (but not for memory-mapping them). Your MCryptFile class should be a subclass of CryptFile. The CryptFile class has the following methods:

VMRegion

The VMRegion class provides basic mechanisms for mapping pages into a region of virtual memory, taking page faults, and managing permissions. A VMRegion corresponds roughly to a contiguous range of page table entries for one process in an operating system. You will create one VMRegion object for each mapped MCryptFile. The VMRegion class is defined in the header file vm.hh and has the following methods:

In addition to these methods, VMRegion also exports a variable page_size, which contains the number of bytes in each page on the current machine. Page sizes are 4096 bytes on the myth cluster as well as Windows or MacBook laptops, but you should use the page_size variable to ensure portability; you can assume page_size always be an even power of two.

PhysMem and PPages

The PhysMem class provides a mechanism for allocating and freeing pages of physical memory. It implements PPage objects. A PPage is a token for a physical page, which you can pass to VMRegion::map. In addition, you can use a PPage to access the bytes of the page: a PPage is actually a char * pointer referring to the first byte of the page, and you can add offsets to this to access other values in the page. Unlike VPages, a PPage is always accessible and writable; references to it will never generate page faults. PPages are useful because they allow your MCryptFile implementation to access physical pages that may not currently be mapped, such as when transferring page contents to or from encrypted files. PPages are mapped into virtual memory by the PhysMem class, at virtual addresses different from those in VMRegions. This is an example of aliasing, where the same physical page can appear at multiple virtual addresses.

The PhysMem class is defined in the header file vm.hh and has the following methods:

Intrusive Containers

An intrusive container is a data structure such as a list or a map that contains data structures that were specifically designed to be included in the contained objects — the container “intrudes” into the design of the element type. By contrast, a non-intrusive container can contain any type meeting some basic requirements (such as being able to be moved). Because non-intrusive containers are more general, the C++ standard template library (STL) provides only non-intrusive containers. For example, you can create a std::list<int>, even though integers were not designed to be on a list and don’t contain a “next pointer.”

There are several reasons that operating systems often make use of intrusive data structures. One is that intrusive data structures require no memory allocation: all the links to insert an object in a list or map are already members of the object. By contrast, when inserting an element in a std::list<int>, the library must dynamically allocate memory for an object that contains the int as well as some next and previous pointers. Because dynamic memory allocation can fail, there are places in the operating system that cannot or should not be dynamically allocating memory, and hence cannot manipulate non-intrusive containers.

A second benefit of intrusive data structures is that the items are the iterators; this simplifies some operations. For example, suppose an object has been inserted in a list and you wish to remove the object from that list. If the list is nonintrusive and all you have is a pointer to the object, you must first get an iterator to the object; this will require searching list, which is expensive. With an intrusive list, the object is the iterator, so unlinking it can be done without searching the list.

We would like you to use intrusive containers in this project, and we have provided you an intrusive list in ilist.hh and an intrusive map in itree.hh. To use these containers, your structures must include a field of type ilist_entry or itree_entry, respectively. The type of the container specifies a member pointer to this field, rather than just the type of the structure. For example:

struct MyStruct {
    int val_;
    ilist_entry list_link_;
    itree_entry itree_link_;
};
ilist<&MyStruct::list_link_> my_list;
itree<&MyStruct::val_, &MyStruct::tree_link_> my_map;

Since the items are the iterators, most methods return pointers rather than references, where nullptr indicates a missing item (e.g., the end of a list). When you delete an object, the destructors of any “entry” fields automatically remove the object from any containers it might be in. (The flip side of this is that it is an error to delete a container that still contains elements.)

To make your life easier, we designed ilist and itree to resemble STL containers. In particular, ilist has the following methods, which behave just like the corresponding methods in std::list: back, begin, delete_all, empty, end, front, insert, pop_back, pop_front, push_back, push_front, remove, and remove_all. itree objects have the following methods, which behave just like the corresponding methods in std::map: begin, delete_all, empty, end, find, insert, lower_bound, operator[], pop_back, pop_front, push_back, push_front, remove, remove_all, and upper_bound.

There are also a few differences. In particular, since the items are the iterators, there is no real end() iterator. There’s a method end() that returns nullptr, just to make range-for syntax work, but you can’t walk an end() iterator backwards to find the last item. Use back() to get the last item in an ilist and max() to get the greatest element in an itree. itree also provides min() to get the smallest element.

Error handling

In general you should use exceptions to handle errors. It makes code more readable by segregating the error handling. More importantly, it reduces the number of occasions for bugs where you forget to check for an error return value. However, make sure that you don’t leak resources when an exception may be thrown. A good way to avoid leaking resources is to rely exclusively on destructors for reclaiming resources. For example, in the implementation of CryptFiloe we have a unique_fd that automatically closes a file descriptor on destruction.

The one place where you must not throw exceptions is inside your page fault handler, unless you also catch the exceptions in the handler. If an exception escapes past your page fault handler, VMRegion will crash the program.

Developing and Testing

To get started on this project, login to the myth cluster and clone the starter repo with this command:

git clone https://web.stanford.edu/class/cs111/starters/cmap.git cs111_p5

This will create a new directory cs111_p5. Do your work for the project in this directory. The files mcryptfile.hh and mcryptfile.cc contain a skeleton for all of the code you need to write. Add to the declarations in mcryptfile.h and fill in the bodies of the methods in mcryptfile.cc to implement the facilities described above. You can also additional methods or classes if needed to implement your solution.

The directory also contains a Makefile; if you type make, it will compile your code along with a test program, creating an executable test. You can then invoke ./run_tests map_tests, which will run a simple set of tests on your code. You can also invoke test with an argument specifying a particular test name; this will run a single test and print out its results. Invoke test with a bogus test name to print out all of the available test names.

As usual, we do not guarantee that the tests we have provided are exhaustive, so passing all of the tests is not necessarily sufficient to ensure a perfect score (CAs may discover other problems in reading through your code).

Miscellaneous Notes

Submitting Your Work

To submit your solution, cd to the directory containing your work and invoke the command

./submit

If you discover problems or improvements after you have submitted, you may resubmit; only the latest submit will be graded.