Distributed Storage Systems Final project guidelines

The final lab assignment for the class is to undertake a mini research project of your choice in a group of 1-3 people.

Your project should be guided by the following deadlines:

By Friday, February 17 you should form a project team of 1-3 people (preferably 2) and email the course instructor to let me know with whom you will be working.
Before Wednesday, February 22 you should schedule a meeting of your team with me (David Mazières) to discuss your proposed project. Once I have approved your proposal, send me a short (1 paragraph) description of what you want to do, which I might post on the web site so everybody knows what the different projects are.
Please schedule your meeting sooner rather than later, in case your team needs to iterate on the proposal. Note, in particular, that it's fine to meet with me if you don't have a concrete plan yet, as I can suggest some things for you to look into.
Note: If you want to combine your project with your research or with work for another class, this is in general fine, but please let me know at the time of the proposal.
By March 22 you must email me a paper describing and evaluating your project. The paper should be no more than 10 pages in at least 11-point font. I may post the papers on the class web site.
At 12:15pm Thursday, March 23, you will present your project to the class and demo what you have done. Your combined talk and demo should take no more than 20 minutes. You must also submit the source code to your project at this time. Note that this is a hard deadline, since I have to reserve a room for the presentations and submit grades for people.

Here are some ideas you might be interested in for projects. This list is by no means exhaustive.

Implement a distributed file system in which the server supports volume migration with copy-on-write snapshots, like AFS.
Build a replicated storage server.
Make some modifications to the NFS3 protocol to support better performance over high-latency, wider area networks and scalability to many clients. Implement the modified protocol in an open-source operating system kernel, or at user-level using the xfs device driver from the ARLA AFS implementation.
Build a distributed file system layered on top of a simple block store abstraction (like Frangipani on Petal). Feel free to look at other classes' labs for some ideas on how to do this.
Modify OpenAFS to add features like security and maybe automounting self-certifying pathnames.
Build a distributed file system in which clients on a LAN cooperate to share their buffer caches, since network accesses are much faster than disk seeks.
Add support for integrity checking or group-readable files to CCFS.
Add support for compressing plaintext files before encrypting them in CCFS.
Design and implement a new buffer cache management scheme. Evaluate it's performance on various workloads (e.g., large databases), compared to regular Linux or BSD.
Implement a file system that periodically (or consistently) snapshots users' files, so they can recover from accidental deletion.
Design and implement a kernel-level file system that takes advantage of rarely used hardware functionality (maybe the ability to have non-standard sector sizes).
Build a file-system front-end to another data access system, like the web, or a version control system.
Build a cooperative backup system (see, for instance, Pastiche).
Build an append-only (or append-mostly) file system that elides redundant data to make it reasonable to dump backups on to optical medial like DVD+R on a daily basis. See Venti and Fossil for some ideas on this.
Build or modify an existing file system to make it re-organize data placement to optimize performance. For instance, if a particular HTML file is always accessed in conjunction with several image files, they might be located near each other to reduce latency. Data that often causes buffer cache misses might even be replicated on several parts of the disk.
Build a storage infrastructure for accessing massive data sets in distributed computations over the wide area network.
Extend an existing file system to make NFS service faster through a small amount of non-volatile RAM.
Design and implement a convincing file system benchmark.
Design and implement a user-level file reconciliation protocol that merges changes to distributed copies of a directory tree.
Build a facility for persistent objects in C, C++ or some other high-level language. Use write-ahead logging to survive crashes. Optionally, you might extend the system to work across the network (perhaps using NFS combined with some user-level locking protocol).
Build a storage system that scales to many readers. For example, you might want to have clients upload data to each other, as in BitTorrent. Or you might target media files, like radio or TV shows, in which case users will want to play the file while downloading, and BitTorrent's random download order will not work.

Note, that if you want to do kernel-level work, (for instance for a performance evaluation), we have a limited amount of lab space and equipment you can use.