Distributed Systems Final project guidelines

The final lab assignment for the class is to undertake a mini research project of your choice in a group of 1-3 people.

Your project should be guided by the following deadlines:

By Friday, Oct 31 you should form a project team of 1-3 people and email the course staff to let us know with whom you will be working.
Before Friday, November 7 you should schedule a meeting of your team with the course staff to discuss your proposed project. Once your proposal has been approved, send the course staff a short (1 paragraph) description of what you want to do, which might be posted on the web site so everybody knows what the different projects are.
Please schedule your meeting sooner rather than later, in case your team needs to iterate on the proposal. Note, in particular, that it's fine to meet with us if you don't have a concrete plan yet, as we can suggest some things for you to look into.
Note: If you want to combine your project with your research or with work for another class, this is in general fine, but please let us know at the time of the proposal.
By Thursday, December 4 you must email the course staff a paper describing and evaluating your project. The paper should be no more than 6 pages in at least 11-point font. We may post the papers on the class web site.
At 7:00pm-10:00pm, Thursday, December 11th (possibly starting earlier for people who can make it), you will present your project to the class and demo what you have done. Your combined talk and demo should take 10-20 minutes depending on the number of groups. You must also submit the source code to your project at this time. Note that this is a hard deadline, since we have to reserve a room for the presentations and submit grades for people.

Here are some ideas you might be interested in for projects. This list is by no means exhaustive.

Build a network object system for C++.
Build something like Porcupine that addresses some of the paper's shortcomings.
Distributed protocols such as 2PC and Paxos are (1) short, (2) really hard to get right because of failures and uncertainty. Build a simple system that takes an implementation of these protocols and systematically explores their behavior in the face of crashes and network partitioning. See here for an example of how to do this for file systems. (I think this project could lead to a conference paper -DRE.)
Build a checking infrastructure than can plug into the many different RAFT implementations and find protocol errors. The nice trick you can use here is that you do not have to specify correctness: each of the protocols must do the same observable action given the same sequence of crashes, partitions, recoveries. You may want to look at what Kyle Kingsbury has done with Jepsen. (I think this project could lead to a conference paper -DRE.)
Build a clean, simple implementation of view stamped replication based on the updated Liskov paper that can be dropped into distributed systems in a way analogous to RAFT.
Raspberry/pi is a very popular embedded computing platform. Build a distributed system using r/pi nodes and some interesting cheap hardware. More ambitious: build a clean, simple "bare-metal" toolkit on r/pi that allows people to easily build such systems.
Build a simple, automatic distributed-parallel make implementation. Most makefiles are broken with spurious dependencies (slow) and missing dependencies (incorrect). Fortunately you can infer true dependencies automatically: kick off an existing (broken) build, intercept every "open()" system call to see which other files a given file depends on (e.g., all the files it #includes). Build a lightweight distributed system that does parallel distributed builds using these dependencies.
Build a large file store, like GFS, and possibly using RAID like Zebra.
Build a scalable virtual disk like Petal. (Maybe built using the Intel Open Storage Toolkit).
Build a simplified version of a synchronization service like Google's Chubby.
Build something like MogileFS but instead of having a centralized database, replicate the DB using Paxos.
Build a scalable web cache using consistent hashing.
Build a highly-available, replicated DNS server that uses Paxos to ensure consistency of updates.
Build a parallel debugger (ideally using some modification of GDB) that allows you to debug distributed systems. It should follow execution across message send and receive (analogously to procedure call/return).
Build a distributed profiler that allows you to observe where time really goes in a distributed system. You should use it to spot bottlenecks in at least one existing distributed system.
Build a system-call or message-level interposition library that can be slipped underneath an existing networked server and transparently be used to replicate these services so that they can survive failure and network partitioning. (Something similar but more complicated that what you would build: parrot.)
Build a similar message-level interposition library that can be slipped underneath existing networked services and add security (nonces, secure checksums, encryption, authentication). Relevant: VPNs.
In the old days, people who wanted to play the same music in different rooms of their house had to run cables through their walls. Downsides = obvious. A reasonable alternative now is to put a music server in each room and synchronize their playlists. Build a system (e.g., based on MPD) that does this in a simple way. The challenge here is to robustly, simply synchronize the play commands and transparently syncing music content by sending the fewest bits possible, possibly by using some of the tricks in LBFS, Shark, or Ori. (I think this is something that would be widely-used if done right. -DRE)