Distributed Systems Final project guidelines
The final lab assignment for the class is to undertake a mini research project of your choice in a group of 1-3 people.
Your project should be guided by the following deadlines:
By [Friday, Oct 20]{.due} you should form a project team of 1-3 people and email the course staff to let us know with whom you will be working.
Before [Friday, November 3]{.due} you should schedule a meeting of your team with the course staff to discuss your proposed project. Once your proposal has been approved, send the course staff a short (1 paragraph) description of what you want to do, which may be posted on the web site so everybody knows what the different projects are.
Please schedule your meeting sooner rather than later, in case your team needs to iterate on the proposal. Note, in particular, that it's fine to meet with us if you don't have a concrete plan yet, as we can suggest some things for you to look into.
Note: If you want to combine your project with your research or with work for another class, this is in general fine, but please let us know at the time of the proposal. Make sure any other instructor or research supervisor involved knows that you plan to do this.
By [3pm Wednesday, December 6]{.due} you must email the course staff a final title and list of project team members, with email subject "final project title".
From [12:30pm-6:30pm, Monday, December 11th]{.due} you will present your project to the class and demo what you have done. Your combined talk and demo should take no more than 10 minutes. You must also submit the source code to your project at this time. Note that this is a hard deadline, since we have to reserve a room for the presentations and submit grades for people.
By [noon Wednesday, December 13]{.due} you must email the course staff a paper describing and evaluating your project. The paper should be no more than 6 pages in at least 11-point font. We may post the papers on the class web site.
Here are some ideas you might be interested in for projects. This list is by no means exhaustive.
- Build a network object system for C++.
- Build something like Porcupine that addresses some of the paper's shortcomings.
- Distributed protocols such as 2PC and Paxos are (1) short, (2) really hard to get right because of failures and uncertainty. Build a simple system that takes an implementation of these protocols and systematically explores their behavior in the face of crashes and network partitioning. See here for an example of how to do this for file systems.
- Build a checking infrastructure than can plug into the many different RAFT implementations and find protocol errors. The nice trick you can use here is that you do not have to specify correctness: each of the protocols must do the same observable action given the same sequence of crashes, partitions, recoveries. You may want to look at what Kyle Kingsbury has done with Jepsen.
- Build a clean, simple implementation of view stamped replication based on the updated Liskov paper that can be dropped into distributed systems in a way analogous to RAFT.
- Raspberry/pi is a very popular embedded computing platform. Build a distributed system using r/pi nodes and some interesting cheap hardware. More ambitious: build a clean, simple "bare-metal" toolkit on r/pi that allows people to easily build such systems.
- Build a simple, automatic distributed-parallel make implementation. Most makefiles are broken with spurious dependencies (slow) and missing dependencies (incorrect). Fortunately you can infer true dependencies automatically: kick off an existing (broken) build, intercept every "open()" system call to see which other files a given file depends on (e.g., all the files it #includes). Build a lightweight distributed system that does parallel distributed builds using these dependencies.
- Build a large file store, like GFS, and possibly using RAID like Zebra.
- Build a scalable virtual disk like Petal. (Maybe built using the Intel Open Storage Toolkit).
- Build a simplified version of a synchronization service like Google's Chubby.
- Build something like MogileFS but instead of having a centralized database, replicate the DB using Paxos.
- Build a scalable web cache using consistent hashing.
- Build a highly-available, replicated DNS server that uses Paxos to ensure consistency of updates.
- Build a parallel debugger (ideally using some modification of GDB) that allows you to debug distributed systems. It should follow execution across message send and receive (analogously to procedure call/return).
- Build a distributed profiler that allows you to observe where time really goes in a distributed system. You should use it to spot bottlenecks in at least one existing distributed system.
- Build a system-call or message-level interposition library that can be slipped underneath an existing networked server and transparently be used to replicate these services so that they can survive failure and network partitioning. (Something similar but more complicated that what you would build: parrot.)
- Build a similar message-level interposition library that can be slipped underneath existing networked services and add security (nonces, secure checksums, encryption, authentication). Relevant: VPNs.
- Build a file synchronization tool like tra.
- Design an implement a Byzantine-fault-tolerant version of Raft.
- Build a raftscope-like visualization tool for a different protocol.