Distributed Systems Final project guidelines

The final lab assignment for the class is to undertake a mini research project of your choice in a group of 1-4 people.

Your project should be guided by the following deadlines:

By Monday, April 13 you must form a project team of 1-4 people and email the course staff to let us know with whom you will be working.
Before Monday, April 20 you should schedule a meeting of your team with the course staff to discuss your proposed project. Once your proposal has been approved, send the course staff a short (1 paragraph) description of what you want to do, which may be posted on the web site so everybody knows what the different projects are.

Please schedule your meeting sooner rather than later, in case your team needs to iterate on the proposal. Note, in particular, that it’s fine to meet with us if you don’t have a concrete plan yet, as we can suggest some things for you to look into.

Note: If you want to combine your project with your research or with work for another class, this is in general fine, but please let us know at the time of the proposal. Make sure any other instructor or research supervisor involved knows that you plan to do this.
By Friday, May 29 you must email the course staff a final title and list of project team members, with email subject “final project title”, including a link to the git repository for your source code and a one paragraph abstract of your project (which can be the same as your proposal, but things may have changed).
From June 3-June 10 you will present your project to the class and demo what you have done.
On June 11 you must send the course staff an email with subject “final project submission” containing A paper describing and evaluating your project that is no more than 6 pages in at least 11-point font. Papers and git repositories will be published on the class web site.

Note that this is a hard deadline since we need to submit grades for people. Note further that use of git is mandatory since this is how we can evaluate individual members’ contributions to a project.

You can use whatever programming language you and your partners want for the project. Two good choices are C++ and go. You may want to use Stanford’s shared computing cluster. Registring for the class should automatically bump your disk quota by 1 GB. You are also free to use cloud services, and can even use your stanford.edu email address to get some free computing credits on AWS and GCP.

Here are some ideas you might be interested in for projects. This list is by no means exhaustive.

Build better tools for remote collaboration (text, voice, or video chat).
Build a network object system for C++.
Build something like Porcupine that addresses some of the paper’s shortcomings.
Distributed protocols such as 2PC and Paxos are (1) short, (2) really hard to get right because of failures and uncertainty. Build a simple system that takes an implementation of these protocols and systematically explores their behavior in the face of crashes and network partitioning. See here for an example of how to do this for file systems.
Build a checking infrastructure than can plug into the many different RAFT implementations and find protocol errors. The nice trick you can use here is that you do not have to specify correctness: each of the protocols must do the same observable action given the same sequence of crashes, partitions, recoveries. You may want to look at what Kyle Kingsbury has done with Jepsen.
Build a clean, simple implementation of view stamped replication based on the updated Liskov paper that can be dropped into distributed systems in a way analogous to RAFT.
Raspberry/pi is a very popular embedded computing platform. Build a distributed system using r/pi nodes and some interesting cheap hardware. More ambitious: build a clean, simple “bare-metal” toolkit on r/pi that allows people to easily build such systems.
Build a simple, automatic distributed-parallel make implementation. Most makefiles are broken with spurious dependencies (slow) and missing dependencies (incorrect). Fortunately you can infer true dependencies automatically: kick off an existing (broken) build, intercept every “open()” system call to see which other files a given file depends on (e.g., all the files it #includes). Build a lightweight distributed system that does parallel distributed builds using these dependencies.
Build a large file store, like GFS, and possibly using RAID like Zebra.
Build a scalable virtual disk like Petal. (Maybe built using the Intel Open Storage Toolkit).
Build a simplified version of a synchronization service like Google’s Chubby.
Build something like MogileFS but instead of having a centralized database, replicate the DB using Paxos.
Build a scalable web cache using consistent hashing.
Build a highly-available, replicated DNS server that uses Paxos to ensure consistency of updates.
Build a parallel debugger (ideally using some modification of GDB) that allows you to debug distributed systems. It should follow execution across message send and receive (analogously to procedure call/return).
Build a distributed profiler that allows you to observe where time really goes in a distributed system. You should use it to spot bottlenecks in at least one existing distributed system.
Build a system-call or message-level interposition library that can be slipped underneath an existing networked server and transparently be used to replicate these services so that they can survive failure and network partitioning. (Something similar but more complicated that what you would build: parrot.)
Build a similar message-level interposition library that can be slipped underneath existing networked services and add security (nonces, secure checksums, encryption, authentication). Relevant: VPNs.
Build a file synchronization tool like tra.
Design an implement a Byzantine-fault-tolerant version of Raft.
Build a raftscope-like visualization tool for a different protocol.
Formally model and verify a consensus protocol, e.g., using TLA⁺ or IVy.
Build a mobile-phone based privacy-preserving contact-tracing system for tracking the spread of infectious disease.