Coral
=====

Midterm results
Lab 2 eliminated/folded into project; find partners, meet with us
Welcome to second half of course
  Now, from replication to partitioning...

Consistent hashing
  Say you want to cache web pages on a number of machines
    Hash URL, and chose machine based on hash:  e.g., machine = hash % nservers
  What happens when you add and remove machines?
    If URL -> server mapping changes completely, most cache data invalid
  What happens if a client hasn't yet heard of a new machine
    Could use entirely wrong hash function
  Consistent hashing:  Only invalidate data in proportion to reconfiguration
    Add one new server, should invalidate 1/N of data

How to implement consistent hashing?
  Give each node a number of randomly chosen IDs (e.g., 160-bit number)
  compute key = Hash(URL), which is also in same ID space
  Approach 1:  Map IDs onto circle [Chord]
    Store URL at node whose ID follows key on circle (successor)
    [Go over what happens when you add or remove a node]
  Approach 2:  Arrange node IDs in a tree of height 160 [Kademlia]
    Start with most significant bit of URL hash (key)
    Go down tree choosing left or right branch based on next bit of key
      If only one branch, just chose that one
    In other words, find node whose ID minimizes  key XOR ID
    [Go over what happens when you add or remove a node]

Problem: neighbor of failed node may bear brunt of load spike
  How to distribute failed node's load more evenly?
  Give each node multiple "virtual IDs"
  Key's close to failed node's multiple virtual IDs will be spread out
  Note: in heterogeneous environment, more virtual IDs for beefier machines

Another approach: CARP [http://icp.ircache.net/carp.txt], roughly:
  For each server ID id_1, id_2, ..., compute h_i := H(id_i, URL)
  Sort the h_i, and take the server i with the highest "score" h_i
  In heterogeneous environments, can also scale h_i by load
  Advantages of CARP:
    No longer need multiple virtual IDs
    Servers can decline requests under high load, next nodes spread out

Now, say you want to build a gigantic data cache
  So big each machine only knows about a small fraction of other machines
  So big that nodes come and go all the time--nodes owned by many people
  Napster approach--centralized server coordinates, can be shut down
  Gnutella approach--ad-hoc configuration, broadcast queries very expensive

Better approach:  Use consistent hashing for *routing* as well
  Store data as before on "closest" node to key in ID space
    Chord:  closest node closest clockwise around circle
    Kademlia:  closest node is closest treating ID XOR as distance
  Build routing tables to allow ID-space navigation
    Each node knows about ID-space neighbors
    Perhaps each node knows a few farther-away nodes
      To move long distances quickly
  For n nodes, can do this with log n RPCs and log n state at each node
  These data structures are called distributed hash tables (DHTs)

How would you do this in Chord circle?
  Each node must know next node around circle (successor)
  Know node should cache node almost 1/2 way around circle, 1/4 way, etc.
    This is called the "finger table"
  So ask closest node you know about for its routing table
How would you do this in Kademlia tree?
  If a node's (base 2) id is b_0 b_1 b_2 ..., the node must know:
    - A node with first bit ~b_0  (other half of tree)
    - A node with prefix b_0 ~b_1 (other quarter in this half of tree)
    - A node with prefix b_0 b_1 ~b_2
    - etc.
  Then you can get one bit closer to key with each RPC
  (Note: To handle unbalanced trees, each node actually must
   know all nodes under some point in the tree with two children.)

In a very large system, machines will be failing all the time
  If N machines and a constant fraction fail, then in subsets of
         size < O(log(N)) stand good chance of all nodes failing
  Pick "replication factor" k-sized set unlikely to fail simultaneously
  In Chord, want to avoid "breaking" the circle
    So each node must know k successors around circle, in case k-1 fail
    Can always re-build finger tables as long as circle not broken
  In Kademlia, should know up to k nodes with each prefix
         ~b_0..., b_0 ~b_1..., b_0 b_1 ~b_2...
    I.e., k nodes in other half of tree, other quarter of this half, etc.
  Must do *maintenance* to ensure you learn of dead nodes
    Note:  Advantage of Kademlia is symmetry
      You receive RPCs from nodes in same distribution as you send them to
      So will often already know that nodes in routing table are alive
    Chord goes clockwise around circle, so incoming RPCs less useful

Optimization
  Note Kademlia lets you contact *any* node in appropriate subtree
  So for better performance, cache nodes with low round-trip-time (RTT)
  Note: failure causes RPC timeouts, which can seriously hurt latency
    Can trade off bandwidth for latency w. concurrent requests

Digression:  Bit shifting vs. bit correcting
  W. bit shifting, can actually have constant-size routing table
    E.g., node with ID x knows nodes with IDs 2x and 2x+1
  But makes failure recovery and optimization harder

Note Kadmlia is used by trackerless bit torrent
  Seems to scale to 1,000,000+ nodes

So what was indented use of CoralCDN?
  Most web sites have peak usage many times average
  Pool resources to absorb flash crowds, solving "slashdot problem"
  Easy to use--just append ".nyud.net" to URLs
  Ideally avoid contributing to load spike by sharing data among cache servers
  Can we solve this in a simpler way?

Straw-man:  put web objects into Kademlia
  compute key = Hash(URL) = key, value = contents of web object
    Store <key, value> in Kademlia
  Just shifted problem; nodes in Kademlia now subject to flash crowds
  So at a minimum, should just use Kademlia for metadata, not data

Straw-man2:  Network of web proxies
  When you have a URL, store <Hash(URL), my address> in Kademlia
  When you want a URL, look up Hash(URL) in Kademlia, ask proxy
  If no one has it, then get data from origin web server
  Problems?
    1. Could require slow RPCs to find nearby copies
       E.g., USA node must query Asia node to fine USA copy of data
    2. Could overload Kademlia nodes just with metadata requests
       Imagine everyone storing the same key to Kademlia
    3. If many copies, which one(s) should you download from?
    4. How to map clients to nearby proxies?
       E.g., sending client in Asia to proxy in USA very bad for performance

How did CoralCDN addresses problems?
  1. Join multiple DHTs with different latency thresholds (hysteresis p. 13)
     Always begin search in low-latency cluster
     Even if low-latency cluster misses, got you closer to target kademlia ID
  2. Nodes "fill up" with references, so spill to previous nodes in search
     Stop Kademlia lookup when you find a node that has reference to URL
  3. Searching first in low-latency cluster anyway
     So likely to hit pointer stored by node in your cluster
  4. Crazy DNS tricks that leverage clusters (paper doesn't have details)

Why does Coral have an atomic put+get operation (p. 7)?
  Goal: Minimize load on origin server even from proxies
  Want cut-through routing (p. 3)
  
How real was the slashdot problem?
  What does Figure 12 tell us about design?
  Flash crowds not so abrupt
  Maybe too much emphasis on shielding origin server from proxies?

What is CoralCDN actually used for (p. 5), and how well does it fit use?
  - "long-term durability" - not great fit
  - anonymity/censorship avoidance - not great fit
  - long-term popular content such as adblockplus rules - overkill
  - flash crowds - works well (though crowds not as abrupt as anticipated)

What are security issues and what does coral do?
  Sending spam, etc.
    Only support GET and HEAD requests
    Mark requests with User-Agent and X-Forwarded-For headers
  Resource exhaustion
    Block sites for excessive bandwidth or # requests (in redirect loops)
    Also maximum file size of 50MB
  IP-based access control
    Global blacklist
    But fast-flux-like attacks where evil.org -> acm.org
      Need to convince sites to check the Host: header
  Cookies (sites sloppy about providing access to nyud.net)
    Stripping cookie headers is only a partial solution because of Javascript

What would be different if CoralCDN designed to run on < 1000 servers?
  Don't need DHTs, can do one-hop routing (globally, or in disjoint clusters)
  "Metadata flash crowds" a non-issue (at most 1000 requests)

What prevents CoralCDN from running on 1,000,000+ servers?
  Need adopters (but could plausibly get people running software)
  System finally is closed (requires crypto key to join).  Why?
    Biggest reason:  No way to do content authentication