Practical Byzantine Fault Tolerance
===================================

Goal of paper:  Deal with compromise through replication
  Use state machine replication... what's this?
    Assume all replicas start in the same state
    Every operation deterministically modifies state
    If all replicas agree on operations and order, will maintain same state
  At very high level:  Want to keep n replicas of the server
    Client sends request (directly or indirectly) to all servers
    Some replicas may experience "Byzantine" failure
      Means they will act in worst possible way -- controlled by attacker
      Could just stop responding, or might send garbage responses back 
    Idea:  If "small" number fail, should be able to keep going
    But if most servers okay, clients can notice and ignore the bad ones

Want two specific properties of this replicated system:
  linearizability:  System must appear as though
    - all requests execute sequentially in some particular order
    - the execution order of requests looks the same to all clients
    - non-overlapping requests execute in temporal order
  liveness:
    - clients eventually receive a reply to each request

This paper presents protocol for asynchronous systems... what are then?
  Def.:  Asynchronous system has bound on message delay
  Why do we care?
    Means you can't tell the difference between failed and slow replicas
    Attacker might intentionally disrupt network to delay messages
    Have to be careful about violating security with "time outs"

What is the minimum replicas that we will need to recover from f failures?
  3f+1.  Why?
  Because replicas must agree on order of operations...

Let's digress for a sec with a military analogy...  Byzantine generals
  Say you have three armies, three generals, G0, G1, G2.
  Can communicate with trustworthy messengers.
  They need to agree on whether to attack at dawn.
  But one of the generals might be a traitor.
  Two loyal generals needed for victory; defeat if only one attacks.

Straw man:
  In advance the generals designate G0 as the leader.
  G0 sends an order to each of G1 and G2.
  G1 and G2 follow the order.
  If G1 or G2 is the traitor, this procedure works:
    G0 and the non-traitor both attack (or both retreat).
  What if G0 is the traitor?
    G0 could tell G1 to attack, and G2 to retreat.

Can G1 and G2 detect G0's treachery by comparing notes?
  G1 and G2 tell each other what G0 told them.
  If G1 and G2 got the same order, they'll obey it.
    Otherwise both retreat.
  If G0 is the traitor, this procedure will detect it.
  But does it still work if G1 (or G2) is the traitor?
    Suppose G0 tells both to attack.
    G1 tells G2 "G0 told me to retreat".
    So G2 decides G0 is the traitor, and retreats, while G0 attacks.

So now let's say generals use digital signatures
  G0 sends digitally signed message to G1 & G2 saying attack or retreat
    G1 & G2 relay signed message from G0 to each other
    Now G1 can't tell G2 "G0 says retreat" w/o signed message from G0
    And G0 will be discovered for telling different things to G1 & G2
    So G1 and G2 will retreat
  What happens if G0 bad, sends G1 signed "attack", and sends G2 nothing?
    G1 still sends G0's signed attack message to G2...
    so G1 and G2 will still attack
  But... we're assuming messages will get there before dawn!
    Suppose G1's message to G2 gets delayed... G1 will attack alone
    What if G2 acks message to G1?  Same problem... ack could be delayed
  In general, this consensus problem is impossible with only three generals

Why are we talking about generals?
  Same problem happens in a replicated system
  E.g., imagine replicas: R0, R1, R2
  Must agree on order in which to order operations
    Suppose R0 tells R1 "order operation A before B"
    R1 relays message to R2, waits for ack
    R2 Does not respond
    Two possibilities:
      - R2 is a bad or failed replica and will not respond
      - R0 is a bad replica, send different messages to R1, R2,
           but R2 is slow (e.g., net partition) and just hasn't responded yet
    No way to tell the difference

Conclusion:  We need at least 4 replicas to survive a failure
  More generally:  Need 3f+1 replicas to survive f Byzantine failures

Here's the basic idea underlying this paper:
  You have 3f+1 replicas, of which f may have failed
  You can never wait for more than 2f+1 replicas to respond to a message
    Because the last f replicas might have failed and so will never reply
  But the last f replicas might be honest, just slow
  But waiting for 2f+1 replicas assures at least f of them are honest

Now how to determine order of operations?
  System goes through a series of views
    In view v, replica number v mod (3f+1) is designated primary
    Primary is responsible for selecting the order of operations
      Assigns an increasing sequence number to each operation
  In normal-case operation, use two-phase protocol for request r:
    Phase 1 (pre-prepare, prepare) goal:
      Ensure at least f+1 honest replicas agree that
        If request r executes in view v, will execute with sequence no. n
    Phase 2 (prepare, commit) goal:
      Ensure at least f+1 honest replicas agree that
        Request r has executed in view v with sequence no. n

Protocol for normal-case operation
  Let c be client
      r_i be replica i, or p primary, b_i backup i
      R set of all replicas

   c -> p:  m = {REQUEST, req, o, t, c}_Kc
   p -> R:  {PRE-PREPARE, v, n, d}_Kp, m     (note d = H(m))
  b_i -> R: {PREPARE, v, n, d, i}

  replica r_i now waits for PRE-PREPARE + 2f matching PREPARE messages
     puts these messages in its log
     then we say prepared (m, v, n, i) is TRUE

  Note:  If prepared (m, v, n, i) is TRUE for honest replica r_i
     then prepared (m', v, n, j) where m' != m FALSE for any honest r_j
     So no other operation can execute with view v sequence number n

  Are we done?  Just reply to client?  No.
    Just because some other m' won't execute at (v,n) doesn't mean m will
    Suppose r_i is compromised right after prepared (m, v, n, i)
    Suppose no other replica received r_i's prepare message
    Suppose f replicas are slow and never even received the PRE-PREPARE
    No other honest replica will know the request prepared!
    Particularly if p fails, request might not get executed!

  So we say operation doesn't executes until
    prepared (m, v, n, i) is TRUE for 2f+1 non-fault replicas r_i
    We say committed (m, v, n) is TRUE when this property holds

  So how does a replica *know* committed (m, v, n) holds?
    Add one more message:

  r_i -> R: {COMMIT, v, n, d, i}

  replica r_i waits for 2f+1 identical COMMIT messages
    committed-local (m, v, n, i) is TRUE when:
      prepared (m, v, n, i) is TRUE, and
      r_i has 2f+1 matching commits in its log (including r_i's own)

  Note:  If committed-local (m, v, n, i) is TRUE for some non-faulty r_i
    Then means committed (m, v, n) is TRUE.
    r_i knows when committed-local is TRUE
    So committed-local is a replica's way of knowing when committed is TRUE

r_i replies to client when committed-local (m, v, n, i) is TRUE
  Client waits for f+1 matching replies, then returns to client
  Why f+1 and not 2f+1?
    Because of f+1, at least one replica r_i is non-faulty
    So client knows committed-local (m, v, n, i)
    Which in turn implies committed (m, v, n)

Checkpoints

View changes

Discussion
  what problem does BFS solve?
  - is IS going to run BFS to deal with byzantine failures?
  - what failures are we talking about?
    compromised servers
  - what about compromised clients?
    authentication and authorization
  how can we extend the system to allow for more than (n-1)/3
  failures over its lifetime?
  - detect failed replicas using  proactive recovery 
    - recover the system periodically, no matter what
    - makes bad nodes good again
  - tricky stuff
    - an attacker might steal compromised replica's keys
- with how many replicas will BFS work reasonably well?