HotStuff (2019)
===============

Administrivia: class poll on camera vs. screen share

PBFT is live under *weak synchrony*
  Network/node delays grow at a bounded rate (e.g., polynomially)
  Keep doubling timeout until eventually all nodes in same view for long enough
Another model: partial synchrony
  Initially, things may be very crazy
  Eventually reach some global stabilization time GST
  After GST, nodes can communicate within some timebound Delta
  The catch: don't know GST or Delta (so Byz. generals still not safe)
Two models have same power
  Increasing estimate of Delta is like increasing timeouts in PBFT

Blockchain digression:  Say we replace credit card #s w. public keys?
  Hold private key on cell phone
  Sign each transaction with amount, recipient, comment
  What would this fix?
    Credit card numbers would be harder to steal
    Lower fraud could lead to lower (but still non-zero) transaction fees
    Could also lower fraud-detection false positives
    Could give merchants protection (eliminate "card not present" transactions)
  How does this compare to cash?
    Worse privacy, third party trust, bank control, transaction charges
  Bitcoin attempts to get rid of the bank...
    Transfer ownership of digital BTC between different public keys
  Must solve 2 problems for this model to work:
    1. How do you distribute Bitcoin so people believe it has value?
    2. How do you prevent someone from double-spending a Bitcoin?
       Public key A is worth 1 BTC
       A pays 1 BTC to B, walks away with some goods
       A tries to pay 1 BTC to C, walk away with more goods
    Note that problem #2 is basically consensus--"timestamp server"
       With timestamp server, everyone agrees A->C message later than A->B
       Hence A->C is not valid, because A is already spent
       Keep ordered transaction history as a giant replicated state machine

How Bitcoin do consensus?  Proof-of-work mining
  Flood new transactions
  History progresses in blocks of new transactions:
    block = { H(previous-block), nonce, H*(new-transactions) }
    Note inclusion of previous block makes this a *block chain*
  By convention, only accept block if H(block) < target-value
    Currently H(block) must start with >=76 0 bits.
    So must try an expected 75 hextillion (75*10^21) nonces to find good one
  Finding these nonces, known as "mining", is computationally very hard
  So incentivize finding the next block with bitcoins!
    First transaction in block is special "coinbase" transaction
    Creates 12.5 (soon to be 6.25) new BTC paid to block miner
        https://bitcoin.org/en/developer-reference#serialized-blocks
  Note: adjust target-value as more miners come on-line
    In practice, new blocks mined every 7-10 minutes or so
  Given two incompatible block chains, take one with most work (~tallest)
  Note we've just solved both the coin distribution and consensus problems!
How does this solve the double-spending problem?
  Well-behaved nodes ordinarily shouldn't fork block chain
  Assume ill-behaved nodes have less aggregate compute than well-behaved ones
    (Though 49% limit is inadequate http://arxiv.org/abs/1311.0243)
  If bad buys don't immediately fork chain and double spend, can't catch up
    After receiving payment, before handing over goods,
    wait for several new blocks (e.g., ~1hr) to cement payment in history
  And note mining payouts incentivize honest behavior even by greedy miners

Which of safety, liveness, and fault-tolerance does Bitcoin offer?
  Liveness and Fault-tolerance (anyone can unilaterally mine a block)
  It's also randomized (since guessing nonces), so not subject to FLP
  But Bitcoin not safe w/o synchrony assumption
    Under network partition, miners create arbitrarily deep forks
  Also not safe against computationally-powerful attackers
Another problem:  People want actual money, bank-like value guarantees
  So facebook created Calibra trying to realize libra
    Virtual coin backed by basket of currencies, blockchain for transactions
    But use Hostuff for f-out-of-3f+1 BFT consensus among closed consortium

Why not use PBFT for libra?
  Short answer: you could (e.g., hyperledger fabric supports this model)
  But with many replicas, PBFT has large message sizes and counts

Let's explore the straw-man, 2-round HotStuff
  Replicas proceed through numbered views; Node L_v is leader of view v
  Use Mn for messages broadcast by leader L_v to replicas {R_i}
  Use An for acknowledgments (replies) from replicas {R_i} to leader L_v
    Note: L_v is also a replica, so conceptually sends A_i messages, too
  Organize candidate values into blocks: b' = {v, H(b), x}
    v = new number, x = candidate consensus output value
  Replica State:
    QC: N-f matching signed A2 messages for highest view seen
    locked: A "locked" block for which R_i sent A2, or NULL if none locked
  Protocol:
    R_i -> L_v:  {A1, v, QC|NULL}
        QC = N-f signed A2 messages for some view v' < v
        L_v waits for N-f of these M1 messages, potentially updates QC

    L_v -> R_i:  {M2, v, block, QC}
        if (!extends(block, QC.block)) ignore M2 message
        if (highQC.v > lock.v && !extends(block, locked)) locked = NULL
	  if (highQC.v > QC.v) QC = highQC
        if (lock != NULL && !extends(block, locked)) ignore M2 message
        sets lock = block, proceed to 2b
    R_i -> L_v:  <A2, v, block>    # <> denotes signed message
        Leader waits for N-f of these

    L_v -> R_i: {M3, v, QC2}
        QC2 = N-f signed <A2, v, block> messages
        R_i drops unless v is latest view
	  Updates QC and proceeds to 3b
    R_i -> L_v: <A3, v, block>

    L_v -> R_i: {M4, v, QC3}
        QC3 = N-f <A3, v, block> messages, proving block committed
  What's the actual consensus value to output?
    Use x from earliest block in chain with non-trivial x
    Because any future committed blocks guaranteed to extend existing chain
  To summarize:
    - Commit/externalize when you see N-f signed A3s
    - Don't send A3 unless you see N-f signed A2s
    - Don't send A2 if its block is incompatible with your locked block
    - Lock A2.block when sending A2
    - Unlock if you see QC with QC.v > lock.v and !extends(QC.block, v.block)
    - Leader chooses block extending highest QC it knows

Is 2-round HotStuff safe?  Yes.  Why?
  N-f replicas always includes majority (N-2f) of non-faulty replicas
  So in any given view, cannot commit conflicting blocks
  Say conflicting w and b committed with w.v < b.v
    w must have been locked at N-2f non-faulty replicas, later unlocked
    What QC could have unlocked the first such non-faulty replica?
      Must have contained N-f signed A2 messages incompatible with w
      But if N-2f honest nodes locked w, N-2f can't contradict in A2

Can 2-round HotStuff get permanently stuck?  No.  Why?
  Say leader hears A1 from *all* N-f non-faulty nodes
  Will learn the highest QC of any non-faulty node
  Hence send M2 that for each non-faulty Ri is either
    - compatible with Ri's locked block
    - has a QC allowing Ri to unlock whatever it previously locked
So is 2-round HotStuff live under partial synchrony?  Yes
  Leader just needs to wait Delta time to hear all non-faulty A1
  Just keep increasing timeout until it exceeds Delta
So why isn't 2-round HotStuff good enough?
  You have to wait Delta even if you've heard from N-f non-faulty replicas
  Don't know they're non-faulty--so must wait for all N A1 msgs or Delta time
  Hence, protocol is not *responsive*
  But note it is "good enough" for settings like Casper and Tendermint
Does PBFT lack responsiveness? yes (long timeout after network repaired)
Does HoneyBadger? no--that's the point of being asynchronous

What's the solution?  Add another round of messages
  Replica State:
    QC: N-f matching signed *A1* messages
    locked: still set when replica sends message A2
  Messages:
      R_i -> L_v:  {M0, v, QC|NULL}
      L_v -> R_i:  {M1, v, QC|NULL}
      R_i -> L_v:  <A1, v, block, QC>
      ...
    Mapping: A0=NEW-VIEW, M1=PREPARE, M2=PRE-COMMIT, M3=COMMIT, M4=DECIDE
  To summarize:
    - Still commit/externalize when you see N-f signed A3s
    - Don't send A3 unless you see N-f signed A2s
    - Don't send A2 unless you see N-f signed A1s
    - Don't send A1 if its block is incompatible with your locked block
    - Still lock A2.block when sending A2
    - But unlock with N-f incompatible higher *A1* messages
How does adding a round increase responsiveness?
  2round: lock when *you* know N-2f non-faulties can't lock conflicting value
    But the next leader might not know this...
    Might propose conflicting block that causes failed view
  3round: lock when *N-2f non-faulty replicas* know this
    Say next leader only waits to hear N-f A0 messages
    N-2f will be from non-faulty replicas
    At least one will know of any block locked at any replica

How do we reduce message sizes?
  Use (N-f)-threshold signatures; So QCs contain 1 signature, not N-f
Could PBFT also use threshold crypto?  Yes, but
  It would add more rounds and hence more latency
    PBFT targeted low-latency NFS and leveraged physical multicast
  Also, new-view messages would still be pretty big

How does chained version optimize for blockchains?
  Instead of taking oldest value in committed chain, take whole chain
    Each view number now corresponds to a specific log index
    If leader fails or nodes time out, fill in gaps with dummy blocks
  Piggyback earlier rounds of newer blocks on later rounds (Fig. 1)
    Write parent<-child when parent={v, h, x}, child={v+1, H(parent), x'}
    If have "three-chain" b1 <- b2 <- b3 <-* b4, then one message implies:
        <A4,v1,b1>, <A3,v2,b2>, <A2,v3,b3>, <A1,v4,b4>
      So can externalize b1 and all ancestors of b1
  If you have a stable leader, skip A0 to get 1 RPC/block (amortized)
  Why does three-chain require direct parents for b1 <- b2 <- b3?
    Say b1 contradicts earlier committed w1
    Recall two prongs of proof:
      1. Can't commit conflicting blocks in the same view
      2. A committed block is locked and can't later be unlocked
    If w1 <- w2 <- w3 <-* w4 has three-chain, then w3 must precede b1
      So locked at w3 before unlocked at b1
      If sequence gaps, b1 could have earlier view number than w1

What's a "Pacemaker" (Algorithm 5)?
  Blockchains aggregate blocks of transactions to amortize consensus cost
    So leader proposes blocks in onBeat (e.g., every N seconds)
  Want to retain stable leader to save NEW-VIEW messages
    If leader fails, onNextSyncView switches to new leader
  Pacemaker abstracts timing and leader rotation policy to single module
  What's a simple Pacemaker guaranteed live under partial synchrony?
    Double onNextSyncView timeout for each failed view
    Is this better responsiveness than PBFT?  no
      But HotStuff easier to implement than PBFT's view change
      And HotStuff message sizes are smaller
  Modular pacemaker can use other, fancier ideas (e.g., Cogsworth)

Is HotStuff leader more scalable than PBFT leader?
  Messages smaller, but still have to communicate to all N replicas
  "BFTree" proposal partially aggregates signatures on other nodes