HotStuff (2019) =============== Administrivia: class poll on camera vs. screen share PBFT is live under *weak synchrony* Network/node delays grow at a bounded rate (e.g., polynomially) Keep doubling timeout until eventually all nodes in same view for long enough Another model: partial synchrony Initially, things may be very crazy Eventually reach some global stabilization time GST After GST, nodes can communicate within some timebound Delta The catch: don't know GST or Delta (so Byz. generals still not safe) Two models have same power Increasing estimate of Delta is like increasing timeouts in PBFT Blockchain digression: Say we replace credit card #s w. public keys? Hold private key on cell phone Sign each transaction with amount, recipient, comment What would this fix? Credit card numbers would be harder to steal Lower fraud could lead to lower (but still non-zero) transaction fees Could also lower fraud-detection false positives Could give merchants protection (eliminate "card not present" transactions) How does this compare to cash? Worse privacy, third party trust, bank control, transaction charges Bitcoin attempts to get rid of the bank... Transfer ownership of digital BTC between different public keys Must solve 2 problems for this model to work: 1. How do you distribute Bitcoin so people believe it has value? 2. How do you prevent someone from double-spending a Bitcoin? Public key A is worth 1 BTC A pays 1 BTC to B, walks away with some goods A tries to pay 1 BTC to C, walk away with more goods Note that problem #2 is basically consensus--"timestamp server" With timestamp server, everyone agrees A->C message later than A->B Hence A->C is not valid, because A is already spent Keep ordered transaction history as a giant replicated state machine How Bitcoin do consensus? Proof-of-work mining Flood new transactions History progresses in blocks of new transactions: block = { H(previous-block), nonce, H*(new-transactions) } Note inclusion of previous block makes this a *block chain* By convention, only accept block if H(block) < target-value Currently H(block) must start with >=76 0 bits. So must try an expected 75 hextillion (75*10^21) nonces to find good one Finding these nonces, known as "mining", is computationally very hard So incentivize finding the next block with bitcoins! First transaction in block is special "coinbase" transaction Creates 12.5 (soon to be 6.25) new BTC paid to block miner https://bitcoin.org/en/developer-reference#serialized-blocks Note: adjust target-value as more miners come on-line In practice, new blocks mined every 7-10 minutes or so Given two incompatible block chains, take one with most work (~tallest) Note we've just solved both the coin distribution and consensus problems! How does this solve the double-spending problem? Well-behaved nodes ordinarily shouldn't fork block chain Assume ill-behaved nodes have less aggregate compute than well-behaved ones (Though 49% limit is inadequate http://arxiv.org/abs/1311.0243) If bad buys don't immediately fork chain and double spend, can't catch up After receiving payment, before handing over goods, wait for several new blocks (e.g., ~1hr) to cement payment in history And note mining payouts incentivize honest behavior even by greedy miners Which of safety, liveness, and fault-tolerance does Bitcoin offer? Liveness and Fault-tolerance (anyone can unilaterally mine a block) It's also randomized (since guessing nonces), so not subject to FLP But Bitcoin not safe w/o synchrony assumption Under network partition, miners create arbitrarily deep forks Also not safe against computationally-powerful attackers Another problem: People want actual money, bank-like value guarantees So facebook created Calibra trying to realize libra Virtual coin backed by basket of currencies, blockchain for transactions But use Hostuff for f-out-of-3f+1 BFT consensus among closed consortium Why not use PBFT for libra? Short answer: you could (e.g., hyperledger fabric supports this model) But with many replicas, PBFT has large message sizes and counts Let's explore the straw-man, 2-round HotStuff Replicas proceed through numbered views; Node L_v is leader of view v Use Mn for messages broadcast by leader L_v to replicas {R_i} Use An for acknowledgments (replies) from replicas {R_i} to leader L_v Note: L_v is also a replica, so conceptually sends A_i messages, too Organize candidate values into blocks: b' = {v, H(b), x} v = new number, x = candidate consensus output value Replica State: QC: N-f matching signed A2 messages for highest view seen locked: A "locked" block for which R_i sent A2, or NULL if none locked Protocol: R_i -> L_v: {A1, v, QC|NULL} QC = N-f signed A2 messages for some view v' < v L_v waits for N-f of these M1 messages, potentially updates QC L_v -> R_i: {M2, v, block, QC} if (!extends(block, QC.block)) ignore M2 message if (highQC.v > lock.v && !extends(block, locked)) locked = NULL if (highQC.v > QC.v) QC = highQC if (lock != NULL && !extends(block, locked)) ignore M2 message sets lock = block, proceed to 2b R_i -> L_v: # <> denotes signed message Leader waits for N-f of these L_v -> R_i: {M3, v, QC2} QC2 = N-f signed messages R_i drops unless v is latest view Updates QC and proceeds to 3b R_i -> L_v: L_v -> R_i: {M4, v, QC3} QC3 = N-f messages, proving block committed What's the actual consensus value to output? Use x from earliest block in chain with non-trivial x Because any future committed blocks guaranteed to extend existing chain To summarize: - Commit/externalize when you see N-f signed A3s - Don't send A3 unless you see N-f signed A2s - Don't send A2 if its block is incompatible with your locked block - Lock A2.block when sending A2 - Unlock if you see QC with QC.v > lock.v and !extends(QC.block, v.block) - Leader chooses block extending highest QC it knows Is 2-round HotStuff safe? Yes. Why? N-f replicas always includes majority (N-2f) of non-faulty replicas So in any given view, cannot commit conflicting blocks Say conflicting w and b committed with w.v < b.v w must have been locked at N-2f non-faulty replicas, later unlocked What QC could have unlocked the first such non-faulty replica? Must have contained N-f signed A2 messages incompatible with w But if N-2f honest nodes locked w, N-2f can't contradict in A2 Can 2-round HotStuff get permanently stuck? No. Why? Say leader hears A1 from *all* N-f non-faulty nodes Will learn the highest QC of any non-faulty node Hence send M2 that for each non-faulty Ri is either - compatible with Ri's locked block - has a QC allowing Ri to unlock whatever it previously locked So is 2-round HotStuff live under partial synchrony? Yes Leader just needs to wait Delta time to hear all non-faulty A1 Just keep increasing timeout until it exceeds Delta So why isn't 2-round HotStuff good enough? You have to wait Delta even if you've heard from N-f non-faulty replicas Don't know they're non-faulty--so must wait for all N A1 msgs or Delta time Hence, protocol is not *responsive* But note it is "good enough" for settings like Casper and Tendermint Does PBFT lack responsiveness? yes (long timeout after network repaired) Does HoneyBadger? no--that's the point of being asynchronous What's the solution? Add another round of messages Replica State: QC: N-f matching signed *A1* messages locked: still set when replica sends message A2 Messages: R_i -> L_v: {M0, v, QC|NULL} L_v -> R_i: {M1, v, QC|NULL} R_i -> L_v: ... Mapping: A0=NEW-VIEW, M1=PREPARE, M2=PRE-COMMIT, M3=COMMIT, M4=DECIDE To summarize: - Still commit/externalize when you see N-f signed A3s - Don't send A3 unless you see N-f signed A2s - Don't send A2 unless you see N-f signed A1s - Don't send A1 if its block is incompatible with your locked block - Still lock A2.block when sending A2 - But unlock with N-f incompatible higher *A1* messages How does adding a round increase responsiveness? 2round: lock when *you* know N-2f non-faulties can't lock conflicting value But the next leader might not know this... Might propose conflicting block that causes failed view 3round: lock when *N-2f non-faulty replicas* know this Say next leader only waits to hear N-f A0 messages N-2f will be from non-faulty replicas At least one will know of any block locked at any replica How do we reduce message sizes? Use (N-f)-threshold signatures; So QCs contain 1 signature, not N-f Could PBFT also use threshold crypto? Yes, but It would add more rounds and hence more latency PBFT targeted low-latency NFS and leveraged physical multicast Also, new-view messages would still be pretty big How does chained version optimize for blockchains? Instead of taking oldest value in committed chain, take whole chain Each view number now corresponds to a specific log index If leader fails or nodes time out, fill in gaps with dummy blocks Piggyback earlier rounds of newer blocks on later rounds (Fig. 1) Write parent<-child when parent={v, h, x}, child={v+1, H(parent), x'} If have "three-chain" b1 <- b2 <- b3 <-* b4, then one message implies: , , , So can externalize b1 and all ancestors of b1 If you have a stable leader, skip A0 to get 1 RPC/block (amortized) Why does three-chain require direct parents for b1 <- b2 <- b3? Say b1 contradicts earlier committed w1 Recall two prongs of proof: 1. Can't commit conflicting blocks in the same view 2. A committed block is locked and can't later be unlocked If w1 <- w2 <- w3 <-* w4 has three-chain, then w3 must precede b1 So locked at w3 before unlocked at b1 If sequence gaps, b1 could have earlier view number than w1 What's a "Pacemaker" (Algorithm 5)? Blockchains aggregate blocks of transactions to amortize consensus cost So leader proposes blocks in onBeat (e.g., every N seconds) Want to retain stable leader to save NEW-VIEW messages If leader fails, onNextSyncView switches to new leader Pacemaker abstracts timing and leader rotation policy to single module What's a simple Pacemaker guaranteed live under partial synchrony? Double onNextSyncView timeout for each failed view Is this better responsiveness than PBFT? no But HotStuff easier to implement than PBFT's view change And HotStuff message sizes are smaller Modular pacemaker can use other, fancier ideas (e.g., Cogsworth) Is HotStuff leader more scalable than PBFT leader? Messages smaller, but still have to communicate to all N replicas "BFTree" proposal partially aggregates signatures on other nodes