Practical Byzantine Fault Tolerance =================================== Goal of paper: Deal with compromise through replication Use state machine replication... what's this? Assume all replicas start in the same state Every operation deterministically modifies state If all replicas agree on operations and order, will maintain same state At very high level: Want to keep n replicas of the server Client sends request (directly or indirectly) to all servers Some replicas may experience "Byzantine" failure Means they will act in worst possible way -- controlled by attacker Could just stop responding, or might send garbage responses back Idea: If "small" number fail, should be able to keep going But if most servers okay, clients can notice and ignore the bad ones Want two specific properties of this replicated system: linearizability: System must appear as though - all requests execute sequentially in some particular order - the execution order of requests looks the same to all clients - non-overlapping requests execute in temporal order liveness: - clients eventually receive a reply to each request This paper presents protocol for asynchronous systems... what are then? Def.: Asynchronous system has bound on message delay Why do we care? Means you can't tell the difference between failed and slow replicas Attacker might intentionally disrupt network to delay messages Have to be careful about violating security with "time outs" What is the minimum replicas that we will need to recover from f failures? 3f+1. Why? Because replicas must agree on order of operations... Let's digress for a sec with a military analogy... Byzantine generals Say you have three armies, three generals, G0, G1, G2. Can communicate with trustworthy messengers. They need to agree on whether to attack at dawn. But one of the generals might be a traitor. Two loyal generals needed for victory; defeat if only one attacks. Straw man: In advance the generals designate G0 as the leader. G0 sends an order to each of G1 and G2. G1 and G2 follow the order. If G1 or G2 is the traitor, this procedure works: G0 and the non-traitor both attack (or both retreat). What if G0 is the traitor? G0 could tell G1 to attack, and G2 to retreat. Can G1 and G2 detect G0's treachery by comparing notes? G1 and G2 tell each other what G0 told them. If G1 and G2 got the same order, they'll obey it. Otherwise both retreat. If G0 is the traitor, this procedure will detect it. But does it still work if G1 (or G2) is the traitor? Suppose G0 tells both to attack. G1 tells G2 "G0 told me to retreat". So G2 decides G0 is the traitor, and retreats, while G0 attacks. So now let's say generals use digital signatures G0 sends digitally signed message to G1 & G2 saying attack or retreat G1 & G2 relay signed message from G0 to each other Now G1 can't tell G2 "G0 says retreat" w/o signed message from G0 And G0 will be discovered for telling different things to G1 & G2 So G1 and G2 will retreat What happens if G0 bad, sends G1 signed "attack", and sends G2 nothing? G1 still sends G0's signed attack message to G2... so G1 and G2 will still attack But... we're assuming messages will get there before dawn! Suppose G1's message to G2 gets delayed... G1 will attack alone What if G2 acks message to G1? Same problem... ack could be delayed In general, this consensus problem is impossible with only three generals Why are we talking about generals? Same problem happens in a replicated system E.g., imagine replicas: R0, R1, R2 Must agree on order in which to order operations Suppose R0 tells R1 "order operation A before B" R1 relays message to R2, waits for ack R2 Does not respond Two possibilities: - R2 is a bad or failed replica and will not respond - R0 is a bad replica, send different messages to R1, R2, but R2 is slow (e.g., net partition) and just hasn't responded yet No way to tell the difference Conclusion: We need at least 4 replicas to survive a failure More generally: Need 3f+1 replicas to survive f Byzantine failures Here's the basic idea underlying this paper: You have 3f+1 replicas, of which f may have failed You can never wait for more than 2f+1 replicas to respond to a message Because the last f replicas might have failed and so will never reply But the last f replicas might be honest, just slow But waiting for 2f+1 replicas assures at least f of them are honest Now how to determine order of operations? System goes through a series of views In view v, replica number v mod (3f+1) is designated primary Primary is responsible for selecting the order of operations Assigns an increasing sequence number to each operation In normal-case operation, use two-phase protocol for request r: Phase 1 (pre-prepare, prepare) goal: Ensure at least f+1 honest replicas agree that If request r executes in view v, will execute with sequence no. n Phase 2 (prepare, commit) goal: Ensure at least f+1 honest replicas agree that Request r has executed in view v with sequence no. n Protocol for normal-case operation Let c be client r_i be replica i, or p primary, b_i backup i R set of all replicas c -> p: m = {REQUEST, req, o, t, c}_Kc p -> R: {PRE-PREPARE, v, n, d}_Kp, m (note d = H(m)) b_i -> R: {PREPARE, v, n, d, i} replica r_i now waits for PRE-PREPARE + 2f matching PREPARE messages puts these messages in its log then we say prepared (m, v, n, i) is TRUE Note: If prepared (m, v, n, i) is TRUE for honest replica r_i then prepared (m', v, n, j) where m' != m FALSE for any honest r_j So no other operation can execute with view v sequence number n Are we done? Just reply to client? No. Just because some other m' won't execute at (v,n) doesn't mean m will Suppose r_i is compromised right after prepared (m, v, n, i) Suppose no other replica received r_i's prepare message Suppose f replicas are slow and never even received the PRE-PREPARE No other honest replica will know the request prepared! Particularly if p fails, request might not get executed! So we say operation doesn't executes until prepared (m, v, n, i) is TRUE for 2f+1 non-fault replicas r_i We say committed (m, v, n) is TRUE when this property holds So how does a replica *know* committed (m, v, n) holds? Add one more message: r_i -> R: {COMMIT, v, n, d, i} replica r_i waits for 2f+1 identical COMMIT messages committed-local (m, v, n, i) is TRUE when: prepared (m, v, n, i) is TRUE, and r_i has 2f+1 matching commits in its log (including r_i's own) Note: If committed-local (m, v, n, i) is TRUE for some non-faulty r_i Then means committed (m, v, n) is TRUE. r_i knows when committed-local is TRUE So committed-local is a replica's way of knowing when committed is TRUE r_i replies to client when committed-local (m, v, n, i) is TRUE Client waits for f+1 matching replies, then returns to client Why f+1 and not 2f+1? Because of f+1, at least one replica r_i is non-faulty So client knows committed-local (m, v, n, i) Which in turn implies committed (m, v, n) Checkpoints View changes Discussion what problem does BFS solve? - is IS going to run BFS to deal with byzantine failures? - what failures are we talking about? compromised servers - what about compromised clients? authentication and authorization how can we extend the system to allow for more than (n-1)/3 failures over its lifetime? - detect failed replicas using proactive recovery - recover the system periodically, no matter what - makes bad nodes good again - tricky stuff - an attacker might steal compromised replica's keys - with how many replicas will BFS work reasonably well?