The ordering of events in distributed systems High level plan Maintain replicated data in a distributed system Consistency: Keep the copies identical Conflict handling: Updates from different clients may conflict This is an application-specific idea Conflicts: avoid or resolve Example problem: 1. I look at your calendar and decide you're free at 2pm 2. I look at the meeting room calendar and see it's available at 2pm 3. I reserve the room for 2pm 4. I mark your calendar as "busy" for 2pm What if somebody else is reserving the room at the same time? Execute all operations on a central server (NFS, SFS, ...) Simple, but scales badly, requires full-time network connections Serialize with locks (System R, AFS, ...) Result is as if operations were executed in some serial order Applications must lock, but otherwise no changes Locking requires tightly coupled communication Or allow conflicts, recognize later and resolve Optimistic like Bayou (Also FSes Coda, Ficus, ...) Does not require permanent net connectivity But often harder to resolve than to avoid Consistency Want to keep replicas identical Example (with time-lines): 1. A buys IBM at 107 2. B sees A's trade, B buys IBM at 108 3. C sees B's trade, C buys IBM at 109 4. What order do D and E see these transactions? What's the current price? Is it going up, down, or just being random? The basic approach: Make sure everybody agrees on the order to apply updates. Make sure everybody sees all the updates. The details: Can you make progress while you're missing some updates? How do you know you have all the updates? Who assigns order? Does the order preserve any real-world ordering & external consistency? Remember sequential consistency from System R? Everybody must see the same order of operations If I see update 27, then create update 28 Then everybody sees my update after update 27 Or at least knows mine should follow 27, even if mine arrives first That is, we've preserved causality Example: 27 creates a meeting, my update adds a participant Harp provides similar semantics in a distributed system "Primary" server determines order by log position E.g., 27 and 28 could be positions in the log That's not scalable, and requires good network connectivity Can we preserve causal order without a central time-stamper? I.e. in some distributed, decentralized way? What is causality? We want a notion of X "happens before" Y. Powerful enough to help us write distributed applications. Weak enough to be efficiently implemented. I.e. no global clock or central time-stamp server. - What does "happening before" mean in a distributed system? - In normal life, we look at a physical clock and say something happens before or after. - Using a physical clocks in a distributed system to determine whether an event happens before or after is difficult----it requires having physical clocks on each computer, and synchronizing clocks accurately is difficult. - The "happens-before" relationship in a distributed system: Model: Processes are a sequence of events, where events are an abstract notion, which could be a single instruction, a procedure, sending a message, receiving an interrupt, etc. Happens before within a process: an event A that precedes event B in the sequence is said to happen before B. Lets assume two events: send and receiving messages. A happens before B (1) if A and B are events in the same process and A comes before B, or (2) if A is the sending of a message by one process and B is the receipt of the same message by another process. Notation: A happens before B is written as A -> B. Two events are concurrent if A didn't happen before B and B didn't happen before A. Happens-before defines an irreflexive partial ordering. We want *other* nodes to see events in an order that obeys ->. - Logical clocks -- called Lamport Clocks - A distributed algorithm to order events in a way that preserves causality. - Using the happens-before relationship we can define a logical clock. - Define Ci, a *logical* clock for process Pi as follows: Ci(A) = a number for event A. The number can be a counter. - The global Clock C for the system must adhere to: if A -> B, then C(A) < C(B) This condition is satisfied if: (1) if A and B are events in Pi, then Ci(A) < Ci(B) (2) if A is the sending of a message by process Pi and B is the receipt of that message by process Pj, then Ci(A) < Cj(B). - Implementation rules: (1) Pi increments Ci between any two successive events (2)(a) If event A is the sending of a message M by process Pi, the the message contains a timestamp Tm = Ci(A). (b) Upon receiving a message M, process Pj sets Cj >= its present value and greater than Tm, MAX (Cj, Ci(A)+1). - Example: use logical clocks to order messages in our IBM example. - A -> B implies C(A) < C(B), but C(A) < C(B) DOES NOT IMPLY A -> B (events may happen concurrently) - Note that two concurrent events may have the same time stamp - Total ordering of events - Order events A -> B by their logical clocks, Break ties by ordering them in some arbitrary order - A ==> B if and only if: (1) Ci(B) < Cj(B), or (2) Ci(A) == Cj(B) and Pi < Pj. ==> is a total order.