Coda ==== Goals: Constant data availability Transparency Work on off-the-shelf hardware Scale gracefully Trade-off availability vs. consistency What is target environment? Academic/research environment Portables--people already worked "disconnected" Based on file access patters w. no fine-grained concurrent access Architecture (fig 1): Untrusted clients, trusted servers, replication amongst servers Scalability Callback-based coherence & whole-file caching (like AFS) Avoid rapid system-wide change Place functionality on clients not servers (avoid election schemes) First vs. Second-class replication 2nd class=Clients: Turned off at will, limited disk, untrusted, no backups High-performance (local to processes), available (on client) Idea: Trade quality for availability 1st class=Servers: Known, persistent, secure, available, complete, accurate "High-quality", expensive (requires additional hardware) Optimistic vs. Pessimistic replication What is pessimistic replication At any time, either one reader/writer or many readers Must agree on which before any partition Problem: Can't always predict a disconnection Problem: Writing client disappears, others locked out (leases might fix) What is optimistic replication? Coda guarantee: Always see the latest in your accessible universe Problem: update conflicts Why are second-class replicas optimistic? Why are first-class replicas optimistic? Transparency: Else, user could edit file disconnected but not connected Client structure: User-level venus client, like AFS. Minicache improves performance Client states: Hoarding, Emulation, Reintegration What is hoarding? Hoard useful data in anticipation of a disconnection Complications of caching policy - File reference behavior - Possibility of unanticipated disconnection - Hard to quantify the cost of a cache miss - Must ensure you cache the latest version - Hoarding reduces available cache (is this a problem today?) Want to have cache in equilibrium Assign each file a caching priority Explicit information: Hoard DB (Figure 4) priority, Children, all Decencents, + future children Implicit information: Recent reference history Infinite priority for parent directories of cached files Equilibrium: no cached object has higher priority than uncached object What is hoard-walking? Every 10 minutes (or before disconnect), bring cache to equilibrium Phase 1: Process HDB (in case new directories created, etc.) Phase 2: Fetch/evict files as needed How is cache kept in sync with other clients Callbacks invalidate client cache Files/symlinks: fetch on ref. or at hoard walk (unavailable if disconnect) Directories: mark name cache entry as "suspicious" How does emulation work? Client takes over server's resposibilities: access checks, generating fids Modifications affect cache only Deleted files purged from cache, Modified files get infinite priority What happens on a cache miss when disconnected? On read or write? Cannot happen On open, gives error (unless configured to block) Venus keeps track of all modifications in a per-volume replay log Log system call arguments and version-state... but not read write args Log stores on file close, but log points to cache file, does not have data Allocate fids for newly created files preallocate a certain number of fids for client before disconnect after that, client uses temporary fids Log optimizations: discard overwritten stores, discard operations that overwrite each other (store->unlink/trunc, ...) cancel both of two inverting operations (mkdir->rmdir) What crash recovery semantics do we want from Coda? Data lost should be no greater when disconnected than when connected Must have persistent cache survive crashes Cached file data is stored in local unix files Metadata uses recoverable virtual memory "RVM" Memory mapped region with local, non-nested transactions Can independently toggle atomicity, permanence, serializability Use no-flush commits to reduce latency (get bounded persistence) For performance, flush log less often when connected than disconnected Resource exhaustion? Bad news, could possibly be fixed How does reintegration work? First obtain permanent fids for any temporary ones Next ship replay log to all servers in AVSG Server atomically performs actions in replay log 1. Parse log, lock all files in log 2. Validate and execute each operation in the log - is access permitted? does sufficient disk space exist? - is there an update conflict? - is it a store? Yes, Create empty "shadow file". Else do operation. 3. Transfer data into shadow files--"back-fetching" 4. Commit transaction, release locks What happens if client crashes before back-fetching completes? Revert all changes--replay is transactional Conflicts What kinds of conflicts are possible? (Read after Write, WaW) Don't care about read after write How are conflicts detected? Every version of an object has a storeid--changes with each store If client's pre-operation storeid does not match server, then conflict If A and B both have copies of f, and LSID is latest storeid A dominates B if A's LSID != B's AND B's LSID in A's history Conflict if neither LSID in the other's history To approximate algorithm w/o keeping complete history: LSID = Each replica maintains estimate of other replica's history length Compare history length vectors to see which replica had more recent state Special-case directories. Only declare conflict if: - Two clients created files with the same name - An object updated in one partition was deleted in another - Directory attributes separately changed in two partitions What happens on a conflict? Reintegration of entire volume is aborted Coda dumps contents of the reply log into tar-like file User must go through and manually reintegrate parts of log Could improve with finer granularity (only dependent subsequences atomic) For weak connection: "Trickle reintegration" allows connected use while Saving bandwidth: Operation-based updates Set up proxy-client strongly connected to server Build up log of what commands have been run Reexecute commands on proxy-client--hope output is NEARLY identical (why nearly? because many files contain the date, etc.) Use forward error correction to fix the problem What is forward error correction Do you buy the argument that CODA offers a useful paradigm? What are other alternatives? copy files back and forth cvs application-specific (e.g., imap)