Coda
====

Goals:
  Constant data availability
  Transparency
  Work on off-the-shelf hardware
  Scale gracefully
Trade-off availability vs. consistency
What is target environment?
  Academic/research environment
  Portables--people already worked "disconnected"
  Based on file access patters w. no fine-grained concurrent access
Architecture (fig 1):
  Untrusted clients, trusted servers, replication amongst servers

Scalability
  Callback-based coherence & whole-file caching (like AFS)
  Avoid rapid system-wide change
  Place functionality on clients not servers (avoid election schemes)

First vs. Second-class replication
  2nd class=Clients: Turned off at will, limited disk, untrusted, no backups
    High-performance (local to processes), available (on client)
    Idea:  Trade quality for availability
  1st class=Servers: Known, persistent, secure, available, complete, accurate
    "High-quality", expensive (requires additional hardware)

Optimistic vs. Pessimistic replication
  What is pessimistic replication
    At any time, either one reader/writer or many readers
    Must agree on which before any partition
    Problem:  Can't always predict a disconnection
    Problem:  Writing client disappears, others locked out (leases might fix)
  What is optimistic replication?
    Coda guarantee:  Always see the latest in your accessible universe
    Problem: update conflicts
  Why are second-class replicas optimistic?
  Why are first-class replicas optimistic?
    Transparency: Else, user could edit file disconnected but not connected

Client structure:
  User-level venus client, like AFS.  Minicache improves performance
  Client states:  Hoarding, Emulation, Reintegration

What is hoarding?
  Hoard useful data in anticipation of a disconnection
  Complications of caching policy
    - File reference behavior
    - Possibility of unanticipated disconnection
    - Hard to quantify the cost of a cache miss
    - Must ensure you cache the latest version
    - Hoarding reduces available cache (is this a problem today?)
  Want to have cache in equilibrium
    Assign each file a caching priority
      Explicit information:  Hoard DB (Figure 4)
        priority, Children, all Decencents, + future children
      Implicit information:  Recent reference history
      Infinite priority for parent directories of cached files
    Equilibrium:  no cached object has higher priority than uncached object
  What is hoard-walking?
    Every 10 minutes (or before disconnect), bring cache to equilibrium
    Phase 1:  Process HDB (in case new directories created, etc.)
    Phase 2:  Fetch/evict files as needed
  How is cache kept in sync with other clients
    Callbacks invalidate client cache
    Files/symlinks: fetch on ref. or at hoard walk (unavailable if disconnect)
    Directories: mark name cache entry as "suspicious"

How does emulation work?
  Client takes over server's resposibilities:  access checks, generating fids
  Modifications affect cache only
    Deleted files purged from cache, Modified files get infinite priority
  What happens on a cache miss when disconnected?
    On read or write?  Cannot happen
    On open, gives error (unless configured to block)
  Venus keeps track of all modifications in a per-volume replay log
    Log system call arguments and version-state... but not read write args
    Log stores on file close, but log points to cache file, does not have data
    Allocate fids for newly created files
      preallocate a certain number of fids for client before disconnect
      after that, client uses temporary fids
    Log optimizations:  discard overwritten stores,
      discard operations that overwrite each other (store->unlink/trunc, ...)
      cancel both of two inverting operations (mkdir->rmdir)
  What crash recovery semantics do we want from Coda?
    Data lost should be no greater when disconnected than when connected
    Must have persistent cache survive crashes
      Cached file data is stored in local unix files
      Metadata uses recoverable virtual memory "RVM"
        Memory mapped region with local, non-nested transactions
        Can independently toggle atomicity, permanence, serializability
      Use no-flush commits to reduce latency (get bounded persistence)
      For performance, flush log less often when connected than disconnected
  Resource exhaustion?  Bad news, could possibly be fixed

How does reintegration work?
  First obtain permanent fids for any temporary ones
  Next ship replay log to all servers in AVSG
  Server atomically performs actions in replay log
    1. Parse log, lock all files in log
    2. Validate and execute each operation in the log
       - is access permitted?  does sufficient disk space exist?
       - is there an update conflict?
       - is it a store?  Yes, Create empty "shadow file".  Else do operation.
    3. Transfer data into shadow files--"back-fetching"
    4. Commit transaction, release locks
  What happens if client crashes before back-fetching completes?
    Revert all changes--replay is transactional

Conflicts
  What kinds of conflicts are possible?  (Read after Write, WaW)
    Don't care about read after write
  How are conflicts detected?
    Every version of an object has a storeid--changes with each store
    If client's pre-operation storeid does not match server, then conflict
      If A and B both have copies of f, and LSID is latest storeid
      A dominates B if A's LSID != B's AND B's LSID in A's history
      Conflict if neither LSID in the other's history
    To approximate algorithm w/o keeping complete history:
      LSID = <client_id, monotonically_increasing_integer>
      Each replica maintains estimate of other replica's history length
      Compare history length vectors to see which replica had more recent state
    Special-case directories.  Only declare conflict if:
      - Two clients created files with the same name
      - An object updated in one partition was deleted in another
      - Directory attributes separately changed in two partitions
  What happens on a conflict?
    Reintegration of entire volume is aborted
    Coda dumps contents of the reply log into tar-like file
    User must go through and manually reintegrate parts of log
    Could improve with finer granularity (only dependent subsequences atomic)

For weak connection: "Trickle reintegration" allows connected use while

Saving bandwidth:  Operation-based updates
  Set up proxy-client strongly connected to server
  Build up log of what commands have been run
  Reexecute commands on proxy-client--hope output is NEARLY identical
    (why nearly? because many files contain the date, etc.)
  Use forward error correction to fix the problem
    What is forward error correction

Do you buy the argument that CODA offers a useful paradigm?
  What are other alternatives?
    copy files back and forth
    cvs
    application-specific (e.g., imap)