Frangipani
==========

Goals of paper
  - Scalable storage
    Many clients, lots of disk space
    Just add most servers everything works
  - Simplicity of design:  Leverage Petal

Why not just use AdvFS on Petal?

How does Frangipani use Petal?
  - Figure 2 shows architecture of the system
  - Everything stored on Petal - even local server logs!
    Couldn't servers store their own logs on local disk?
      (would hamper recovery)

Frangipani exploits large virtual address space
  - Figure 4 shows storage layout
  - 64KB allocation size allows it to have many clusters of allocated space
  - Why are inodes 512 bytes?  locking granularity, eliminate false sharing

How to deal with concurrency?
  - Virtual disk segments covered by shared rd/exclusive wr locks
  - Are segments contiguous?
    No:  One lock covers inode and data
  - Every server has its own address space for log (no contention)
  - Split allocation bitmap amongst servers -- no alloc contention
  - When do you have contention?  concurrent write sharing, file deletes

How to implement locks?
  - Centralized server?  Not fault-tolerant
  - Store them in petal?  Disk writes too expensive
  - Distributed lock server, and "clerks" on each Frangipani server

How is distributed lock server implemented?
  Split locks into ~100 different lock groups
  Clerks get lease on lock table
  Servers check each other with "heartbeat" messages
  How to agree on on which server is responsible for which lock group?
    Danger:  Two servers both think they are responsible for same locks!
    Use consensus alg. to assign lock groups to servers--How might this work?
      E.g., use something like view change algorithm from Harp
      - Coordinator asks other servers to participate in view change
      - Available servers will agree if not participating in another v.c.
      - Phase 2:  Coordinator tells other servers about new view
    Slow view change protocol OK, since only rarely needed
  What about lost state when server crashes?
    Can retrieve lost state from clerks
    Will know all clerks that have lease on lock table (leases replicated)
  Does above guarantee two servers never responsible for same lock?
    Not quite, need to ensure view change happens atomically
      First server releases all locks they are losing responsibility for
      Then find out state of new locks from clients

What is hazard if messages delayed (3rd to last par in sec 6)?

What about deadlock?
  - Is it a problem?  e.g. rename
  - Acquire locks in two phases
     1. Figure out what locks you need (acquiring & releasing as needed)
     2. Get all the locks in increasing block order
     3. Figure out if step 1 was still correct, if not, start over

What happens if you trip over the ethernet cord
  - If network is out for more than 30 seconds, lease will expire
  - If any dirty data in cache, file system must be unmounted and remounted

In normal operation, does Frangipani preserve Echo's => ordering?
  Almost (writes that change file size can still be reordered).  How?
  - Always log metadata before updating permanent locations
  - Always write permanent locations before returning lock

What happens when a Frangipani server crashes?
  - Detect server failure.  How? (client notices, or lock lease expires)
  - Recovery daemon gains ownership of failed server's locks
    includes log, inodes, etc.
  - Finds beginning of log.  How?  version number decrease
  - Replays log, releases locks
  - How many Frangipani servers can fail?  All, as long as Petal still up

How to maintain update order after a crash?
  - What if log contains changes that were already overwritten?
      Server applied change, released lock, someone else changed,
        recovery server reacquired lock, rolled back other server's change
  - Solution?  Never replay same log entry more than once
    Version number in every metadata block
    What if metadata block replaced by data block w/o version? don't do that
    How do you guarantee metadata blocks not recycled for data?
      They don't say... several possibilities:
      Could allocate metadata from bottom of bitmap region, data from top
      Could keep extra bits it bitmap table

What happens if a Frangipani server and lock server crash simultaneously?
  Recovery server is supposed to acquire locks of failed Frangipani server
  Lock servers are supposed to reconstruct lock state from (crashed) clerk
  Don't know what locks to give recovery server!  Why is this bad?
    Someone else could get lock and see incomplete operation
  How can you deal with this?  Paper doesn't say.  Possibilities:
    Could give entire set of missing locks to recovery server
    Will guarantee it has all locks the crashed server had
    Disadvantages?  Slow.  Plus what if two servers and lock server crash?
      Must wait until both Frangipani servers' logs are replayed
	   before giving out any locks in missing lock groups
      Okay, because two crashed Frangipani servers didn't hold same lock

How to backup a Frangipani file system?
  - Petal offers snapshots.  Is that good enough?  yes logs in Petal
  - Restore entire Petal snapshot (including logs), and do crash recovery
  - What about recovering individual files?  Painful; search all logs
  - What's alternate scheme?  Block everyone with a global lock, also imperfect

Security plan
  - Export Frangipani w. another network file system protocol
  - Why not just authenticate users to petal servers?

Performance
  - What are goals?
     Good single client performance
     Scale with number of clients
  - Figure 5, 6:  good
  - Figure 7:  Why are writes slower?  (must write to two petal servers)
  - Figure 8:  Why is performance so bad under conention?
     Must flush cache/write back all data when giving up lock
  - Why does readahead hurt?

How does Frangipani compare to Zebra?
  - Very easy to implement (two months)
  - No need for central file manager
  - Frangipani has very heavy-weight sharing
      Must flush cache after returning lock
      Would Zebra do better on Figure 8 benchmark?  Why?
        Zebra consistency is on block pointers, not blocks themselves