WAFS
====

What is WAFS?
  Simple file system designed to store file system and application journals
  Can be mounted synchronously or async (if want async creates, etc.)
  WAFS can be on different disk from file system it is journaling (faster)

Why WAFS?
  Both applications and file systems need a logging facility
  If a database logs to a journaling file system, you have two layers of logs
    Bad for performance, particularly w. frequent flushes or w/o group commit
    Might actually decrease reliability--both logs must be intact after crash
  WAFS initially implemented for LFFS, a journaling version of FFS

WAFS interface
  Looks kind of like an ordinary file system, can mount/unmount
  Except only one vnode for the whole file system

Limited operations on WAFS file:
  register - registers a new application that will use the log
    Arguments:  unique name, inform method, recovery command
      unique name - string nameing the client
      inform method - specifies how applications want to deal with log wraps
        Usually, will send signal to all clients before log wraps
      recovery command - string which is app-specific prog to run after crash
    Returns: 32-bit rmid (resource manager ID)
  append - appends log entry to a file, returns 64-bit LSN
  read - takes an LSN as an argument, returns log entry
  fsync - force all records to disk, or all records up to some LSN


PFS
===

What is the storage model for PFS?
  - Storage system (e.g., network-attached disk) managed by untrusted entity
  - Want to make sure any tampering with data gets detected
  - Want system to be adaptable to many file systems

Straw man 1:  MAC every block you store in the system
  - Size no longer a power of two (or updates not atomic)
  - Intruder can switch two blocks, X and Y, and may still look correct

Straw man 2:  Store hashes with block pointers (like SFSRO)
  - Makes updating more expensive
  - Indirect references to blocks (i-numbers) would need to be change, too

How does PFS solve the problem?
  - Have block map, which contains hash of every block on disk
  - For 16GB partition, 8KB blocks, 16-byte hashes -> 32MB
  - When you read a block, must check against entry in block map
  - When you write a block, must update map

How is map stored?  Stored on WAFS.
  Key idea:  Can merge file system journal with block map log

How do you trust contents of WAFS?
  - Can store on local disk
  - Log entries are all MACed, so untrusted storage OK, too

When you update a block...
  1. Update contents of block in buffer cache
  2. Flag buffer as not yet hashed
  3. hashd hashes block, writes result to head of journal
  4. Clear flag and set buffer header to contain journal entry LSN
  5. Do not write block back until log has been flushed past LSN
  - Only 1 and 2 must happen synchronously with system call

How do you checkpoint the block map?
  - Can't write whole thing to disk atomically, way too big
  - Use partial checkpoints:
      Break map into chunks, checkpoint chunks in round-robin fashion

What happens if you crash after writing log entry, but before writing block?
  Danger:  Could have new hash value in block map, old data in block
  Solution:  Each checkpoint must be followed by "async map"
    Async map contains old values of blocks not yet written

How to recover block contents after a crash?
  - Some data exists on disk--but must verify its integrity
  - Log will contain some number of hash values for any particular block
  - Blocks could actually be any of those hash values
  - How to bound number of possible hash values during recovery?
      Could make log record to record all buffer I/O completion
      But don't want to lock head of the log at interrupt level
  - Solution:  Store two LSNs in buffer header:
      buffer-end:  The LSN you must write before writing the block)
      buffer-begin:  The LSN of the last time the block was written
  - hashd logs the oldest buffer-begin LSN of any dirty block
      Thus, do not believe any hashes in log records before than LSN

Could we do better than this?

Attacks on the system?
  Freshness--can roll back any block to any value since
		 logged LSN (i.e., oldest buffer-begin value)