WAFS ==== What is WAFS? Simple file system designed to store file system and application journals Can be mounted synchronously or async (if want async creates, etc.) WAFS can be on different disk from file system it is journaling (faster) Why WAFS? Both applications and file systems need a logging facility If a database logs to a journaling file system, you have two layers of logs Bad for performance, particularly w. frequent flushes or w/o group commit Might actually decrease reliability--both logs must be intact after crash WAFS initially implemented for LFFS, a journaling version of FFS WAFS interface Looks kind of like an ordinary file system, can mount/unmount Except only one vnode for the whole file system Limited operations on WAFS file: register - registers a new application that will use the log Arguments: unique name, inform method, recovery command unique name - string nameing the client inform method - specifies how applications want to deal with log wraps Usually, will send signal to all clients before log wraps recovery command - string which is app-specific prog to run after crash Returns: 32-bit rmid (resource manager ID) append - appends log entry to a file, returns 64-bit LSN read - takes an LSN as an argument, returns log entry fsync - force all records to disk, or all records up to some LSN PFS === What is the storage model for PFS? - Storage system (e.g., network-attached disk) managed by untrusted entity - Want to make sure any tampering with data gets detected - Want system to be adaptable to many file systems Straw man 1: MAC every block you store in the system - Size no longer a power of two (or updates not atomic) - Intruder can switch two blocks, X and Y, and may still look correct Straw man 2: Store hashes with block pointers (like SFSRO) - Makes updating more expensive - Indirect references to blocks (i-numbers) would need to be change, too How does PFS solve the problem? - Have block map, which contains hash of every block on disk - For 16GB partition, 8KB blocks, 16-byte hashes -> 32MB - When you read a block, must check against entry in block map - When you write a block, must update map How is map stored? Stored on WAFS. Key idea: Can merge file system journal with block map log How do you trust contents of WAFS? - Can store on local disk - Log entries are all MACed, so untrusted storage OK, too When you update a block... 1. Update contents of block in buffer cache 2. Flag buffer as not yet hashed 3. hashd hashes block, writes result to head of journal 4. Clear flag and set buffer header to contain journal entry LSN 5. Do not write block back until log has been flushed past LSN - Only 1 and 2 must happen synchronously with system call How do you checkpoint the block map? - Can't write whole thing to disk atomically, way too big - Use partial checkpoints: Break map into chunks, checkpoint chunks in round-robin fashion What happens if you crash after writing log entry, but before writing block? Danger: Could have new hash value in block map, old data in block Solution: Each checkpoint must be followed by "async map" Async map contains old values of blocks not yet written How to recover block contents after a crash? - Some data exists on disk--but must verify its integrity - Log will contain some number of hash values for any particular block - Blocks could actually be any of those hash values - How to bound number of possible hash values during recovery? Could make log record to record all buffer I/O completion But don't want to lock head of the log at interrupt level - Solution: Store two LSNs in buffer header: buffer-end: The LSN you must write before writing the block) buffer-begin: The LSN of the last time the block was written - hashd logs the oldest buffer-begin LSN of any dirty block Thus, do not believe any hashes in log records before than LSN Could we do better than this? Attacks on the system? Freshness--can roll back any block to any value since logged LSN (i.e., oldest buffer-begin value)