Generalized File System Dependencies ==================================== Brief review of FFS/USF/ext2-like file systems Superblock - gives overall layout, free counts Inodes - Metadata + block maps for files Bitmaps - Record free/used status of inodes and blocks Data blocks - Chunks of files (and directories) What goes if ext2-like file system writes everything asynchronously? Power failure or crash leaves disk in inconsistent state, E.g.: Cross-allocated blocks Chunks of deleted file show up in another file Lose renamed file (or shows up in lost+found, not correct place) Garbage shows up as metadata (directory/indirect block) What are the three rules of metadata consistency (p. 4) 1. Never write pointer before initializing the structure it points to 2. Never reuse a resource before nullifying all pointers to it 3. Never clear last pointer to live resource before setting new one What is journaling, and how does it ensure the three rules? Write data twice, once to journal (a.k.a., write-ahead log), once in place Group multiple operations into a single log entry w. commit record Replay committed but uncompleted log entries after a crash. Either - all entries in a log entry will execute (after replay), or - none will execute (because entry not committed, and log write-ahead) Atomic operations make rules 1-3 easy to respect E.g., write pointer and initialize structure atomically What are soft updates? Keep track of dependencies between blocks E.g., write indirect block before writing inode that points to it "B depends on A" written "B -> A", means A must be written before B Example: Figure 2a (page 4) Inode depends on data block (rule 1) Why does data block depend on bitmap block? (easy way to enforce rule 3) Must write bitmap, then data, then inode But also keep track of undo information for changes--why? Even if dependencies don't have cycle, block granularity causes false sharing and cycles between blocks Undo changes to break block dependency cycles Example: Figure 2b 3 dirty blocks, none is safe to write (all have outgoing arrows) Must undo changes, write a block, re-dirty block, then do 3 more writes What is a patch in featherstitch? Similar to a soft-update, but implemented in a file-system-independent way System maintains some invariants (refer to Figure 1, p. 3, for notation) No cycles are allowed in dependencies: p ~> q ~> p is illegal But can still block dependencies (p ~> q ~> r with p & r on same block) Can't write patch to disk before its invariants: dep[C] \subseteq C Every in-flight block can commit without violating previous, i.e.: dep[F_B] \subseteq (C \union F_B) OK to depend on patches in same block if assuming block atomicity Also: Only one version of block can be in-flight at a time The buffer cache must eventually write all dirty blocks How are patches implemented? "~>" is transitive, so only track direct dependencies ("->") Keep track of old and new data for byte range within block Special cases for small patches: <= 4 bytes, 1-bit patch Automatically detect overlap: q -> p if q written after p and overlaps it System supports empty patches that prevent others from being committed Empty patches can make quadratic dependencies linear (p. 5) how? How do patches represent journal? Figure 2c Create empty placeholder patch All in place writes depend on placeholder (which can't commit) Then write changes to the log Then write commit record, dependent on all log entries Finally make placeholder dependent on commit record and release it to commit Also make completion record dependent on in-place writes What's wrong with naive featherstitch implementation? Uses way to much memory for patches and dependencies (Fig. 3a) What's a hard patch, and when is it safe? (4.1) Hard patch p stores no undo information Safe if p will never need to be reverted to break block dependencies Happens if p not the head of any block-level dependency cycle Strategy to determine when it is safe to make p a hard patch - Ensure block-level cycles with p at head are created with p, not later - Be conservative: OK if no dependencies on p's block from other blocks What is hard patch merging? (4.2) Make two hard patches into one larger hard patch Merge dependencies and prune any intra-block cycles Note: no inter-block cycles, since both are hard patches Can always merge two hard patches on same block (will be written together) Might be able to convert old soft patch into hard, merge with new hard patch What is overlap merging, and when is it safe? (4.3) Merge soft patch with another patch on same block (inode fields, bitmap bits) Safe if you never need to roll back one patch without the other Say p2 ~> p1 and p1 and p2 overlap Danger if p2 ~> x ~> p1 ~> p3 where p3 on same block, x on different block Cannot commit p2 and p1 at same time--must commit p1, then x, then p2 Use conservative heuristic to avoid exhaustive search for x where expensive What are ready patch lists? Keep dependency count, update (using back pointers) Blocks with no outgoing arrows are "ready", keep on special ready list Saves CPU time for buffer cache Soft updates optimizes creation and deletion of same file. Featherstitch? Sec 4.5 (p. 8): Optimization FS specific, but base on whether patches merge What is a patchgroup and why would you want this? User-level interface to dependencies engage, disengage patchgroup in process so all FS activity can be grouped add dependencies between patchgroups How to avoid cycles? First add P ~> *, then engage/disengage P, then add * ~> P What is sort example on p. 8? What can go wrong without depend command? What is the problem with loopback file systems (e.g., encrypted home dir)? How could you make cycles okay by changing abstraction layers (p. 3.6)? Journals have the ability to write arbitrarily many patches atomically By writing a cycle What evaluation questions should we ask? Performance? Correctness? Usability/performance of patchgroups API? Impact on kernel code, structure