A Comparison of File System Workloads
=====================================

How do you study file system workloads?
  * Static
    - Limited in what it can do
  * Observe network file system traffic
    - Unobtrusive, don't need source/access to modify software
    - Miss some information--local files, closes (sync. vs. async writes)
  * Instrument operating system
    - HPUX:  Use system call auditing
    - NT:  Interpose file system (kind of like dumb CCFS)
    Drawbacks:
    - Big traces to deal with
    - When file mmapped, what blocks are accessed (OS is I/O from mmap)
    - Hard to differentiate read from read-ahead (heuristics)
    - Overhead will slow things down
        harder to find willing subjects
        time dilation may affect trace (disk appears faster than actually is)

Huge number of file attribute reads
  * Attribute reads highly clustered by directory (ls, make)
  * Does it matter?
    - How much memory needed for most to hit in cache? (don't say)
  If significant, how might you exploit this fact?
    CFFS -- physical file system embeds 
  What about when designing a network file system protocol?
    NFS3 READDIRPLUS -- prefetch attributes when listing directory
    Echo-like leases on attributes probably a good idea

File lifetimes--how do you measure?
  * Delete-based (old paper)
  * Create based (this work)
  * Delete-based shows shorter lifetimes.  Why the difference?
    People create more files than they delete (increasing disk space used)

What do we learn from Figure 2?
    - Most Unix blocks last less than one hour
    - Most NT blocks last less than 1 sec or more than 1 day
      (recycle bin holds things for a long time)
    - A single application can have a big effect on shape of curve
      WEB database and log files, Netscape database files

Does this matter?
 * sec 4.2.3:  "file system designers will need to explore
                   alternatives..."  Do you agree?
 * If no fsync, could save some writes on Unix with 1 hour buffer
    - Figure 3 shows potential write savings
    - But if disk ever idle, could have written data in idle time
    - Write-buffer size probably doesn't affect e2e application perormance
  * Maybe it's important for functionality
    - Backup/snapshot services?  (Don't back up files less than 1 hour old?)
  * Note that sync/fsync bandwidth small (<10%)
    - So in general delayed writes (including metadata) are a good idea
       XFS/LFS

Most blocks die due to overwrites rather than deletion
 * Most overwritten files multiply overwritten
 * Can you take advantage of predictability?
   - How would XFS/FFS deal with overwrites
      Is it truncate and overwrite, or just overwrite?
	 probably truncate and overwrite
      XFS/FFS free and reallocate blocks
   - Maybe don't free blocks immediately
      (allow "pre-allocated" blocks beyond file size)

Read cache size
 * 4.4.1:  "For [non-web] workloads, there is little benefit to increasing
	    the cache beyond 16MB."  Do you believe this?
   - Some workloads still decrease significantly from 16MB -> 256MB
   - Important question is how would it affect end-to-end performance?
       Cache miss >1,000x more expensive than miss
       E.g., 96% hit rate could be twice as good as 92%
 * What are "file read misses"
 * What does interleaved/non-interleaved comparison tell us
   - Only 2% difference when multiple clients access FS concurrently
   - What does this mean for anticipatory scheduling on a file server?
       Hard to say

Read/write domination depends on workload
 * What does this mean for LFS?

Effect of memory mapping files
 * How do they measure this?
   - In NT it's hard, don't have enough information
   - In Unix:  Have trace of mmap, fork, exit system calls
   - Keep track of all files mmapped in one or more files
   - Don't know page access patterns, but total mmapped bytes very small
   - Explanation?  Just shared libraries

Dynamic size information
  * Most files accessed are small (<16KB)
  * More accesses to large files
  * Big files are bigger
  * They suggest maybe worth redesigning inode structure (XFS seems good)

Access patterns
  * Very few accesses contain both reads and writes
  * Except in RES, large files mostly accessed randomly
    - Need heuristic for prefetching
  * < 20KB files read in their entirety
    - Could modify file systems to do this always
      How would this interact with anticipatory scheduling?
        ordinarily prefetching doesn't kick in immediately
	This was a big benefit of anticipatory scheduling in web workload
  * Figure 9:  Bimodal access patterns

UBM revisited
  * How does this compare to results UBM paper?
    - UBM paper had much lower hit rates than, e.g., Figure 5
    - Concentrated on disk-bound workloads
  * In practice, looping far more common than sequential accesses
    - Marginal gain from keeping SEQ blocks != 0
  * Flush buffers from multiply overwritten files that aren't read?