A Comparison of File System Workloads ===================================== How do you study file system workloads? * Static - Limited in what it can do * Observe network file system traffic - Unobtrusive, don't need source/access to modify software - Miss some information--local files, closes (sync. vs. async writes) * Instrument operating system - HPUX: Use system call auditing - NT: Interpose file system (kind of like dumb CCFS) Drawbacks: - Big traces to deal with - When file mmapped, what blocks are accessed (OS is I/O from mmap) - Hard to differentiate read from read-ahead (heuristics) - Overhead will slow things down harder to find willing subjects time dilation may affect trace (disk appears faster than actually is) Huge number of file attribute reads * Attribute reads highly clustered by directory (ls, make) * Does it matter? - How much memory needed for most to hit in cache? (don't say) If significant, how might you exploit this fact? CFFS -- physical file system embeds What about when designing a network file system protocol? NFS3 READDIRPLUS -- prefetch attributes when listing directory Echo-like leases on attributes probably a good idea File lifetimes--how do you measure? * Delete-based (old paper) * Create based (this work) * Delete-based shows shorter lifetimes. Why the difference? People create more files than they delete (increasing disk space used) What do we learn from Figure 2? - Most Unix blocks last less than one hour - Most NT blocks last less than 1 sec or more than 1 day (recycle bin holds things for a long time) - A single application can have a big effect on shape of curve WEB database and log files, Netscape database files Does this matter? * sec 4.2.3: "file system designers will need to explore alternatives..." Do you agree? * If no fsync, could save some writes on Unix with 1 hour buffer - Figure 3 shows potential write savings - But if disk ever idle, could have written data in idle time - Write-buffer size probably doesn't affect e2e application perormance * Maybe it's important for functionality - Backup/snapshot services? (Don't back up files less than 1 hour old?) * Note that sync/fsync bandwidth small (<10%) - So in general delayed writes (including metadata) are a good idea XFS/LFS Most blocks die due to overwrites rather than deletion * Most overwritten files multiply overwritten * Can you take advantage of predictability? - How would XFS/FFS deal with overwrites Is it truncate and overwrite, or just overwrite? probably truncate and overwrite XFS/FFS free and reallocate blocks - Maybe don't free blocks immediately (allow "pre-allocated" blocks beyond file size) Read cache size * 4.4.1: "For [non-web] workloads, there is little benefit to increasing the cache beyond 16MB." Do you believe this? - Some workloads still decrease significantly from 16MB -> 256MB - Important question is how would it affect end-to-end performance? Cache miss >1,000x more expensive than miss E.g., 96% hit rate could be twice as good as 92% * What are "file read misses" * What does interleaved/non-interleaved comparison tell us - Only 2% difference when multiple clients access FS concurrently - What does this mean for anticipatory scheduling on a file server? Hard to say Read/write domination depends on workload * What does this mean for LFS? Effect of memory mapping files * How do they measure this? - In NT it's hard, don't have enough information - In Unix: Have trace of mmap, fork, exit system calls - Keep track of all files mmapped in one or more files - Don't know page access patterns, but total mmapped bytes very small - Explanation? Just shared libraries Dynamic size information * Most files accessed are small (<16KB) * More accesses to large files * Big files are bigger * They suggest maybe worth redesigning inode structure (XFS seems good) Access patterns * Very few accesses contain both reads and writes * Except in RES, large files mostly accessed randomly - Need heuristic for prefetching * < 20KB files read in their entirety - Could modify file systems to do this always How would this interact with anticipatory scheduling? ordinarily prefetching doesn't kick in immediately This was a big benefit of anticipatory scheduling in web workload * Figure 9: Bimodal access patterns UBM revisited * How does this compare to results UBM paper? - UBM paper had much lower hit rates than, e.g., Figure 5 - Concentrated on disk-bound workloads * In practice, looping far more common than sequential accesses - Marginal gain from keeping SEQ blocks != 0 * Flush buffers from multiply overwritten files that aren't read?