======================= Anticipatory Scheduling ======================= Consider two programs, p & q p makes random, synchronous writes to a 10MB file, takes 5 seconds to run q makes random, synchronous writes to different 10MB file, takes 5 seconds How long does it take to run p then q? 10 sec How long does it take to run p & q simultaneously? Longer than 10 seconds. Requests alternate between processes Longer seeks -> lower disk throughput What happens if you run three programs, p, q, & r? Depends on scheduling algorithm CSCAN - schedules processes in round-robin order SPTF - picks two processes and alternates between them? What is a proportional-share scheduler? Virtual clock, ticks more slowly for high priority processes Scheduler attempts to keep processes' virtual clocks in sync Say p & q have priorities assigned in ratio 1:2? Will q make significantly faster progress than p? No, as disk requests will alternate between the two processes Problem: work-conserving scheduler. What is a work-conserving disk scheduler? Always keeps disk busy if there are pending requests How to do better? Basic idea of anticip. sched. (AS): p. 4 Choose request to schedule as usual Possibly delay next request, if process that issued previous request may issue another soon What assumptions does does paper make E.g., would it work if random processes accessed random files? No, assume: 1. Dependent disk requests come from same process 2. Most of a process's requests have similar spacial/temporal locality Why 1? Why 2? What kind of workloads benefit from anticip. sched. (AS)? Two processes each sequentially reading data from a large file? - AS won't have much benefit beyond readahead Two processes each randomly reading from a large file? - Yes! Blocks within one file probably closer than across files Two processes each randomly writing to a large file? - No, buffering will coalesce writes anyway Untar a directory - Yes, many synchronous writes to same directory/cylinder group SPTF: When should you delay a disk request? When benefit is greater than cost! Estimate (equations p. 5) benefit = (positioning time of next req - positioning time of request you may get by waiting) cost = max (0, expected median "think time" - elapsed time) How to estimate think time? (p. 5) Keep 30 per-process "buckets" representing 500usec intervals Decay buckets to 90% of value at each new request by process How do you predict seek time of next req? p6/sec 3.6 You estimate to 75%--but that's good enough. Why? How do you predict seek time of future request you may get by waiting? Weighted average from process (measured on request completion) decay to forget 95% of old value after 10 requests How long should you wait for next request? (Expected 95th percentile thinktime - elapsed) Aged SPTF Like SPTF, except don't wait if another request has aged sufficiently CSCAN Additionally maintain expected direction of next seek If next seek 80% likely to go backwards, don't wait How well does this work? For random reads, not very well Proportional share schedulers Delay request if (a) last process has no pending request, and (b) expected think time is less than 3ms, and (c) last process has lower virtual clock than minclock (minclock = lowest clock of any proc. with pending req.) Limitations: Never waits for processes with lower share (partial seek reduction) Solution: combine proportional-share and SPTF anticipation heuristics - Wait if either proportional share or SPTF heuristic would wait - Use "relaxation threshold T", to change: (c) last proc. has lower clock than minclock + T - Compute wait time only until point where clock >= minclock + T Performance - Figure 4: Is less disk utilization less good? No, because shorter seeks give more throughput for less utilization Why the distinction between read and mmap? OS does not prefetch disk blocks brought in with mmap Why does AS do better even on reads where prefetch is enabled? OS splits large files over "cylinder groups", and won't prefetch accross groups (could possibly be fixed) - Explain figure 5 (seq/mmap with varying think times for both procs) - Explain figure 6 (seq/mmap with varying think time for one proc) Why does "original" throughput go up with increased thinktime? Fast process can issue two disk requests while slow computes Why is there a dip in AS performance? Matches disk idleness in right graph--have to wait for computation Once think time is very large, stops waiting Is AS behavior really what you want here? Depends--High throughput, but very unfair scheduling How might you tweak AS to be more fair? - Why does compile go slower in Figure 8? CPU processing of interrupts for timers and heuristic computation - How well does apache web server fit AS assumptions? Files accessed by a web server don't have locality, but, readahead in medium files does not kick in soon enough! This could be fixed without AS. How? How is evaluation? Would you use this? Any hidden catches? (workload where you might do very poorly) Fig. 7 shows application that purposefully exceeds max wait time A.S. throughput is not too bad Note handling of large requests which get broken up Is Fig. 7 really worst case scenario? Note coast is based on median, not mean think time So worst case would be to create low median, very high 95%ile think time Will anticipatory scheduling continue to be useful in the future? + Seek time improving more slowly than other system aspects -> deceptive idleness a bigger problem + Faster CPU -> faster think time - Bigger track buffers/readahead -> data may be getting cached anyway