=======================
Anticipatory Scheduling
=======================

Consider two programs, p & q
  p makes random, synchronous writes to a 10MB file, takes 5 seconds to run
  q makes random, synchronous writes to different 10MB file, takes 5 seconds
  How long does it take to run p then q?  10 sec
  How long does it take to run p & q simultaneously?
    Longer than 10 seconds.  Requests alternate between processes
    Longer seeks -> lower disk throughput
  What happens if you run three programs, p, q, & r?
    Depends on scheduling algorithm
      CSCAN - schedules processes in round-robin order
      SPTF - picks two processes and alternates between them?

What is a proportional-share scheduler?
  Virtual clock, ticks more slowly for high priority processes
    Scheduler attempts to keep processes' virtual clocks in sync
  Say p & q have priorities assigned in ratio 1:2?
    Will q make significantly faster progress than p?
      No, as disk requests will alternate between the two processes

Problem:  work-conserving scheduler.  What is a work-conserving disk scheduler?
  Always keeps disk busy if there are pending requests

How to do better?  Basic idea of anticip. sched. (AS):  p. 4
  Choose request to schedule as usual
  Possibly delay next request, if process that issued previous
        request may issue another soon

What assumptions does does paper make
  E.g., would it work if random processes accessed random files?
  No, assume:
    1. Dependent disk requests come from same process
    2. Most of a process's requests have similar spacial/temporal locality
  Why 1?  Why 2?

What kind of workloads benefit from anticip. sched. (AS)?
  Two processes each sequentially reading data from a large file?
    - AS won't have much benefit beyond readahead
  Two processes each randomly reading from a large file?
    - Yes!  Blocks within one file probably closer than across files
  Two processes each randomly writing to a large file?
    - No, buffering will coalesce writes anyway
  Untar a directory
    - Yes, many synchronous writes to same directory/cylinder group

SPTF:  When should you delay a disk request?
  When benefit is greater than cost!  Estimate (equations p. 5)
    benefit = (positioning time of next req
               - positioning time of request you may get by waiting)
    cost = max (0, expected median "think time" - elapsed time)
  How to estimate think time? (p. 5)
    Keep 30 per-process "buckets" representing 500usec intervals
    Decay buckets to 90% of value at each new request by process
  How do you predict seek time of next req?
    p6/sec 3.6 You estimate to 75%--but that's good enough.  Why?
  How do you predict seek time of future request you may get by waiting?
    Weighted average from process (measured on request completion)
      decay to forget 95% of old value after 10 requests
  How long should you wait for next request?
    (Expected 95th percentile thinktime - elapsed)

Aged SPTF
  Like SPTF, except don't wait if another request has aged sufficiently
CSCAN
  Additionally maintain expected direction of next seek
  If next seek 80% likely to go backwards, don't wait
  How well does this work?
    For random reads, not very well

Proportional share schedulers
  Delay request if
    (a) last process has no pending request, and
    (b) expected think time is less than 3ms, and
    (c) last process has lower virtual clock than minclock
        (minclock = lowest clock of any proc. with pending req.)
  Limitations:
    Never waits for processes with lower share (partial seek reduction)
  Solution: combine proportional-share and SPTF anticipation heuristics
    - Wait if either proportional share or SPTF heuristic would wait
    - Use "relaxation threshold T", to change:
       (c) last proc. has lower clock than minclock + T
    - Compute wait time only until point where clock >= minclock + T

Performance
  - Figure 4:
      Is less disk utilization less good?
        No, because shorter seeks give more throughput for less utilization
      Why the distinction between read and mmap?
        OS does not prefetch disk blocks brought in with mmap
      Why does AS do better even on reads where prefetch is enabled?
        OS splits large files over "cylinder groups", and won't
                prefetch accross groups (could possibly be fixed)
  - Explain figure 5 (seq/mmap with varying think times for both procs)
  - Explain figure 6 (seq/mmap with varying think time for one proc)
      Why does "original" throughput go up with increased thinktime?
        Fast process can issue two disk requests while slow computes
      Why is there a dip in AS performance?
        Matches disk idleness in right graph--have to wait for computation
        Once think time is very large, stops waiting
      Is AS behavior really what you want here?
        Depends--High throughput, but very unfair scheduling
        How might you tweak AS to be more fair?
  - Why does compile go slower in Figure 8?
      CPU processing of interrupts for timers and heuristic computation
  - How well does apache web server fit AS assumptions?
      Files accessed by a web server don't have locality,
      but, readahead in medium files does not kick in soon enough!
      This could be fixed without AS.  How?

How is evaluation?
  Would you use this?
  Any hidden catches?  (workload where you might do very poorly)
    Fig. 7 shows application that purposefully exceeds max wait time
      A.S. throughput is not too bad
    Note handling of large requests which get broken up
  Is Fig. 7 really worst case scenario?
    Note coast is based on median, not mean think time
    So worst case would be to create low median, very high 95%ile think time

Will anticipatory scheduling continue to be useful in the future?
  + Seek time improving more slowly than other system aspects
         -> deceptive idleness a bigger problem
  + Faster CPU -> faster think time
  - Bigger track buffers/readahead -> data may be getting cached anyway