Stride Scheduling ================= What's wrong with BSD scheduler? Hard to have isolation / prevent interference Priorities are absolute Can't transfer priority (e.g., to server on RPC) No flexible control E.g., In monte carlo simulations, error is 1/sqrt(N) after N trials Want to get quick estimate from new computation Leave a bunch running for a while to get more accurate results Multimedia applications Often fall back to degraded quality levels depending on resources Want to control quality of different streams What's wrong with lottery scheduling? Randomization: Expected Relative and absolute O(sqrt(n_a)) for n_a allocations What are relative & absolute error. How do you calculate these? E.g., process A should have had 1/3 of CPU yet after 1 minute has had only 19 seconds Absolute error = sum of all such errors Relative error = difference between a pair of processes Why does lottery scheduling have O(sqrt(n_a)) error? Probability of getting k out of n quanta is binomial distribution: (n choose k) * p^k * (1-p)^{n-k} [n choose k = n!/(k!(n-k)!)] For large n, binomial distribution approximately normal Variance for a single allocation: p(1-p)^2 + (1-p)p^2 = p(1-p)(1-p+p) = p(1-p) Variance for n allocations = np(1-p), stddev O(sqrt(n)) What is stride scheduling? Idea: Every process has: tickets = priority assigned by admimistrator stride = stride1 / tickets pass = roughly how much CPU time used Schedule process c with lowest stride, then set c->pass += c->stride Example: Look at figure 2 Benefits: Good control over resource allocation Can transfer tickets on an IPC Currency, inflation - nice model for divvying up resources amongst users letting users then sub-allocate what resources they have What happens if you give tickets to a process or remove them? Can you just recompute the stride and be done with it? No. E.g., if low priority process gets more tickets, still long latency Bad for IPC, since ticket transfer won't help you much Idea: Remaining fraction of your pass should be at new ticket rate c->tickets = new_tickets; c->stride = stride1 / c->tickets remain = c->pass - global_pass remain *= new_stride / old_stride c->pass = global_pass + remain What if you go to sleep and don't use your full quantum? If you use fraction f of time quantum, advance pass by f * stride To go to sleep, why not just set your # tickets to zero? Sort of what you want to do, except would require division by zero Does it work? What are we seeing in Figure 9? Seems like lottery scheduling drifts from allocations Curves look O(sqrt(n_a)), so this is as expected Why do we care? Is this useful? What questions should we be asking when we evaluate stride scheduling? How will it impact applications we actually care about? In particular: 1. Overall system throughput 2. Response time for applications that need it 3. Fairness / adherence to prescribed policy How does the system stack up? 1. Maybe 0.2% slower than Linux... but both schedulers suboptimal (So probably we don't care here, if no one bothered to optimize) 2. What evidence do we have that response time is good? Figures 11 and 12 -- looks better than Lottery scheduling How would BSD scheduler have done? Better, but of course allocations would have been 1:1 Are we interested in these workloads? If I'm compiling tcpproxy, will my emacs respond quickly? Hard to say based purely on data in the paper 3. How about adherence to policy? Looks pretty good, but only on the workloads they show What happens if 100 processes have ticket allocations 100:1:1:...:1 High priority process P0 runs for 100 quanta, then each low priority process gets one quantum, then repeat Might alternatively have scheduled P0, P1, P0, P2, P0, P3, P0, P4, ... Is basic 100q/1q/1q/... good or bad? (take a vote) Good: If CPU bound workload, fewer contect switches => fewer cache misses, TBL misses, page faults, etc. Bad: If you care about response time, P0 has to wait 100 quanta Do we care about response time? Depends. When do we care? Interactive work loads--100 quanta too long to wait for keys to echo Network protocols--want to overlap latencies Web server might not need a lot of CPU time, but we want to issue disk requests as soon as we get network requests Multimedia applications--e.g., keep constant video frame rate So how to give high-priority processes a low response time Hierarchical stride scheduling [picture of tree, see how quanta get allocated] So now absolute error is O(log n_c), not O(n_c) What are we seeing in Figure 13? 8 processes, ticket ratios 7:1:1:1:1:1:1:1 How often do low priority processes get schedules (every 14 quanta) Graph shows waiting time for high-priority process Hierarchical is better, as we would expect How would Lottery scheduling look on this experiment? Low-priority processes would have more variance High-priority process probably much better than non-hierarchical stride Can this be extended to other resources? Yes, they discuss application to networking What about VM? as useful? (Every page equivalent, not every quantum) Consider the following experiment: Process A reads random blocks from file F_A (that doesn't fit in cache) Process B reads random blocks from file F_B (that doesn't fit in cache) Running A or B requires 10 seconds. Running A then B requires 20 seconds. How long to run A & B concurrently? Probably much longer than 20 seconds File system groups a file's blocks together. Two files => longer seeks What if A has 10 tickets and B has 1 ticket, what are relative allocations? Probably 1:1, NOT 10:1 Processes are I/O bound, not CPU bound read, sleep, switch, read, sleep, switch, ... What is desired behavior. Any way to make this work better? Discussion: Is stride scheduling a good idea?