Easy question to start: title? When was it written? 1 year ago? 10? 20? 1991 ASPLOS Why does it matter when it was written? What kind of people wrote it -- CPU or O/S designers? O/S. They have much more to say about what CPU screws up than O/S. What's the main *point* of this paper? What are the authors trying to convince us of? O/S performance has improved less than app perf w/ faster CPUs They explain why They suggest what to do about it What are the main reasons they claim? New RISC machines don't support microkernels well. CPU designs that improve apps irrelevant to kernel speed. Because kernels are different... We're going to want to know how. Quick microkernel overview tiny kernel, most stuff in servers many more system calls some tricks harder to play, since most o/s code isn't privileged e.g. shared address space What kind of evidence or reasoning could we expect at this point? 1. year-to-year performance of apps, o/s What does that tell us? Just that there's a problem, not why. 2. detailed CPU time breakdowns for operations risc vs cisc monolithic vs micro-kernel to help understand why How are we going to decide if this is an important problem? Suppose system calls are getting relatively slower Is that actually a big deal? 3. So we're looking for big-picture evaluation as well. They do cite "lots of time in O/S" studies. What kinds of solution might we look for? Fix O/S to work better with RISC. Fix RISC to work better with O/S. Let's make two tables: Problems with CPUs. large register sets (sparc windows) deep pipelines (88000 o/s must save 30 regs of pipeline state) no h/w vectoring (in MIPS) limited write buffers (R2000, fixed in R3000) cpu speed vs memory speed i860 page fault handler must interpret to find faulting addr caches that have to be flushed during address space switches Problems with O/S. none mentioned? Now let's look at the evidence they present. What does Table 1 tell us? Where do the the "Time" numbers come from? Where do the "Relative Speed" numbers come from? In an ideal world, what would Table 1 look like? How we can assign blame based on Table 1? Are numbers fundamental to h/w? Or is the point that we could optimize s/w? (remember, they tuned the s/w, they think it's the best possible) Do they explain *why* Table 1 looks the way it does? Or what to do about it? In succeeding sections... Let's focus on 2.2: Local communication 2.3 and 2.4: System calls (this is where the real meat is) What are the steps required to send msg from P1 to P2? (Table 4...) P1 makes system call kernel copies data from P1? P1 sleeps in kernel kernel switches to (waiting?) P2 kernel half kernel copies data to P2? return from P2 system call into P2 We're looking for Ways in which RISC supports this less well than CISC Ways in which microkernel implements this less well than monolithic What are the problems they mention? (2.3, table 5, 2.4) large register sets (sparc windows) deep pipelines (88000 o/s must save 30 regs of pipeline state) no h/w vectoring (in MIPS) limited write buffers (R2000, fixed in R3000) cpu speed vs memory speed caches that have to be flushed during address space switches How do they establish that the mentioned problems are actually responsible? For the most part they do not! Would this have been straightforward? Section 5: proof that all this matters? Doesn't matter if traps are slow if they are rare. Section 5 a little weak. Counts how frequent traps &c are. Extrapolates w/ Mach 2.5 -> Mach 3.0. Lame. Table 7 shows that maybe 20% of total CPU time in o/s primitives. Is this a lot? Maybe not. They claim: O/S using traps &c more (microkernels). CPUs making traps &c relatively more expensive. Can't continue both trends indefinitely. That was 1991. Which trend won? What did we learn from this paper? Lots of performance details. Choice of O/S and CPU abstractions matters. System-level view: combined CPU-O/S-application behavior Was it a good paper? Clearly written? Clear statement of goals/problem/method/ideas? Does performance matter? When does performance matter? Google runs service on 10,000 PCs...