Nooks ===== What is the problem this paper is addressing? Why drivers? Does this seem like a viable approach? How does it compare to last week's paper on static analysis? What about software fault isolation? Virtual machines? challenge is virtualizing kernel/driver interface, not hardware How might you combine Nooks with previous approaches? Nooks is trying to find a new design point between unprotected and safe: - fault resistance, not fault isolation - design for mistakes, not abuse What are the benefits and drawbacks of this approach? + works with today's code + performance impact possibly less than, say, full microkernel - not 100% effective - only applies to extensions, not core kernel (unlike, say, Metal) What are three goals of the system: Isolation - don't let fault in one extension infect rest of system Recovery - support automatic recovery after a function Backwards compatibility - e.g., work with Linux How does vanilla Linux deal with bugs/assertion failures in kernel Make a distinction between running in process vs. interrupt context Process context - kill process Note this isn't quite "fair", because bug is kernel bug, not user bug But allows some degree of recoverability Interrupt context - crash & reboot machine. Why? How does Nooks achieve Isolation? Use paging hardware to protect kernel & extensions against bad extensions See Fig 3: Kernel can write everything, extensions only write themselves Use Extension Procedure Call (XPC) What do you have to do to call into an extension? (Fig. 4) Copy any argument data structures to where extension can write them Might need to follow/adjust any pointers in data structures Adjust stack pointer Load %cr3 with address space of extension === run extension Switch %cr3 and stack back Copy results back; synchronize any modified structures What about modifications to non-argument kernel data structures Fortunately, often done through macros and inline functions Can change these into XPCs Where do page tables come from when loading %cr3? Nooks has to maintain a set of "shadow" page tables Just change code where linux touches page tables Have to modify page fault handler... how? Task structure on kernel stack? Could you optimize this process on the x86? If extensions are in different 4 MB regions... maybe re-use page tables (Just clear PTE_W in page directory entry) Or at least do this for some regions (might not work for buffer cache) Also, maybe targeted TLB flush in stead of %cr3 load? What is deferred XPC mechanism? Where/why does this come in? What are wrappers? How do they work? Three purposes: Check parameters for validity Implement call-by-value-result? (What's this vs. call by reference?) Perform XPC Basically works through linker Who writes a wrapper? Tool auto-generates skeleton from header Fill in by hand Need to know properties (Perhaps extractable by Metal) How specific to each extension is the wrapper? See Fig. 5 What is Object Tracking and why? Records address/type of all objects in use by an extension If used for call, just attach to stack If held, keep in per-extension hash table If ext. might write object, keep association between kern & ext. versions How do you know lifetime of objects? By hand inspection - determine type of object passed in for call, allocated/deallocated by ext., special (timer), ... Do you always copy objects? No... more efficient just to re-map network & disk buffers How do you detect a fault? Easy cases... page fault or other exception in extension What about harder cases... e.g., no network packets received User can detect and initiate recovery How do you recover from a fault? - Disable any interrupts vectored to the extension, if driver (what if you didn't do this... could get livelock or worse) - Invoke user-mode recovery agent Perform extension-specific recovery, notify sysadmin, Change configuration, disable after repeated failures, ... By default, unloads and re-loads module What's this about interruptable vs. non-interruptible state? What about allocate memory? (This is why we need object tracking) What about thinks like network buffers w. pending DMA? Only free buffers after re-loading driver after it has re-initialized the device How could an extension bypass Nooks to corrupt system? Set %esp to something bad and take an exception (what happens on x86) DMA to physical memory you can't write move something to %cr3 disable interrupts and loop forever logic bugs that don't involve trashing memory How do you evaluate something like this? Care about whether it improves Reliability, and cost in Performance How to measure reliability? Is this realistic? How do results look? Too optimistic or pessimistic? Performance... Let's look at table 4: Play-mp3 looks good Why does send-stream have more XPCs than receive stream? Why does this not matter for performance? Why does compile take bigger hit than send-stream (which has more XPCs)? Compile is CPU-bound How did they produce graph in Figure 8? What is statistical profiling? What does this tell us? Why don't they show user-mode execution time? Where is CPU time going? Extra code -- e.g., XPC, object tracking Existing code running more slowly? Why? TLB misses; What are "Pentium 4 performance counters"? Why does khttpd do so much worse under Nooks? (60% worse, ouch) CPU problem, like compile Also, transactional, not buffered... how does this affect things? Do we care? khttpd does sound like a bogus project Maybe use exokernel/cheetah on dedicated hardware if you care so much... Would same ideas apply to other OSes? Authors claim Linux is worst case scenario? Why? Do we believe this? In terms of lots of ill-defined extension interfaces, probably true That linux doesn't reboot on process-context panic might help, though Could Nooks be applied to the JOS kernel?