Exokernel ========= Many papers published about new OS ideas Fancy schedulers Better thread systems Better I/O prefetching and caching Useful VM primitives & implementations But many of these ideas don't have impact. Why? hard to modify OSes Many OS abstractions come with trade-offs: E.g., speed vs. generality No single choice is optimal for all applications What is the Exokernel architecture? Basic idea: Separate protection from management of resources Why? (...or, as stated in previous paper: "Exterminate all OS abstractions") Applications may know better how to manage resources In fact, applications may benefit from knowing what resources they have Paging, buffering often *hide* information from apps What this means in practice: Expose allocation Expose names Expose revocation What is the end-to-end argument? How does this apply to Exokernel? Main approach is based on three things, shown in Fig 3. What are these? Secure bindings - decouple authorization from use of a resource Hardware mechanisms Software caching Downloading application code Visible resource revocation--what is this and why? Ask the application to chose which resource to give up E.g., when looping through a file, want to evict MRU, not LRU Abort protocol--what's this When your more friendly revocation upcall takes too long Load "repossession vector" with resources for drastic situations (e.g., disk blocks where pages can be written) How is physical memory multiplexed in Aegis? Go over what MIPS VM looks like: Hardware has 64-entry TLB References to addresses not in TLB trap to kernel Each TLB entry has the following fields: Virtual page, Pid, Page frame, NC, D, V, Global Kernel itself unpaged All of physical memory contiguously mapped in high VM Kernel uses these pseudo-physical addresses User TLB fault hander very efficient Two hardware registers reserved for it utlb miss handler can itself fault--allow paged page tables What does Aegis's interface do? ... How is performance of VM? Look at Table 10 Why are prot100 and unprot100 slower on Aegis? Two data structures may have to be updated, page table & STLB Maybe also "immaturity of the implementation" More generally, what does the environment abstraction consist of in Aegis? Exception context For each exception (e.g., page fault), entry point & where to save regs Interrupt context Same for intrerupts (e.g., when time quantum has expired) Protected entry context Entry point for IP from other processes Addressing context Small number of pages guaranteed to be mapped How is processor multiplexed? Round robin; allocate slots How did the stride scheduler implementation work? (sec 7.3) How is the network multiplexed? DPF - dynamic packet filter Is regular scheduler good enough to process received packets No, high-latency. Why is this important for protocols like TCP? ASHes. What is motivation for ASHes? Direct, dynamic message vectoring No need to copy to intermediary kernel buffers Dynamic integrated layer processing -- e.g., checksum while copying Message initiation -- e.g., send TCP ack immediately Control initiation -- e.g., create/activate thread, acquire lock, etc. Does this work? Look at figure 2 Good. Other benefits besides latency: don't have to pre-specify buffers, etc. How does IPC work? Built on protected entry context of environment What's going on in Table 8? pipe - passes word in circular shared-memory buffer pipe' - same as above, but inline calls to read and write (only on Aegis) shm - relies on directed yield to bump counter in shared memory lrpc - do RPC to increment counter What is the plan for revoking resources? Expose information so that application can do the right thing. Ask applications politely to release resources of a given type. Ask applications with force to release resources Let's consider how we might do things on x86: How to implement a minimal protected control transfer? Set up a specific handler to be called when an environment wants to call this environment. How does this impact scheduling of environments? How to dispatch exceptions (e.g., page fault) to user space? How would you implement fork? Copy on write? What are the examples of extensibility in this paper? RPC system in which server saves and restores registers (Table 12) Different page table, and stride scheduler How would you do buffer cache? How you do sleep/wakeup on various events? How would you do file system? Some cool exokernel hacks from later on: Fast, simple binary emulation of other OSes Emulator runs in same address space as process System call ints vectored back to user space Therefore, emulation can actually be faster than original OS (e.g., getpid) In general, emulator can avoid expensive checks, because trusts app XCP - highly optimized file copy All writes can be delayed until you attach file tree at end of copy Cheetah - highly optimized web server Special files system keeps TCP checksums in with data File system co-locates files like images based on HTML grouping Combine the disk buffer cache with the TCP retransmission cache Never need to stream data through processor cache (true 0-copy) Never need more than one copy of data in memory Lessons learned in retrospect several years later: Exposing kernel data structures is a big win (e.g., for wake predicates) Exokernel interface design is hard Even before exokernel, things like scheduler activations not obvious DPF, buf cache, XN, wake predicates, all non-trivial Information loss can put libOSes at a disadvantage E.g., UNIX can implement LRU paging across applications Solution: Exokernel can keep statistics, but leave interpretation to apps Provide space for application data in kernel structures Fast applications don't require good microbenchmark numbers Cheap critical sections useful User-level page tables were very hard E.g., when an ASH accesses VM, might need app-level fault handler Even w. kernel page tables, self-paging is complicated ASHes might not have been necessary Yes, upcalls are expensive, but maybe not that expensive Downloaded code is powerful But not no much because of performance reasons, like fewer upcalls Rather, because you can control and reason about the execution Check packet filters for conflicts, merge packet filters XN (file system) needs to know code is deterministic