Quiz correction: question 6D could be true Vote on extra lecture: Synthesis vs. Singularity Events, handlers, and guards in SPIN (didn't get to last time) Exokernel ========= Many papers published about new OS ideas Fancy schedulers Better thread systems Better I/O prefetching and caching Useful VM primitives & implementations But many of these ideas don't have impact. Why? hard to modify OSes Many OS abstractions come with trade-offs: E.g., speed vs. generality No single choice is optimal for all applications Examples (should be familiar by now)? Databases & garbage collectors interact badly with LRU paging Hence, third in series of papers on extensible OSes L3 - make microkernels viable with fast IPC SPIN - put extensions in kernel, avoiding IPC to servers Exokernel - put extensions in *application/library*, avoiding servers What is the Exokernel architecture? Basic idea: Separate protection from management of resources Why? (...or, as stated in previous paper: "Exterminate all OS abstractions") Applications may know better how to manage resources In fact, applications may benefit from knowing what resources they have Paging, buffering often *hide* information from apps What this means in practice: Expose allocation Expose names Expose revocation What is the end-to-end argument? How does this apply to Exokernel? Main approach is based on three techniques (p. 1) What are these? Secure bindings - decouple authorization from use of a resource Hardware mechanisms Software caching Downloading application code Visible resource revocation--what is this and why? Ask the application to chose which resource to give up E.g., when looping through a file, want to evict MRU, not LRU Abort protocol--what's this When your more friendly revocation upcall takes too long Load "repossession vector" with resources for drastic situations (e.g., disk blocks where pages can be written) What does the environment abstraction consist of in Aegis (p. 8)? Exception context (think sys_env_set_pgfault_upcall in JOS) For each exception (e.g., page fault), entry point & where to save regs Interrupt context Same for intrerupts (e.g., when time quantum has expired) Note also has "interrupt enable" flag, to disable interrupts In JOS, exception state is saved on stack, so can fault recursively Aegis stored stuff at fixed addresses in lower 64K So potentially nowhere to store stuff in recursive interrupt Protected entry context Entry point for IPC from other processes Addressing context Small number of guaranteed page mappings (so your user-level TLB fault handler is always present) How is physical memory multiplexed in Aegis? Go over what MIPS VM looks like: Hardware has 64-entry TLB References to addresses not in TLB trap to kernel Each TLB entry has the following fields: Virtual page, Pid, Page frame, NC, D, V, Global Kernel itself unpaged All of physical memory contiguously mapped in high VM Kernel uses these pseudo-physical addresses User TLB fault hander very efficient Two hardware registers reserved for it utlb miss handler can itself fault--allow paged page tables How does Aegis's VM interface work on page faults (see p. 9)? Application VM divided into two segments (why?) Segment 1 normal, Segment 2 may contain guaranteed mappings Why? To make take fast path for common case of segment 1 How is performance of VM? Look at Table 10 Why are prot100 and unprot100 slower on Aegis? Two data structures may have to be updated, page table & STLB Maybe also "immaturity of the implementation" How is processor multiplexed? Round robin; allocate slots How did the stride scheduler implementation work? (sec 7.3, p. 11) How is the network multiplexed? DPF - dynamic packet filter Is regular scheduler good enough to process received packets No, high-latency. Why is this important for protocols like TCP? ASHes. What is motivation for ASHes? Direct, dynamic message vectoring No need to copy to intermediary kernel buffers Dynamic integrated layer processing (ILP) -- e.g., checksum while copying Used a pseudo-assembly language to specify this In setting ash you would specify regular code + some ILP code Message initiation -- e.g., send TCP ack immediately Control initiation -- e.g., create/activate thread, acquire lock, etc. Does this work? Look at figure 2 Good. Other benefits besides latency: don't have to pre-specify buffers, etc. Protected control transfer (sec 5.5, p.9) Synchronous vs. asynchronous (rescheduled next time quantum) Note: No access control--how would you implement in library? Not necessarily obvious without some notion of identity... How is performance? Table 6 looks good. Is this meaningful? Scaling L3 is a little bogus IPC stresses aspects of CPUs that don't improve with MIPS How does IPC work? Built on protected entry context of environment What's going on in Table 8? pipe - passes word in circular shared-memory buffer pipe' - same as above, but inline calls to read and write (only on Aegis) shm - relies on directed yield to bump counter in shared memory lrpc - do RPC to increment counter What is the plan for revoking resources? Expose information so that application can do the right thing. Ask applications politely to release resources of a given type. Ask applications with force to release resources What are the examples of extensibility in this paper? RPC system in which server saves and restores registers (Table 12) Different page table, and stride scheduler How would you do buffer cache? How you do sleep/wakeup on various events? How would you do file system? Some cool exokernel hacks from later on: Fast, simple binary emulation of other OSes Emulator runs in same address space as process System call ints vectored back to user space Therefore, emulation can actually be faster than original OS (e.g., getpid) In general, emulator can avoid expensive checks, because trusts app XCP - highly optimized file copy All writes can be delayed until you attach file tree at end of copy Cheetah - highly optimized web server Special files system keeps TCP checksums in with data File system co-locates files like images based on HTML grouping Combine the disk buffer cache with the TCP retransmission cache Never need to stream data through processor cache (true 0-copy) Never need more than one copy of data in memory Lessons learned in retrospect several years later: Exposing kernel data structures is a big win (e.g., for wake predicates) Exokernel interface design is hard Even before exokernel, things like scheduler activations not obvious DPF, buf cache, XN, wake predicates, all non-trivial Information loss can put libOSes at a disadvantage E.g., UNIX can implement LRU paging across applications Solution: Exokernel can keep statistics, but leave interpretation to apps Provide space for application data in kernel structures Fast applications don't require good microbenchmark numbers Cheap critical sections useful User-level page tables were very hard E.g., when an ASH accesses VM, might need app-level fault handler Even w. kernel page tables, self-paging is complicated ASHes might not have been necessary Yes, upcalls are expensive, but maybe not that expensive Downloaded code is powerful But not so much because of performance reasons, like fewer upcalls Rather, because you can control and reason about the execution Check packet filters for conflicts, merge packet filters XN (file system) needs to know code is deterministic