V = This paper should flush out some of the motivation of L3 and microkernels One advantage of microkernels is adaptability to distributed environments What is the goal of V? Make a powerful cluster of machines look like a single system How does this compare to the goals of Plan 9? Plan 9's model had resource-poor workstations and CPU servers Gives users control of workstations while harnessing power of servers V has model of powerful workstations, maybe more uniform global image What are central design tenets of V? High-performance communication is critical The protocols, not the software, define the system Small kernel can provide a *software backplane* Shared state is key problem; exploit problem-oriented shared memory Groups of entities are fundamental, should deal with same as just entities What do the system call and application interfaces look like? Syscalls are basically Send & Receive, similar to L3 Most traditional system abstractions implemented outside the kernel: processes, address spaces, TCP/IP, even scheduler to a large extent Even abstractions within the kernel are accessed with Send No need to add different trap types for different syscalls Extensible can add new services to kernel without new syscall interface (particularly important in distributed setting--don't change backplane) Things like C stdio implemented in libraries that use IPC pp. 316-317 talks about procedure vs. message model. What's this? In message model, handle messages one at a time In procedure model, requests call procedures that can run in threads How do communications work Messages consist of 32 bytes (fixed size) + up to 16K data segment 32 bytes correspond to general-purpose registers, for short IPCs VMTP network protocol, consists of request and response No explicit connection set up / tear down Response acknowledges the request, subsuming flow-control, etc. Supports multicast, datagrams, forwarding, streaming, security, priority (the latter few are not really detailed in the paper) Implemented using a template VMTP header for each process No need to allocate buffers, etc., just put registers in appropriate place Same interface is used for IPC within a single machine -- why? Since syscalls also use same IPC interface, makes them fast This is also the key to making distributed operation transparent basically guarantees anything you can do locally you can do remotely What are process groups? Processes are members of groups; a process can be in multiple groups Examples: all file servers, processes in some distributed job, etc. Multi-destination delivery: can send a message to all members of a group Can get multiple replies, or possibly only one Logical addressing: Send message to file servers Might not know ID of a file server process But all file servers are part of well-known group What is the "co-resident" qualifier in a message -- why do you want this? Specifies a particular process; delivers only to machine with that process Say you want to suspend process P Must send message to process manager on P's machine All process managers are members of the process manager group But don't want to interrupt all machines So send to process manager group co-resident with P How does system use multicast? Naming protocol Clock synchronization Distributing load information for distributed scheduler Used as part of atomic transaction protocol Used for writes in the replicated file update protocol What is UIO? IPC interface for most types of I/O--files, pipes, devices, kernel services Goal: Uniformity of interface across all these forms of I/O UIO based on objects; 4 operations possible on objects: Read, write, query, modify Block-oriented data access Stateful interface (for things like pipes, windows) State provides support for locking, atomic transactions Each UIO object can have up to three classes of functionality Compulsory - operations for a sequential stream (like a pipe) Optional - e.g., for files or disk devices, might have two attributes: RANDOM_ACCESS STORAGE - means successive reads return same data until overwritten Attributes allow you to check something is sane (don't use pipe for VM) Also allow for optimizations Exceptional - control function allows device-specific operations Presumably this is similar to Unix ioctl -- e.g., for ejecting CD, etc. Can you implement disk driver entirely at user-level with IPC interface? Security: Could initiate DMA over any physical memory Example: Mouse -- thread blocks and gets updates from mouse Note: UIO interface implemented by services that don't strictly need it program manager (team server) exports directory via UIO (->plan9 /proc) How are processes managed Process initiation is separate from address space initialization Saves space by keeping only one copy of address-space-specific info Also, only one kernel stack per CPU, not one per process as in Unix Process termination doesn't require garbage collection In traditional system, kernel would close all your file descriptors, etc. With V, responsibility lies with server E.g., file server has garbage collector that periodically checks clients Note: normal user-level exit routine notifies servers to minimize garbage How does scheduling work? Kernel has simple priority-based scheduler External process moves adjusts processes' priorities User-level schedulers can use IPC to coordinate across nodes How does MP scheduling work? One ready queue for each CPU, periodically re-adjust Avoids contention/locking issues Exceptions also handled at user-level by exception server Usually invokes debugger on a process Also has benefit of network transparency; handler could be on other node Process migration - what's this? What is VM interface like? An Address Space consists of ranges of addresses called regions Each region bound to a UIO object Kernel basically functions as a cache for UIO objects How does program loading work? No special mechanism! Just bind program file into address space Note debugger can also talk through IPC to VM system to manipulate memory What about consistency? Implemented w. simple ownership protocol & lock manager at backing server Note that page cache is also buffer cache How does naming work? Three levels of names: character string names In general, want names managed by same servers that manage objects E.g., file server implements directories, gets name server out of loop Also simplifies consistency Detect inconsistency on use of objects (problem-specific characteristic) Makes directory as replicated as file server Adds extensibility, as "foreign" services come with their own directories Have notion of character string names with "mount point" prefixes To find server, multicast QueryName(character name) to name group Programs cache bindings to reduce QueryNames (initialized on startup like Unix environment variables) Say you multicast QueryName and don't get response? Bad name? or net error? Use global directory mechanism to track which names are actually managed object identifiers (manager-id, local-object-id) manager-id says where to send a message local-object-id just interpreted by manager for the object entity identifiers: process / process group / communication endpoint Host-address independent - because processes can move around Kernel caches (entity id -> host addrs) uses multicast to find entity ids it doesn't know about Hash group id down to multicast group