Scheduler Activations ===================== What is the goal of this work? Functionality of kernel threads with performance and flexibility of user-level threads. What's wrong with user-level threads? A blocking system call blocks all threads. What about epoll/select? Doesn't work for all syscalls. A page fault blocks all threads Hard to run as many threads as CPUs. Why? Don't know how many CPUs Don't know when a thread blocks Deadlock? (p. 59 bottom) Example: One uthread holds file lock Another uthread tries to acquire lock Second uthread blocks kthread--if only 1 kthread, bad! What about kernel threads? Handle blocking syscalls/page faults well Adds many user/kernel crossings--expensive Thread switch, create, exit, lock, signal, wait, ... On 2.4GHz Athlon 3400+: getpid: 359 cycles, fn call: 6 cycles Typically 10x-30x slower than user threads User-level threads multiplexed on kernel threads? Different apps have different needs (thread priorities, etc.) Kernel doesn't know best thread to run Kernel doesn't know about user-level locks, priority inversion (preempt while in critical section) too much info changing too quickly to notify kernel Hard to keep same number of kthreads as CPUs Neither Kernel nor user knows how many runnable threads User doesn't even know number of CPUs available Can even have deadlock as previously discussed How do scheduler activations address the problem? Let user program schedule threads (most thread ops just a function call) Run same number of threads as you have CPUs (know exactly which threads you can run and which are in blocking syscalls or page faults) Minimize number of user/kernel crossings What is a scheduler activation? Virtual CPUs Always begin execution in user scheduler User scheduler keeps activation to run a thread Preempted by kernel, but never directly resumed How many scheduler activations does a process need? One for each CPU One for each blocked thread. Why? Kernel might need its stack when blocking op completes When must kernel call into user-space New processor available Processor had been preempted Thread has bocked Thread has unblocked When must user call into kernel? Need more CPUs CPU is idle Preempt thread another CPU (for higher priority thread) Return unused scheduler activation for recycling (after user thread system has extracted necessary state) How does this compare to # of u/k crossings with user-level and kernel-level threads packages? What happens during preemption (in detail)? To revoke 1 CPU, kernel will actually preempt 2 threads Preempt thread on CPU being reclaimed Preempt second thread to notify proc. that first thread preempted User-level thread scheduler then resumes one of preempted threads What if third CPU is running lower-priority thread? Example: Thread 1 running on CPU A, 2 on B, 3 on C Thread 3 is lower priority than the other two Kernel preempts 1 (revoking CPU A), notifies with upcall on B User thread scheduler will ask kernel to preempt CPU C How does kernel notify process when taking away last CPU? Delays until process rescheduled Why not just resume the last preempted scheduler activation? Might be on different CPU, messing up cache affinity Application might want to service high-priority timeouts Point is: Give user-level thread system all the information What if a preempted scheduler activiation is in a critical section? Does this matter? Might kill performance if holding spinlock Could hold lock on ready list -> deadlock How to deal with this? Detect thread in critical section Finish critical section, then return to scheduler. How? record addresses of critical sections generate copy of code that returns to scheduler resume in copy of critical section What if critical section calls another function then must bracket call with set/clear flag to detect critical section What if critical thread blocked in page fault? Performance might be suboptimal, but at least correct What if scheduler activation entry point causes page fault? Create infinite # of scheduler activations? Kernel checks for this special case and resumes activation What abstractions besides threads might you build on scheduler activations? Can construct non-blocking I/O API Evaluation of paper Statement of claims? Evaluation of functionality? Performance evaluation? How is the kernel overhead of an upcall (Section 5.2)? 2.4 milliseconds -- much worse than Topaz 441 usec signal-wait Worse even than 1.840 milliseconds for Unix procs Why do we care? If it's slower, then some applications will be slower, particularly on a uniprocessor, even if common case is faster Also adds latency when preempted thread holds some critical resource could block other threads Why is it slower? Implementation based on Topaz saves more state than necessary could have done better by designing from scratch Modula 2+ implementation compared to optimized assembly for Topaz Fig 2, why does SA do better than orig uthreads? Sometimes system daemons use the CPU, causes more preemptions in orig With SA, user process knows how many CPUs it has Also, affinity scheduling might help