Eliminating Receive Livelock in an Interrupt-Driven Kernel ========================================================== Why are we reading this paper? We already talked about scheduling last lecture... Don't the algorithms we looked at solve the problem? We only looked at scheduling different processes/top-half kernel threads Scheduling device interrupts is a different story. What functionality does a network interface offer to driver? - Read packets - Poke hardware to send packets - Interrupts when packet received/transmit complete - Buffer many input packets What is the point of interrupts? What is the problem the paper addresses? Receive livelock: Example: IP forwarding Example: Run tcpdump to see who's hosing network & machine wedges Explain Figure 2. Why do black dots (no screend) go up? What determines how high the peak is? Why do they go down? What determines how fast it goes down? (fraction of packets discarded)(work invested in discarded packets) / (total work CPU is capable of) What happens when we add screend? Suppose I wanted to test an NFS server for livelock. Run client with this loop: while(1){ send NFS READ RPC; wait for response; } What would I see? Is the NFS server probably subject to livelock? No--offered load subject to feedback What other problems are we trying to address? Increased latency for packet delivery and forwarding E.g., start disk head moving when first NFS read request comes Transmit starvation User-level CPU starvation What does typical Unix network stack look like? Look at Figure 3 - Receive processing, appends packets to ipintrq - IP forwarding layer (IPL softnet, or just kernel thread, depends on OS) - Transmit processing, takes packets from output queue Look at Figure 11 (p. 240) - Device interrupt (send/receive) - Highest priority - Software interrupt (ipintr) - Higher priority - Top-half (read/write syscalls) - IP input queue - Device output queue - Socket buffering What's wrong with typical Unix interrupt usage? What is overhead of an interrupt? ~10 usec Can't that support 100,000 packets/sec? Priority Why do receive interrupts have high priority? Is this good? What does kernel do on input overload? Queues fill up, packets dropped When are packets dropped? When should they be dropped? Causes throughput to go down as offered load goes up! Why not just tell scheduler to give interrupts lower priority? non-preemptible Why not completely process each packet in the interrupt handler? I.e. forward it? Other parts of kernel don't expect to run at high interrupt-level E.g., some packet processing code might a function that sleeps Still might want an output queue What about using polling instead of interrupts? Solves overload problem Killer latency What's the paper's solution? Do most packet processing in "top half" kernel thread Receive interrupt arrives, poke thread, disable interrupt Thread processes packets fully, put on output queue Thread checks for more work, reenables interrupts otherwise Eliminates IP input queue (packets dropped before work invested) Why does this work? What happens when packets arrive too fast? What happens when packets arrive slowly? Explain Figure 5. Why does "Polling (no quota)" work badly? Input still starves xmit complete processing. Why does it do even worse than unmodified kernel? Because now packets are dropped at the device output queue Actually investing more work in each dropped packet! Explain Figure 6. Here white squares have quota of 10 packets Why does "Polling, no feedback" behave badly? There's a queue in front of screend. We can still give 100% to input thread, 0% to screend. Why does "Polling w/ feedback" behave well? Input thread yields when queue to screend fills. What if screend hangs, what about other consumers of packets? E.g., can you ssh to machine to fix screend? Fortunately screend typically only application Also, re-enable input after timeout Why are the two solutions different? 1. Polling thread *with quotas*. 2. Feedback from full queue. (I believe they should have used #2 for both.) If we apply their fixes, does the phenomemon totally go away? E.g. for web server, waits for disk, &c. Can the net device throw away packets without slowing down host? Problem: We want to drop packets for applications with big queues But requires work to determine which application a packet belongs to Solution: NI-LRP (have network interface sort packets) What about latency question? Look at figure 14 p. 243. 1st packet looks like an improvement over non-polling But 2nd packet transmitted later with poling. Why? No new packets added to xmit buffer until xmit interrupt Why? In traditional BSD, to amortize cost of poking device Maybe better to poke a second time anyway. How is paper? Claims/evaluation/contribution. Do we care about this problem? Yes, e.g., used by NASDAQ What should network stack deliver? - Throughput - Good latency - Graceful degradation under high load Do they convincingly show their system delivers this? Yes. Can we formulate any general principles from paper? Don't spend time on new work before completing existing work. Or give new work lower priority than partially-completed work.