Code Red ======== First Internet work, Morris worm Nov '88 Brought many machines to a standstill Long history since then, but CodeRed takes the cake w. "$2.6B" damage Background: Microsoft IIS contained buffer overflow vulnerability Announced June 18, 2001 Patched June 26, 2001 July 12, 2001: CodeRedI released Only memory resident First 19 days of month: Scans IP addresses in fixed, pseudo-random order Tries to infect any other IIS hosts it finds From 20-28th days of month, DoS attack on www1.whitehouse.gov July 13, 2001: CodeRedI v2 pseudo-random host sequence no longer fixed managed to infect 359,000 still not memory resident but if you rebooted and patched, often infected before patch applied August 4, 2001: CodeRedII No longer just memory resident Sets up a backdoor with administrative access to machine Doesn't do anything for first 24 hours Then reboots machine, tries infecting other machines Now tries addresses in an interestingly distributed fashion w. 1/2 probability tries machines in same /8 network w. 3/8 probability tries machines in same /16 network w. 1/8 probability tries random non-class-D non-loopback address Delay makes things harder (can't correlate recent flows w. infection) Dynamic Taint Analysis ====================== What is goal of this paper? What is basic approach? Keep 4-byte pointer to "taint structure" for each byte of memory * TaintSeed - Taints network data coming back from system calls (actually, could vary policy, but this makes the most sense) * TaintTracker - Keeps track of which bytes/registers are tainted and how Note: Does not track condition code registers Note2: Must special-case instructions like xor %eax,%eax * TaintAssert - Checks that tainted data isn't used illegitimately What is valgrind? Open-source x86 emulator that supports extensions called "skins" Translates x86 into RISC-like code called UCode Skins can modify UCode Gets translated back to x86 What kinds of vulnerabilities can this approach discover? (See Fig. 2) A. Format string B. Buffer overflow C. Double free D. Heap Smash Review what A, B, and D are from lecture 7 How might you exploit a double free? Can result in memory double allocated Free(ptr), Alloc() [returns ptr], Double-Free(ptr), Alloc() [returns ptr] Suppose old structure contains function pointer New structure contains buffer with user supplied data Can make function pointer designate shell code What kinds of attack on these vulnerabilities? Return address - value on stack Jump address - e.g., value in jump table for dynamic linking Function pointer - in user data structure Function pointer offset - index into function pointer table System call args - e.g., arg to execve Function call args - e.g., arg to system What's an example of a false negative? if (x == 0) y = 0; if (x == 1) y = 1; Or worse, data gets converted by a table (IIS converts ASCII to unicode) Or tainted data could be written to a file Can you come up with contrived example, that would untaint an entire 32-bit word? (E.g., could do one bit at a time in a loop) What's an example of a false positive? Program may sanity-check data before using it: E.g., user supplies function table index if (index >= 0 && index < table_size) (*table[index]) (); How to handle false positives? 1. Disable check at location that triggers false positive E.g., (*table[index]) () call is okay 2. Untaint data immediately after the check Which is best? 2, because with 1 data could be tainted along other path How does TaintCheck relate to worm detection (sec. 6)? Automatic semantic analysis Idea: Find parts of payload likely to be constant across attacks Overwrite values: Most code injection attacks must overwrite return address w. fixed value Or at least upper 3 bytes must be the same So worm payload will often contain those three bytes Or will contain those three bytes encoded as URL Keep track of whether bytes are used in significant way Distinguish "filler" from important bytes and from bytes used only after the attack Could also flip payload bits and see if attack still works Allow worm to spread in confined environment Trap all its network I/O, and use samples for signature generation How is performance of taintcheck? Bzip2 is 37x times slower. Ouch! Does this mean taintcheck is useless? Mitigating factors: bzip2 CPU bound, while some server workloads I/O bound (valgrind dilates CPU time, not I/O time) This is a prototype implementation Commercial alternatives to valgrind are much more efficient but still that's a factor of 3x better, not 37x Don't need to run taintcheck on every connection Honeypots have no legitimate use, so who cares if bad performance Or could randomly sample, particularly with multiple sites But, still need to worry that work could detect slow response time In a distributed setting, Taintcheck allows you to verify signatures! E.g., with distributed autograph, someone might poison your fingerprints But if they give you sample payload, run w. taintcheck to verify exploit