Code Red
========

First Internet work, Morris worm Nov '88
  Brought many machines to a standstill
Long history since then, but CodeRed takes the cake w. "$2.6B" damage

Background:  Microsoft IIS contained buffer overflow vulnerability
  Announced June 18, 2001
  Patched June 26, 2001

July 12, 2001:  CodeRedI released
  Only memory resident
  First 19 days of month:
    Scans IP addresses in fixed, pseudo-random order
    Tries to infect any other IIS hosts it finds
  From 20-28th days of month, DoS attack on www1.whitehouse.gov

July 13, 2001:  CodeRedI v2
  pseudo-random host sequence no longer fixed
  managed to infect 359,000
  still not memory resident
  but if you rebooted and patched, often infected before patch applied

August 4, 2001:  CodeRedII
  No longer just memory resident
  Sets up a backdoor with administrative access to machine
  Doesn't do anything for first 24 hours
  Then reboots machine, tries infecting other machines
    Now tries addresses in an interestingly distributed fashion
      w. 1/2 probability tries machines in same /8 network
      w. 3/8 probability tries machines in same /16 network
      w. 1/8 probability tries random non-class-D non-loopback address
  Delay makes things harder (can't correlate recent flows w. infection)

Dynamic Taint Analysis
======================

What is goal of this paper?

What is basic approach?
  Keep 4-byte pointer to "taint structure" for each byte of memory
  * TaintSeed - Taints network data coming back from system calls
      (actually, could vary policy, but this makes the most sense)
  * TaintTracker - Keeps track of which bytes/registers are tainted and how
      Note: Does not track condition code registers
      Note2:  Must special-case instructions like xor %eax,%eax
  * TaintAssert - Checks that tainted data isn't used illegitimately

What is valgrind?
  Open-source x86 emulator that supports extensions called "skins"
  Translates x86 into RISC-like code called UCode
  Skins can modify UCode
  Gets translated back to x86

What kinds of vulnerabilities can this approach discover?  (See Fig. 2)
  A. Format string
  B. Buffer overflow
  C. Double free
  D. Heap Smash
  Review what A, B, and D are from lecture 7
  How might you exploit a double free?
    Can result in memory double allocated
      Free(ptr), Alloc() [returns ptr], Double-Free(ptr), Alloc() [returns ptr]
    Suppose old structure contains function pointer
    New structure contains buffer with user supplied data
    Can make function pointer designate shell code

What kinds of attack on these vulnerabilities?
  Return address - value on stack
  Jump address - e.g., value in jump table for dynamic linking
  Function pointer - in user data structure
  Function pointer offset - index into function pointer table
  System call args - e.g., arg to execve
  Function call args - e.g., arg to system

What's an example of a false negative?
  if (x == 0) y = 0; if (x == 1) y = 1;
  Or worse, data gets converted by a table
    (IIS converts ASCII to unicode)
  Or tainted data could be written to a file
  Can you come up with contrived example, that would untaint an entire
      32-bit word?  (E.g., could do one bit at a time in a loop)

What's an example of a false positive?
  Program may sanity-check data before using it:
    E.g., user supplies function table index
    if (index >= 0 && index < table_size) (*table[index]) ();
  How to handle false positives?
    1. Disable check at location that triggers false positive
        E.g., (*table[index]) () call is okay
    2. Untaint data immediately after the check
    Which is best?
      2, because with 1 data could be tainted along other path

How does TaintCheck relate to worm detection (sec. 6)?
  Automatic semantic analysis
    Idea:  Find parts of payload likely to be constant across attacks
  Overwrite values:
    Most code injection attacks must overwrite return address w. fixed value
      Or at least upper 3 bytes must be the same
    So worm payload will often contain those three bytes
    Or will contain those three bytes encoded as URL
  Keep track of whether bytes are used in significant way
    Distinguish "filler" from important bytes
      and from bytes used only after the attack
    Could also flip payload bits and see if attack still works
  Allow worm to spread in confined environment
    Trap all its network I/O, and use samples for signature generation

How is performance of taintcheck?
  Bzip2 is 37x times slower.  Ouch!
  Does this mean taintcheck is useless?
  Mitigating factors:
    bzip2 CPU bound, while some server workloads I/O bound
      (valgrind dilates CPU time, not I/O time)
    This is a prototype implementation
    Commercial alternatives to valgrind are much more efficient
      but still that's a factor of 3x better, not 37x
  Don't need to run taintcheck on every connection
    Honeypots have no legitimate use, so who cares if bad performance
    Or could randomly sample, particularly with multiple sites
    But, still need to worry that work could detect slow response time

In a distributed setting, Taintcheck allows you to verify signatures!
  E.g., with distributed autograph, someone might poison your fingerprints
  But if they give you sample payload, run w. taintcheck to verify exploit