Bugs as Deviant Behavior
========================

Background:  Metal language
  Allows you to load arbitrary code into the compiler
  Can be used to check correctness rules
    E.g., free (ent); ... return ent; -> compiler reports using ent after free

Premise of paper is that programs often violate correctness rules
  - Never call blocking functions in bottom half of the kernel
  - Check NULL pointers returned from routines
  - Do not allocate large variables on the kernel stack
  - Do not make inconsistent assumptions about whether pointer is NULL
  - Check array/loop bounds derived from user data
  - Release acquired locks, and do not double-acquire
  - Restore disabled interrupts
  - Do not use freed memory
  - Do not use floating point in the kernel
  - Check realloc result so as not to leak memory
  - Do not dereference untrusted user pointers
  - Allocate enough memory for type that is stored there

In previous work, found many violations of correctness rules
Problem:  How to find such rules if you don't understand the system
  Real programs have 100s of unspecified rules
  Idea:  Try to infer the rules from the source code itself

Why should this work?
Intuition:  What if a detective interrogates witnesses?
    Doesn't know what actually happened
    But if two people contradict each other, one must be lying
  Also if 100 people say X and 1 person doesn't, probably the 1 is incorrect
Same with code (p. 3 of paper):
  struct mxser_struct *info = tty->driver_data; // MUST: tty not NULL
  //...
  if (!tty || ...) ...  // Here tty may or may not be NULL

In general can infer beliefs:
  x = *p;      // MUST:  programmer believes p is not NULL
               //  MAY:  x may be protected by l
  free (p);    // MUST:  p is heap allocated
               // MUST:  p not used any more
  unlock (l);  // MUST:  l acquired

One technique:  Statistical analysis
  Find lots of repeated idioms:
    lock(a); x++; unlock(a); ... lock(a); x++; unlock(a); ... 
      Then an unprotected x++ means MUST: x not protected by l
    If x almost always protected by lock(a), unprotected access probably a bug
       (MAY belief conflicts with MUST belief)
Another technique:  Internal consistenty
  Check for self-contradiction
    if (card==NULL)
      printk (KERN_ERR, "capidrv-%d: ... %d!\n", card->contrnr, id)

Example:  Can't dereference user-supplied pointers in the kernel
  Can only access them through special paranoid routines
  Self-contradiction in line of linux 2.4.4 kernel (p. 11):
      if (copy_to_user (rt, ipddp_find_route(rt), ...))
    First use of rt "taints" the variable -- MUST be user-supplied
    Second use of rt (in find_route) is uncheck -- MUST be trusted

General approach for infering bugs:
  User supplies some TEPLATES
    Cannot use a pointer after passed to function <f>
    <a> must be paired with <b>
  <a>, <b>, and <f> are SLOTS
  Actual code that could fit in <a> is called a SLOT INSTANCE

Example:  use after free -- don't want to use a pointer after freeing it
  Problem:  There are many "free-like" functions in the kernel--how to find?
  To find slot instances, find functions that take last use of pointer
    -- they MAY be free-like functions
  Then use statistical analysis

Another problem:  Finding too candidate slot instances
    E.g., what pairs of functions to check for lock/unlock-like behavior
  LATENT SPECIFICATIONS can solve the problem (sec 5.2)
  E.g., Use substrings in function names:  lock, alloc, release, assert, ...

Ranking output - What to do if you aren't sure it's a bug
  Rank the possible bugs by likelihood that they are
  Then investigate in order, stop when too many false positives