The dominant paradigm

By far the most widely used style of testing today is unit testing.

Invent a "state of the world".
Run the unit we're testing (e.g. a function).
Check the modified state of the world to see if it looks like it should.
Profit!

Shameless Wikipedia pillaging

If you're used to unit testing, this may look reasonable.

public class TestAdder {
    public void testSum() {
        Adder adder = new AdderImpl();
        assert(adder.add(1, 1) == 2);
        assert(adder.add(1, 2) == 3);
        assert(adder.add(2, 2) == 4);
        assert(adder.add(0, 0) == 0);
        assert(adder.add(-1, -2) == -3);
        assert(adder.add(-1, 1) == 0);
        assert(adder.add(1234, 988) == 2222);
    }
}

The "real world" adds tons of cruft: mock objects, preprocessor abuse to expose "test seams", etc.

Problem: unit testing is only as good as your patience

The example from the previous slide contains 7 tests.

It's not hard to imagine a world in which we lose the will to continue inventing new unit tests long before we've exhausted our search of the space of possible bugs.

Let's sort a list

mergeSort :: (a -> a -> Bool) -> [a] -> [a]
mergeSort pred = go
  where
    go []  = []
    go [x] = [x]
    go xs  = merge (go xs1) (go xs2)
      where (xs1,xs2) = split xs

    merge xs [] = xs
    merge [] ys = ys
    merge (x:xs) (y:ys)
      | pred x y  = x : merge xs (y:ys)
      | otherwise = y : merge (x:xs) ys

The split function

The purpose of splitting is to produce two sublists of roughly equal length, so that they can be merged (where the sorting occurs).

split :: [a] -> ([a],[a])
split []       = ([],[])
split [x]      = ([x],[])
split (x:y:zs) = (x:xs,y:ys)
  where (xs,ys) = split zs

Sorting: a unit test perspective

sort [1,2,3,4] == [1,2,3,4]
sort [4,3,2,1] == [1,2,3,4]
sort [1,4,2,3] == [1,2,3,4]
...

This gets a little dull.

Let's talk about properties

A property is nothing more than a predicate that should always hold.

What's an obvious property for sorts?

t_idempotent = sort . sort == sort

We can't define this, since we can't compare functions for equality.

However, we can cheat:

t_idempotent xs = 
    sort (sort xs) == sort xs

Why don't we do it in the interpreter

t_idempotent [3,2,1]

Okay! Let's mechanize some exhaustive testing.

To exhaustively test over all lists containing the above elements, we need some help.

import Data.List (permutations)

t_permute :: ([a] -> Bool) -> [a] -> Bool
t_permute prop = all prop . permutations

Over what list sizes is this practical?

Exhausted testing?

Clearly, exhaustive testing is impractical for most interesting purposes.

The insight of QuickCheck:

Combine property-based testing with randomly generated data.
This obviously can't give us the same assurance as exhaustive testing, but it's practical.
We can choose the amount of data to throw at our properties.
This lets us tune the tradeoff between degree of assurance and cost.

Let's give it a try

At the ghci prompt:

>> import Test.QuickCheck
>> quickCheck (t_idempotent compare)
+++ OK, passed 100 tests.

As the output suggests, QuickCheck generated 100 random lists, and tested our property over them.

Yay!

Something funky is going on

The type of compare is polymorphic:

>> :type t_idempotent compare
t_idempotent compare :: Ord a => [a] -> Bool

So how are we able to generate a list?

Type defaulting - a recap

Haskell's usual defaulting rules take each group of constraints (C1 a, C2 a, ..., Cn a) for each type variable a, and defaults the type variable if all of the following conditions hold:

The type variable a appears in no other constraints.
All the classes Ci are standard.
At least one of the classes Ci is numeric.

That's not enough for us lazy programmers!

To reduce the number of types we're forced to specify by hand, ghci relaxes the standard rules (changes in italics):

The type variable a appears in no other constraints. Unchanged.
All the classes Ci are standard, and all are single-parameter type classes.
At least one of the classes Ci is numeric, or is Show, Eq, or Ord.

It also adds another critical step when defaulting:

The type () becomes the first of the the standard list of types tried when doing type defaulting.

Peek inside

We can use the verboseCheck function to see all the test data that QuickCheck is generating for us.

>> verboseCheck (t_idempotent compare)

Notice that we have endless lists of ()?

Our supposedly reassuring test isn't very useful!

Polymorphic testing

When we're testing a polymorphic property, we need to specify which concrete type we're testing at.

>> import Data.Word (Word8)
>> verboseCheck (t_idempotent (compare :: Word8 -> Word8 -> Ordering))

That's verbose and unhappy-making, right?

Enter the type synonym

Here's one approach to reducing our work:

type Cmp a = a -> a -> Ordering

Trying this at the ghci prompt:

>> import Data.Word (Word8)
>> verboseCheck (t_idempotent (compare :: Cmp Word8))

Witness the fitness

Here's an alternative approach:

t_witnessed p a xs = mergeSort p (mergeSort p xs) == mergeSort p xs
  where _witness = a < head xs

What's that _witness variable for?

It's a type witness, a value that exists to express a constraint between several types (it "witnesses" the constraint).
Thanks to the use of <, this witness forces the type of a and the type of the elements of xs to be the same.

(We prefix the name with an underscore to tell GHC that it's an unused wild card.)

Instantiating our new polymorphic test

We can supply a value for a of the appropriate type to test over:

>> verboseCheck (t_witnessed compare 'a')

Of course, the value of a is never used.

As a result, we don't even need to supply a working value, provided the type of what we supply is correct:

>> verboseCheck (t_witnessed compare (undefined::Double))

Where do random values come from?

To generate random values of some type, we must write an Arbitrary instance for it.

class Arbitrary a where
  arbitrary :: Gen a

Here's an example, making use of the fact that this unknown type Gen is an instance of Monad:

data Point = Point Int Int

instance Arbitrary Point where
    arbitrary = do
      x <- arbitrary
      y <- arbitrary
      return (Point x y)

Lifting

We're hopefully by now familiar with the Functor class:

class Functor f  where
    fmap :: (a -> b) -> f a -> f b

This takes a pure function and "lifts" it into the functor f.

In general, "lifting" takes a concept and transforms it to work in a different (sometimes more general) setting.

For instance, we can define lifting of functions with the Monad class too:

liftM :: (Monad m) => (a -> b) -> m a -> m b
liftM f action = do
  b <- action
  return (f b)

fmap and liftM

Notice the similarities between the type signatures:

fmap  :: (Functor f) => (a -> b) -> f a -> f b
liftM :: (Monad m)   => (a -> b) -> m a -> m b

All instances of Monad can possibly be instances of Functor. Ideally, they'd be defined in terms of each other:

class (Functor m) => Monad m where
    {- blah blah -}

For historical reasons, the two classes are separate, so we write separate instances for them and just reuse the code:

instance Monad MyThingy where
    {- whatever -}

instance Functor MyThingy where
    fmap = liftM

Why the apparent digression?

It turns out that lifting pure functions into monads is really common.

So common, in fact, that Control.Monad defines a bunch of extra combinators for us.

liftM2 :: (Monad m) => (a -> b -> c) -> m a -> m b -> m b
liftM2 f act1 act2 = do
  a <- act1
  b <- act2
  return (f a b)

These combinators go all the way up to liftM5.

Look familiar? Useful?

A tighter Arbitrary instance

Before:

instance Arbitrary Point where
    arbitrary = do
      x <- arbitrary
      y <- arbitrary
      return (Point x y)

After:

import Control.Monad (liftM2)

instance Arbitrary Point where
    arbitrary = liftM2 Point arbitrary arbitrary

Trying something else

Suppose we want to verify that the sum of two odd integers is always even.

Here's a quick and obvious property:

p_sum_odd a b
    | odd a && odd b = even (a + b)
    | otherwise      = True

This looks a little crufty, though.

We're returning True even in instances where the property really isn't defined.

Conditional properties

It would be nice if we could express the idea "check this property only if the inputs satisfy these constraints".

In fact, there's a combinator for that: ==>

p_sum_odd1 a b =
    odd a && odd b ==> 
    even (a+b)

This specifies that the property on the right should hold whenever the Bool-valued test on the left succeeds.

QuickCheck will discard inputs for which the test fails.

Limits on conditional properties

Suppose we try to test this property:

p_odd3 a b c = odd a && odd b && odd c ==> odd (a+b+c)

We'll get a strange error from QuickCheck:

>> quickCheck p_odd
*** Gave up! Passed only 79 tests.

None of our tests failed, but QuickCheck puts an upper limit on the number of test cases it will generate.

This avoids an infinite loop if the condition before the property never holds.
Here, we hit the limit because almost 90% of our inputs are discarded. Why are they discarded?

Correctness by construction

Instead of filtering out data that isn't right for us, it's better to generate only data that is right.

newtype Odd a = Odd a
    deriving (Show)

instance (Integral a, Arbitrary a) => Arbitrary (Odd a) where
    arbitrary = do
      a <- arbitrary
      return $! Odd (if even a then a + 1 else a)

It's clear from inspection that the Arbitrary instance for Odd a will only generate odd-valued integers.

Can I get a witness?

We no longer need a conditional property now, since all of our property's parameters must by necessity be odd:

p_odd1 x (Odd a) (Odd b) (Odd c) = odd (a+b+c)
  where _witness = x == a

Let's run it:

>> quickCheck $ p_odd1 (1::Int)
+++ OK, passed 100 tests.

Sweet! We saved ourselves from generating a ton of useless data, made the testing process faster, and cleaned up the property some too.

Sizing a test

Test data generators have an implicit size parameter, hidden inside the Gen type.

QuickCheck starts by generating small test cases; it increases the size as testing progresses.

The meaning of "size" is specific to the needs of an Arbitrary instance.

The Arbitrary instance for lists interprets it as "the maximum length of a list of arbitrary values".

We can find the current size using the sized function, and modify it locally using resize:

sized  :: (Int -> Gen a) -> Gen a
resize ::  Int -> Gen a  -> Gen a

Testing a recursive data type

Suppose we have a tree type:

data Tree a = Node (Tree a) (Tree a)
            | Leaf a
              deriving (Show)

Here's an obvious Arbitrary instance:

instance (Arbitrary a) => Arbitrary (Tree a) where
    arbitrary = oneof [
                  liftM Leaf arbitrary
                , liftM2 Node arbitrary arbitrary
                ]

The oneof combinator chooses a generator at random.

oneof :: [Gen a] -> Gen a

What's up, Doc?

Potential trouble:

This generator may not terminate at all!
It's likely to produce huge trees.

We can use the sample function to generate and print some arbitrary data.

sample :: (Show a) => Gen a -> IO ()

This helps us to explore what's going on.

A safer instance

Here's where the sizing mechanism comes to the rescue.

instance (Arbitrary a) => Arbitrary (Tree a) where
    arbitrary = sized tree

tree :: (Arbitrary a) => Int -> Gen (Tree a)
tree 0 = liftM Leaf arbitrary
tree n = oneof [
           liftM  Leaf arbitrary
         , liftM2 Node subtree subtree
         ]
  where subtree = tree (n `div` 2)

Sorting revisited

A very desirable property of a sorting algorithm is stability.

The built-in sort and sortBy functions are designed to be stable.

What about our mergeSort? Let's find out.

import Data.Function (on)
import Data.List (sortBy)
import Data.Word (Word8)

p_stable xs = merged == sorted
  where merged = mergeSort (compare `on` fst) xs
        sorted = sortBy    (compare `on` fst) xs
        _witness = xs :: [(Word8, Word8)]

That on function is pretty neat:

on :: (b -> b -> c) -> (a -> b) -> a -> a -> c
(*) `on` f = \x y -> f x * f y