Let's think about a programming pattern we've seen, but not paid attention to.

Patterns: I

0 + n  ==  n
n + 0  ==  n

(a + b) + c  ==  a + (b + c)

Patterns: II

1 * n  ==  n
n * 1  ==  n

(a * b) * c  ==  a * (b * c)

Patterns: III

[]

[] ++ n  ==  n
n ++ []  ==  n

(a ++ b) ++ c  ==  a ++ (b ++ c)

Patterns: IV

True

True && n  ==  n
n && True  ==  n

(a && b) && c == a && (b && c)

Patterns, abstracted

Typeclass:

class Monoid a where
    -- A "zero element"
    mempty  :: a
    -- An associative operation
    mappend :: a -> a -> a

Where can you find this typeclass?

import Data.Monoid

Monoids

Instances of Monoid must obey some rules.

Rule 1: identity element

mempty `mappend` n  ==  n
n `mappend` mempty  ==  n

Rule 2: our associative operation must actually associate.

(a `mappend` b) `mappend` c  ==
a `mappend` (b `mappend` c)

Rules?

Monoids come from abstract algebra.

In abstract algebra, rules that must be true are called axioms.

Also called laws.

In Haskell, how are these rules/axioms/laws enforced?

They are not.

Monoids for lists

Here's the easiest and most familiar-to-Haskellers case:

instance Monoid [a] where
     mempty           = []
     xs `mappend` ys  = xs ++ ys

Pop quiz:

What other definition(s) would follow the Monoid laws?
Do they make any sense?

Monoids for numbers?

Numbers are an interesting case.

Addition as monoid:

Identity 0
Associative operator +

Multiplication as monoid:

Identity 1
Associative operator *

When do we use typeclasses?

Suppose you want to abstract a code pattern into a typeclass.

Under what circumstances is this likely to work best?

When there is just one "canonical" behaviour you expect for a given type.

For lists, our Monoid instance is canonical:

Any other behaviour that follows the laws is just weird.

For numbers, we have two sensible behaviours:

No one Monoid instance can be called canonical!

Monoids for multiplication

newtype Product a = Product { getProduct :: a }
    deriving (Eq, Ord, Read, Show, Bounded)

instance Num a => Monoid (Product a) where
    mempty                        = Product 1

    Product x `mappend` Product y = Product (x * y)

Monoids for addition

newtype Sum a = Sum { getSum :: a }
    deriving (Eq, Ord, Read, Show, Bounded)

instance Num a => Monoid (Sum a) where
    mempty                = Sum 0

    Sum x `mappend` Sum y = Sum (x + y)

The `Either` type

There exists a built-in type named Either.

data Either a b = Left a | Right b

By convention:

Left means "something went wrong"
Right means "result was a success"

Often used as follows:

type Result a = Either String a

(where the String carries an error message)

Coding exercise

Create a Monoid instance that will give the first success from a chain of Either values.

Desired behaviour:

Left "you goofed" `mappend`
Left "i win!"     `mappend`
Right "rats! you won!"

   ==

Right "rats! you won!"

You have five minutes.

Ambient machinery for the coding exercise

If you import Data.Monoid you will have the following definitions available:

class Monoid a where
  mempty :: a
  mappend :: a -> a -> a

data Either a b = Left a | Right b

Language hitch

Did you try to write code like this?

instance Monoid (Either a b) where
    mempty = Left {- what ??? -}

    Right a `mappend` _ = Right a
    _       `mappend` b = b

You surely ran into trouble while trying to define mempty.

Why?

Type quantification

In Haskell, type variables are quantified.

They stand in for all types in a given domain.

If there's no typeclass mentioned, a type variable is implicitly universally quantified.

We can write these quantifiers explicitly:

length :: forall a. [a] -> Int

"The length function must accept any list, no matter what type of data it contains."

Universal quantification

Why is universal quantification relevant here?

instance Monoid (Either a b) where
    mempty = Left {- what ??? -}

Universal quantification

Why is universal quantification relevant here?

instance Monoid (Either a b) where
    mempty = Left {- what ??? -}

Since mempty gives a "zero element", it must somehow produce a zero element for the type a.

But since a is universally quantified, it stands in for every type.

Clearly there is no one legal value that is of every type.

It is impossible to write a sensible instance.

A possible fix

This won't typecheck either:

instance Monoid (Either String a) where
    mempty = Left "fnord"

    Right a `mappend` _ = Right a
    _       `mappend` b = b

However, we can make it compile by adding the following to the top of our source file:

{-# LANGUAGE FlexibleInstances #-}

Pragmas

This is a specially formatted comment:

{- i am a normal comment -}

{-# i am a special comment #-}

"Special" comments usually contain directives ("pragmas") that change the compiler's behaviour.

The LANGUAGE pragma enables non-standard language features.

{-# LANGUAGE FlexibleInstances #-}

FlexibleInstances makes the compiler consider more typeclass instances as legal than the Haskell 98 standard allows.

More about pragmas

You'll see a few more pragmas as we progress.

Some are widely used, others are not.

Some are safe, others are not...

up to and including allowing the typechecker to go into an infinite loop! (UndecidableInstances)

FlexibleInstances is widely used and often safe.

Back to our fix

This will typecheck:

{-# LANGUAGE FlexibleInstances #-}

instance Monoid (Either String a) where
    mempty = Left "fnord"

    Right a `mappend` _ = Right a
    _       `mappend` b = b

But is it canonical?

Canonicality

Why worry about our Monoid instance being canonical?

Any time you declare an instance of any typeclass:

It is automatically made available to every module that imports your module.
You can't say "I don't want to import instance X" :-(

If you define a weird instance of a popular typeclass, you'll "infect" people who import your module.

Make sure your instances make sense!

Finally!

Via use of newtype, we don't accidentally associate a silly Monoid instance with Either String a.

{-# LANGUAGE FlexibleInstances #-}

import Data.Monoid

newtype FirstRight a b = FirstRight {
    getFirstRight :: Either a b
  }

instance Monoid (FirstRight String a) where
  mempty = FirstRight (Left "suxx0rz")

  a@(FirstRight (Right _)) `mappend` _ = a
  _                        `mappend` b = b

HTTP POST

Let's upload some vitally important data to a server.

curl --data foo=bar --verbose \
  http://httpbin.org/post

Multipart form upload

When we POST multipart data to a form (e.g. uploading a photo), some information is mandatory, while other stuff is optional.

data Part = Part {
    -- name of the <input> tag this belongs to
      name        :: String
    -- filename of file we're uploading
    , fileName    :: Maybe FilePath
    -- type of file
    , contentType :: Maybe ContentType
    -- file contents
    , body        :: String
    } deriving (Show)

Uploading data

Suppose we want to build a HTTP client that supports POST.

Web pages tend to expect multipart form data, while REST APIs have different needs.

Here are some types that let us represent a POST body.

type Param = (String, String)

type ContentType = String

data Payload = NoPayload
             | Raw ContentType String
             | Params [Param]
             | FormData [Part]
               deriving (Show)

Can you write a Monoid instance for Payload?

Decide for yourself, then discuss with a partner for 2 minutes.

Huh

This part is easy enough:

instance Monoid Payload where
    mempty = NoPayload

    mappend NoPayload b = b
    mappend a NoPayload = a

    mappend (Params a) (Params b) = Params (a++b)

    {- ... -}

What about the rest of mappend?

Semantic problems

It is easy to see how we can glom together Params or FormData.

data Payload = NoPayload
             | Raw ContentType String
             | Params [Param]
             | FormData [Part]

However, mixing Raw with Params, or Params with FormData, is nonsensical.

A straightforward Monoid instance will have to crash (!!!) if we try this.

Handling failure (badly)

What if we use the Maybe type to represent a failed attempt to mappend?

{-# LANGUAGE FlexibleInstances #-}

-- I dropped the NoPayload constructor. Why?
data Payload = Raw ContentType String
             | Params [Param]
             | FormData [Part]
               deriving (Show)

instance Monoid (Maybe Payload) where
  mempty = Nothing

  mappend Nothing b = b
  mappend a Nothing = a

  mappend (Just (Params a)) (Just (Params b))
    = Just (Params (a++b))
  mappend (Just (FormData a)) (Just (FormData b))
    = Just (FormData (a++b))
  mappend _ _ = Nothing

Yay?

This compiles, but it has a conceptual problem.

Every time we use mappend, we have to pattern-match the result to see if the mappend succeeded.

In API design circles, this is called "crappy".

But wait, it gets worse!

O error message, where art thou?

Let me try this in ghci:

Just (Params []) `mappend` Just (Params [])

Overlapping instances

Remember FlexibleInstances?

It allowed us to write a Monoid instance for the type Maybe Payload.

Trouble is, Data.Monoid already defined an instance for Maybe a.

FlexibleInstances allows these two definitions to coexist happily.

But when we want to use an instance, GHC doesn't know which one to use!

Overlapping instances

Enter the OverlappingInstances pragma:

{-# LANGUAGE FlexibleInstances, OverlappingInstances #-}

This allows multiple instances to coexist and be used.

The most specific instance that is visible will be used.

A very handy extension!

Also a big semantic gun pointing at your foot.

Problems with overlapping instances

Why worry about OverlappingInstances?

Makes it very easy for incorrect programs to still typecheck.
Can cause confusing error messages.
A program that typechecks can have its meaning changed by adding an instance declaration in some remote module.

On the plus side, you can publish papers about their problems, so they're not bad for an academic career.

Checking in

We have a Monoid instance that:

Has a janky API
Uses a dodgy language extension

Can we do better?

Phantom types

Let's add a type parameter on the left hand side of our Payload type.

data Payload a = NoPayload
               | Raw ContentType String
               | Params [Param]
               | FormData [Part]
               deriving (Show)

The type variable a does not appear in the RHS.

We call this a phantom type.

What's it for?

A tiny upload API

param :: String -> String -> Payload [Param]
param name value = Params [(name, value)]

filePart :: String -> FilePath -> IO (Payload [Part])
filePart name path = do
  body <- readFile name
  return (FormData [Part name (Just path) Nothing body])

Consider the types

param :: String -> String
      -> Payload [Param]

filePart :: String -> FilePath
         -> IO (Payload [Part])

Notice:

The first function returns a Payload [Param]
The second returns a Payload [Part]

The phantom parameter makes these distinct types.

The runtime representation is the same in each case.
The compiler prevents us from mixing the two by accident.

Code moment

Please write a body for addParams below.

instance Monoid (Payload [Param]) where
    mempty = NoPayload
    mappend = addParams

Download the code you'll need:

curl -L http://cs240h.scs.stanford.edu/PayloadPhantom.hs

You have five minutes.

Making this all work

We have a constrained public API for creating Payload values.

param :: String -> String -> Payload [Param]

filePart :: String -> FilePath -> IO (Payload [Part])

fileString :: String -> Maybe FilePath -> String -> (Payload [Part])

How do we enforce this?

We export the name of the type Part, but not any of its constructors.

Exporting a type

The (..) notation below means "export the type Part and all of its constructors".

module PayloadPhantom
    (
      Part(..)
    {- ... trimmed out ... -}
    ) where

Exporting a type

The (..) notation below means "export the type Part and all of its constructors".

module PayloadPhantom
    (
      Part(..)
    {- ... trimmed out ... -}
    ) where

Notice that we omit the (..) below, meaning "export the type Payload, but not any of its constructors".

module PayloadPhantom
    (
      Part(..)
    , Payload -- no constructors
    {- ... trimmed out ... -}
    ) where

Exporting a type

The (..) notation below means "export the type Part and all of its constructors".

module PayloadPhantom
    (
      Part(..)
    {- ... trimmed out ... -}
    ) where

So we export the Payload type, and only the functions that we defined and control ("smart constructors") that construct values of this type.

module PayloadPhantom
    (
      Part(..)
    , Payload -- no constructors
    , param
    , filePart
    , fileString
    {- ... trimmed out ... -}
    ) where

Trying it out

In ghci:

ghci> param "foo" "bar" <> param "baz" "quux"
Params [("foo","bar"),("baz","quux")]

This uses my favourite operator from Data.Monoid:

(<>) :: Monoid m => m -> m -> m
(<>) = mappend

What do we get if we try this?

param "foo" "bar" <> fileString "baz" Nothing "quux"

Last of the monoids

For which of the following should we write Monoid instances?

data Payload a = NoPayload
               | Raw ContentType String
               | Params [Param]
               | FormData [Part]
               deriving (Show)

Why care so much about monoids?

Monoids have many merits:

Simple
Easy for clients to use
Force you to address API design problems early on

Monoids without an identity

Like the abstract algebraic approach?

A package on Hackage named semigroups gives us monoids without an identity operation: semigroups.

Alas:

The Monoid type was developed before the semigroups package
The two should be related, but thanks to this accident of history are not

Principles

Why care about phantom types and monoids?

We want to build the simplest correct libraries we can

Monoids help us focus on simplicity.

Phantom types make it easier to build APIs where flat-out broken behaviours can be made impossible by the compiler.

Mutable variables

We've already seen the very handy MVar type, which represents a "blocking mutable box": we can put a value in or take one out, but we'll block if we put when full or take when empty.

Even though MVars are the fastest blocking concurrent structure in the industry (they made the the Kessel Run in less than twelve parsecs!), we don't always want blocking semantics.

For cases where we want non-blocking updates, there's the IORef type, which gives us mutable references.

import Data.IORef

newIORef    :: a -> IO (IORef a)

readIORef   :: IORef a -> IO a
writeIORef  :: IORef a -> a -> IO ()

modifyIORef :: IORef a -> (a -> a) -> IO ()

Managing mutation

Application writers are often faced with a question like this:

I have a big app, and parts of it need their behaviour tweaked by an administrator at runtime.

There are of course many ways to address this sort of problem.

Let's consider one where we use a reference to a piece of config data.

Any code that's executing in the IO monad can, if it knows the name of the config reference, retrieve the current config:

curCfg <- readIORef cfgRef

The trouble is, ill-behaved code could clearly also modify the current configuration, and leave us with a debugging nightmare.

Phantom types to the rescue!

Let's create a new type of mutable reference.

We use a phantom type t to statically track whether a piece of code is allowed to modify the reference or not.

import Data.IORef

newtype Ref t a = Ref (IORef a)

Remember, our use of newtype here means that the Ref type only exists at compile time: it imposes no runtime cost.

Since we are using a phantom type, we don't even need values of our access control types:

data ReadOnly
data ReadWrite

We're already in a good spot! Not only are we creating compiler-enforced access control, but it will have zero runtime cost.

Creating a mutable reference

To create a new reference, we just have to ensure that it has the right type.

newRef :: a -> IO (Ref ReadWrite a)
newRef a = Ref `fmap` newIORef a

Reading and writing a mutable reference

Since we want to be able to read both read-only and read-write references, we don't need to mention the access mode when writing a type signature for readRef.

readRef :: Ref t a -> IO a
readRef (Ref ref) = readIORef ref

Of course, code can only write to a reference if the compiler can statically prove (via the type system) that it has write access.

writeRef :: Ref ReadWrite a -> a -> IO ()
writeRef (Ref ref) v = writeIORef ref v

Converting a reference to read-only

This function allows us to convert any kind of reference into a read-only reference:

readOnly :: Ref t a -> Ref ReadOnly a
readOnly (Ref ref) = Ref ref

In order to prevent clients from promoting a reference from read-only to read-write, we do not provide a function that goes in the opposite direction.

We also use the familiar technique of constructor hiding at the top of our source file:

module Ref
    (
      Ref, -- export type ctor, but not value ctor
      newRef, readOnly,
      readRef, writeRef
    ) where

Phantoms

Patterns: I

Patterns: II

Patterns: III

Patterns: IV

Patterns, abstracted

Monoids

Rules?

Monoids for lists

Monoids for numbers?

When do we use typeclasses?

Monoids for multiplication

Monoids for addition

The Either type

Coding exercise

Ambient machinery for the coding exercise

Language hitch

Type quantification

Universal quantification

Universal quantification

A possible fix

Pragmas

More about pragmas

Back to our fix

Canonicality

Finally!

HTTP POST

Multipart form upload

Uploading data

Huh

Semantic problems

Handling failure (badly)

Yay?

O error message, where art thou?

Overlapping instances

Overlapping instances

Problems with overlapping instances

Checking in

Phantom types

A tiny upload API

Consider the types

Code moment

Making this all work

Exporting a type

Exporting a type

Exporting a type

Trying it out

Last of the monoids

Why care so much about monoids?

Monoids without an identity

Principles

Mutable variables

Managing mutation

Phantom types to the rescue!

Creating a mutable reference

Reading and writing a mutable reference

Converting a reference to read-only

Further reading

The `Either` type