# A Monday recap

Someone asked a thoughtful question on Monday about folds, specifically about `foldr` vs `foldl` and `foldl'`.

I wasn't very happy with the answers I came up with, because I was in a bit of a rush.

• Always be suspicious of teachers who get into a muddle if you spring an unexpected question!

• Do they really know what they're talking about, or are they just good at reading slides?

Today, let's slow down and think about folds a little more.

# Folding from the right

Here's the definition of `foldr`:

``foldr :: (a -> b -> b) -> b -> [a] -> bfoldr f = go  where    go (x:xs) = f x (go xs)    go z      = z``

It associates to the right, so here's what a use of `foldr` looks like, expanded:

``foldr f z [a,b,c,d] = f a (f b (f c (f d z)))``

# foldr and strictness: 0 of n

Suppose we want to sum the elements of a list using `foldr`:

``foldr (+) 0 [3..6]``

In order for the `(+)` operator to add two numbers, it has to know what each one is, so at least for the usual types we work with, it's strict in both of its arguments.

(Note: `(+)` is not required to be strict. That behaviour depends on the instance of `Num` in use.)

# foldr and strictness: 1 of n

Let's do a stepwise expansion of the `foldr`-based sum.

``foldr (+) 0 (3:{-...-}) = 3 + {-...-}``

We can't produce an answer after consuming the first element of the list, because `+` is strict in its arguments.

# foldr and strictness: 2 of n

``foldr (+) 0 (3:4:{-...-}) = 3 + (4 + {-...-})``

Neither can we produce an answer after the second element.

In fact, we're building up an expression that we can't reduce to a smaller form, because we're associating to the right and we haven't seen the rest of the list yet (see the definition of `foldr`).

# foldr and strictness: n of n

In fact, not until we reach the end of the list do we have a number on the right hand side of a `+`:

``foldr (+) 0 (3:4:5:6:[]) = 3 + (4 + (5 + (6 + 0)))``

What we've done is build up a big expression that we can't evaluate until we've seen the entire list.

This is due in equal part to two factors:

• `foldr` is right associative.

• The function we're folding with is strict in both arguments.

If `foldr` was getting us all screwed up due to the associativity, maybe `foldl` will do better.

``foldl :: (a -> b -> a) -> a -> [b] -> afoldl f = go  where    go z (x:xs) = go (f z x) xs    go z _      = z``

Let's see an expansion.

``foldl f z [a,b,c,d] = f (f (f (f z a) b) c) d``

Ouch! Maybe we can make that clearer with infix syntax?

``foldl (+) z [a,b,c,d] = (((z + a) + b) + c) + d``

# foldl and strictness

For summing elements, this looks better than `foldr` on the face of things, because at every step through the list, the function we're folding with can see both of its arguments.

But there's a catch.

None of the intermediate results are visible to the caller of `foldl`, so the caller cannot force those results to be evaluated.

In other words, we're still building up a big thunk!

# A strict left fold

This function is defined for us in `Data.List`, and it has the same type as `foldl`:

``{-# LANGUAGE BangPatterns #-}foldl' :: (a -> b -> a) -> a -> [b] -> afoldl' f = go  where    go !z (x:xs) = go (f z x) xs    go !z  _     = z``

The crucial difference lies with the strictness annotation, which forces each intermediate result to be evaluated to WHNF.

# foldl' and strictness: 1 of n

That same sum again.

``foldl' (+) 0 [1,2,3]``

First step:

``foldl' (+) 0 (1:{-...-}) = foldl' (0+1) {-...-}                         = foldl' 1     {-...-}``

# foldl' and strictness: 2 of n

Second step:

``foldl' (+) 1 (2:{-...-}) = foldl' (1+2) {-...-}                         = foldl' 2     {-...-}``

The structure of `foldl'` causes us to replace the accumulator `z` with a new value at each step.

The strictness annotation (we could have used `seq` too) causes that accumulator to be evaluated to WHNF before we continue.

# foldl' and strictness: n of n

Last step:

``foldl' (+) 3 [] = 3``

I hope this makes the difference between `foldl'` and `foldr` more apparent!

# Don't use foldl

It should be clear by now that plain old `foldl` is very rarely useful.

In fact, I've never personally found a situation where it was the right kind of fold to use.

If you find yourself thinking "I need a left fold", it's safe to assume that `foldl'` is what you'll need.

# foldr and laziness

We've discussed the fact that using `foldr` with a strict function is a bad idea, because it generates a chain of thunks.

What if we used a function that was lazy in its second argument?

Hmm. Conveniently enough, we have an everyday example of just such a function: the list constructor!

What does this function do?

``foldr (:) []``

# A handy way to think about foldr

Let's do another expansion, but this time we'll align the input list and result expression lexically:

``foldr (<+>) z   (a  :  (b  :  (c  :  (d  :  [])))) =  (a <+> (b <+> (c <+> (d <+> z ))))``

Notice that each `(:)` in the input list is "replaced" by an application of `(<+>)` in the result, and the `[]` at the end is replaced with `z`.

(It doesn't matter what `(<+>)` actually does here. We're purely interested in the structures of the two expressions.)

# Quick exercise 1: map

Write a version of `map` that does not pattern match on its input list, but instead uses `foldr`:

``map :: (a -> b) -> [a] -> [b]map (+1) [1,2,3] == [2,3,4]``

# Quick exercise 1: my implementation

``myMap f = foldr go []  where go x xs = f x : xs``

# Quick exercise 2: filter

Write a version of `filter` that again uses `foldr`:

``filter :: (a -> Bool) -> [a] -> [a]filter odd [0..5] == [1,3,5]``

# Quick exercise 2: my version

``myFilter p = foldr go []  where    go x xs      | p x       = x : xs      | otherwise = xs``

# Quick exercise 3: append

Write an `append` function that has the same behaviour as the `(++)` operator:

``(++) :: [a] -> [a] -> [a][1,2,3] ++ [4,5,6] == [1,2,3,4,5,6]``

For bonus points, make your implementation as short as possible.

# Quick exercise 3: mine

``append = flip (foldr (:))``

# Parallelism vs concurrency

A handy rule of thumb for distinguishing the two:

• Concurrency: "I have a number of potentially unrelated tasks to deal with."

• Parallelism: "I want my single question to be answered sooner if I can throw more resources at the problem."

Many languages blur these notions together, by providing a "one size fits all" toolkit for concurrency and parallelism.

# What does a concurrent system need?

We do not need multiple CPU cores to achieve concurrent processing, nor do we even need threads.

However, a good language runtime and programming environment will offer improved throughput and latency on concurrent problems if it can use multiple cores.

# What about a parallel system?

The essence of parallel programming is to have multiple computing elements process parts of a problem simultaneously.

GHC provides a "baked in" way of using multiple CPU cores to do this.

To take advantage of this, we must build our executables with `-threaded`.

This links a Haskell program against the "threaded runtime", which can schedule lightweight Haskell threads across multiple CPU cores, and which can also run parallel programs on those cores.

# Low-level parallel programming

In the `Control.Parallel` module lives an intriguing function.

``par :: a -> b -> b``

This is the parallel cousin to `seq`. It tells the threaded runtime that it may be beneficial to evaluate the first argument in parallel with the second.

Like `seq`, `par` returns the value of its second argument.

# pseq

As it happens, we need another oddball function to work with `par`.

``pseq :: a -> b -> b``

This is semantically identical to `seq`, but with a subtle operational difference.

`seq` is strict in both arguments, so the compiler can take an expression like this:

``a `seq` b ``

And transform it into this:

``b `seq` a `seq` b``

This can be a problem when annotating code for parallelism, when we need more control over the order of evaluation.

In contrast to `seq`, `pseq` is only strict in its first argument.

This restricts the transformations that the compiler can perform, and lets us retain control of the evaluation order.

# When to use par

We use `par` when two conditions are true:

• The first argument is somewhat expensive to compute.

• We don't have an immediate need for the result of the first computation.

If the first argument is too cheap, then the book-keeping overhead to evaluate it in parallel will diminish or erase any performance gain.

# Futures

The `par` operator makes use of the overlap between lazy evaluation and futures.

To implement lazy evaluation, we need thunks: a representation for expressions which are not yet evaluated but whose value may later be demanded.

A future is a computation whose value is being evaluated in parallel and which we may wait for.

`par` offers a way to annotate a lazy computation as being potentially profitable to evaluate in parallel.

In other words, it turns a lazy computation into a future.

# O rly?

In practice, `par` and `pseq` are a pain in the neck to use.

To use `par` effectively, we have to get some surprisingly subtle things right:

1. Pass an unevaluated computation to `par`

2. The unevaluated computation must be somewhat expensive

3. Ensure that its value will not be required by the enclosing computation for a while

4. Ensure that the result is shared by the rest of the program

If we miss any of 1 through 3, then we achieve little or no speedup.

If we miss 4, then the GC may get rid of the computed result before it can be used.

• In practice, even experts fall afoul of these requirements on a regular basis.

The `monad-par` package provides a library that makes parallelism easier to achieve and reason about.

``````cabal install monad-par
``````

It gives us a type named `Par`:

``newtype Par ainstance Functor Parinstance Applicative Parinstance Monad Par``

To evaluate a `Par` computation, we use `runPar`:

``runPar :: Par a -> a``

Notice that this has no side effects, so it will run deterministically.

# Building blocks

To start a parallel computation, we use the `fork` action:

``fork :: Par () -> Par ()``

Forked tasks need a way to communicate with each other, for which we use the `IVar` type:

``data IVar a    deriving (Eq)new :: Par (IVar a)get :: IVar a -> Par aput :: (NFData a) => IVar a -> a -> Par ()``

The `IVar` type is a write-once mutable reference. The `get` function blocks until a `put` has been performed.

# The NFData class

We know that `seq` evaluates an expression to WHNF, but sometimes we want to completely evaluate an expression to normal form.

Enter the `deeqseq` package:

``import Control.DeepSeqclass NFData a where    rnf :: a -> ()``

We can write an `NFData` instance for any custom type. Our `rnf` implementation must reduce an expression to normal form.

``data Foo a = Foo Int ainstance (NFData a) => NFData (Foo a) where    rnf (Foo x y) = rnf x `seq` rnf y``

# Higher level operations

An extremely common pattern is for a thread to `fork` several children and wait for them to finish.

We can easily capture this idea with a suitable combinator.

``spawn :: (NFData a) => Par a -> Par (IVar a)spawn act = do  i <- new  fork (put i =<< act)  return i``

In fact, usually all we want is to simply wait for all children and return a list of their results.

``parMapM :: (NFData b) => (a -> Par b) -> [a] -> Par [b]parMapM f acts = do   ivars <- mapM (spawn . f) acts  mapM get ivars``

# Computing the mean of a sample

Suppose we need the mean of a list of numbers. The naive way to do so would be `sum xs / length xs`, but this has numerical stability problems.

Here's a more stable method:

``import Data.Listdata T a = T !a !Intmean :: (RealFrac a) => [a] -> amean = fini . foldl' go (T 0 0)  where    fini (T a _) = a    go (T m n) x = T m' n'      where m' = m + (x - m) / fromIntegral n'            n' = n + 1``

# Questionable numbers

Supposing we have a set of sample data that we know little about, in particular its precision and variance.

This is exactly the kind of problem that the criterion benchmarking library faces: we have performance data, but it's dirty and doesn't follow any particular statistical distribution.

A technique called the jackknife lets us estimate these parameters.

We successively recompute a function (such as `mean`) over subsets of a sample, each time with a sliding window cut out.

For a w-width window and k samples, the jackknife has a cost of O((k - w)2).

``jackknife :: ([a] -> b) -> [a] -> [b]jackknife f = map f . resamples 500``

# Resampling

It's easy to write a resampling function using familiar building blocks.

``resamples :: Int -> [a] -> [[a]]resamples k xs =    take (length xs - k) \$    zipWith (++) (inits xs) (map (drop k) (tails xs))``

Our function resamples a list with a window size of `k`.

``>> resamples 2 [0..5][[    2,3,4,5], [0,    3,4,5], [0,1,    4,5], [0,1,2,    5]]``

# Speeding up the jackknife

The nice thing about the jackknife is that each element of the result list is independent, so it's an "embarrassingly parallel" problem.

The `monad-par` package is very helpful in making this trivial to parallelize.

``import Control.Monad.Par (runPar, parMap)jackknifeP f = runPar . parMap f . resamples 500``

Let's try it out!

# Suitably noisy data Noise!

# A test program

``import System.Random.Mersennecrud = zipWith (\x a -> sin (x / 300)**2 + a) [0..]main = do  (xs,ys) <- splitAt 1500 . take 6000 <\$> (randoms =<< getStdGen)  let rs = crud xs ++ ys  putStrLn \$ "sample mean:    " ++ show (mean rs)  let j = jackknifeP mean rs  putStrLn \$ "jack mean min:  " ++ show (minimum j)  putStrLn \$ "jack mean max:  " ++ show (maximum j)``

# Compiling

We have to remember to build using both `-threaded` and `-rtsopts`.

``````ghc -O --make -rtsopts -threaded Jackknife
``````
• `-threaded` gives us the threaded runtime.

• We'll need `-rtsopts` to tell the RTS how many cores to use. It will default to just one if we do not specify otherwise.

To use e.g. 3 cores:

``````./Jackknife +RTS -N3
``````

If we want to use all cores:

``````./Jackknife +RTS -N
``````

# Other approaches to parallelism

Meanwhile, `par` and the `monad-par` packages are not the only ways to achieve parallel speedups in Haskell.

There are other more "researchy" projects under way:

• For multicore machines: Data Parallel Haskell

• For GPU computing: the accelerate package

DPH is the name of a research project to support nested data parallelism under GHC on multicore CPUs.

Nested data parallelism extends the programming model of flat data parallelism, as known from modern Fortran dialects, to:

• Irregular parallel computations (such as divide-and-conquer algorithms) and

• Irregular data structures (such as sparse matrices and tree structures).

This project is still a work in progress, and is quite technically intricate.

# Accelerate

The accelerate library defines an embedded language for regular, multi-dimensional array computations with multiple backends to facilitate high-performance implementations.

Currently, there are two backends:

• An interpreter that serves as a reference implementation of the intended semantics of the language.

• A CUDA backend generating code for CUDA-capable NVIDIA GPUs.

# Busting a myth

You'll sometimes hear uninformed people claim that functional programming is some kind of magic pixie dust for parallel programming.

This is not really true.

Certainly, aspects of the programming model make incidental parts of the problem easier.

However, tricky considerations like data dependencies, memory placement, and thread scheduling have every bit as insidious an effect on parallel performance in Haskell as in C.

Parallel programming is a subtle art no matter which language you do it in.