![](graph.png)

* Graph reduction allows lazy evaluation and sharing
* _let_: adds new node to graph
* _case_: expression evaluation, causes the graph to be reduced
* when a node is reduced, it is replaced (or _updated_) with its
result

# Terminology 102
* _redex(es)_:
reducible expression. A expression that can be evaluated further
* _normal form_:
an expression without an redexes
* _head normal form_:
an expression where the top level (head) is neither a redex NOR
a lambda abstraction with a reducible body
* _weak head normal form_:
an expression where the top level (head) isn't a redex
* _unfolding_:
unfolding of a function f is just the body of f.
* Unfolding = Inlining.
# Terminology 103
* evaluation strategies:
* _call-by-value_: arguments evaluated before function entered
(copied)
* _call-by-name_: arguments passed unevaluated
* _call-by-need_: arguments passed unevaluated but an expression
is only evaluated once (sharing)
* _no-strict evaluation_ Vs. _lazy evaluation_:
* non-strict: Includes both call-by-name and call-by-need, general
term for evaluation strategies that don't evaluate arguments
before entering a function
* lazy evaluation: Specific type of non-strict evaluation. Uses
call-by-need (for sharing).
# Front End: _Haskell -> Core_
Lets now look at how Haskell is compiled to
[Core](http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/CoreSynType).
* Core is a small lazy language functional language
* Learning Core is the most useful thing to get out of this lecture,
experienced Haskell programmers that care about performance will
look at Core.
* Think of it as a functional assembly language. Reasoning about
behaviour and performance of Core is much simpler
# Terminology 104
* _kernel_: A kernel in programming language domain means the
essential subset of the language. The 'base' of the language on
which all other constructs are defined.
* Core is a kernel for Haskell.
* _CAF_: (Constant Applicative Form) A top level constant, allocated
for life time of program and shared. Since statically allocated
garbage collector has to [treat them specially](http://hackage.haskell.org/trac/ghc/wiki/Commentary/Rts/Storage/GC/CAFs?redirectedfrom=Commentary/Rts/Storage/CAFs)
* _scrutinee_: The expression you are case'ing on in a case statement
# Functions -> Core
Haskell
~~~~ {.haskell}
idChar :: Char -> Char
idChar c = c
id :: a -> a
id x = x
idChar2 :: Char -> Char
idChar2 = id
~~~~
Core
~~~~ {.haskell}
idChar :: GHC.Types.Char -> GHC.Types.Char
[GblId, Arity=1, Caf=NoCafRefs]
idChar = \ (c :: GHC.Types.Char) -> c
id :: forall a. a -> a
id = \ (@ a) (x :: a) -> x
idChar2 :: GHC.Types.Char -> GHC.Types.Char
idChar2 = id @ GHC.Types.Char
~~~~
* [GblId...] specifies various metadata about the function
* Functions are all lambda abstractions
* Explicit passing and instantiation of type variables
* type variables are proceeded by @ symbol (read them as 'at type
...')
* they are passed abstracted and passed around just like value
variables
* this is known as second order lambda calculus
* GHC uses this representation because it makes preserving type
information during optimization easy
# Functions -> Core
Haskell
~~~~ {.haskell}
map :: (a -> b) -> [a] -> [b]
map _ [] = []
map f (x:xs) = f x : map f xs
~~~~
Core
~~~~ {.haskell}
map :: forall a b. (a -> b) -> [a] -> [b]
map =
\ (@ a) (@ b) (f :: a -> b) (xs :: [a]) ->
case xs of _ {
[] -> GHC.Types.[] @ b;
: y ys -> GHC.Types.: @ b (f y) (map @ a @ b f ys)
}
~~~~
* case statements are only place evaluation happens, read them as
'evaluate'
* they take an extra variable just after `of` that captures the
return value of the scrutinee
* names are fully qualified
# Data -> Core
Haskell
~~~~ {.haskell}
data Maybe a = Nothing | Just a
none = Nothing
some = Just (1 :: Int)
~~~~
Core
~~~~ {.haskell}
none :: forall a. Maybe a
none = Nothing
n :: GHC.Types.Int
n = GHC.Types.I# 1
some :: Maybe GHC.Types.Int
some = Just @ GHC.Types.Int n
~~~~
* Data types don't explicitly appear in Core
* Core supports datatype but just no syntax for them at this level
* Can see how GHC lifts constants out to the top level (CAFs)
* Can also see boxing and primitive types
* In general Core follows same syntactic rules as Haskell (e.g
Uppercase = Data constructor, # = unboxed value / type)
# Handling where
Haskell
~~~~ {.haskell}
dox :: Int -> Int
dox n = x * x
where x = (n + 2) * 4
~~~~
Core
~~~~ {.haskell}
dox :: GHC.Types.Int -> GHC.Types.Int
dox =
\ (n :: GHC.Types.Int) ->
let {
x :: GHC.Types.Int
x =
GHC.Num.* @ GHC.Types.Int GHC.Num.$fNumInt
(GHC.Num.+ @ GHC.Types.Int GHC.Num.$fNumInt n (GHC.Types.I# 2))
(GHC.Types.I# 4) }
in GHC.Num.* @ GHC.Types.Int GHC.Num.$fNumInt x x
~~~~
* `where` becomes `let`
# Patterns & Guards
Haskell
~~~~ {.haskell}
iff :: Bool -> a -> a -> a
iff True x _ = x
iff False _ y = y
~~~~
Core
~~~~ {.haskell}
iff :: forall a. GHC.Bool.Bool -> a -> a -> a
iff =
\ (@ a) (d :: GHC.Bool.Bool) (x :: a) (y :: a) ->
case d of _
GHC.Bool.False -> y
GHC.Bool.True -> x
~~~~
* Patterns and guards become `case` statements
# Sharing & Updating
Haskell
~~~~ {.haskell}
sum100 :: Int -> Int
sum100 n = foldr (+) 0 [1..100]
~~~~
Core
~~~~ {.haskell}
-- Unoptimized
sum100n = \ (n :: Int) -> * n (foldr (I# 0) (enumFromTo (I# 1) (I# 100)))
-- Optimized
sum100n = \ (n :: Int) -> GHC.Base.timesInt n sum100n1
sum100n1 = case $wgo 1 of r { __DEFAULT -> GHC.Types.I# r }
$wgo :: Int# -> Int#
$wgo = \ (w :: Int#) ->
case w of w'
__DEFAULT -> case $wgo (GHC.Prim.+# w' 1) of r
__DEFAULT -> GHC.Prim.+# w' r
100 -> 100
~~~~
* For the optimized case GHC lifts the constant expression out so its
only computed once and then shared
* Optimized version creates a new function called `$wgo` which means
'worker'. This version works with unboxed types for efficiency.
# Partial Evaluation -> Core
Haskell
~~~~ {.haskell}
add :: Int -> Int -> Int
add x y = x + y
add2 :: Int -> Int
add2 = add 2
~~~~
Core (unoptimized)
~~~~ {.haskell}
add :: GHC.Types.Int -> GHC.Types.Int -> GHC.Types.Int
add =
\ (x :: GHC.Types.Int) (y :: GHC.Types.Int) ->
GHC.Num.+ @ GHC.Types.Int GHC.Num.$fNumInt x y
x :: GHC.Types.Int
x = GHC.Types.I# 2
add2 :: GHC.Types.Int -> GHC.Types.Int
add2 =
\ (y :: GHC.Types.Int) ->
GHC.Num.+ @ GHC.Types.Int GHC.Num.$fNumInt x y
~~~~
* (+) function used is the polymorphic `GHC.Num.+` variant
* `GHC.Num.+ @ GHC.Types.Int GHC.Num.$fNumtInt` means, select the
(+) field from the GHC.Types.Int dictionary (which is retrieved
from GHC.Num.$fNumInt) for the GHC.Num type class
# Partial Evaluation -> Core
Haskell
~~~~ {.haskell}
add :: Int -> Int -> Int
add x y = x + y
add2 :: Int -> Int
add2 = add 2
~~~~
Core (optimized)
~~~~ {.haskell}
add :: GHC.Types.Int -> GHC.Types.Int -> GHC.Types.Int
Hs2Core.add = GHC.Base.plusInt
x :: GHC.Types.Int
x = GHC.Types.I# 2
add2 :: GHC.Types.Int -> GHC.Types.Int
add2 = GHC.Base.plusInt x
~~~~
* type class dictionary method has been inlined.
# (+) -> Core
The function `GHC.Base.plusInt` is implemented as:
~~~~ {.haskell}
+ :: Int -> Int -> Int
+ = \ a b -> case a of _
I# a_ -> case b of _
I# b_ -> I# (GHC.Prim.+# a_ b_)
~~~~
* Notice the evaluation and unboxing of each argument, followed
finally by reboxing.
# Type Classes -> Core
Haskell
~~~~ {.haskell}
typeclass MyEnum a where
toId :: a -> Int
fromId :: Int -> a
instance MyEnum Int where
toId = id
fromId = id
instance (MyEnum a) => MyEnum (Maybe a) where
toId (Nothing) = 0
toId (Just n) = 1 + toId n
fromId 0 = Nothing
fromId n = Just (fromId $ n - 1)
~~~~
Core
~~~~ {.haskell}
toId :: forall a. MyEnum a => a -> GHC.Types.Int
toId =
\ (@ a) (d :: MyEnum a) ->
case d of _ { D:MyEnum f1 _ -> f1 }
fromId :: forall a. MyEnum a => GHC.Types.Int -> a
fromId =
\ (@ a) (d :: MyEnum a) ->
case d of _ { D:MyEnum _ f2 -> f2 }
~~~~
# Type Classes -> Core
Core
~~~~ {.haskell}
$fMyEnumInt :: MyEnum GHC.Types.Int
$fMyEnumInt = D:MyEnum @ GHC.Types.Int (id @ GHC.Types.Int) (id @ GHC.Types.Int)
$fMyEnumMaybe :: forall a. MyEnum a => MyEnum (Maybe a)
$fMyEnumMaybe =
\ (@ a) ($dMyEnum_arR :: MyEnum a) ->
D:MyEnum @ (Maybe a_acF)
($fMyEnumMaybe_$ctoId @ a $dMyEnum_arR)
($fMyEnumMaybe_$cfromId @ a $dMyEnum_arR)
$fMyEnumMaybe_$ctoId :: forall a. Hs2Core.MyEnum a => Hs2Core.Maybe a -> GHC.Types.Int
$fMyEnumMaybe_$ctoId =
\ (@ a) ($dMyEnum_arR :: MyEnum a) (ds :: Maybe a) ->
case ds of _
Nothing -> GHC.Types.I# 0
Just n -> case toId @ a $dMyEnum_arR n of _
GHC.Types.I# y -> GHC.Types.I# (GHC.Prim.+# 1 y)
~~~~
* Typeclasses are implemented via _dictionaries_
* Just a data structure storing the various functions for each field
* Functions that have type class constraints take an extra dictionary argument
* GHC will optimize away this dictionary passing when it can
# IO -> Core
* Monads are just type classes. So much of previous applies.
* IO Monad is basically a state passing monad. Passes around the 'Real
World' so that IO actions can transform it.
~~~~ {.haskell}
newtype IO a = IO (State# RealWorld -> (# State# RealWorld, a #))
~~~~
* 'Real Wold' is represented in GHC by a special token
* At the base, there are some primitive IO actions.
* IO Monad builds on top of RealWord# and the primitive IO actions.
Haskell
~~~~ {.haskell}
f :: IO ()
f = do
putStrLn "Hello World"
putStrLn "What's up today?"
~~~~
# IO -> Core
Core (Unoptimized)
~~~~ {.haskell}
g :: GHC.Types.IO ()
g =
GHC.Base.>> @ GHC.Types.IO GHC.Base.$fMonadIO @ () @ ()
(System.IO.putStrLn (GHC.Base.unpackCString# "Hello World"))
(System.IO.putStrLn (GHC.Base.unpackCString# "What's up today?"))
~~~~
Core (optimized)
~~~~ {.haskell}
f :: GHC.Prim.State# GHC.Prim.RealWorld -> (# GHC.Prim.State# GHC.Prim.RealWorld, () #)
f =
\ (world :: GHC.Prim.State# GHC.Prim.RealWorld) ->
case hPutStr2 stdout f1 True world of _
(# new_world, _ #) -> hPutStr2 stdout f2 True new_world
f1 :: [GHC.Types.Char]
f2 = GHC.Base.unpackCString# "Hello World"
f2 :: [GHC.Types.Char]
f1 = GHC.Base.unpackCString# "What's up today?"
~~~~
* `unpackCString#` takes a C style string and turns it into a Haskell
String
# Lazy Evaluation -> Core
Haskell
~~~~ {.haskell}
foldl :: (a -> b -> a) -> a -> [b] -> a
foldl' :: (a -> b -> a) -> a -> [b] -> a
forcee :: a -> b -> b
forccee = seq
~~~~
Core
~~~~ {.haskell}
foldl = \ (f :: a -> b -> a) (z :: a) (d :: [b]) ->
case d of _
[] -> z;
: x xs -> foldl f (f z x) xs
foldl' = \ (f :: a -> b -> a) (z :: a) (d :: [b]) ->
case d of _
[] -> z;
: x xs ->
case f z x of z'
__DEFAULT -> foldl' b f z' xs
forccee = \ (x :: a) (y :: b) -> case x of _ { __DEFAULT -> y }
~~~~
* Notice the exta `case` statement in foldl' to force evaluation
# Core Summary
* Look at Core to get an idea of how your code will perform
* Lots of noise in Core, so best to clean up manually (or play with
various flags to suppress some of the noise)
* Some rules:
* Pattern matching and guards are translated to case statements
* `where` statements become `let` statements
* language still lazy but looking for `let` and `case` gives you a
good idea of evaluation order
* `case` means evaluation. (e.g `seq` is translated to `case`)
* `let` statements are allocation of closures
* function application is a thunk
* operations involving unboxed types are eager
# Middle of GHC: _Core -> Core_
Hopefully you have a decent idea of how Haskell is reduced to Core
now. Once we have the Core IR we can do a lot of optimization work:
* Inlining, CSE, DCE
* Strictness
* Float In
* Full Laziness
* Specialise
* Spec Constr
* Liberate Case
* Lambda Eta Expansion
* Do Eta Reduction
* Case Merge
* Static Argument Transformation
# Some standard optimisations
* GHC does some stock standard optimisations: Inlining, Common
Subexpression Elimination, Dead Code Elimination
* A large set of simple, local optimisations (e.g constant folding)
are done in one pass called the _simplifier_. It is run repeatedly
until not further changes can be done (with a fixed maximum number
of iterations).
* These are only the basic, big win ones. All the other standard stuff
(e.g strength reduction, loop induction...) are missing.
* We get a lot of this for free though if we use the LLVM backend.
Rest of the optimisations GHC does are fairly specific to a functional
language. Lets look at a few of them.
~~~~
Fun Fact: Estimated that functional languages gain 20 - 40%
improvement from inlining Vs. imperative languages which gain 10 - 15%
~~~~
# STG Code
* In the next few slides the code Ill be showing isn't exactly Core
but a IR GHC uses after Core called STG. (Ive cleaned up the STG
though so its not `true` syntax)
* STG is very similar to Core but has one nice additional property:
* laziness is 'explicit'
* `case` = _evaluation_ and ONLY place evaluation occurs (true in
Core)
* `let` = _allocation_ and ONLY place allocation occurs (not true
in Core)
* So in STG we can explicitly see thunks being allocated for
laziness using `let`
* To view STG use:
~~~~
ghc -ddump-stg A.hs > A.stg
~~~~
# Strictness & Unboxing
* Consider the expression `x + y`, where x and y have type Int.
* In Haskell `x` & `y` must be represented by pointers to a
possibly unevaluated object
* Even if evaluated still represented by "boxed" values on the
heap
* So addition operation must unbox `x` & `y`, add them, and box
the result
* This can be a huge performance penalty in numeric heavy code if the
implementation is naive
* If we can we want to work with unboxed values as long and as much as
possible
* We can only do this though when we have determined that a value
`x` will always be evaluated (i.e is 'strict') to avoid breaking
the lazy semantics of Haskell
# Naive compilation of factorial
Consider this factorial implementation in Haskell:
~~~~ {.haskell}
fac :: Int -> Int -> Int
fac a 0 = a
fac a n = fac (n*a) (n-1)
~~~~
STG
~~~~ {.haskell}
fac = \ a n -> case n of
I# n# -> case n# of
0# -> a
_ -> let one = I# 1;
x = n - one
y = n * a;
in fac y x
~~~~
* We allocate thunks before the recursive call and box arguments
* But `fac` will immediately evaluate the thunks and unbox the values!
* With this strictness knowledge, the boxing and thunk creation are
unnecessary overhead
# GHC with strictness analysis and unboxing
If we compile in GHC with optimisations turned on:
~~~~ {.haskell}
one = I# 0#
-- worker :: Int# -> Int# -> Int#
$wfac = \ a# n# -> case n# of
0# -> a#
n'# -> case (n'# -# 1#) of
m# -> case (n'# *# a#) of
x# -> $wfac x# m#
-- wrapper :: Int -> Int -> Int
fac = \ a n -> case a of
I# a# -> case n of
I# n# -> case ($wfac a# n#) of
r# -> I# r#
~~~~
* Strictness analysis has discovered that `fac` is strict in both
arguments
* So creates a new 'worker' variant of `fac` that uses unboxed types
and no thunks
* Keeps original function `fac` though, referred to as the 'wrapper'
to supply the correct type interface for other code.
* As the wrapper uses unboxed types and is tail recursive, this will
compile to a tight loop in machine code!
# SpecConstr
The idea of the SpecConstr pass is to extend the strictness and
unboxing from before but to functions where arguments aren't strict in
every code path.
Consider this Haskell function:
~~~~ {.haskell}
drop :: Int -> [a] -> [a]
drop n [] = []
drop 0 xs = []
drop n (x:xs) = drop (n-1) xs
~~~~
* Would like to pass `n` unboxed but it isn't strict in the first
pattern
So we get this code in STG:
~~~~ {.haskell}
drop n xs = case xs of
[] -> []
(y:ys) -> case n of
I# n# -> case n# of
0 -> []
_ -> drop (I# (n# -# 1#)) xs
~~~~
* Notice how after the first time this function is called and we start
recursing, we could pass `n` unboxed
# SpecConstr
The SpecConstr pass takes advantage of this to create a specialised
version of `drop` that is only called after we have passed the first
check where we may not want to evaluate `n`.
Basically we aren't specialising the whole function but a particular
branch of it that is heavily used (ie. recursive)
~~~~ {.haskell}
drop n xs = case xs of
[] -> []
(y:ys) -> case n of
I# n# -> case n# of
0 -> []
_ -> drop' (n# -# 1#) xs
-- works with unboxed n
drop' n# xs = case xs of
[] -> []
(y:ys) -> case n# of
0# -> []
_ -> drop (n# -# 1#) xs
~~~~
* To stop the code size blowing up GHC limits the amount of
specialized functions it creates, specified with the
`-fspec-constr-threshol` and `-fspec-constr-count` flags
# Backend: _Core -> Assembly_
Final stage of GHC is compiling Core to an executable. The backend is
in two parts:
* STG -> Cmm: called the code generator in GHC
Cmm is a low level imperative language used in GHC. Basically a very
simple C like language. Just enough to abstract away hardware
registers, call conventions:
* Cmm exists to provide a common (easy) IR for the final backends to
work with.
* There are three _Cmm -> Object code_ backends in GHC: C code
generator using GCC, Native assembly code generator and an LLVM code
generator.
* C is for portability, NCG is for compilation speed, LLVM is for
'performance' and the future
# _STG -> Cmm_
So what has been handled and what is left to handle?
* By the STG stage we have:
* Simplified Haskell to a handful of constructs (variables,
literal, let, lambda, case and application)
* type classes, monads have all been dealt with
* laziness is nearly explicit through let constructs for
allocation and case for evaluation
* So we still have to deal with:
* Compiling these constructs efficiently, big focus will be
handling closures and garbage collection (lazy functional
languages involve a lot of allocation of short lived objects)
* Partial application (only remaining implicit allocation)
* Evaluating thunks and handling updates
# The STG Machine
The way the operational semantics of the STG language is defined is by
an abstract machine called 'The STG Machine'.
* The idea of an abstract machine is to give an operational semantics
to a language (STG in this case) that the source language (Haskell
in this case) can be 'easily' mapped to.
* But the abstract machine should also define an efficient way it
itself can be implemented on standard hardware.
* So basically its a virtual machine stepping stone. LLVM is a good
modern day example of this that is fairly widely known.
# _STG Machine -> Cmm_
Lets just look at some of the details of the code generator. The final
backends are all pretty straight forward (think simple C compiler).
The important parts of the code generator are:
* Closure representation
* Heap and Stack layout
* Call convention & partial application
* Graph reduction: thunks, update frames and black holes
* Case statements
* Pointer tagging and evaluation
* RTS and Garbage Collection
# Closure Representation
The STG machine represents function and data values as heap allocated
_closures_. Delayed computations, _thunks_, are also represented by
closure objects.
In GHC all Heap objects have the same layout:
Closure | Info Table | ||

![](heap-object.png) | ![](basic-itbl.png) |