Faster persistent data structures through hashing

Johan Tibell

November 30, 2011

About me

This lecture

Motivating problem: Twitter data analysis

"I'm computing a communication graph from Twitter data and then scan it daily to allocate social capital to nodes behaving in a good karmic manner. The graph is culled from 100 million tweets and has about 3 million nodes."

We need a data structure that is

Persistent maps in Haskell

Real world performance of Map

Hash tables

Main idea: IntMap as a sparse array

class Hashable a where
hash :: a -> Int

Aside: collisions are easy to deal with

HashMap implemented using an IntMap

Naive implementation:

newtype HashMap k v = HashMap (IntMap [(k, v)])

We use a list of key-value pairs to handle collisions.

Reasoning about space usage

Knowing how GHC represents values in memory is useful because

Memory usage for data constructors

Rule of thumb: a constructor uses one word for a header, and one word for each field. So e.g.

data Uno = Uno a
data Due = Due a b

an Uno takes 2 words, and a Due takes 3.

Memory layout

Here's how GHC represents the list [1,2] in memory:

Refresher: unboxed types

GHC defines a number of unboxed types. These typically represent primitive machine types.

data Int = I# Int#

Poll

How many machine words is needed to store a value of this data type:

data IntPair = IP Int Int

Tip: Draw a boxes-and-arrows diagram.

IntPair memory layout

So an IntPair value takes 7 words.

Refresher: unpacking

GHC gives us some control over data representation via the UNPACK pragma.

The pragma is added just before the bang pattern:

data Foo = Foo {-# UNPACK #-} !SomeType

GHC 7 and later will warn if an UNPACK pragma cannot be used because it fails the use constraint.

Unpacking example

data IntPair = IP !Int !Int

data IntPair = IP {-# UNPACK #-} !Int
{-# UNPACK #-} !Int

Benefits of unpacking

When the pragma applies, it offers the following benefits:

Caveat: There are (rare) cases where unpacking hurts performance e.g. if the value is passed to a non-strict function, as it needs to be reboxed.

Unpacking is one of the most important optimizations available to us.

A structural comparison with C

By reference:

-- Haskell
data A = A !Int
// C
struct A {
int *a;
};

By value:

-- Haskell
data A = A {-# UNPACK #-} !Int
// C
struct A {
int a;
};

If you can figure out which C representation you want, you can figure out which Haskell representation you want.

Exercise: HashMap memory layout

Here are the data types used in our naive HashMap implementation:

newtype HashMap k v = HashMap (IntMap [(k, v)])

data IntMap a
= Bin {-# UNPACK #-} !SuffixMask
!(IntMap a)
!(IntMap a)
| Tip {-# UNPACK #-} !Key a
| Nil

type SuffixMask = Int
type Key = Int

Exercise:

Solution

30 words! 22 (73%) of them overhead.

Can we do better?

Yes! We can make use of the following:

data List k v = Nil | Cons k v (List k v)

is more memory efficient than [(k, v)], as the pair constructor has been unpacked into the Cons constructor.

An improved HashMap data type

data HashMap k v
= Bin {-# UNPACK #-} !SuffixMask
!(HashMap k v)
!(HashMap k v)
| Tip {-# UNPACK #-} !Hash
{-# UNPACK #-} !(FullList k v) -- now monomorphic
| Nil

type SuffixMask = Int
type Hash = Int

data FullList k v = FL k v !(List k v)
data List k v = Nil | Cons k v !(List k v)

Improved HashMap data type memory layout

22 words. 14 (64%) of them overhead.

In general: 5N + 4(N − 1) words + size of keys & values

Remaining sources of inefficiency

Reasoning about laziness

A function application is only evaluated if its result is needed, therefore:

These two properties allow us to use "back-to-front" analysis (known as demand/strictness analysis) to figure which arguments a function is strict in.

Reasoning about laziness: example

max :: Int -> Int -> Int
max x y
| x > y = x
| x < y = y
| otherwise = x -- arbitrary

Poll

data Tree = Leaf | Node Int Tree Tree

insert :: Int -> Tree -> Tree
insert x Leaf = Node x Leaf Leaf
insert x (Node y l r)
| x < y = Node y (insert x l) r
| x > y = Node y l (insert x r)
| otherwise = Node x l r

Which argument(s) is insert strict in?

Solution

Only the second, as inserting into an empty tree can be done without comparing the value being inserted. For example, this expression

insert (1 `div` 0) Leaf

does not raise a division-by-zero expression but

insert (1 `div` 0) (Node 2 Leaf Leaf)

does.

Strictness annotations in the real world

delete :: (Eq k, Hashable k) => k -> HashMap k v -> HashMap k v
delete k0 = go h0 k0
where
h0 = hash k0
go !h !k t@(Bin sm l r)
| nomatch h sm = t
| zero h sm = bin sm (go h k l) r
| otherwise = bin sm l (go h k r)
go h k t@(Tip h' l)
| h == h' = case FL.delete k l of
Nothing -> Nil
Just l' -> Tip h' l'
| otherwise = t
go _ _ Nil = Nil
{-# INLINABLE delete #-}

Benchmark

So, is HashMap faster than Map? Benchmark: 212 random ByteString keys of length 8

benchmarking Map/lookup/ByteString
mean: 1.590200 ms
std dev: 30.69466 us

benchmarking HashMap/lookup/ByteString
mean: 575.9371 us
std dev: 8.790398 us

benchmarking Map/insert/ByteString
mean: 2.957678 ms
std dev: 451.8105 us

benchmarking HashMap/insert/ByteString
mean: 1.506817 ms
std dev: 301.2400 us

Yes!

Memory usage

Benchmark: 220 key-value pairs of type Int, on a 64-bit machine

Estimated: 8 * (5N + 4(N − 1) + 4N) = 104 MB

Real:

   716,158,856 bytes allocated in the heap
 1,218,205,432 bytes copied during GC
   106,570,936 bytes maximum residency (16 sample(s))
     3,636,304 bytes maximum slop
           269 MB total memory in use (0 MB lost due to fragmentation)

Maximum residency is the number we care about (note that the value is sampled and thus not 100% accurate).

Summary

Bonus: memory footprint of some common data types

Write this down on an index card and keep around for back-of-the-envelope calculations.

Data type Memory footprint
Data.ByteString 9 words + N bytes
Data.Text 6 words + 2N bytes
String 5N words
Data.Map 6N words + size of keys & values
Data.Set 5N words + size of elements
Data.IntMap 3N + 5(N-1) words + size of values
Data.IntSet 2N + 5(N-1) words
Data.HashMap 5N + 4(N-1) words + size of keys & values
Data.HashSet 5N + 4(N-1) words + size of elements

(Some caveats apply.)

Bonus: Use libraries tuned for performance

Non-exhaustive list: