Bad actors on the internet

No, not that kind.

Bad actors on the internet

Fighting spam is an arms race.

Bad actors on the internet

Fighting spam is an arms race.

Bad actors on the internet

The attack landscape is constantly evolving:

…and many more

Why does spam appear to be “solved”?

Fighting spam at Facebook

Sigma is a rule execution engine

Every interaction on FB has associated rules

Sigma evaluates each interaction to identify and block malicious acts

An example policy

“A user who is less than a week old posts a photo tagging >= 5 non-friends”


“A user who is less than a week old posts a photo tagging >= 5 non-friends”

The old Sigma rule language, FXL:

If (AgeInHours(Account) < 168 &&
                      Friends(Account))) >= 5)
Then [BlockAction, LogRequest]
Else []

History: FXL


Minuses became an increasing drag:

Purely functional and strongly typed

Automatically batch and overlap data fetches

Policies fetch data from many other systems

Concurrency must be implicit

Fast turnaround

Push code to production in minutes

Support for interactive development

Our desired endpoint

Our ideal for expressiveness:

length (intersect (friendsOf x) (friendsOf y))

Some requirements

We have many varied data sources

Must minimise network roundtrips

Need to abstract all this away

A terrible hack

{-# LANGUAGE NoImplicitPrelude #-}
module HackyPrelude where

length :: IO [a] -> IO Int
intersect :: Eq a => IO [a] -> IO [a] -> IO [a]
friendsOf :: UserID -> IO [UserID]

This approach “works”, but it’s awful.

What’s wrong?

Everything gets lifted into IO:

length :: IO [a] -> IO Int

No more pure, safe code :-(

No concurrency or batching when we execute two friendsOf actions:

length (intersect (friendsOf x) (friendsOf y))

Explicit concurrency

What if we use MVar to express this?

length (intersect (friendsOf x) (friendsOf y))

Can’t see what’s happening for all the MVar machinery!

  m1 <- newEmptyMVar
  m2 <- newEmptyMVar
  forkIO (friendsOf x >>= putMVar m1)
  forkIO (friendsOf y >>= putMVar m2)
  fx <- takeMVar m1
  fy <- takeMVar m2
  return (length (intersect fx fy))

The central problem

We need something where:

What do we want?

  1. Inspect a rule to find all data fetches, without executing them

  2. Re-organize fetches to use batching, concurrency, and caching

  3. Execute fetches, wait for results

  4. Resume execution once results come in,
    inspecting and re-organizing the next round of fetches

Static analysis

We want to be able to inspect the structure of code without executing it

Monads and static analysis

Let’s think briefly of a monad m as the “structure” of a computation.

Look at the flipped version of bind:

(=<<) :: Monad m => (a -> m b) -> m a -> m b

It transforms an m a into an m b.

To do so, it takes its instructions from a -> m b.

That function can use a to decide what m b it returns.

Thus we can’t analyse this statically (at least not easily):

Functors, applicatives, and static analysis

Compare this:

(=<<) :: Monad m       => (a -> m b) -> m a -> m b

With these rather similar type signatures:

(<$>) :: Functor f     =>   (a -> b) -> f a -> f b
(<*>) :: Applicative f => f (a -> b) -> f a -> f b

What’s the crucial difference?

Neither Functor nor Applicative can affect the “structure” f that gets returned.

They can only change b inside f.

Therefore they’re friendly to static analysis!

Enter Haxl

newtype Haxl a = Haxl { unHaxl :: IO (Result a) }

data Result a = Done a
              | Blocked (Haxl a)

instance Monad Haxl where
  return a = Haxl (return (Done a))

  m >>= k = Haxl $ do
    a <- unHaxl m
    case a of
      Done a    -> unHaxl (k a)
      Blocked r -> return (Blocked (r >>= k))

What’s going on?

data Result a = Done a
              | Blocked (Haxl a)

A computation has either

What is the Haxl monad?

Alternate names for this construction:



countCommonFriends :: UserID -> UserID -> Haxl Int

countCommonFriends x y = do
  fx <- friendsOf x
  fy <- friendsOf y
  return (length (intersect fx fy))

We run out of AST to explore as soon as we hit the first friendsOf, thanks to use of >>= (via do desugaring).

Can’t see the second one, so no concurrent execution.

Applicatives to the rescue!

Can we rewrite our function to make more of the fetches statically visible?


countCommonFriends x y =
  length <$> (intersect <$> friendsOf x <*> friendsOf y)

A little bit of sequencing

instance Applicative Haxl where
  pure = return
  Haxl f <*> Haxl a = Haxl $ do
    r <- f
    case r of
    Done f' -> do
      ra <- a
      case ra of
        Done a'    -> pure (Done    (f' a'))
        Blocked a' -> pure (Blocked (f' <$> a'))
    Blocked f' -> do
      ra <- a
      case ra of
        Done a'    -> pure (Blocked (f' <*> pure a'))
        Blocked a' -> pure (Blocked (f' <*> a'))

Simplifying coding

Two complementary approaches:

Fancy language support

{-# LANGUAGE ApplicativeDo #-}

Facebook-developed language extension

When ApplicativeDo is turned on:


  x <- a
  y <- b
  return (f x y)

Translates to:

f <$> a <*> b

Why does caching matter?

Consider our example fragment of code:

length (intersect (friendsOf x) (friendsOf y))

What if this was executed as two fetches?

Data sources

Several “core data” systems at Facebook

Other data sources also involved in Sigma


A data source interacts with Haxl core in 3 ways


-- Core abstraction
class DataSource req where
  {- ... -}

-- An example data source
data ExampleReq a where
  CountAardvarks :: String -> ExampleReq Int
  ListWombats :: Id -> ExampleReq [Id]
  deriving Typeable

-- Fetch data generically
dataFetch :: DataSource req => req a -> Haxl a

Refinement: blocked computations

dataFetch :: DataSource req => req a -> Haxl a

Haxl core has to manage requests submitted with dataFetch

Once an entire round of fetching has stopped making progress, we retrieve the pending requests

class DataSource req where
  fetch :: [BlockedFetch req] -> IO ()

data BlockedFetch req =
  forall a . BlockedFetch (req a) (MVar a)

Refinement: acquiring pending requests

Remember: Haxl core is agnostic to data sources

We use dynamic typing (Typeable) to manage them

-- hack to support parameterised types
class Eq1 req where
  eq1 :: req a -> req a -> Bool

class (Typeable1 req, Hashable1 req, Eq1 req) =>
      DataSource req where
  data DataState req

  fetch :: DataState req
      -> [BlockedFetch req]
      -> IO ()

DataState is an associated type

Is that it?

A lot of other demanding and intricate work went into making Haxl run effectively at scale.

As of June 2015, Haxl-powered Sigma was handling over a million RPS.

For many more juicy details, see the article on code.facebook.com.

Code is open sourced on Hackage as haxl.

Haxl inspired Twitter’s Stitch project.

Dynamic types at scale

Facebook is famous for having been built in PHP.

As our code base grew to millions of lines in size:

We observed the same growing pains in Javascript.

Gradual typing

In between static and dynamic type systems lies the interesting middle ground of gradual types.

Languages that start all-dynamic can acquire gradual typing

…as can languages that start all-static!

Gradual typing of PHP

At Facebook, we developed Hack: hacklang.org


Well over 90% of Facebook PHP code is now statically typed via Hack.

Some code is still too tricky to statically type.

Type system ergonomics

PHP programmers rightly value rapid feedback.

With Hack, this short cycle time had to be preserved.

Implications are huge:

Implementing Hack

The Hack type checker is built in OCaml.

OCaml had, until recently, zero support for multiple CPUs.

But multiple CPUs can speed up type checking, and we care greatly about speed …

Gradually typed Javascript: Flow

Following the success of Hack at Facebook, we started work on the same problem space of types in Javascript.

The result is the Flow type checker: flowtype.org

Similar ergonomics:

Philosophical differences



Flow vs TypeScript: pragmatics

Somewhat similar ideas, different tradeoffs

Pros for TS:

Pros for Flow: