May 01, 2019

Purity in Haskell

Haskell can do anything your mainstream programming language can.
Purity is not about preventing side effects (a database query or an http request), it's about having a clear boundary between code with side effects (impure) and pure code.

What are Pure functions?

Haskell is a purely functional programming language.

Today we're mostly interested in the pure bit. We could get away by saying that functional means you can only work with functions. There are no objects sending messages to one another — a Haskell program is just a big composition of functions put together like Lego blocks.

A lot of people associate purity with lack of side effects. I certainly did when I started learning about Haskell. It was something that confused me for quite some time. If everything is pure, how can I write to a file? How do I send an email?

One of the things you immediately come across is this thing called IO. In the beginning I wasn't clear exactly what it represented, but I understood that if you wanted to do useful stuff you had to do things in IO.

Let's make an example. A function that takes a ShoppingCart and returns an Int (a number) representing the total number of items in the cart, would have this type signature:

numberOfItems :: ShoppingCart -> Int

I don't want to assume you know too much about Haskell so let's write this function in Javascript. I have a cart which is just a list of items with some quantity.

const numberOfItems = cart => {
  return cart.items.reduce(
    (acc, item) => acc + item.quantity, 0
  )
}

const sampleCart = [
  { item: "A book",  quantity: 1 },
  { item: "A chair", quantity: 3 },
]

console.log(numberOfItems(sampleCart)) // 4

Simple right? Now, this function doesn't have any side effects. This is what we call a pure function.

There are many definitions of purity out there, but the one I like to use to introduce the concept is:

A function is pure when given the same input, you always get the same output.

It is as simple as that! No matter how many times we call numberOfItems, if we give it the same ShoppingCart we'll always get the same quantity back. This function doesn't read from a file or generates a random number, so its output is only going to be determined by the input you feed it.

Pure functions are highly desirable precisely because they are predictable. In other words they are:

Easy to reason about
Easy to test

A program made of only pure functions is useless

I know what you're thinking. This is all well and good but in the real world I need to do messy stuff! I need to connect to a database to fetch the items in the shopping cart and my function wouldn't be pure anymore. So what's the point?

The point is creating a clear boundary between pure and impure functions.

Let's expand on our simple example and pretend that a cart is stored in some Postgres table.

# Carts table
-------------------------------
cart_id | product_id | quantity
-------------------------------
ab341   | A book     | 1
ab341   | A chair    | 3

Our numberOfItems function can no longer be pure. We need to write some SQL query and perform a side effect by executing it. We lost purity because the output of our function is no longer determined by its input. There's now a database involved which could be empty or have hundreds of carts.

The state of the database is in no way provided as input so the output is no longer deterministic! Theoretically, if we could pass the entire content of the database as an input then this function would still be pure. Obviously that's not practical, but it's a nice thought experiment nonetheless. :)

const numberOfItems = cartId => {
  return db.query(
      'select * from Carts where cart_id = ?',
      cartId
    )
    .then(items => {
      return items.reduce(
        (acc, item) => acc + item.quantity, 0
      )
    })
}

numberOfItems('ab341')
.then(count => console.log(count)) // 4

The corresponding function in Haskell would have to change its type signature. We finally get to IO — the massive hammer we have available to do anything and everything.

-- We can no longer simply return `Int`.
-- The output needs to be wrapped in `IO`.
numberOfItems :: ShoppingCartId -> IO Int

Functions are nice and pure by default. When there are side effects, the function needs to be in IO otherwise the compiler will refuse to compile your program. Reading from a file? You'll get an IO String. Generating a random number? You'll get an IO Int.

You can think of IO as a way of marking a function as impure. You know that a function is impure (it might perform some side effects) when its output is wrapped in IO.

Impure functions are not desirable.

We said pure functions are easy to reason about and to test. Impure functions are the exact opposite!

1. Impure code is harder to reason about

By making numberOfItems impure, we lost the ability to easily reason about it. We can no longer determine the result just by looking at the input, because the result is going to be dependent on external state, ie. the content of the Carts table in the database.

2. Impure code is harder to test

Now that a database is involved, how are we going to test that function? The first version (the pure one) is extremely easy to unit test and we can prove that our implementation is correct just by checking that a certain input corresponds to a certain output.

When we have side effects, we lose determinism. That means we'll need some disgusting way of mocking the database or populate a test database with the data we need. Not great.

A clear boundary between pure and impure code

All hope is not lost! We can refactor our code so that we keep the nice pure numberOfItems implementation and compose it to an impure function that just pulls data out of the database.

The type signatures might look like:

-- Pure
numberOfItems :: ShoppingCart -> Int

-- Impure
fetchShoppingCart :: ShoppingCartId -> IO ShoppingCart

-- And now for the composition!
numberOfItemsByCartId :: ShoppingCartId -> IO Int

Now, if I look at numberOfItems I can be 100% sure that it's not going to have any nasty side effects.

I accept that fetchShoppingCart will do something bad (made explicit by its output wrapped in IO) but I don't really care about testing it because I factored all of the logic out. Finally, we define numberOfItemsByCartId by composing the two other functions together.

Remember, when you compose pure and impure functions, the result is always going to be impure. IO is infective, it spreads through your program like a virus. We want to push side effects to the edge of our program so that our core can remain pure. This is the Functional core, Imperative shell pattern.

Conclusion

Write your business logic as pure functions.

Your code will be easier to reason about and to test. Keep impure code as dumb as possible. Impure functions should only be transferring data in and out of pure functions!

Pure functions are great and you need to learn how to use them to your advantage. Obviously we could have made the same refactoring in our Javascript code just the same and that would have been a massive improvement. But there's nothing preventing you from making these mistakes. You have to be vigilant and disciplined about separating pure and impure code. That's what makes Haskell amazing, the compiler is there to tell you when you're doing something wrong.

This has been eye opening to me. In Haskell you must be explicit about side effects. A program without side effects would be useless, but the point is not preventing side effects.

The point is having a clear separation between pure and impure code. Pure functions help us write robust and correct software because they're easy to reason about and test.

I'm writing a series for people that want to learn Haskell without the bullshit.
There are exercises and videos as well!
Check out Zero Bullshit Haskell on Github.

Thanks to Giulio @giuliocanti and Tom @am_i_tom for the amazing feedback.️

You can follow me on Twitter @_alpacaaa.