Entropic Thoughts

Haskell: A Great Procedural Language

Haskell: A Great Procedural Language

There are many catchphrases about Haskell.

These sound like dismissals or absurdities from the outside, but once you learn what they really mean, they take on a new light. In this article, I want to explain the third. (See the appendix if you are curious about the first two.)

This article really, really tried to become a monad i/o tutorial, but I think I stopped it in time.1 By that I mean I had to rewrite it twice and delete large chunks of monad i/o tutorial material. Here, we are going to jump right in and focus on the interesting stuff.

Side effects as first class values

Effectful computations in Haskell are first class values. This means we can store them in variables or data structures for later use. There is a Haskell function

randomRIO :: (Int, Int) -> IO Int

which, when given two integers as arguments, picks a random integer between them. We can put calls to this function into a list, like so:

some_dice = [ randomRIO(1, 6), randomRIO(1, 6) ]

This is a list of two calls to randomRIO. What surprises non-Haskellers is that when this list is created, no random numbers are generated. Coming from other programming languages, we are used to side effects (such as random generation) being executed directly when the side effectful function is called.2 You may think Haskell is different here due to laziness, but that’s also not true. Even if we put these calls into a strict data structure, no randomness would happen.

We can add more random generation to the list:

more_dice = some_dice <> [ randomRIO(1, 6) ]

and still no random numbers will be generated. We can go ahead and manipulate this list in all sorts of ways, and still no random numbers would be generated.

To be clear, the randomRIO function could well be called3 Whether this actually happens is a question of lazy evaluation, optimisation, etc., and when it is called it returns a value of type IO Int. It’s just that this value is not an integer. If anything, we can think of it as a set of instructions for eventually, somehow, getting an integer. It’s not an actual integer. It’s an object encapsulating a side effect. When this side effect object executes, it will produce a random integer, but the object itself just describes the computation, it is not an integer.

In other words, in Haskell, it is not enough to call a side effectful function to execute its side effects. When we call the side effectful function, it produces an object encapsulating the side effect, and this object can be executed in the future to produce the result of the side effect.4 Readers familiar with JavaScript promises will recognise this concept. Indeed, promises are modeled after side effects in Haskell.

The common way we teach beginners to do execute side effect objects is by calling them from a do block, using the special <- assignment operator to extract their result. As a first approximation, we can think of the following code as the way to force side effects to execute.

dice_print = do
  side <- randomRIO(1, 6)
  printf "It landed on %d\n" side

We can imagine that the <- arrow executes the side effect object returned by randomRIO and captures the value it produces. Similarly, the side effect object returned by printf gets executed, but we don’t capture the result; we don’t care about the value produced by it, we only care about the side effect itself.

The lie-to-children here is that we pretend the do block is magical and that when it executes, it also executes side effects of functions called in it. This mental model will take the beginner a long way, but at some point, one will want to break free of it. That is when Haskell starts to really shine as a procedural language.

This article features another lie-to-children: it will have type signatures specialised to IO a and [a]. All the functions I mention are more generic than I’m letting on.

  • Anywhere this article says IO a it will work with any type of side effect (like Maybe a, Rand g a, StateT s m a, etc.)
  • Anywhere this article says [a] it probably also works with other collection/container types (like Maybe a, Array i a, HashMap k a, Tree a, etc.)

The reason this article uses more direct type signatures is to hopefully be readable also to someone who does not use Haskell.5 If the reader has never seen ml style syntax before, I think the most important thing to know is that function calls aren’t written like isInfixOf("llo", "hello, world\n") but rather with spaces, as in isInfixOf "llo" "hello, world\n".

De-mystifying do blocks

In order to drive the point home, we’ll start by seeing what it is the do block actually does, because it’s not magic at all. In fact, every do block can be converted to just two operators. If you already know this, skip ahead to the next section.

then

If we want to combine two side effects into one, we can use the *> operator, which is pronounced then or sequence right. It takes two side effect objects and creates a new side effect object that executes both when itself is executed. The value produced by this new composite object is going to be the value produced by the second object its constructed from. In that sense, the *> operator is a lot like the comma operator in C: it chains together statements, and returns the result of the last.

(*>) :: IO a -> IO b -> IO b

For example, here we combine two print statements into a single side effectful function.

greeting :: IO ()
greeting =
  putStr "hello, " *> putStrLn "world"

This is a single side effect object called greeting, but its execution will involve the execution of two separate print statements.

We teach beginners to write this as

greeting = do
  putStr "hello, "
  putStrLn "world"

which is the exact same thing, although arguably easier to read. The interesting thing is the implication for how we look at do blocks. It turns out they may not be magical at all; maybe they are just inserting implicit commas, i.e. a pretty way of taking multiple side effect objects and combining them into a single, bigger, side effect object.

bind

The one thing we cannot do with *> is take the result from the left-hand side effect and use it to influence the right-hand side function call, because the *> operator discards the first result before executing the second effect – just like the comma operator in C. The die throwing code we saw previously,

dice_print = do
  side <- randomRIO(1, 6)
  printf "It landed on %d\n" side

cannot be implemented with just *>. We need the additional operator >>= which takes a side effect object and plugs the value it produces into another side effectful function.6 This operator is widely known as bind.

(>>=) :: IO a -> (a -> IO b) -> IO b

Using this operator, we could write the above do block as

print_side :: Int -> IO ()
print_side side =
  printf "It landed on %d\n" side

dice_print :: IO ()
dice_print =
  randomRIO(1, 6) >>= print_side

and this would take the result of executing the side effect of randomRIO and plugging it into another side effectful function, namely print_side.

Two operators are all of do blocks

This illustrates that do blocks are built from only two operators. If we stick with do blocks for all side effects, we will never learn why Haskell is the greatest procedural programming language in the world, because we are limiting ourselves to just two operators for dealing with side effects.

Let’s lift our gaze and see what happens when we look beyond those. There are more functions for dealing with side effects.

Functions that operate on side effects

We’ll start with the basics and work our way up.

pure

If we want to construct a new side effect object that always produces a specific value, we can use the function pure.

pure :: a -> IO a

For example, this creates a side effect object that always produces the integer 4.

loaded_die :: IO Int
loaded_die =
  -- Chosen by fair dice roll.
  -- Guaranteed to be random.
  pure 4

Creating side effect objects that always produce a known value might not seem very useful, but it comes up all the time when bridging the worlds of pure code and side effects.

fmap

One of the most used functions when working with side effects in Haskell is fmap.

fmap :: (a -> b) -> IO a -> IO b

This takes a pure function, and a side effect object, and returns a new side effect object that is similar to the one it got, except the value produced will be transformed by the function first. Transforming the results of side effects is so common that fmap has an operator alias: <$>. For example, to get the length of the path to the user’s home directory, we can do

path_length :: IO Int
path_length = length <$> getEnv "HOME"
-- equivalent to
-- path_length = fmap length (getEnv "HOME")

This creates a new side effect object which will produce the result of applying the length function to the result of the side effect of getEnv.

liftA2, liftA3, …

Where fmap allows us to transform the value produced by a single side effect, sometimes we need to create a side effect object that produces something based on multiple other side effects. This is where liftA2 and friends come in.

liftA2 :: (a -> b -> c)      -> IO a -> IO b -> IO c
liftA3 :: (a -> b -> c -> d) -> IO a -> IO b -> IO c -> IO d

These can be thought of like fmap, except they don’t transform the result of just one side effect, but they combine the results of multiple side effects with one function, and create a new side effect object that produces the return value of that function. This is also common enough that it can be written with two operators: the <$> we already saw for the first argument, and then <*> for the rest of the arguments. To check if a user’s home directory contains the user name, we do

home_username :: IO Bool
home_username = isInfixOf <$> getEnv "LOGNAME" <*> getEnv "HOME"
-- equivalent to
-- home_username = liftA2 isInfixOf (getEnv "LOGNAME") (getEnv "HOME")

This has created a new side effect object that will produce true if $LOGNAME is a part of $HOME, and false otherwise.

Intermission: what’s the point?

I am raving ecstatic about this. Combining the results of side effects through liftA2 and friends is such a fundamental technique that the favicon of this website is a stylised version of the <*> operator.

But the reader may understandably be a little underwhelmed. It seems like we have only learned to do in Haskell what we can do in Python and every other programming language already. All popular programming languages let us use the results of side effects as arguments to other functions. There are already two reasons we should care about Haskell, though, even before we see what’s to come. These are refactorability and discipline.

New here? Since I'm now working professionally with Haskell these days I hope to write more about it in the near future. You should subscribe to receive weekly summaries of new articles by email. If you don't like it, you can unsubscribe any time.

Biggest benefit is probably refactorability. We might have the following code that throws two dice and sums them up:

sum_dice :: IO Int
sum_dice =
  liftA2 (+) (randomRIO(1,6)) (randomRIO(1,6))

and we think the repetition is annoying, so we put the actual die-tossing code into a variable.

sum_dice :: IO Int
sum_dice =
  let
    toss_die = randomRIO(1,6)
  in
    liftA2 (+) toss_die toss_die

If we did this in Python, we would accidentally store the result of the coin toss in the toss_dice variable! If the coin lands on 3, we will compute 3+3, rather than the intention of summing two different tosses. With the way side effects are first class values in Haskell, we are always free to blindly (!) extract code into a variable name, and this will never change how the code runs.7 This is called equational reasoning and it’s incredibly powerful and one of the things that make Haskell such a strong enterprise language, and so nice to work with procedural code in.

A second benefit is that by being structured in how we allow side effects to affect subsequent computation, we have a lower risk of accidentally introducing side effects where they were not intended. We also have greater control over what can affect what, reducing the risk of interaction bugs.

The reader might, for example, argue that the previous refactoring example works just fine in Python, by using an anonymous function as a faux side effect object:

def sum_dice():
    toss_die = lambda: random.randint(1, 7)
    return toss_die() + toss_die()

but this (a) requires being careful in refactoring, (b) does not prevent accidentally triggering the side effect where it was not intended, and (c) requires a language feature (sequence points) to disambiguate in which order side effects get executed. With Haskell, we just don’t have to care. We can rest easy in knowing that the compiler and libraries have our backs.

sequenceA

We are starting to see the benefits of side effects as first class values, so let’s shift to a higher gear. We have seen that Haskell allows us to store side effect objects in variables without accidentally executing their effects. The next step is storing these side effect objects in data structures.

We could, for example, create a list of three different ways to get the username of the currently logged in user.

getting_usernames :: [IO (Maybe String)]
getting_usernames = [lookupEnv "USER", lookupEnv "LOGNAME", lookupEnv "SUDO_USER"]

This list cannot be executed for its side effects directly, because the list itself is not a side effect – it’s a list of side effects. There are library functions to deal with this, though. One is

sequenceA :: [IO a] -> IO [a]

This is what we need in this case: it takes a list of side effect objects and creates a new side effect object that executes all the side effects of the list, and then produces a list of all the values produced by those side effects. To make a side effect that produces a list of candidate usernames, we define

actual_usernames :: IO [Maybe String]
actual_usernames = sequenceA getting_usernames

If we execute this side effect and print the result (either by connecting it up with print using the >>= operator, or in a do block), then on my system we get the result

[Just "kqr", Just "kqr", Nothing]

Sometimes we have a list of side effects but we don’t care about the value they produce. This might be the case if we have collected a bunch of log statements from a pure function. We want to execute their side effects (the actual logging action) but we don’t care about what the log function itself returns (it is usually a void or unit-type value.)

log_statements :: [IO ()]
log_statements = [
    log Info "Creating user",
    log Warn "User already found"
    log Info "Updating user"
]

As a reminder: these function calls to the log function do not cause anything to be logged. The log function returns a side effect object that, when executed, makes the logging happen. The log_statements variable contains a list of such side effect objects – it is not itself a side effect object.

To execute these, we can again combine the side effects in the list into one side effect object with sequenceA. When we do so, we get a side effect object that produces the value [(), (), ()]. To get the code to type check, we may have to discard this value. We already know how to do this, because discarding values is what the *> operator does.

execute_logged :: IO ()
execute_logged =
  sequenceA log_statements *> pure ()

When the side effect of execute_logged runs, it will run the side effects of the log statements and then discard the dummy values produced in the process.

Remember that loaded die from before? Now we can check that it indeed always returns the same number. First we create a list by repeating the same side effect object an infinite number of times.

many_loaded_dice :: [IO Int]
many_loaded_dice =
  repeat loaded_die

Then we construct a side effect object that executes the first few of these and keeps the values they produced.

some_thrown_dice :: IO [Int]
some_thrown_dice =
  sequenceA (take 20 many_loaded_dice)

If we connect this up with a print (again, do block or the >>= operator) and execute it, we get

[4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4]

We could do the same thing with a better die:

many_good_dice :: [IO Int]
many_good_dice =
  repeat (randomRIO (1,6))

some_thrown_dice :: IO [Int]
some_thrown_dice =
  sequenceA (take 20 many_good_dice)

If we print this once, we get

[2,1,4,1,3,2,2,2,1,4,6,4,1,4,6,4,5,3,4,6]

If we print it again, we might get

[4,5,3,2,4,2,5,4,6,1,1,5,1,3,6,4,4,5,1,4]

Even though we constructed the list by repeating the same side effect object, we get a fresh set of random numbers every time the composite side effect object is executed. This is additional proof that what we store in the list is not the result of the side effect, but the side effect itself.

But also note what we did. We used list functions (repeat, take 20) to manipulate a data structure of side effect objects as if they were regular values – because they are! Then we used a side effect manipulation function (sequenceA) to combine the side effects in the list into one new side effect object that executes all of them. This is a kind of meta programming, except it’s not using a special macro language but performed at the level of regular values.

Interlude: convenience functions

The Haskell standard libraries also contain a few convenience functions to do what we have done above except easier. For example, when we discarded the value produced by a side effect, we used object *> pure (). This exists as a shortcut function called void:

void :: IO a -> IO ()

We could wrap the call in this instead:

execute_logged :: IO ()
execute_logged =
  void (sequenceA log_statements)

But there is an even more convenient way to do it. Many of the functions I will show have a variant that ends with underscore. These variants throw away the result. So we could instead have written

execute_logged :: IO ()
execute_logged =
  sequenceA_ log_statements

and this would run the effects and throw away the results. One benefit of using this variant is that it actually works with slightly more collection types than the one without underscore but the drawback is, of course, that we don’t get the results produced, only the side effects.8 See the appendix for more details.

Separately, we used list manipulation functions repeat and take 20 to create an appropriately-sized list of side effect objects, and then executed it with sequenceA. This specific combination is common enough to exist as a library function

replicateM :: Int -> IO a -> IO [a]

With this, we can draw 20 loaded dice quite succinctly.

some_thrown_dice :: IO [Int]
some_thrown_dice =
  replicateM 20 loaded_die

This also exist as a result-discarding variant that runs a side effect object \(n\) times but does not collect the values produced by the side effect.

replicateM_ :: Int -> IO a -> IO ()

Back when I was in university I had a coursemate who had just learned programming and found the following joke very funny:

Teacher: Write “I will not cheat again” 500 times on the blackboard. That should get you to learn your lesson.

Student: for (int counter = 0; counter < 500; counter++) { System.out.println("I will not cheat again."); }

I didn’t find it funny – I found it sad. Is this really the type of low-level programming we are teaching students? Here’s how it should go.

Teacher: Write “I will not cheat again” 500 times on the blackboard. That should get you to learn your lesson.

Student: replicateM_ 500 (putStrLn "I will not cheat again.")

As a programmer, I do not want to babysit my computer and instruct it in exactly how to count to perform a side effect 500 times. I just want to tell it to perform the side effect 500 times and the standard libraries should know what the detailed steps are.

Anyway, let’s get back to the good stuff. Now we’re going into advanced techniques.

traverse

In a previous example, we had code that amounted to

usernames :: IO [Maybe String]
usernames =
  sequenceA [lookupEnv "USER", lookupEnv "LOGNAME", lookupEnv "SUDO_USER"]

This contains some duplication, namely the application of lookupEnv to each of the strings. We could write this instead as

usernames :: IO [Maybe String]
usernames =
  sequenceA (fmap lookupEnv ["USER", "LOGNAME", "SUDO_USER"])

relying on the fact that fmap works on lists as well9 For a brief explanation of this, see the appendix., but there is a more efficient way to express this, though, and it’s using the function

traverse :: (a -> IO b) -> [a] -> IO [b]

which constructs a side effect object that produces the result of applying a side effectful function to each element in a collection. The above side effect object could then be constructed as

usernames :: IO [Maybe String]
usernames =
  traverse lookupEnv ["USER", "LOGNAME", "SUDO_USER"]

This is a fairly fundamental operation, and we can define e.g. sequenceA in terms of traverse:

sequenceA :: [IO a] -> IO [a]
sequenceA = traverse identity

where identity is the identity function that returns its argument unchanged.

As with sequenceA, there exists an underscored version of traverse that throws away the values produced by the side effect, for cases where we are only interested in the side effect itself.

traverse_ :: (a -> IO b) -> [a] -> IO ()

We can use this e.g. to create a side effect object that emits several log messages when it executes.

mass_log :: IO ()
mass_log =
  traverse_ (log Info) [
    "System startup in experimental mode."
    "This means it will attempt to divine in which order to run startup scripts."
    "The startup procedure may crash if things are performed in the wrong order."
    "Please see the manual for more details."
  ]

for

We have learned that when we have a collection of things, and we want to perform some side effectful work for each of those things, we use traverse. However, when we call traverse we have to give the function first, and the collection second. Sometimes it’s convenient to be able to pass the arguments the other way around, and for this, we have an alias called for that is exactly like traverse except with its arguments swapped around. In other words, this takes a collection as its first argument, and the scare-quotes “loop body” last.

for :: [a] -> (a -> IO b) -> IO [b]

As with traverse, it exists in an underscored version that throws away the values produced by the side effect.

for_ :: [a] -> (a -> IO b) -> IO ()

This swapped-parameters version of traverse is convenient whenever we need something that resembles a for loop in other languages. The general pattern of the Haskell for loop is

for collection $ \item -> do
  things

This, again, highlights how side-effects-as-first-class-values lets us do meta programming. The for loop is not syntax. There is nothing (other than side effects) built into the language supporting this construct. It is made entirely from library functions.

My imagination is running dry, so we will use a convoluted example to illustrate this. We are trying to break weak cryptography, and we have a list of numbers for which we want to get the largest prime factor of each. This is a pure function (no side effects) at its core, but we know factorisation is expensive, and we suspect we will receive the same numbers repeatedly, so we will cache the results of the computation. Once we involve the cache, the function becomes side effectful. Here’s a first draft of such a function.

factorise :: [Int] -> IO [(Int, Maybe Int)]
factorise numbers = do
  -- Create a mutable variable with an empty dictionary.
  cache <- newIORef M.empty
  -- Loop through all numbers we received.
  for numbers $ \n -> do
    -- Get what's in the cache for this number.
    cached <- M.lookup n <$> readIORef cache

    factor <- case cached of
      -- If the cache contained the factor for n, use it
      -- as it is.
      Just f -> pure f
      -- If the cache did not contain the factor, compute
      -- it fresh and cache it.
      Nothing -> do
        let f = greatest_factor n
        modifyIORef cache (M.insert n f)
        pure f

    -- If the greatest factor is the number itself, it is
    -- prime. Otherwise, it can be divided by factor.
    if factor == n then
      pure (n, Nothing)
    else
      pure (n, Just factor)

This seems rather … procedural. Even though we get all the nice guarantees of working with side effectful functions in Haskell, the code itself reads like any other procedural language would. With Haskell, we get the best of both worlds.

Leaning into the first classiness of effects

We can really lean into this and simplify the code further, though.

  • We create the helper function compute_cached which computes the greatest factor of a number, inserts it into the cache, and then returns the computed factor. This allows us to replace the big pattern match with a call to the maybe function. This works because – you guessed it – side effects are first class values in Haskell.
  • We reduce duplication in the last few lines, by recognising that the first part of the return tuple is always the number we started with, and the second is either Just factor or Nothing based on a condition.
factorise :: [Int] -> IO [(Int, Maybe Int)]
factorise numbers =
  let
    caching_compute cache n = do
      let computed_factor = greatest_factor n
      -- Cache the computed factor and return it.
      modifyIORef cache (M.insert n computed_factor)
      pure computed_factor
  in do
    cache <- newIORef M.empty
    for numbers $ \n -> do
      -- Get the cached factor if it exists, otherwise compute it.
      cached <- M.lookup n <$> readIORef cache
      factor <- maybe (caching_compute cache n) pure cached
      -- Return n along with a Maybe value containing the factor
      -- if the number is composite.
      pure (n, guard (factor /= n) $> factor)

We’ll note that we don’t actually perform any side effect other than keep a mutable variable around. There is a special type of side effect object designed for keeping a mutable variable around, and it’s called State.10 State is actually just one of many options, depending on what guarantees one wants around backtracking, multithreading, atomicity, etc. Notably, State is not the highest-performance alternative, but in our case the factorisation is what takes time, not state management. We can switch to that type of mutable state instead of IO, to make it harder to accidentally launch missiles from this function.

This will make the cache variable the implicit target of our mutation (modify and gets), and it will also allow for streaming results. That means we can plug an infinite list into this function, and it will keep emitting results one element at a time.11 This depends a little on our choice of data structure for the cache as well as the type of side effect object we choose for keeping the mutable variable around. Some combinations will not allow streaming results, and they tend to have better behaviour and performance characteristics for finite inputs.

type FactorCache = M.Map Int Int

factorise :: [Int] -> State FactorCache [(Int, Maybe Int)]
factorise numbers =
  for numbers $ \n ->
    let
      -- Rely on laziness to not compute this value other than
      -- where it is used, which is only when the cache is dry.
      computed = greatest_factor n
    in do
      cached <- gets (M.lookup n)
      factor <- maybe (modify (M.insert n computed) $> computed)
        pure cached
      pure (n, guard (factor /= n) $> factor)

Since we are no longer using IO, we cannot extract the result from this side effectful function the way we are used to. Instead, can convert its result to a pure value with the evalState function, seeding the state with an empty dictionary:

factored :: [(Int, Maybe Int)]
factored = evalState (factorise some_numbers) M.empty

The result here is a pure, streaming list, meaning we can feed this function an infinite sequence of numbers and it will keep categorising them – although it will go faster and faster as its cache fills up.

This also illustrates another reason to go for something like State rather than IO: we can guarantee that any side effects executed as part of a State object are only visible within that State object, which in turn means we can convert the side effectful function to a pure function. Haskell is happy to let us have side effects inside pure functions – as long as we can prove to the compiler that the side effects will not affect anything outside of those functions.12 There are many of these specialised side effect types in Haskell which we can use to get local side effects but which look pure from the outside.

Since numbers is given as an argument to this function and then immediately used as an argument to for, I’d be tempted at this point to actually go back to traverse and eta reduce, resulting in the following definition.

factorise :: [Int] -> State FactorCache [(Int, Maybe Int)]
factorise = traverse $ \n ->
  let
    computed = greatest_factor n
  in do
    cached <- gets (M.lookup n)
    factor <- maybe (modify (M.insert n computed) $> computed)
      pure cached
    pure (n, guard (factor /= n) $> factor)

This no longer looks anything like procedural code in other languages. This is procedural on a higher level: it’s defining a stateful traversal. A stateful traversal is a distinctly procedural operation, but most other procedural languages don’t have support for defining stateful traversals. Haskell does, because Haskell is the greatest procedural language in the world.

Things you never need to care about

There are some historic functions you might run across in this context. You never need these, and you can almost always translate them into their modern, better equivalents.

  • >> is an old name for *>.
  • return is an old name for pure.
  • map and liftM are old names for fmap.
  • liftM2, liftM3, etc. are old names for liftA2, liftA3, etc.
  • ap is an old name for the <*> operator.
  • msum is an old name for asum.
  • sequence and sequence_ are old names for sequenceA and sequenceA_.
  • mapM and mapM_ are old names for traverse and traverse_.
  • forM and forM_ are old names for for and for_.

Back in the dark ages of the 1990s, we thought monads were the greatest thing since sorting machines for punch cards. It turns out it is possible extract a highly useful subset of monads, called applicative functors, and they are responsible for many of the cool things we did with monads. These old names generally come from the time before we learned about applicative functors.

Appendix A: Avoiding success and uselessness

It is sometimes said the Haskell community is ready to do anything to avoid success. This is a misreading of “avoid success at all costs”. What the phrase really means is that some languages are ready to sacrifice their values to become more successful – they aim for success at all costs – and Haskell does not. It prefers to stick to its values rather than chucking them out for a moment of fame.

“Haskell is useless” was said in a discussion on how different programming languages approach giving authority to the programmer. At the time it was said, the popular approach to programming language design was to assume that the programmer was always right, good, and should be given maximum authority over their code.13 Think php, Ruby, Python 2, JavaScript. It has long been recognised that this sort of freedom often leads to programmers shooting themselves in the foot, so languages restrict this flexibility by adding prohibitions into their design. The idea is “you are allowed to do everything except X, Y, Z.” Some examples:

  • In C, we can no longer jump to abitrary addresses like we could in assembly.14 Popular C compilers do have support for this, but it’s not in the language standard.
  • In Ada, we can no longer think of a pointer as a trenchcoated integer.
  • In Java, we are no longer allowed to manually deallocate memory.
  • In Python 3, we can no longer treat a sequence of bytes as text.

With every generation, more restrictions are piled onto previous ones, making the language safer, but also less powerful – or, some would say, less useful.

Haskell approached this from the other direction. It started by saying no to arbitrary side effects: when computing a return value, a function is only allowed to rely on its arguments, nothing else. Functions cannot read from the system environment, and certainly not write to it. This eliminates a large class of safety problems, but it also makes the language completely useless for almost everything practical.

Then Haskell went on and found a way to allow code to describe side effects without actually executing them in such a way that they can be executed in the right order by an external runtime, and Haskell gained some usefulness. This is in contrast to other languages that started out useful but lost some of their usefulness with time, in the name of safety: Haskell started out useless, but has since gained usefulness.

Appendix B: Why fmap maps over both side effects and lists

We have already seen that

fmap :: (a -> b) -> IO a -> IO b

but it is equally true that

fmap :: (a -> b) -> [a] -> [b]

I.e. fmap can take a pure function and apply it to all elements of a list.

The way it works is that fmap is not written to work specifically with side effects, or lists, or anything else. It is written to work against anything that is a functor. Its most general type signature is actually

fmap :: Functor f => (a -> b) -> f a -> f b

The name functor is a bit nondescriptive, but we can think of it as the name for all types that support being “mapped over”, in some sense.

  • Mapping over a side effect object means transforming the result it produces.
  • Mapping over a list means transforming each element of the list.
  • Mapping over a nullable value means transforming the value of it exists, otherwise doing nothing.

Lists, side effect objects, and nullable values are, therefore, functors. Thus, they all support the fmap operation.

Appendix C: Foldable and Traversable

The result-preserving variants of the functions we’ve discussed – sequenceA, traverse, etc. – guarantee that the data structure produced by side effect they create will have the same shape as the original data structure we passed in. This is easy e.g. in the case of a list, because given the same number of items, the list has the same shape regardless of what we put into it. However, for some data structures, this is a more severe requirement. Typical examples are sets and search trees: their structure depends on their contents. When we traverse a search tree and construct a new search tree with the values produced by the side effect, we might end up getting a completely different tree shape than what we started from.

In contrast, if we are not interested in getting back the values produced by side effects, we don’t need to be able to reconstruct the same data structure we started with. All that takes is being able to iterate the elements of the data structure, and this can be supported by more data structures – including sets and search trees. This is why sequenceA_, traverse_, and friends can work with more collection types.

One hack if we want the values produced by the side effect, but do not care about the structure, is to first convert the collection to a list (this only requires being able to iterate the collection) and then call traverse on that list. Then we have retained the results, but lost the original structure of the collection. Such is life if we work with structures that do not support traversal.

The technical names here are foldable (can produce elements one at a time in a specified order) and traversable (can reconstruct its structure even when given new elements.) All traversables are foldable, but not all foldables are traversable.