Haskell: A Great Procedural Language
Table of Contents
- Side effects as first class values
- De-mystifying do blocks
- Functions that operate on side effects
- Leaning into the first classiness of effects
- Things you never need to care about
- Appendix A: Avoiding success and uselessness
- Appendix B: Why fmap maps over both side effects and lists
- Appendix C: Foldable and Traversable
There are many catchphrases about Haskell.
- Haskell is useless.
- Haskell aims to avoid success at all costs.
- Haskell is the best procedural language in the world.
These sound like dismissals or absurdities from the outside, but once you learn what they really mean, they take on a new light. In this article, I want to explain the third. (See the appendix if you are curious about the first two.)
This article really, really tried to become a monad i/o tutorial, but I think I stopped it in time.1 By that I mean I had to rewrite it twice and delete large chunks of monad i/o tutorial material. Here, we are going to jump right in and focus on the interesting stuff.
Side effects as first class values
Effectful computations in Haskell are first class values. This means we can store them in variables or data structures for later use. There is a Haskell function
randomRIO :: (Int, Int) -> IO Int
which, when given two integers as arguments, picks a random integer between them. We can put calls to this function into a list, like so:
some_dice = [ randomRIO(1, 6), randomRIO(1, 6) ]
This is a list of two calls to randomRIO
. What surprises non-Haskellers is
that when this list is created, no random numbers are generated. Coming from
other programming languages, we are used to side effects (such as random
generation) being executed directly when the side effectful function is
called.2 You may think Haskell is different here due to laziness, but that’s
also not true. Even if we put these calls into a strict data structure, no
randomness would happen.
We can add more random generation to the list:
more_dice = some_dice <> [ randomRIO(1, 6) ]
and still no random numbers will be generated. We can go ahead and manipulate this list in all sorts of ways, and still no random numbers would be generated.
To be clear, the randomRIO
function could well be called3 Whether this
actually happens is a question of lazy evaluation, optimisation, etc., and when
it is called it returns a value of type IO Int
. It’s just that this value is
not an integer. If anything, we can think of it as a set of instructions for
eventually, somehow, getting an integer. It’s not an actual integer. It’s an
object encapsulating a side effect. When this side effect object executes, it
will produce a random integer, but the object itself just describes the
computation, it is not an integer.
In other words, in Haskell, it is not enough to call a side effectful function to execute its side effects. When we call the side effectful function, it produces an object encapsulating the side effect, and this object can be executed in the future to produce the result of the side effect.4 Readers familiar with JavaScript promises will recognise this concept. Indeed, promises are modeled after side effects in Haskell.
The common way we teach beginners to do execute side effect objects is by
calling them from a do
block, using the special <-
assignment operator to
extract their result. As a first approximation, we can think of the following
code as the way to force side effects to execute.
dice_print = do side <- randomRIO(1, 6) printf "It landed on %d\n" side
We can imagine that the <-
arrow executes the side effect object returned by
randomRIO
and captures the value it produces. Similarly, the side effect
object returned by printf
gets executed, but we don’t capture the result; we
don’t care about the value produced by it, we only care about the side effect
itself.
The lie-to-children here is that we pretend the do
block is magical and that
when it executes, it also executes side effects of functions called in it. This
mental model will take the beginner a long way, but at some point, one will want
to break free of it. That is when Haskell starts to really shine as a procedural
language.
This article features another lie-to-children: it will have type signatures
specialised to IO a
and [a]
. All the functions I mention are more generic
than I’m letting on.
- Anywhere this article says
IO a
it will work with any type of side effect (likeMaybe a
,Rand g a
,StateT s m a
, etc.) - Anywhere this article says
[a]
it probably also works with other collection/container types (likeMaybe a
,Array i a
,HashMap k a
,Tree a
, etc.)
The reason this article uses more direct type signatures is to hopefully be
readable also to someone who does not use Haskell.5 If the reader has never
seen ml style syntax before, I think the most important thing to know is that
function calls aren’t written like isInfixOf("llo", "hello, world\n")
but
rather with spaces, as in isInfixOf "llo" "hello, world\n"
.
De-mystifying do blocks
In order to drive the point home, we’ll start by seeing what it is the do
block actually does, because it’s not magic at all. In fact, every do block can
be converted to just two operators. If you
already know this, skip ahead to the next section.
then
If we want to combine two side effects into one, we can use the *>
operator,
which is pronounced then or sequence right. It takes two side effect objects
and creates a new side effect object that executes both when itself is executed.
The value produced by this new composite object is going to be the value
produced by the second object its constructed from. In that sense, the *>
operator is a lot like the comma operator in C: it chains together statements,
and returns the result of the last.
(*>) :: IO a -> IO b -> IO b
For example, here we combine two print statements into a single side effectful function.
greeting :: IO () greeting = putStr "hello, " *> putStrLn "world"
This is a single side effect object called greeting
, but its execution will
involve the execution of two separate print statements.
We teach beginners to write this as
greeting = do putStr "hello, " putStrLn "world"
which is the exact same thing, although arguably easier to read. The interesting
thing is the implication for how we look at do
blocks. It turns out they may
not be magical at all; maybe they are just inserting implicit commas, i.e. a
pretty way of taking multiple side effect objects and combining them into a
single, bigger, side effect object.
bind
The one thing we cannot do with *>
is take the result from the left-hand side
effect and use it to influence the right-hand side function call, because the
*>
operator discards the first result before executing the second effect –
just like the comma operator in C. The die throwing code we saw previously,
dice_print = do side <- randomRIO(1, 6) printf "It landed on %d\n" side
cannot be implemented with just *>
. We need the additional operator >>=
which takes a side effect object and plugs the value it produces into another
side effectful function.6 This operator is widely known as bind.
(>>=) :: IO a -> (a -> IO b) -> IO b
Using this operator, we could write the above do block as
print_side :: Int -> IO () print_side side = printf "It landed on %d\n" side dice_print :: IO () dice_print = randomRIO(1, 6) >>= print_side
and this would take the result of executing the side effect of randomRIO
and
plugging it into another side effectful function, namely print_side
.
Two operators are all of do blocks
This illustrates that do
blocks are built from only two operators. If we stick
with do
blocks for all side effects, we will never learn why Haskell is the
greatest procedural programming language in the world, because we are limiting
ourselves to just two operators for dealing with side effects.
Let’s lift our gaze and see what happens when we look beyond those. There are more functions for dealing with side effects.
Functions that operate on side effects
We’ll start with the basics and work our way up.
pure
If we want to construct a new side effect object that always produces a
specific value, we can use the function pure
.
pure :: a -> IO a
For example, this creates a side effect object that always produces the integer 4.
loaded_die :: IO Int loaded_die = -- Chosen by fair dice roll. -- Guaranteed to be random. pure 4
Creating side effect objects that always produce a known value might not seem very useful, but it comes up all the time when bridging the worlds of pure code and side effects.
fmap
One of the most used functions when working with side effects in Haskell is
fmap
.
fmap :: (a -> b) -> IO a -> IO b
This takes a pure function, and a side effect object, and returns a new side effect
object that is similar to the one it got, except the value produced will be
transformed by the function first.
Transforming the results of side effects is
so common that fmap
has an operator alias: <$>
. For example, to get the
length of the path to the user’s home directory, we can do
path_length :: IO Int path_length = length <$> getEnv "HOME" -- equivalent to -- path_length = fmap length (getEnv "HOME")
This creates a new side effect object which will produce the result of applying
the length
function to the result of the side effect of getEnv
.
liftA2, liftA3, …
Where fmap
allows us to transform the value produced by a single side effect,
sometimes we need to create a side effect object that produces something based
on multiple other side effects.
This is where liftA2
and friends come in.
liftA2 :: (a -> b -> c) -> IO a -> IO b -> IO c liftA3 :: (a -> b -> c -> d) -> IO a -> IO b -> IO c -> IO d
These can be thought of like fmap, except they don’t transform the result of
just one side effect, but they combine the results of multiple side effects with
one function, and create a new side effect object that produces the return value
of that function.
This is also common enough that it can be written with two
operators: the <$>
we already saw for the first argument, and then <*>
for
the rest of the arguments. To check if a user’s home directory contains the user
name, we do
home_username :: IO Bool home_username = isInfixOf <$> getEnv "LOGNAME" <*> getEnv "HOME" -- equivalent to -- home_username = liftA2 isInfixOf (getEnv "LOGNAME") (getEnv "HOME")
This has created a new side effect object that will produce true if $LOGNAME
is a part of $HOME
, and false otherwise.
Intermission: what’s the point?
I am raving ecstatic about this. Combining the results of side effects through
liftA2
and friends is such a fundamental technique that the favicon of this
website is a stylised version of the <*>
operator.
But the reader may understandably be a little underwhelmed. It seems like we have only learned to do in Haskell what we can do in Python and every other programming language already. All popular programming languages let us use the results of side effects as arguments to other functions. There are already two reasons we should care about Haskell, though, even before we see what’s to come. These are refactorability and discipline.
New here? Since I'm now working professionally with Haskell these days I hope to write more about it in the near future. You should subscribe to receive weekly summaries of new articles by email. If you don't like it, you can unsubscribe any time.
Biggest benefit is probably refactorability. We might have the following code that throws two dice and sums them up:
sum_dice :: IO Int sum_dice = liftA2 (+) (randomRIO(1,6)) (randomRIO(1,6))
and we think the repetition is annoying, so we put the actual die-tossing code into a variable.
sum_dice :: IO Int sum_dice = let toss_die = randomRIO(1,6) in liftA2 (+) toss_die toss_die
If we did this in Python, we would accidentally store the result of the die
toss in the toss_dice
variable! If the die lands on 3, we will compute 3+3,
rather than the intention of summing two different tosses. With the way side
effects are first class values in Haskell, we are always free to blindly (!)
extract code into a variable name, and this will never change how the code
runs.7 This is called equational reasoning and it’s incredibly powerful and one of
the things that make Haskell such a strong enterprise language, and so nice to
work with procedural code in.
A second benefit is that by being structured in how we allow side effects to affect subsequent computation, we have a lower risk of accidentally introducing side effects where they were not intended. We also have greater control over what can affect what, reducing the risk of interaction bugs.
The reader might, for example, argue that the previous refactoring example works just fine in Python, by using an anonymous function as a faux side effect object:
def sum_dice(): toss_die = lambda: random.randint(1, 7) return toss_die() + toss_die()
but this (a) requires being careful in refactoring, (b) does not prevent accidentally triggering the side effect where it was not intended, and (c) requires a language feature (sequence points) to disambiguate in which order side effects get executed. With Haskell, we just don’t have to care. We can rest easy in knowing that the compiler and libraries have our backs.
sequenceA
We are starting to see the benefits of side effects as first class values, so let’s shift to a higher gear. We have seen that Haskell allows us to store side effect objects in variables without accidentally executing their effects. The next step is storing these side effect objects in data structures.
We could, for example, create a list of three different ways to get the username of the currently logged in user.
getting_usernames :: [IO (Maybe String)] getting_usernames = [lookupEnv "USER", lookupEnv "LOGNAME", lookupEnv "SUDO_USER"]
This list cannot be executed for its side effects directly, because the list itself is not a side effect – it’s a list of side effects. There are library functions to deal with this, though. One is
sequenceA :: [IO a] -> IO [a]
This is what we need in this case: it takes a list of side effect objects and creates a new side effect object that executes all the side effects of the list, and then produces a list of all the values produced by those side effects. To make a side effect that produces a list of candidate usernames, we define
actual_usernames :: IO [Maybe String] actual_usernames = sequenceA getting_usernames
If we execute this side effect and print the result (either by connecting it up
with print
using the >>=
operator, or in a do
block), then on my system we
get the result
[Just "kqr", Just "kqr", Nothing]
Sometimes we have a list of side effects but we don’t care about the value they produce. This might be the case if we have collected a bunch of log statements from a pure function. We want to execute their side effects (the actual logging action) but we don’t care about what the log function itself returns (it is usually a void or unit-type value.)
log_statements :: [IO ()] log_statements = [ log Info "Creating user", log Warn "User already found" log Info "Updating user" ]
As a reminder: these function calls to the log
function do not cause anything
to be logged. The log
function returns a side effect object that, when
executed, makes the logging happen. The log_statements
variable contains a
list of such side effect objects – it is not itself a side effect object.
To execute these, we can again combine the side effects in the list into one
side effect object with sequenceA
. When we do so, we get a side effect object
that produces the value [(), (), ()]
. To get the code to type check, we may
have to discard this value. We already know how to do this, because discarding
values is what the *>
operator does.
execute_logged :: IO () execute_logged = sequenceA log_statements *> pure ()
When the side effect of execute_logged
runs, it will run the side effects of
the log statements and then discard the dummy values produced in the process.
Remember that loaded die from before? Now we can check that it indeed always returns the same number. First we create a list by repeating the same side effect object an infinite number of times.
many_loaded_dice :: [IO Int] many_loaded_dice = repeat loaded_die
Then we construct a side effect object that executes the first few of these and keeps the values they produced.
some_thrown_dice :: IO [Int] some_thrown_dice = sequenceA (take 20 many_loaded_dice)
If we connect this up with a print (again, do
block or the >>=
operator) and execute it, we get
[4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4]
We could do the same thing with a better die:
many_good_dice :: [IO Int] many_good_dice = repeat (randomRIO (1,6)) some_thrown_dice :: IO [Int] some_thrown_dice = sequenceA (take 20 many_good_dice)
If we print this once, we get
[2,1,4,1,3,2,2,2,1,4,6,4,1,4,6,4,5,3,4,6]
If we print it again, we might get
[4,5,3,2,4,2,5,4,6,1,1,5,1,3,6,4,4,5,1,4]
Even though we constructed the list by repeating the same side effect object, we get a fresh set of random numbers every time the composite side effect object is executed. This is additional proof that what we store in the list is not the result of the side effect, but the side effect itself.
But also note what we did. We used list functions (repeat
, take 20
) to
manipulate a data structure of side effect objects as if they were regular
values – because they are! Then we used a side effect manipulation function
(sequenceA
) to combine the side effects in the list into one new side effect
object that executes all of them. This is a kind of meta programming, except
it’s not using a special macro language but performed at the level of regular
values.
Interlude: convenience functions
The Haskell standard libraries also contain a few convenience functions to do
what we have done above except easier. For example, when we discarded the value
produced by a side effect, we used object *> pure ()
. This exists as a
shortcut function called void
:
void :: IO a -> IO ()
We could wrap the call in this instead:
execute_logged :: IO () execute_logged = void (sequenceA log_statements)
But there is an even more convenient way to do it. Many of the functions I will show have a variant that ends with underscore. These variants throw away the result. So we could instead have written
execute_logged :: IO () execute_logged = sequenceA_ log_statements
and this would run the effects and throw away the results. One benefit of using this variant is that it actually works with slightly more collection types than the one without underscore but the drawback is, of course, that we don’t get the results produced, only the side effects.8 See the appendix for more details.
Separately, we used list manipulation functions repeat
and take 20
to create
an appropriately-sized list of side effect objects, and then executed it with
sequenceA
. This specific combination is common enough to exist as a library
function
replicateM :: Int -> IO a -> IO [a]
With this, we can draw 20 loaded dice quite succinctly.
some_thrown_dice :: IO [Int] some_thrown_dice = replicateM 20 loaded_die
This also exist as a result-discarding variant that runs a side effect object \(n\) times but does not collect the values produced by the side effect.
replicateM_ :: Int -> IO a -> IO ()
Back when I was in university I had a coursemate who had just learned programming and found the following joke very funny:
Teacher: Write “I will not cheat again” 500 times on the blackboard. That should get you to learn your lesson.
Student:
for (int counter = 0; counter < 500; counter++) { System.out.println("I will not cheat again."); }
I didn’t find it funny – I found it sad. Is this really the type of low-level programming we are teaching students? Here’s how it should go.
Teacher: Write “I will not cheat again” 500 times on the blackboard. That should get you to learn your lesson.
Student:
replicateM_ 500 (putStrLn "I will not cheat again.")
As a programmer, I do not want to babysit my computer and instruct it in exactly how to count to perform a side effect 500 times. I just want to tell it to perform the side effect 500 times and the standard libraries should know what the detailed steps are.
Anyway, let’s get back to the good stuff. Now we’re going into advanced techniques.
traverse
In a previous example, we had code that amounted to
usernames :: IO [Maybe String] usernames = sequenceA [lookupEnv "USER", lookupEnv "LOGNAME", lookupEnv "SUDO_USER"]
This contains some duplication, namely the application of lookupEnv
to each of
the strings. We could write this instead as
usernames :: IO [Maybe String] usernames = sequenceA (fmap lookupEnv ["USER", "LOGNAME", "SUDO_USER"])
relying on the fact that fmap
works on lists as well9 For a brief
explanation of this, see the appendix., but there is a more efficient way to
express this, though, and it’s using the function
traverse :: (a -> IO b) -> [a] -> IO [b]
which constructs a side effect object that produces the result of applying a side effectful function to each element in a collection. The above side effect object could then be constructed as
usernames :: IO [Maybe String] usernames = traverse lookupEnv ["USER", "LOGNAME", "SUDO_USER"]
This is a fairly fundamental operation, and we can define e.g. sequenceA
in
terms of traverse
:
sequenceA :: [IO a] -> IO [a] sequenceA = traverse identity
where identity
is the identity function that returns its argument unchanged.
As with sequenceA
, there exists an underscored version of traverse
that
throws away the values produced by the side effect, for cases where we are only
interested in the side effect itself.
traverse_ :: (a -> IO b) -> [a] -> IO ()
We can use this e.g. to create a side effect object that emits several log messages when it executes.
mass_log :: IO () mass_log = traverse_ (log Info) [ "System startup in experimental mode." "This means it will attempt to divine in which order to run startup scripts." "The startup procedure may crash if things are performed in the wrong order." "Please see the manual for more details." ]
for
We have learned that when we have a collection of things, and we want to perform
some side effectful work for each of those things, we use traverse
. However,
when we call traverse
we have to give the function first, and the collection
second. Sometimes it’s convenient to be able to pass the arguments the other way
around, and for this, we have an alias called for
that is exactly like
traverse
except with its arguments swapped around. In other words, this takes
a collection as its first argument, and the scare-quotes “loop body” last.
for :: [a] -> (a -> IO b) -> IO [b]
As with traverse
, it exists in an underscored version that throws away the
values produced by the side effect.
for_ :: [a] -> (a -> IO b) -> IO ()
This swapped-parameters version of traverse
is convenient whenever we need
something that resembles a for loop in other languages. The general pattern of
the Haskell for
loop is
for collection $ \item -> do things
This, again, highlights how side-effects-as-first-class-values lets us do meta
programming. The for
loop is not syntax. There is nothing (other than side
effects) built into the language supporting this construct. It is made entirely
from library functions.
My imagination is running dry, so we will use a convoluted example to illustrate this. We are trying to break weak cryptography, and we have a list of numbers for which we want to get the largest prime factor of each. This is a pure function (no side effects) at its core, but we know factorisation is expensive, and we suspect we will receive the same numbers repeatedly, so we will cache the results of the computation. Once we involve the cache, the function becomes side effectful. Here’s a first draft of such a function.
factorise :: [Int] -> IO [(Int, Maybe Int)] factorise numbers = do -- Create a mutable variable with an empty dictionary. cache <- newIORef M.empty -- Loop through all numbers we received. for numbers $ \n -> do -- Get what's in the cache for this number. cached <- M.lookup n <$> readIORef cache factor <- case cached of -- If the cache contained the factor for n, use it -- as it is. Just f -> pure f -- If the cache did not contain the factor, compute -- it fresh and cache it. Nothing -> do let f = greatest_factor n modifyIORef cache (M.insert n f) pure f -- If the greatest factor is the number itself, it is -- prime. Otherwise, it can be divided by factor. if factor == n then pure (n, Nothing) else pure (n, Just factor)
This seems rather … procedural. Even though we get all the nice guarantees of working with side effectful functions in Haskell, the code itself reads like any other procedural language would. With Haskell, we get the best of both worlds.
Leaning into the first classiness of effects
We can really lean into this and simplify the code further, though.
- We create the helper function
caching_compute
which computes the greatest factor of a number, inserts it into the cache, and then returns the computed factor. This allows us to replace the big pattern match with a call to themaybe
function. This works because – you guessed it – side effects are first class values in Haskell. - We reduce duplication in the last few lines, by recognising that the first
part of the return tuple is always the number we started with, and the second
is either
Just factor
orNothing
based on a condition.
factorise :: [Int] -> IO [(Int, Maybe Int)] factorise numbers = let caching_compute cache n = do let computed_factor = greatest_factor n -- Cache the computed factor and return it. modifyIORef cache (M.insert n computed_factor) pure computed_factor in do cache <- newIORef M.empty for numbers $ \n -> do -- Get the cached factor if it exists, otherwise compute it. cached <- M.lookup n <$> readIORef cache factor <- maybe (caching_compute cache n) pure cached -- Return n along with a Maybe value containing the factor -- if the number is composite. pure (n, guard (factor /= n) $> factor)
We’ll note that we don’t actually perform any side effect other than keep a
mutable variable around. There is a special type of side effect object designed
for keeping a mutable variable around, and it’s called State
.10 State
is
actually just one of many options, depending on what guarantees one wants around
backtracking, multithreading, atomicity, etc. Notably, State
is not the
highest-performance alternative, but in our case the factorisation is what takes
time, not state management. We can switch to that type of mutable state instead
of IO
, to make it harder to accidentally launch missiles from this function.
This will make the cache
variable the implicit target of our mutation
(modify
and gets
), and it will also allow for streaming results. That means
we can plug an infinite list into this function, and it will keep emitting
results one element at a time.11 This depends a little on our choice of data
structure for the cache as well as the type of side effect object we choose for
keeping the mutable variable around. Some combinations will not allow streaming
results, and they tend to have better behaviour and performance characteristics
for finite inputs.
type FactorCache = M.Map Int Int factorise :: [Int] -> State FactorCache [(Int, Maybe Int)] factorise numbers = for numbers $ \n -> let -- Rely on laziness to not compute this value other than -- where it is used, which is only when the cache is dry. computed = greatest_factor n in do cached <- gets (M.lookup n) factor <- maybe (modify (M.insert n computed) $> computed) pure cached pure (n, guard (factor /= n) $> factor)
Since we are no longer using IO
, we cannot extract the result from this side
effectful function the way we are used to. Instead, can convert its result to a
pure value with the evalState
function, seeding the state with an empty
dictionary:
factored :: [(Int, Maybe Int)] factored = evalState (factorise some_numbers) M.empty
The result here is a pure, streaming list, meaning we can feed this function an infinite sequence of numbers and it will keep categorising them – although it will go faster and faster as its cache fills up.
This also illustrates another reason to go for something like State
rather
than IO
: we can guarantee that any side effects executed as part of a State
object are only visible within that State
object, which in turn means we can
convert the side effectful function to a pure function. Haskell is happy to let
us have side effects inside pure functions – as long as we can prove to the
compiler that the side effects will not affect anything outside of those
functions.12 There are many of these specialised side effect types in Haskell
which we can use to get local side effects but which look pure from the
outside.
Since numbers
is given as an argument to this function and then immediately
used as an argument to for
, I’d be tempted at this point to actually go back
to traverse
and eta reduce, resulting in the following definition.
factorise :: [Int] -> State FactorCache [(Int, Maybe Int)] factorise = traverse $ \n -> let computed = greatest_factor n in do cached <- gets (M.lookup n) factor <- maybe (modify (M.insert n computed) $> computed) pure cached pure (n, guard (factor /= n) $> factor)
This no longer looks anything like procedural code in other languages. This is procedural on a higher level: it’s defining a stateful traversal. A stateful traversal is a distinctly procedural operation, but most other procedural languages don’t have support for defining stateful traversals. Haskell does, because Haskell is the greatest procedural language in the world.
Things you never need to care about
There are some historic functions you might run across in this context. You never need these, and you can almost always translate them into their modern, better equivalents.
>>
is an old name for*>
.return
is an old name forpure
.map
andliftM
are old names forfmap
.liftM2
,liftM3
, etc. are old names forliftA2
,liftA3
, etc.ap
is an old name for the<*>
operator.msum
is an old name forasum
.sequence
andsequence_
are old names forsequenceA
andsequenceA_
.mapM
andmapM_
are old names fortraverse
andtraverse_
.forM
andforM_
are old names forfor
andfor_
.
Back in the dark ages of the 1990s, we thought monads were the greatest thing since sorting machines for punch cards. It turns out it is possible extract a highly useful subset of monads, called applicative functors, and they are responsible for many of the cool things we did with monads. These old names generally come from the time before we learned about applicative functors.
Appendix A: Avoiding success and uselessness
It is sometimes said the Haskell community is ready to do anything to avoid success. This is a misreading of “avoid success at all costs”. What the phrase really means is that some languages are ready to sacrifice their values to become more successful – they aim for success at all costs – and Haskell does not. It prefers to stick to its values rather than chucking them out for a moment of fame.
“Haskell is useless” was said in a discussion on how different programming languages approach giving authority to the programmer. At the time it was said, the popular approach to programming language design was to assume that the programmer was always right, good, and should be given maximum authority over their code.13 Think php, Ruby, Python 2, JavaScript. It has long been recognised that this sort of freedom often leads to programmers shooting themselves in the foot, so languages restrict this flexibility by adding prohibitions into their design. The idea is “you are allowed to do everything except X, Y, Z.” Some examples:
- In C, we can no longer jump to abitrary addresses like we could in assembly.14 Popular C compilers do have support for this, but it’s not in the language standard.
- In Ada, we can no longer think of a pointer as a trenchcoated integer.
- In Java, we are no longer allowed to manually deallocate memory.
- In Python 3, we can no longer treat a sequence of bytes as text.
With every generation, more restrictions are piled onto previous ones, making the language safer, but also less powerful – or, some would say, less useful.
Haskell approached this from the other direction. It started by saying no to arbitrary side effects: when computing a return value, a function is only allowed to rely on its arguments, nothing else. Functions cannot read from the system environment, and certainly not write to it. This eliminates a large class of safety problems, but it also makes the language completely useless for almost everything practical.
Then Haskell went on and found a way to allow code to describe side effects without actually executing them in such a way that they can be executed in the right order by an external runtime, and Haskell gained some usefulness. This is in contrast to other languages that started out useful but lost some of their usefulness with time, in the name of safety: Haskell started out useless, but has since gained usefulness.
Appendix B: Why fmap maps over both side effects and lists
We have already seen that
fmap :: (a -> b) -> IO a -> IO b
but it is equally true that
fmap :: (a -> b) -> [a] -> [b]
I.e. fmap
can take a pure function and apply it to all elements of a list.
The way it works is that fmap
is not written to work specifically with side
effects, or lists, or anything else. It is written to work against anything that
is a functor. Its most general type signature is actually
fmap :: Functor f => (a -> b) -> f a -> f b
The name functor is a bit nondescriptive, but we can think of it as the name for all types that support being “mapped over”, in some sense.
- Mapping over a side effect object means transforming the result it produces.
- Mapping over a list means transforming each element of the list.
- Mapping over a nullable value means transforming the value of it, if it exists, otherwise doing nothing.
Lists, side effect objects, and nullable values are, therefore, functors. Thus,
they all support the fmap
operation.
Appendix C: Foldable and Traversable
The result-preserving variants of the functions we’ve discussed – sequenceA
,
traverse
, etc. – guarantee that the data structure produced by side effect
they create will have the same shape as the original data structure we passed
in. This is easy e.g. in the case of a list, because given the same number of
items, the list has the same shape regardless of what we put into it. However,
for some data structures, this is a more severe requirement. Typical examples
are sets and search trees: their structure depends on their contents. When we
traverse
a search tree and construct a new search tree with the values
produced by the side effect, we might end up getting a completely different tree
shape than what we started from.
In contrast, if we are not interested in getting back the values produced by
side effects, we don’t need to be able to reconstruct the same data structure we
started with. All that takes is being able to iterate the elements of the data
structure, and this can be supported by more data structures – including sets
and search trees. This is why sequenceA_
, traverse_
, and friends can work
with more collection types.
One hack if we want the values produced by the side effect, but do not care
about the structure, is to first convert the collection to a list (this only
requires being able to iterate the collection) and then call traverse
on that
list. Then we have retained the results, but lost the original structure of the
collection. Such is life if we work with structures that do not support traversal.
The technical names here are foldable (can produce elements one at a time in a specified order) and traversable (can reconstruct its structure even when given new elements.) All traversables are foldable, but not all foldables are traversable.