Entropic Thoughts

Data-directed programming in Haskell (SICP 2.4.3)

Data-directed programming in Haskell (SICP 2.4.3)

sicp-2-4-data-directed-programming-in-haskell.jpg

I have a copy of sicp, or as it is also known, The Wizard Book.1 Structure and Interpretation of Computer Programs; Abelson and Sussman; mit Press; 1996. This book is widely praised, but I can’t take the time to work my way through all of it. Instead, I’m going to occasionally jump into the parts of it that look interesting. Last week, we looked at tagged data in Haskell. The authors of sicp weren’t convinced that’s the best approach, so they move on to data-directed programming. We’ll do the same.

Complex numbers have four operations

Complex numbers can be stored in their rectangular form, with a real and an imaginary part. They can also be stored in polar form, where there’s a magnitude and an angle. Whichever way a complex number is stored, we would like to be able to query it for all of these four quantities:

  • The real coordinate in the rectangular form of the complex number.
  • The imaginary coordinate in the rectangular form.
  • The magnitude in the polar form.
  • The angle in the polar form.

Last time we implemented these on top of a tagged value. To return one of the four coordinates above, a function would introspect the tag to figure out how to satisfy the caller for the representation involved. This works, but any time we want to add a new representation, we have to change all four existing operations.

To get around that problem, Abelson and Sussman suggest a data-directed approach instead. We won’t start at the lowest level of Lisp like in the previous article, but jump in around halfway, when we are using compiler-recognised tuples and strings for tagging our types. That means we are starting from this code:

attach_tag tag contents = (tag, contents)
type_tag (tag, _) = tag
contents (_, value) = value

is_rectangular z = type_tag z == "rectangular"
is_polar z = type_tag z == "polar"

make_rectangular re im = (attach_tag "rectangular" (re, im))
make_polar r a = (attach_tag "polar" (r, a))

In the book, the authors use two magical functions get and put to store and retrieve operations from an implicitly declared table of operations. Here’s what those two functions look like in our Haskell code:

put op tag fn =
  State.modify (Map.insert (op, tag) fn)

get op tag =
  State.gets (Map.lookup (op, tag))

We will use them to install operations for the rectangular representation, just like in the Lisp code in sicp.

install_rectangular =
  let
    real_part (re, _) = re
    imag_part (_, im) = im
    magnitude (re, im) = sqrt (re^2 + im^2)
    angle (re, im) = atan2 im re
  in do
    put "real_part" "rectangular" real_part
    put "imag_part" "rectangular" imag_part
    put "magnitude" "rectangular" magnitude
    put "angle" "rectangular" angle

We do the same for the polar representation.

install_polar =
  let
    real_part (r, a) = r * cos a
    imag_part (r, a) = r * sin a
    magnitude (r, _) = r
    angle (_, a) = a
  in do
    put "real_part" "polar" real_part
    put "imag_part" "polar" imag_part
    put "magnitude" "polar" magnitude
    put "angle" "polar" angle

We re-implement the apply_generic function from sicp to pick the right operation from this table.

apply_generic op arg = do
  value <- get op (type_tag arg)
  pure $ case value of
    Just fn -> fn (contents arg)
    Nothing -> error "No method for these types."

This operation can be used to implement the generic coordinate extraction functions.

real_part z = apply_generic "real_part" z
imag_part z = apply_generic "imag_part" z
magnitude z = apply_generic "magnitude" z
angle z = apply_generic "angle" z

Then, because we are writing this in Haskell, the complex number arithmetic is going to look a little clumsy. Don’t worry, we’ll fix that soon.

add_complex za zb = do
  za_re <- real_part za
  za_im <- imag_part za
  zb_re <- real_part zb
  zb_im <- imag_part zb
  pure $ make_rectangular (za_re + zb_re) (za_im + zb_im)

mul_complex za zb = do
  za_r <- magnitude za
  za_a <- angle za
  zb_r <- magnitude zb
  zb_a <- angle zb
  pure $ make_polar (za_r * zb_r) (za_a + zb_a)

To run a computation with this, we have to run it as a stateful computation, where we first install the operations, and then perform the computations. It might look like this.

main = flip State.evalStateT Map.empty $ do
  install_rectangular
  install_polar
  zc <- add_complex (make_polar 4 0.3) (make_rectangular 3 2)
  s <- show_complex zc
  liftIO (putStrLn s)

Great! We can do what Lisp can do.

Moving to pure tables

Although it might seem like a step backwards, we will move away from the implicit operation table, and instead specify it explicitly. Instead of having installation functions that mutate a table of operations, each installation function will return the part of the table they are responsible for.

rectangular =
  let
    real_part (re, _) = re
    imag_part (_, im) = im
    magnitude (re, im) = sqrt (re^2 + im^2)
    angle (re, im) = atan2 im re
  in
    Map.fromList
      [ (("real_part", "rectangular"), real_part)
      , (("imag_part", "rectangular"), imag_part)
      , (("magnitude", "rectangular"), magnitude)
      , (("angle", "rectangular"), angle)
      ]

polar =
  let
    real_part (r, a) = r * cos a
    imag_part (r, a) = r * sin a
    magnitude (r, _) = r
    angle (_, a) = a
  in
    Map.fromList
      [ (("real_part", "polar"), real_part)
      , (("imag_part", "polar"), imag_part)
      , (("magnitude", "polar"), magnitude)
      , (("angle", "polar"), angle)
      ]

When we change the apply_generic function to take the table as an explicit argument, we have to update the other generic methods too.

apply_generic table op arg = do
  case Map.lookup (op, type_tag arg) table of
    Just fn -> fn (contents arg)
    Nothing -> error "No method for these types."

real_part table z = apply_generic table "real_part" z
imag_part table z = apply_generic table "imag_part" z
magnitude table z = apply_generic table "magnitude" z
angle table z = apply_generic table "angle" z

Finally, we have to do the same for the arithmetic.

add_complex table za zb =
  make_rectangular
    (real_part table za + real_part table zb)
    (imag_part table za + imag_part table zb)

mul_complex table za zb =
  make_polar
    (magnitude table za * magnitude table zb)
    (angle table za + angle table zb)

Now we can compute with these things fully purely, without any implicit state. We union the tables for each type of value before we begin.

main =
  let
    table = rectangular <> polar
    za = make_polar 4 0.3
    zb = make_rectangular 3 2
    zc = add_complex table za zb
  in
    putStrLn (show_complex table zc)

The explicit table references make this a little more verbose, but they serve as a nice stepping stone into the next language feature we’ll use.

The language supports this too

With the explicit table references, we have constructed a type class in disguise. We can define it formally, to let the compiler know what we’re up to.

class Complex z where
  real_part :: z -> Double
  imag_part :: z -> Double
  magnitude :: z -> Double
  angle :: z -> Double

Then we have to define custom types for our two representations of complex numbers. First we define the type for rectangular storage, and implement for it the generic methods of the type class.

data Rectangular = Rectangular Double Double

instance Complex Rectangular where
  real_part (Rectangular re _) = re
  imag_part (Rectangular _ im) = im
  magnitude (Rectangular re im) = sqrt (re^2 + im^2)
  angle (Rectangular re im) = atan2 im re

Then we do the same for polar coordinates.

data Polar = Polar Double Double

instance Complex Polar where
  real_part (Polar r a) = r * cos a
  imag_part (Polar r a) = r * sin a
  magnitude (Polar r _) = r
  angle (Polar _ a) = a

In the previous article, Rectangular and Polar were two constructors of the same type Complex. Here, Rectangular and Polar are two completely different types. The only thing they have in common, as far as the compiler cares, is that they implement the Complex interface.

Now that we’ve hooked into the language, we can remove all other code, and replace the arithmetic with these functions.

add_complex za zb =
  Rectangular
    (real_part za + real_part zb)
    (imag_part za + imag_part zb)

mul_complex za zb =
  Polar
    (magnitude za * magnitude zb)
    (angle za + angle zb)

Neat!

Treating different representations as the same

The drawback with this representation in Haskell is that since Rectangular and Polar are considered different types, we cannot create homogeneous data structures like lists with both representations in them. In other words, this is a type error, even if we only care about the elements of the list as generic complex numbers:

let
  za = Polar 4 0.3
  zb = Rectangular 3 2
  zs = [za, zb]  -- ! Couldn't match expected type …
in
  foldl1 add_complex zs

This was something that puzzled me for a long time as a Haskell beginner, but fortunately, Haskell has a very clean solution for this as well. We can create what’s called an existentially quantified type to contain this notion of “here’s a value we don’t know anything about, other than that it is a complex number.”

data AnyComplex = forall z. Complex z => AnyComplex z

Implementing the generic methods for complex numbers on this type is as easy as dispatching them to the underlying value. We can do that because although we don’t know what type the underlying value is, we know it supports these operations.

instance Complex AnyComplex where
  real_part (AnyComplex z) = real_part z
  imag_part (AnyComplex z) = imag_part z
  magnitude (AnyComplex z) = magnitude z
  angle (AnyComplex z) = angle z

Now we can create that list, and operate on it as the list of complex numbers that it is:

let
  za = Polar 4 0.3
  zb = Rectangular 3 2
  zs = [AnyComplex za, AnyComplex zb]
in
  foldl1 add_complex zs

The trick is that the AnyComplex wrapper has no type parameter, because it has hidden the concrete type inside itself. This means any two values of the AnyComplex type look the same to the compiler, even though they are different underneath.

Making return values generic

One thing Abelson and Sussman never did in sicp, but which is easy now that we have the type class machinery installed, is make the return value of the arithmetic functions generic. Right now, if we call add_complex, we will get a complex number back stored in rectangular form. Maybe we don’t want that.

To avoid that, we can add generic constructor functions to our type class definition. These will let us construct complex numbers stored in polar representation from rectangular coordinates, and vice versa.

class Complex z where
  --------------- >8 -----
  fromRectangular :: Double -> Double -> z
  fromPolar :: Double -> Double -> z

We implement these for both rectangular and polar types.

instance Complex Rectangular where
  --------------- >8 -----
  fromRectangular re im = Rectangular re im
  fromPolar r a =
    Rectangular (r * cos a) (r * sin a)

instance Complex Polar where
  --------------- >8 -----
  fromPolar r a = Polar r a
  fromRectangular re im =
    Polar (sqrt (re^2 + im^2)) (atan2 im re)

We also implement them for the AnyComplex wrapper. For the wrapper, we are free to choose the Rectangular representation for fromRectangular and similar for fromPolar.

instance Complex AnyComplex where
  --------------- >8 -----
  fromRectangular re im = AnyComplex (Rectangular re im)
  fromPolar r a = AnyComplex (Polar r a)

Now we can implement the arithmetic functions with these new generic conversions.

add_complex za zb =
  fromRectangular
    (real_part za + real_part zb)
    (imag_part za + imag_part zb)

mul_complex za zb =
  fromPolar
    (magnitude za * magnitude zb)
    (angle za + angle zb)

This lets the calling code decide which representation they want back. If they want the result to be stored in rectangular form, they can go

zc :: Rectangular = add_complex za zb

but if they’d rather have the polar representation, they would ask for that:

zc :: Polar = add_complex za zb

If they want the result as an AnyComplex value, that’s possible too! It’s one function that supports returning all representations. That’s very cool.

Type classes are secretly dictionaries of functions

What happens behind the scenes with the type classes is actually very similar to what we implemented manually when we had the dictionary of operations. If we think about what the type of add_complex would have been in that code, we’d arrive at something like

add_complex
  :: Map (Op_Name, Type_tag) (Complex -> Double)
  -> Complex
  -> Complex
  -> Complex
add_complex table za zb =
  make_rectangular
    (real_part table za + real_part table zb)
    (imag_part table za + imag_part table zb)

The first argument, the dictionary, carries the implementations of the interface, which lets us work on the arguments regardless of how they are stored. In the type class-based version we can instead think of it as having a type signature like

add_complex :: Complex z => z -> z -> z
add_complex za zb =
  fromRectangular
    (real_part za + real_part zb)
    (imag_part za + imag_part zb)

The first type class constraint, before the double-barred arrow, carries that dictionary of operations implicitly. But it’s effectively the same thing.

Things went well

Thus, the 80 or so lines of Lisp code it took to implement this got boiled down to just over 35 lines to do the same thing with type classes. Make that 45 lines if we include the additional functionality of being generic over the return value, something the Lisp code didn’t support.

In addition to shorter code, we also turn our mistakes into compile-time errors rather than midnight alerts. The compiler knows exactly which methods are in the type class dictionary, and it knows exactly which values implement that type class, so we literally cannot accidentally reach for a method that doesn’t exist, or try to call a method on a value that doesn’t support it.

The one thing the explicit dictionary supports, which the type class does not, is dynamically extending the set of generic operations available. Type classes have their set of operations fixed at compile time2 Good for safety, performance, etc.. The typical solution to extending type classes is to create new type classes with the extensions, and then require those at call sites where the extended method set is desired. This is not as big a problem as it may seem, because we will never need to make the extended set of methods available to existing code without first changing that code.