Entropic Thoughts

Esqueleto Tutorial

Esqueleto Tutorial

When interacting with databases in Haskell, we use a library called Persistent to create mappings between database content and Haskell data types. This library can also query for records and update them, as long as the operations involved are very basic.1 Get a record with an id, get a record matching a unique constraint, see if a record matching a unique constrant exists, upsert by unique constraint, insert a batch of records, select all records from a table matching a filter, and a few more along those lines.

Once operations become more complicated, we turn to Esqueleto, a lower-level library which reuses Persistent data mappings but let us write nearly raw sql queries. The main difference between raw sql and Esqueleto is that Esqueleto is type safe, meaning the compiler will complain if we write invalid Esqueleto queries. If we accidentally try to cram a varchar column into an UTCTime field in a Haskell object, the compiler will let us know. Not the pager going off at 3 AM.

Another strength of Esqueleto is that it is, in a sense, plain Haskell code. This is also its drawback. I have long struggled with learning to write Esqueleto fluently. Some colleagues suggested that maybe the problem is I don’t practice writing it enough. So I picked up an arbitrary sql tutorial I found on the web, and followed it but writing Esqueleto instead.

I’ll write down what I learned, and hopefully it helps someone else in the same situation as I was. If you’re just curious what production Haskell database access can look like, you might also enjoy skimming this article.

Haskell and SQL are prerequisites

Note that although I hope this tutorial will help someone else, it’s most decidedly written for someone in my situation: someone who has a decent grasp of both Haskell and sql, but struggle to merge the two into an understanding of Esqueleto.

In that situation, it’s probably best to replicate what I did: pick any sql tutorial you find, and work through its exercises in Esqueleto. Running into compiler errors and being forced to follow the types is probably one of the best ways to figure out how Esqueleto works.2 Follow the types is an expression that means to learn a library by figuring out how functions can be called based on their type signatures and how the types interlock to produce the desired result. This is a good general skill to have, because not all third-party libraries are well documented in any ecosystem, but almost all Haskell third-party libraries have good types. That said, if you still want this article as a reference or a guide along the way, go ahead.

We’ll use new style Esqueleto

Just before the pandemic, Esqueleto received a new module called Database.Esqueleto.Experimental. This contained a new syntax for specifying from clauses in a way that is both safer and more sensible than the old style. The new style is what I have found most of in the production code I read, and it became the default in the Esqueleto library a few months ago, after a five-year deprecation period of the legacy style.

Thus, this tutorial covers the new style. However, if you know the new style, adapting to the old style should be fairly simple. The following query gets the most expensive departments in the organisation:

select
  d.name,
  sum(p.salary)
from
  payroll p
  inner join department d
  on p.department = d.id
group by
  d.name
order by
  sum(p.salary) desc;

The new style of Esqueleto binds identifiers to table with the from function and a monadic bind operator.

select $ do
  (p :& d) <- from $
    table @Payroll
    `innerJoin` table @Department
    `on` (\(p :& d) -> p^.PayrollDepartment ==. d^.Id)
  groupBy (d^.DepartmentName)
  orderBy [desc (sum (p^.PayrollSalary))]
  pure (d^.DepartmentName, sum (p^.PayrollSalary))

in contrast, the old style of Esqueleto takes the entire rest of the query as a function which receives table identifiers as arguments:

select . from $ \(p `InnerJoin` d) -> do
  on (p^.PayrollDepartment ==. d^.Id)
  groupBy (d^.DepartmentName)
  orderBy [desc (sum (p^.PayrollSalary))]
  pure (d^.DepartmentName, sum (p^.PayrollSalary))

This code looks less clumsy, but it is also less flexible and relies on the programmer remembering to specify the on clause. The new style lets the compiler automatically detect a missing on clause.

Setting up the sandbox

You will get the most out of this tutorial if you follow along actively: look at the sql query, and try to translate it to Esqueleto before looking at the solution in this article. To be able to follow along, you need to download the sample sqlite database chinook.db, used in the tutorial above.3 If the file is no longer available at that domain, it exists in the Internet Archive. Create a Haskell project with dependencies esqueleto, persistent, persistent-sqlite, template-haskell, and text.

We’ll scaffold the sandbox application with something like

{-# LANGUAGE TypeApplications #-}

module Main where

import Control.Monad.IO.Class (MonadIO, liftIO)
import Data.Foldable (for_)
import qualified Data.Text as Text
import Database.Esqueleto.Experimental
import Database.Persist.Sqlite (runSqlite)
import Text.Printf (printf)

-- | Print an action label, run the IO action,
-- then print the returned values, numbered
-- on their own lines.
printResults :: (Show a, MonadIO m) => String -> m [a] -> m ()
printResults label query = do
  liftIO (putStrLn ("# " <> label))
  results <- query
  liftIO . for_ (zip [1..] results) $ \(i, v) -> do
    printf "%8i. " (i :: Int)
    print v

main :: IO ()
main = runSqlite (Text.pack "chinook.db") $ do
  printResults "traditional greeting" $
    pure ["Hello, world!"]

Once we have placed the sample database in the same directory as this program, it should print out a warning about an unused import and then run and print

# traditional greeting
       1. "Hello, world!"

Congrats! This connects to the sqlite database and then does nothing with it. Let’s do something.

Part 1: Querying data with select

It might seem overkill to start at this very basic level, but it is beneficial because it allows us to really dig into the fundamentals, and learn differences which will become important later on.

Section 1.1: Selecting some arithmetic

The sqlite tutorial starts off by selecting 1+1, to which sqlite helpfully responds with 2. We can do the same in Haskell – but it does get complicated right away. This is typical of type safe libraries for interacting with external components: they are built for handling a bunch of edge cases, so the most basic examples become a little more complicated than we’d like.

After the traditional greeting, we’ll add a sql query.

printResults "select 1 + 1;" $
  select $ pure (val @Int 1 +. val @Int 1)

This will print

# select 1 + 1;
       1. Value {unValue = 2}

which is correct, so that is good!

To figure out how this works, let’s start from the inside and work our way out. We need a way to get the number 1 into a query. The val function is provided by Esqueleto to convert a Haskell value into a sql expression. In the specific case of numeric literals, their types are ambiguous (is 1 an Int or a Word or a Ratio or …?) so we disambiguate with the type application @Int.

Thus, the Haskell code val @Int 1 generates the sql expression 1. The Esqueleto operator +. translates to the sql operator +. The complete Haskell expression

val @Int 1 +. val @Int 1

translates to the sql expression 1+1. Esqueleto, however, does not allow us to select sql expressions directly. We have to convert the expression into a query first, and the pure function does this.

Whereas the previous code represented a sql expression, by making it

pure (val @Int 1 +. val @Int 1)

we have turned it into a sql query. When we pass this to the select function, it gets run in the database and the results are returned.

But what is returned, really? This prints out

# select 1 + 1;
       1. Value {unValue = 2}

The Value thing is a newtype wrapper around plain values returned from the database. It turns out the select function can also return other things than plain values, but for now we’ll just accept that this is what we get out of the database.

Section 1.2: Selecting multiple things

Based on what we already know, we can probably guess the result of this query where we select a tuple of values.

printResults "select 10/5, 2*4;" $
  select $ pure
    ( val @Int 10 /. val @Int 5
    , val @Int 2 *. val @Int 4
    )

this prints out the desired4 It does really print out the unValue deconstructor, but I’m going to ignore it in the rest of this tutorial for ease of reading.

# select 10/5, 2*4;
       1. (Value 2, Value 8)

Section 1.3: Selecting from a table

Now, in order to select from a table in a type safe way, Esqueleto needs to know the schema of our tables. By using Persistent macros for generating code from a schema, we will also get Haskell data types and other goodies for free.5 The Persistent macros use a lot of language extensions, but fortunately the compiler is of great help in figuring out which are needed. Most of the list is given up front, and then when one is missing, it clearly indicates which. We put these macro calls in a separate module called Model.hs.

This is a faithful translation of the parts we will use from the sqlite tutorial database. Note that if we ran this in production, with the right settings, Persistent would complain that the database schema does not match the Haskell schema. We are okay with this here, because we are only looking at a subset of the database in this tutorial.

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE DerivingStrategies #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE QuasiQuotes #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE UndecidableInstances #-}

module Model where

import Database.Persist.TH

import ChinookPersistSettings

share [mkPersist sqlSettings] [chinookPersistSettings|
Track                sql=tracks
  Id                 sql=TrackId
  name               String
  albumId            AlbumId Maybe
  mediaTypeId        Int
  composer           String Maybe
  milliseconds       Int
  unitPrice          Double
  deriving Show

Customer             sql=customers
  Id                 sql=CustomerId
  city               String Maybe

Invoice              sql=invoices
  Id                 sql=InvoiceId
  billingAddress     String Maybe
  billingCity        String Maybe
  total              Double

Album                sql=albums
  Id                 sql=AlbumId
  title              String
  artistId           ArtistId

Artist               sql=artists
  Id                 sql=ArtistId
  name               String
|]

This also references a tiny module called ChinookPersistSettings6 Named Chinook after the name of the database, to be clear. which we use to help Persistent pick the right database column names for the tutorial data.7 It needs to be a separate module thanks to a ghc stage restriction, which prevents top-level splices to reference values defined earlier in the same module. I understand nothing of this. The compiler asked me to put the definition in a separate module and so I did.

module ChinookPersistSettings where

import Control.Arrow (first)
import qualified Data.Text as Text
import Database.Persist.Quasi
import Database.Persist.TH
import Language.Haskell.TH.Quote (QuasiQuoter)

chinookPersistSettings :: QuasiQuoter
chinookPersistSettings =
  persistWith $ flip setPsToDBName lowerCaseSettings $
    uncurry (<>) . first Text.toUpper . Text.splitAt 1

Now we can select plain values from that table in Esqueleto. We use the select function as before, but we construct the sql query from two things:

  1. We bind an identifier for a table using the from function, and
  2. We extract values from that table using the ^. operator.

It looks like

printResults "select t.trackid, t.name, t.composer, t.unitprice from tracks t;" $
  select $ do
    t <- from (table @Track)
    pure
      ( t^.TrackId
      , t^.TrackName
      , t^.TrackComposer
      , t^.TrackUnitPrice
      )

This starts to print

# select t.trackid, t.name, t.composer, t.unitprice from tracks t;
       1. (Value (TrackKey {unTrackKey = SqlBackendKey {unSqlBackendKey = 1}}), Value "For Those About To Rock (We Salute You)", Value (Just "Angus Young, Malcolm Young, Brian Johnson"), Value 0.99)
       2. (Value (TrackKey {unTrackKey = SqlBackendKey {unSqlBackendKey = 2}}), Value "Balls to the Wall", Value Nothing, Value 0.99)
       3. (Value (TrackKey {unTrackKey = SqlBackendKey {unSqlBackendKey = 3}}), Value "Fast As a Shark", Value (Just "F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman"), Value 0.99)
       ...

and then goes on for a long time.

Section 1.4: Selecting an entire record

The sql tutorial I’m basing this on warns the user against writing select * queries in application logic.

What if someone removes a column, your application would not be working properly, because it assumes that there are three columns returned, and the logic to process those three columns would be broken.

This warning does not apply to us, because Persistent and Esqueleto are type safe. If the schema changes, Persistent and the Haskell compiler will be on our side, and help us work out all the places where we have assumed a field exists but it no longer does. This is not a midnight page, it is a failed ci build.

To select a full record, we replace the tuple from the previous example by returning the identifier brought into scope with from.

printResults "select t.* from tracks t;" $
  select $ do
    t <- from (table @Track)
    pure t

When we do this, we will not get a Value newtype wrapper from Esqueleto, but proper Persistent Entity objects.

The above can, following monad laws, be simplified to

printResults "select * from tracks;" $
  select $ from (table @track)

Section 1.5: Reducing the number of records returned

I’m going to cheat a little and jump straight to the limit keyword. The sqlite tutorial does not present this until later, but I wish to be able to show output. To restrict the number of records returned to a fixed amount, we add a call to the Esqueleto limit function somewhere in the expression where the query is constructed.

printResults "select t.* from tracks t limit 5;" $
  select $ do
    t <- from (table @Track)
    limit 5
    pure t

This prints the more sensible-length

# select t.* from tracks t limit 5;
       1. Entity {entityKey = TrackKey {unTrackKey = SqlBackendKey {unSqlBackendKey = 1}}, entityVal = Track {trackName = "For Those About To Rock (We Salute You)", trackAlbumId = 1, trackMediaTypeId = 1, trackGenreId = 1, trackComposer = Just "Angus Young, Malcolm Young, Brian Johnson", trackMilliseconds = 343719, trackBytes = 11170334, trackUnitPrice = 0.99}}
       2. Entity {entityKey = TrackKey {unTrackKey = SqlBackendKey {unSqlBackendKey = 2}}, entityVal = Track {trackName = "Balls to the Wall", trackAlbumId = 2, trackMediaTypeId = 2, trackGenreId = 1, trackComposer = Nothing, trackMilliseconds = 342562, trackBytes = 5510424, trackUnitPrice = 0.99}}
       3. Entity {entityKey = TrackKey {unTrackKey = SqlBackendKey {unSqlBackendKey = 3}}, entityVal = Track {trackName = "Fast As a Shark", trackAlbumId = 3, trackMediaTypeId = 2, trackGenreId = 1, trackComposer = Just "F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman", trackMilliseconds = 230619, trackBytes = 3990994, trackUnitPrice = 0.99}}
       4. Entity {entityKey = TrackKey {unTrackKey = SqlBackendKey {unSqlBackendKey = 4}}, entityVal = Track {trackName = "Restless and Wild", trackAlbumId = 3, trackMediaTypeId = 2, trackGenreId = 1, trackComposer = Just "F. Baltes, R.A. Smith-Diesel, S. Kaufman, U. Dirkscneider & W. Hoffman", trackMilliseconds = 252051, trackBytes = 4331779, trackUnitPrice = 0.99}}
       5. Entity {entityKey = TrackKey {unTrackKey = SqlBackendKey {unSqlBackendKey = 5}}, entityVal = Track {trackName = "Princess of the Dawn", trackAlbumId = 3, trackMediaTypeId = 2, trackGenreId = 1, trackComposer = Just "Deaffy & R.A. Smith-Diesel", trackMilliseconds = 375418, trackBytes = 6290521, trackUnitPrice = 0.99}}

Note that in order to get the effect of select * from tracks we still need to return the object we bring into scope with the from function. We could simplify the query above to

select $ do
  limit 5
  from (table @Track)

because the limit function sets a limit to the query – it doesn’t matter where it is called. We could also argue that do notation is not needed here, and write the query as

select $ from (table @Track) <* limit 5

As long as we only add things to the query that does not depend on fields in the table we are selecting from, and we want to return full records, we can get by with applicative sequencing (<*) and we don’t need any monadic bind.

Section 1.6: Sorting results

Following the tutorial, we will make a query that sorts on album id ascending, and then track length in milliseconds descending.

printResults "select t.name, t.milliseconds, t.albumid from tracks t order by t.albumid asc, t.milliseconds desc limit 5;" $
  select $ do
    t <- from (table @Track)
    limit 5
    orderBy [asc (t^.TrackAlbumId), desc (t^.TrackMilliseconds)]
    pure (t^.TrackName, t^.TrackMilliseconds, t^.TrackAlbumId)

Here, we have to give a name to the value returned by from (i.e. we use a monadic bind), since we need the name to extract the column values used for sorting. This prints

# select t.name, t.milliseconds, t.albumid from tracks t order by t.albumid asc, t.milliseconds desc limit 5;
     1. (Value "For Those About To Rock (We Salute You)", Value 343719, Value 1)
     2. (Value "Spellbound", Value 270863, Value 1)
     3. (Value "Evil Walks", Value 263497, Value 1)
     4. (Value "Breaking The Rules", Value 263288, Value 1)
     5. (Value "Let's Get It Up", Value 233926, Value 1})

as expected.

We note that in sql, we specify first what we select, then where it is selected from. In Esqueleto, we specify first from where we will select, and then we optionally extract individual components from those things. This is similar to e.g. linq queries in C#.

Part 2: Filtering in select queries

At this point, the tutorial dives into what makes sql powerful: the ability to cut out the data of interest from other stuff.

Section 2.1: Getting unique results with a distinct query

The Esqueleto distinct function takes a query object and makes it into a distinct sql query. The Esqueleto function is intended to be called composed with the select function, as in this example.

printResults "select distinct c.city from customers c order by c.city limit 5;" $
  select . distinct $ do
    c <- from (table @Customer)
    limit 5
    orderBy [asc (c^.CustomerCity)]
    pure (c^.CustomerCity)
# select distinct c.city from customers c order by c.city limit 5;
       1. Value Just "Amsterdam"
       2. Value Just "Bangalore"
       3. Value Just "Berlin"
       4. Value Just "Bordeaux"
       5. Value Just "Boston"

However, it is worth knowing that the Esqueleto distinct function really only sets a flag in the query object, so it could also be called like limit, by supplying a noop query as an argument. That would look like

select $ do
  c <- from (table @Customer)
  limit 5
  distinct (pure ())
  orderBy [asc (c^.CustomerCity)]
  pure (c^.CustomerCity)

and it has the same effect.

Section 2.2: Adding a where clause

The Esqueleto where_ function takes a sql conditional object8 Technically this is just a sql expression object that is known to have type Bool. and limits the query to the items matching that conditional expression. The conditional expression is formed by using what looks like regular Haskell operators (==, &&, >, etc.) but appending a period to their end. These get translated to the corresponding sql operators.

printResults "select t.name from tracks t where t.albumid = 2 and t.milliseconds > 250000 limit 5;" $
  select $ do
    t <- from (table @Track)
    limit 5
    where_ $
      t^.TrackAlbumId ==. val (Just (AlbumKey 1))
      &&. t^.TrackMilliseconds >. val 250000
    pure (t^.TrackName)
# select t.name from tracks t where t.albumid = 2 and t.milliseconds > 250000 limit 5;
       1. Value "For Those About To Rock (We Salute You)"
       2. Value "Evil Walks"
       3. Value "Breaking The Rules"
       4. Value "Spellbound"

As was perhaps clear from the above, the trackAlbumId field in our code is not a plain Int, but a type safe AlbumId. To select based on this, we have to convert the integer 1 into the right type, by passing it through the AlbumKey constructor.

Section 2.3: Primitive string matching with the like operator

The Esqueleto like function takes two sql expression objects that return strings, and creates a conditional expression from them, where they are compared with the sql like operator. This is a place where the type safety of Esqueleto shines. In raw sql, we can accidentally call like on an integer. In Haskell, that would be a compiler error.

printResults "select t.name, t.albumid, t.composer from tracks t where t.composer like '%Smith%' order by t.albumid limit 5;" $
  select $ do
    t <- from (table @Track)
    limit 5
    orderBy [asc (t^.TrackAlbumId)]
    where_ $
      like (t^.TrackComposer) (val (Just "%Smith%"))
    pure (t^.TrackName, t^.TrackAlbumId, t^.TrackComposer)

Here, we called like as a regular function. We could also have called it with infix backticks to make it look like an operator:

t^.TrackComposer `like` val (Just "%Smith%")

However, I don’t like infix backticks and avoid them in my code.9 I think for the most part they make code more difficult to follow. People struggle with precedence rules, and it’s annoying enough to have to deal with them with operators, but with functions also?! Either way, we get the results we expect:

# select t.name, t.albumid, t.composer from tracks t where t.composer like '%Smith%' order by t.albumid limit 5;
       1. (Value "Restless and Wild", Value (Just 3), Value (Just "F. Baltes, R.A. Smith-Diesel, S. Kaufman, U. Dirkscneider & W. Hoffman"))
       2. (Value "Princess of the Dawn", Value (Just 3), Value (Just "Deaffy & R.A. Smith-Diesel"))
       3. (Value "Killing Floor", Value (Just 19), Value (Just "Adrian Smith"))
       4. (Value "Machine Men", Value (Just 19), Value (Just "Adrian Smith"))
       5. (Value "2 Minutes To Midnight", Value (Just 95), Value (Just "Adrian Smith/Bruce Dickinson"))

Section 2.4: Combining conditional expressions

We combine conditional expressions and ensure precedence between operators using regular Haskell parentheses.

printResults "select i.total from invoices i where i.total > 5 and (i.billingcity = 'Chicago' or i.billingaddress like '% Broadway');" $
  select $ do
    i <- from (table @Invoice)
    limit 5
    where_ $
      i^.InvoiceTotal >. val 5
      &&. (
        i^.InvoiceBillingCity ==. val (Just "Chicago")
        ||. like (i^.InvoiceBillingAddress) (val (Just "% Broadway"))
      )
    pure (i^.InvoiceTotal)
# select i.total from invoices i where i.total > 5 and (i.billingcity = 'Chicago' or i.billingaddress like '% Broadway');
       1. Value 15.86
       2. Value 5.94
       3. Value 8.91
       4. Value 7.96
       5. Value 5.94

Section 2.5: Querying based on set inclusion

The Esqueleto in_ function works as one might expect by now. The second argument must be a sql expression object that returns a value list. The easiest way to construct such a list from Haskell literals is through valList.

printResults "select t.name, t.albumid, t.mediatypeid from tracks t where t.mediatypeid in (2, 3) limit 5;" $
  select $ do
    t <- from (table @Track)
    limit 5
    where_ $
      in_ (t^.TrackMediaTypeId) (valList [2, 3])
    pure (t^.TrackName, t^.TrackAlbumId, t^.TrackMediaTypeId)
# select name, albumid, mediatypeid from tracks where mediatypeid in (2, 3) limit 5
       1. (Value "Balls to the Wall", Value (Just 2), Value 2)
       2. (Value "Fast As a Shark", Value (Just 3), Value 2})
       3. (Value "Restless and Wild", Value (Just 3), Value 2})
       4. (Value "Princess of the Dawn", Value (Just 3), Value 2)
       5. (Value "Welcome to the Jungle", Value (Just 90), Value 2)

Again, the in_ function could have been called with infix backticks:

t^.TrackMediaTypeId `in_` valList [2, 3]

It saves some parentheses at the cost of making people confused about the order in which code runs.

Section 2.6: Checking for membership in subquery

We just used the Esqueleto in_ function to check for membership in a literal list using the valList function. We can also use the results of a subquery as the list to check for membership in. This query finds all tracks that come from albums by the artist with id 12.

printResults "select t.name from tracks t where t.albumid in (select a.albumid from albums a where a.artistid = 12) limit 5;" $
  select $ do
    t <- from (table @Track)
    limit 5
    where_ $
      in_ (t^.TrackAlbumId) . subList_select $ do
        a <- from (table @Album)
        where_ $
          a^.AlbumArtistId ==. val (ArtistKey 12)
        pure (just (a^.AlbumId))
    pure (t^.TrackName)
# select t.name from tracks t where t.albumid in (select a.albumid from albums a where a.artistid = 12) limit 5;
       1. Value "Black Sabbath"
       2. Value "The Wizard"
       3. Value "Behind The Wall Of Sleep"
       4. Value "N.I.B."
       5. Value "Evil Woman"

There’s a lot going on in that query, but the new bit is the subquery:

subList_select $ do
  a <- from (table @Album)
  where_ $
    a^.AlbumArtistId ==. val 12
  pure (just (a^.AlbumId))

The Esqueleto select function executes a query, but the subList_select function merely translates code into a sql expression that can be inserted into another query where a value list is expected. The only new thing about this subquery is the just function, which we use to convert the AlbumId into a Maybe AlbumId, in order to make it possible to compare against TrackAlbumId which is nullable in the database.10 We might be tempted to fmap the Just constructor over a^.AlbumId, but the SqlExpr type is not a real functor, so this cannot be done. Thus, the Esqueleto authors have given us a just function to perform that specific operation in a sql-aware way.

We compose this subquery with the in_ function as the outer filter. Since the subList_select function returns a sql expression object, we could also have assigned it to a variable separately, like-a-so:

printResults "select t.name from tracks t where t.albumid in (select a.albumid from albums a where a.artistid = 12) limit 5;" $
  let
    albumIdSubquery = subList_select $ do
      a <- from (table @Album)
      where_ $
        a^.AlbumArtistId ==. val 12
      pure (just (a^.AlbumId))
  in
    select $ do
      t <- from (table @Track)
      limit 5
      where_ $ in_ (t^.TrackAlbumId) albumIdSubquery
      pure (t^.TrackName)

This is why Esqueleto queries being plain Haskell code is a strength. It gives us more opportunities to organise our code and break it apart into manageable chunks.

Section 2.7: Checking for nullity

The Esqueleto isNothing and not_ . isNothing functions can be used to check for nullness and non-nullness respectively. For example

printResults "select t.name from tracks t where t.composer is null order by t.name limit 5;" $
  select $ do
    t <- from (table @Track)
    limit 5
    orderBy [asc (t^.TrackName)]
    where_ $
      isNothing (t^.TrackComposer)
    pure (t^.TrackName)

or

printResults "select t.name from tracks t where t.composer is not null order by t.name limit 5;" $
  select $ do
    t <- from (table @Track)
    limit 5
    orderBy [asc (t^.TrackName)]
    where_ $
      not_ (isNothing (t^.TrackComposer))
    pure (t^.TrackName)

Part 3: Joins

This is the part I was looking forward to learning. Being able to filter data in one table is very powerful, but being able to cross-reference tables is what really sets sql apart as a powerful data storage tool.

Section 3.1: A regular inner join between two tables

Let’s say we want to know the track title and album name for the shortest tracks that belong to an album in our database.

printResults "select t.name, a.title from tracks t inner join albums a on a.albumid = t.albumid order by t.milliseconds limit 5;" $
  select $ do
    (t :& a) <- from $
      innerJoin (table @Track) . on (table @Album) $ \(t :& a) ->
        t^.TrackAlbumId ==. just (a^.AlbumId)
    limit 5
    orderBy [asc (t^.TrackMilliseconds)]
    pure (t^.TrackName, a^.AlbumTitle)
# select t.name, a.title from tracks t inner join albums a on a.albumid = t.albumid order by t.milliseconds limit 5;
       1. (Value "\201 Uma Partida De Futebol", Value "O Samba Pocon\233")
       2. (Value "Now Sports", Value "Body Count")
       3. (Value "A Statistic", Value "Body Count")
       4. (Value "Oprah", Value "Body Count")
       5. (Value "Commercial 1", Value "House of Pain")

At this point, we finally see why we have been giving our tables short identifiers (t, c, i, a etc.) in both sql and Haskell code. When we are juggling multiple tables at once in a join, we can access fields from either table by choosing the right prefix, and it’s clear where the data is coming from.

The new part in this query is the join. I’ve written it this way:

(t :& a) <- from $
  innerJoin (table @Track) . on (table @Album) $ \(t :& a) ->
    t^.TrackAlbumId ==. just (a^.AlbumId)

The way you’ll find it written by other people is using infix backticks.

(t :& a) <- from $
  table @Track
  `innerJoin` table @Album
  `on` (\(t :& a) -> t^.TrackAlbumId ==. just (a^.AlbumId))

However such a from clause is written, its innermost expression is going to be a function that produces a sql conditional expression indicating how to join rows from both tables with each other. In our case, the function is

let
  joinRowsOn (t :& a) =
    t^.TrackAlbumId ==. just (a^.AlbumId)
in
  -- ...

The :& data constructor serves practically the same purpuse as a tuple – it binds together two table objects that are joined in a query, and by deconstructing that pair we can access each table object individually, as we do here when asserting equivalence between album ids.11 Note also the just function, required because TrackAlbumId is nullable.

This function is paired up with the second table we are joining using the on function, as in

let
  joinRowsOn (t :& a) =
    t^.TrackAlbumId ==. just (a^.AlbumId)
  hasOnClause =
    on (table @Album) joinRowsOn
in
  -- ...

Then the innerJoin function takes the first table and the result of the on function and produces an object that can be passed to the Esqueleto from function.

let
  joinRowsOn (t :& a) =
    t^.TrackAlbumId ==. just (a^.AlbumId)
  hasOnClause =
    on (table @Album) joinRowsOn
  completeJoinExpression =
    innerJoin (table @Track) hasOnClause
in
  select $ do
    joinedTables <- from completeJoinExpression
    -- ...

This will get translated to an inner join sql query.

To get the identifiers of the individual tables out of the joinedTables variable, we can add a pattern match on it, since it is one of those :& pairs that we can deconstruct.

let
  joinRowsOn (t :& a) =
    t^.TrackAlbumId ==. just (a^.AlbumId)
  hasOnClause =
    on (table @Album) joinRowsOn
  completeJoinExpression =
    innerJoin (table @Track) hasOnClause
in
  select $ do
    joinedTables <- from completeJoinExpression
    case joinedTables of
      t :& a -> do
        -- ...

Going backwards again, we can now inline the pattern match:

let
  joinRowsOn (t :& a) =
    t^.TrackAlbumId ==. just (a^.AlbumId)
  hasOnClause =
    on (table @Album) joinRowsOn
  completeJoinExpression =
    innerJoin (table @Track) hasOnClause
in
  select $ do
    (t :& a) <- from completeJoinExpression

and then inline the completeJoinExpression.

let
  joinRowsOn (t :& a) =
    t^.TrackAlbumId ==. just (a^.AlbumId)
  hasOnClause =
    on (table @Album) joinRowsOn
in
  select $ do
    (t :& a) <- from $
      innerJoin (table @Track) hasOnClause

From here, we continue to inline the hasOnClause variable.

let
  joinRowsOn (t :& a) =
    t^.TrackAlbumId ==. just (a^.AlbumId)
in
  select $ do
    (t :& a) <- from $
      innerJoin (table @Track)
      . on (table @Album)
      $ joinRowsOn

We inline the last variable too.

select $ do
  (t :& a) <- from $
    innerJoin (table @Track)
    . on (table @Album)
    $ \(t :& a) -> t^.TrackAlbumId ==. just (a^.AlbumId)

Now we’re back at the query we started with. Hopefully that gives a sense of how these joins are constructed.

Section 3.2: Adding another table to the join

We now want to do something simliar to the above, except replace the album name with the artist name in the listing. This requires yet another join, to go from the artist id of the album to the artist name. We could do this without infix backticks, and it might look something like

printResults "select t.name, r.name from tracks t inner join albums l on l.albumid = t.albumid inner join artists r on l.artistid = r.artistid order by t.milliseconds limit 5 offset 30;" $
  select $ do
    (t :& _ :& r) <- from $
      let
        trackAlbum = on (table @Album) $ \(t :& l) ->
          t^.TrackAlbumId ==. just (l^.AlbumId)
        albumArtist = on (table @Artist) $ \(_ :& l :& r) ->
          l^.AlbumArtistId ==. r^.ArtistId
      in
        innerJoin (innerJoin (table @Track) trackAlbum) albumArtist
    limit 5
    offset 30
    orderBy [asc (t^.TrackMilliseconds)]
    pure (t^.TrackName, r^.ArtistName)

but at this point I might have to give up my crusade and bow before the backticks. The triple join reads more naturally that way:

(t :& _ :& r) <- from $
  table @Track
  `innerJoin` table @Album
  `on` (\(t :& l) -> t^.TrackAlbumId ==. just (l^.AlbumId))
  `innerJoin` table @Artist
  `on` (\(_ :& l :& r) -> l^.AlbumArtistId ==. r^.ArtistId)

The other joins work much like this one so we’ll ignore them and march right ahead onto the next part.

Part 4: Data aggregation

The most basic form of aggregation is counting within groups, so that’s what the tutorial goes to and where we’ll pick up.

Section 4.1: Counting within groups

We can guess how to group by something based on what we’ve seen so far. As for the count function, it is generic over all numbers so we specify we want an integer with a type application.

printResults "select t.albumid, count(t.trackid) from tracks t group by t.albumid limit 5;" $
  select $ do
    t <- from $ table @Track
    limit 5
    groupBy (t^.TrackAlbumId)
    pure (t^.TrackAlbumId, count @Int (t^.TrackId))

We could also order by count to get the most trackful albums.

printResults "select t.albumid, count(t.trackid) from tracks t group by t.albumid order by count(t.trackid) desc limit 5;" $
  select $ do
    t <- from $ table @Track
    limit 5
    groupBy (t^.TrackAlbumId)
    orderBy [desc (count @Int (t^.TrackId))]
    pure (t^.TrackAlbumId, count @Int (t^.TrackId))

We can list the names of the first five albums with more than 15 tracks on them, combining grouping, joining, and a having filter.

printResults "select a.title from tracks t inner join albums a on t.albumid = a.albumid group by t.albumid having count(t.trackid) > 15 limit 5;" $
  select $ do
    (t :& a) <- from $
      table @Track
      `innerJoin` table @Album
      `on` (\(t :& a) -> t^.TrackAlbumId ==. just (a^.AlbumId))
    limit 5
    groupBy (t^.TrackAlbumId)
    having (count (t^.TrackId) >. val @Int 15)
    pure (a^.AlbumTitle)

More aggregation, set operations, CTEs, etc.

There are many more things we can do12 I’m a big fan of recursive ctes after having used them to speed up application logic by many orders of magnitude at one of my first jobs., both in sql and Esqueleto, but I’m leaving this off here. After having gone through the above examples, we ought to feel much more confident in being able to both read and write an Esqueleto query when we need it.

Maybe we’ll come back to this in the future and show how more advanced features work. Maybe not.