Esqueleto Tutorial
Table of Contents
When interacting with databases in Haskell, we use a library called Persistent to create mappings between database content and Haskell data types. This library can also query for records and update them, as long as the operations involved are very basic.1 Get a record with an id, get a record matching a unique constraint, see if a record matching a unique constrant exists, upsert by unique constraint, insert a batch of records, select all records from a table matching a filter, and a few more along those lines.
Once operations become more complicated, we turn to Esqueleto, a lower-level
library which reuses Persistent data mappings but let us write nearly raw
sql queries. The main difference between raw sql and Esqueleto is that
Esqueleto is type safe, meaning the compiler will complain if we write invalid
Esqueleto queries. If we accidentally try to cram a varchar
column into an
UTCTime
field in a Haskell object, the compiler will let us know. Not the
pager going off at 3 AM.
Another strength of Esqueleto is that it is, in a sense, plain Haskell code. This is also its drawback. I have long struggled with learning to write Esqueleto fluently. Some colleagues suggested that maybe the problem is I don’t practice writing it enough. So I picked up an arbitrary sql tutorial I found on the web, and followed it but writing Esqueleto instead.
I’ll write down what I learned, and hopefully it helps someone else in the same situation as I was. If you’re just curious what production Haskell database access can look like, you might also enjoy skimming this article.
Haskell and SQL are prerequisites
Note that although I hope this tutorial will help someone else, it’s most decidedly written for someone in my situation: someone who has a decent grasp of both Haskell and sql, but struggle to merge the two into an understanding of Esqueleto.
In that situation, it’s probably best to replicate what I did: pick any sql tutorial you find, and work through its exercises in Esqueleto. Running into compiler errors and being forced to follow the types is probably one of the best ways to figure out how Esqueleto works.2 Follow the types is an expression that means to learn a library by figuring out how functions can be called based on their type signatures and how the types interlock to produce the desired result. This is a good general skill to have, because not all third-party libraries are well documented in any ecosystem, but almost all Haskell third-party libraries have good types. That said, if you still want this article as a reference or a guide along the way, go ahead.
We’ll use new style Esqueleto
Just before the pandemic, Esqueleto received a new module called
Database.Esqueleto.Experimental
. This contained a new syntax for specifying
from
clauses in a way that is both safer and more sensible than the old style.
The new style is what I have found most of in the production code I read, and it
became the default in the Esqueleto library a few months ago, after a five-year
deprecation period of the legacy style.
Thus, this tutorial covers the new style. However, if you know the new style, adapting to the old style should be fairly simple. The following query gets the most expensive departments in the organisation:
select d.name, sum(p.salary) from payroll p inner join department d on p.department = d.id group by d.name order by sum(p.salary) desc;
The new style of Esqueleto binds identifiers to table with the from
function
and a monadic bind operator.
select $ do (p :& d) <- from $ table @Payroll `innerJoin` table @Department `on` (\(p :& d) -> p^.PayrollDepartment ==. d^.Id) groupBy (d^.DepartmentName) orderBy [desc (sum (p^.PayrollSalary))] pure (d^.DepartmentName, sum (p^.PayrollSalary))
in contrast, the old style of Esqueleto takes the entire rest of the query as a function which receives table identifiers as arguments:
select . from $ \(p `InnerJoin` d) -> do on (p^.PayrollDepartment ==. d^.Id) groupBy (d^.DepartmentName) orderBy [desc (sum (p^.PayrollSalary))] pure (d^.DepartmentName, sum (p^.PayrollSalary))
This code looks less clumsy, but it is also less flexible and relies on the
programmer remembering to specify the on
clause. The new style lets the
compiler automatically detect a missing on
clause.
Setting up the sandbox
You will get the most out of this tutorial if you follow along actively: look at
the sql query, and try to translate it to Esqueleto before looking at the
solution in this article. To be able to follow along, you need to download the
sample sqlite database chinook.db
, used in the tutorial above.3 If the
file is no longer available at that domain, it exists in the Internet Archive.
Create a Haskell project with dependencies esqueleto, persistent,
persistent-sqlite, template-haskell, and text.
We’ll scaffold the sandbox application with something like
{-# LANGUAGE TypeApplications #-} module Main where import Control.Monad.IO.Class (MonadIO, liftIO) import Data.Foldable (for_) import qualified Data.Text as Text import Database.Esqueleto.Experimental import Database.Persist.Sqlite (runSqlite) import Text.Printf (printf) -- | Print an action label, run the IO action, -- then print the returned values, numbered -- on their own lines. printResults :: (Show a, MonadIO m) => String -> m [a] -> m () printResults label query = do liftIO (putStrLn ("# " <> label)) results <- query liftIO . for_ (zip [1..] results) $ \(i, v) -> do printf "%8i. " (i :: Int) print v main :: IO () main = runSqlite (Text.pack "chinook.db") $ do printResults "traditional greeting" $ pure ["Hello, world!"]
Once we have placed the sample database in the same directory as this program, it should print out a warning about an unused import and then run and print
# traditional greeting 1. "Hello, world!"
Congrats! This connects to the sqlite database and then does nothing with it. Let’s do something.
Part 1: Querying data with select
It might seem overkill to start at this very basic level, but it is beneficial because it allows us to really dig into the fundamentals, and learn differences which will become important later on.
Section 1.1: Selecting some arithmetic
The sqlite tutorial starts off by selecting 1+1, to which sqlite helpfully
responds with 2
. We can do the same in Haskell – but it does get complicated
right away. This is typical of type safe libraries for interacting with external
components: they are built for handling a bunch of edge cases, so the most basic
examples become a little more complicated than we’d like.
After the traditional greeting, we’ll add a sql query.
printResults "select 1 + 1;" $ select $ pure (val @Int 1 +. val @Int 1)
This will print
# select 1 + 1; 1. Value {unValue = 2}
which is correct, so that is good!
To figure out how this works, let’s start from the inside and work our way out.
We need a way to get the number 1
into a query. The val
function is provided
by Esqueleto to convert a Haskell value into a sql expression. In the specific
case of numeric literals, their types are ambiguous (is 1
an Int
or a Word
or a Ratio
or …?) so we disambiguate with the type application @Int
.
Thus, the Haskell code val @Int 1
generates the sql expression 1
. The
Esqueleto operator +.
translates to the sql operator +
. The complete
Haskell expression
val @Int 1 +. val @Int 1
translates to the sql expression 1+1
. Esqueleto, however, does not allow us
to select
sql expressions directly. We have to convert the expression into
a query first, and the pure
function does this.
Whereas the previous code represented a sql expression, by making it
pure (val @Int 1 +. val @Int 1)
we have turned it into a sql query. When we pass this to the select
function, it gets run in the database and the results are returned.
But what is returned, really? This prints out
# select 1 + 1; 1. Value {unValue = 2}
The Value
thing is a newtype wrapper around plain values returned from the
database. It turns out the select
function can also return other things than
plain values, but for now we’ll just accept that this is what we get out of the
database.
Section 1.2: Selecting multiple things
Based on what we already know, we can probably guess the result of this query where we select a tuple of values.
printResults "select 10/5, 2*4;" $ select $ pure ( val @Int 10 /. val @Int 5 , val @Int 2 *. val @Int 4 )
this prints out the desired4 It does really print out the unValue
deconstructor, but I’m going to ignore it in the rest of this tutorial for ease
of reading.
# select 10/5, 2*4; 1. (Value 2, Value 8)
Section 1.3: Selecting from a table
Now, in order to select from a table in a type safe way, Esqueleto needs to know
the schema of our tables. By using Persistent macros for generating code from a
schema, we will also get Haskell data types and other goodies for free.5 The
Persistent macros use a lot of language extensions, but fortunately the compiler
is of great help in figuring out which are needed. Most of the list is given up
front, and then when one is missing, it clearly indicates which. We put these
macro calls in a separate module called Model.hs
.
This is a faithful translation of the parts we will use from the sqlite tutorial database. Note that if we ran this in production, with the right settings, Persistent would complain that the database schema does not match the Haskell schema. We are okay with this here, because we are only looking at a subset of the database in this tutorial.
{-# LANGUAGE DataKinds #-} {-# LANGUAGE DerivingStrategies #-} {-# LANGUAGE FlexibleInstances #-} {-# LANGUAGE GADTs #-} {-# LANGUAGE GeneralizedNewtypeDeriving #-} {-# LANGUAGE MultiParamTypeClasses #-} {-# LANGUAGE OverloadedStrings #-} {-# LANGUAGE QuasiQuotes #-} {-# LANGUAGE StandaloneDeriving #-} {-# LANGUAGE TemplateHaskell #-} {-# LANGUAGE TypeFamilies #-} {-# LANGUAGE TypeOperators #-} {-# LANGUAGE UndecidableInstances #-} module Model where import Database.Persist.TH import ChinookPersistSettings share [mkPersist sqlSettings] [chinookPersistSettings| Track sql=tracks Id sql=TrackId name String albumId AlbumId Maybe mediaTypeId Int composer String Maybe milliseconds Int unitPrice Double deriving Show Customer sql=customers Id sql=CustomerId city String Maybe Invoice sql=invoices Id sql=InvoiceId billingAddress String Maybe billingCity String Maybe total Double Album sql=albums Id sql=AlbumId title String artistId ArtistId Artist sql=artists Id sql=ArtistId name String |]
This also references a tiny module called ChinookPersistSettings
6 Named
Chinook after the name of the database, to be clear. which we use to help
Persistent pick the right database column names for the tutorial data.7 It
needs to be a separate module thanks to a ghc stage restriction, which
prevents top-level splices to reference values defined earlier in the same
module. I understand nothing of this. The compiler asked me to put the
definition in a separate module and so I did.
module ChinookPersistSettings where import Control.Arrow (first) import qualified Data.Text as Text import Database.Persist.Quasi import Database.Persist.TH import Language.Haskell.TH.Quote (QuasiQuoter) chinookPersistSettings :: QuasiQuoter chinookPersistSettings = persistWith $ flip setPsToDBName lowerCaseSettings $ uncurry (<>) . first Text.toUpper . Text.splitAt 1
Now we can select plain values from that table in Esqueleto. We use the select
function as before, but we construct the sql query from two things:
- We bind an identifier for a table using the
from
function, and - We extract values from that table using the
^.
operator.
It looks like
printResults "select t.trackid, t.name, t.composer, t.unitprice from tracks t;" $ select $ do t <- from (table @Track) pure ( t^.TrackId , t^.TrackName , t^.TrackComposer , t^.TrackUnitPrice )
This starts to print
# select t.trackid, t.name, t.composer, t.unitprice from tracks t; 1. (Value (TrackKey {unTrackKey = SqlBackendKey {unSqlBackendKey = 1}}), Value "For Those About To Rock (We Salute You)", Value (Just "Angus Young, Malcolm Young, Brian Johnson"), Value 0.99) 2. (Value (TrackKey {unTrackKey = SqlBackendKey {unSqlBackendKey = 2}}), Value "Balls to the Wall", Value Nothing, Value 0.99) 3. (Value (TrackKey {unTrackKey = SqlBackendKey {unSqlBackendKey = 3}}), Value "Fast As a Shark", Value (Just "F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman"), Value 0.99) ...
and then goes on for a long time.
Section 1.4: Selecting an entire record
The sql tutorial I’m basing this on warns the user against writing select *
queries in application logic.
What if someone removes a column, your application would not be working properly, because it assumes that there are three columns returned, and the logic to process those three columns would be broken.
This warning does not apply to us, because Persistent and Esqueleto are type safe. If the schema changes, Persistent and the Haskell compiler will be on our side, and help us work out all the places where we have assumed a field exists but it no longer does. This is not a midnight page, it is a failed ci build.
To select a full record, we replace the tuple from the previous example by
returning the identifier brought into scope with from
.
printResults "select t.* from tracks t;" $ select $ do t <- from (table @Track) pure t
When we do this, we will not get a Value
newtype wrapper from Esqueleto, but
proper Persistent Entity
objects.
The above can, following monad laws, be simplified to
printResults "select * from tracks;" $ select $ from (table @track)
Section 1.5: Reducing the number of records returned
I’m going to cheat a little and jump straight to the limit
keyword. The
sqlite tutorial does not present this until later, but I wish to be able to show
output. To restrict the number of records returned to a fixed amount, we add a
call to the Esqueleto limit
function somewhere in the expression where the
query is constructed.
printResults "select t.* from tracks t limit 5;" $ select $ do t <- from (table @Track) limit 5 pure t
This prints the more sensible-length
# select t.* from tracks t limit 5; 1. Entity {entityKey = TrackKey {unTrackKey = SqlBackendKey {unSqlBackendKey = 1}}, entityVal = Track {trackName = "For Those About To Rock (We Salute You)", trackAlbumId = 1, trackMediaTypeId = 1, trackGenreId = 1, trackComposer = Just "Angus Young, Malcolm Young, Brian Johnson", trackMilliseconds = 343719, trackBytes = 11170334, trackUnitPrice = 0.99}} 2. Entity {entityKey = TrackKey {unTrackKey = SqlBackendKey {unSqlBackendKey = 2}}, entityVal = Track {trackName = "Balls to the Wall", trackAlbumId = 2, trackMediaTypeId = 2, trackGenreId = 1, trackComposer = Nothing, trackMilliseconds = 342562, trackBytes = 5510424, trackUnitPrice = 0.99}} 3. Entity {entityKey = TrackKey {unTrackKey = SqlBackendKey {unSqlBackendKey = 3}}, entityVal = Track {trackName = "Fast As a Shark", trackAlbumId = 3, trackMediaTypeId = 2, trackGenreId = 1, trackComposer = Just "F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman", trackMilliseconds = 230619, trackBytes = 3990994, trackUnitPrice = 0.99}} 4. Entity {entityKey = TrackKey {unTrackKey = SqlBackendKey {unSqlBackendKey = 4}}, entityVal = Track {trackName = "Restless and Wild", trackAlbumId = 3, trackMediaTypeId = 2, trackGenreId = 1, trackComposer = Just "F. Baltes, R.A. Smith-Diesel, S. Kaufman, U. Dirkscneider & W. Hoffman", trackMilliseconds = 252051, trackBytes = 4331779, trackUnitPrice = 0.99}} 5. Entity {entityKey = TrackKey {unTrackKey = SqlBackendKey {unSqlBackendKey = 5}}, entityVal = Track {trackName = "Princess of the Dawn", trackAlbumId = 3, trackMediaTypeId = 2, trackGenreId = 1, trackComposer = Just "Deaffy & R.A. Smith-Diesel", trackMilliseconds = 375418, trackBytes = 6290521, trackUnitPrice = 0.99}}
Note that in order to get the effect of select * from tracks
we still need to
return the object we bring into scope with the from
function. We could
simplify the query above to
select $ do limit 5 from (table @Track)
because the limit
function sets a limit to the query – it doesn’t matter where
it is called. We could also argue that do
notation is not needed here, and
write the query as
select $ from (table @Track) <* limit 5
As long as we only add things to the query that does not depend on fields in the
table we are selecting from, and we want to return full records, we can get by
with applicative sequencing (<*
) and we don’t need any monadic bind.
Section 1.6: Sorting results
Following the tutorial, we will make a query that sorts on album id ascending, and then track length in milliseconds descending.
printResults "select t.name, t.milliseconds, t.albumid from tracks t order by t.albumid asc, t.milliseconds desc limit 5;" $ select $ do t <- from (table @Track) limit 5 orderBy [asc (t^.TrackAlbumId), desc (t^.TrackMilliseconds)] pure (t^.TrackName, t^.TrackMilliseconds, t^.TrackAlbumId)
Here, we have to give a name to the value returned by from
(i.e. we use a
monadic bind), since we need the name to extract the column values used for
sorting. This prints
# select t.name, t.milliseconds, t.albumid from tracks t order by t.albumid asc, t.milliseconds desc limit 5; 1. (Value "For Those About To Rock (We Salute You)", Value 343719, Value 1) 2. (Value "Spellbound", Value 270863, Value 1) 3. (Value "Evil Walks", Value 263497, Value 1) 4. (Value "Breaking The Rules", Value 263288, Value 1) 5. (Value "Let's Get It Up", Value 233926, Value 1})
as expected.
We note that in sql, we specify first what we select, then where it is selected from. In Esqueleto, we specify first from where we will select, and then we optionally extract individual components from those things. This is similar to e.g. linq queries in C#.
Part 2: Filtering in select queries
At this point, the tutorial dives into what makes sql powerful: the ability to cut out the data of interest from other stuff.
Section 2.1: Getting unique results with a distinct query
The Esqueleto distinct
function takes a query object and makes it into a
distinct
sql query. The Esqueleto function is intended to be called composed
with the select
function, as in this example.
printResults "select distinct c.city from customers c order by c.city limit 5;" $ select . distinct $ do c <- from (table @Customer) limit 5 orderBy [asc (c^.CustomerCity)] pure (c^.CustomerCity)
# select distinct c.city from customers c order by c.city limit 5; 1. Value Just "Amsterdam" 2. Value Just "Bangalore" 3. Value Just "Berlin" 4. Value Just "Bordeaux" 5. Value Just "Boston"
However, it is worth knowing that the Esqueleto distinct
function really only
sets a flag in the query object, so it could also be called like limit
, by
supplying a noop query as an argument. That would look like
select $ do c <- from (table @Customer) limit 5 distinct (pure ()) orderBy [asc (c^.CustomerCity)] pure (c^.CustomerCity)
and it has the same effect.
Section 2.2: Adding a where clause
The Esqueleto where_
function takes a sql conditional
object8 Technically this is just a sql expression object that is known
to have type Bool
. and limits the query to the items matching that
conditional expression. The conditional expression is formed by using what looks
like regular Haskell operators (==
, &&
, >
, etc.) but appending a period to
their end. These get translated to the corresponding sql operators.
printResults "select t.name from tracks t where t.albumid = 2 and t.milliseconds > 250000 limit 5;" $ select $ do t <- from (table @Track) limit 5 where_ $ t^.TrackAlbumId ==. val (Just (AlbumKey 1)) &&. t^.TrackMilliseconds >. val 250000 pure (t^.TrackName)
# select t.name from tracks t where t.albumid = 2 and t.milliseconds > 250000 limit 5; 1. Value "For Those About To Rock (We Salute You)" 2. Value "Evil Walks" 3. Value "Breaking The Rules" 4. Value "Spellbound"
As was perhaps clear from the above, the trackAlbumId
field in our code is not
a plain Int
, but a type safe AlbumId
. To select based on this, we have to
convert the integer 1 into the right type, by passing it through the AlbumKey
constructor.
Section 2.3: Primitive string matching with the like operator
The Esqueleto like
function takes two sql expression objects that return
strings, and creates a conditional expression from them, where they are compared
with the sql like
operator. This is a place where the type safety of
Esqueleto shines. In raw sql, we can accidentally call like
on an integer.
In Haskell, that would be a compiler error.
printResults "select t.name, t.albumid, t.composer from tracks t where t.composer like '%Smith%' order by t.albumid limit 5;" $ select $ do t <- from (table @Track) limit 5 orderBy [asc (t^.TrackAlbumId)] where_ $ like (t^.TrackComposer) (val (Just "%Smith%")) pure (t^.TrackName, t^.TrackAlbumId, t^.TrackComposer)
Here, we called like
as a regular function. We could also have called it with
infix backticks to make it look like an operator:
t^.TrackComposer `like` val (Just "%Smith%")
However, I don’t like infix backticks and avoid them in my code.9 I think for the most part they make code more difficult to follow. People struggle with precedence rules, and it’s annoying enough to have to deal with them with operators, but with functions also?! Either way, we get the results we expect:
# select t.name, t.albumid, t.composer from tracks t where t.composer like '%Smith%' order by t.albumid limit 5; 1. (Value "Restless and Wild", Value (Just 3), Value (Just "F. Baltes, R.A. Smith-Diesel, S. Kaufman, U. Dirkscneider & W. Hoffman")) 2. (Value "Princess of the Dawn", Value (Just 3), Value (Just "Deaffy & R.A. Smith-Diesel")) 3. (Value "Killing Floor", Value (Just 19), Value (Just "Adrian Smith")) 4. (Value "Machine Men", Value (Just 19), Value (Just "Adrian Smith")) 5. (Value "2 Minutes To Midnight", Value (Just 95), Value (Just "Adrian Smith/Bruce Dickinson"))
Section 2.4: Combining conditional expressions
We combine conditional expressions and ensure precedence between operators using regular Haskell parentheses.
printResults "select i.total from invoices i where i.total > 5 and (i.billingcity = 'Chicago' or i.billingaddress like '% Broadway');" $ select $ do i <- from (table @Invoice) limit 5 where_ $ i^.InvoiceTotal >. val 5 &&. ( i^.InvoiceBillingCity ==. val (Just "Chicago") ||. like (i^.InvoiceBillingAddress) (val (Just "% Broadway")) ) pure (i^.InvoiceTotal)
# select i.total from invoices i where i.total > 5 and (i.billingcity = 'Chicago' or i.billingaddress like '% Broadway'); 1. Value 15.86 2. Value 5.94 3. Value 8.91 4. Value 7.96 5. Value 5.94
Section 2.5: Querying based on set inclusion
The Esqueleto in_
function works as one might expect by now. The second
argument must be a sql expression object that returns a value list. The
easiest way to construct such a list from Haskell literals is through valList
.
printResults "select t.name, t.albumid, t.mediatypeid from tracks t where t.mediatypeid in (2, 3) limit 5;" $ select $ do t <- from (table @Track) limit 5 where_ $ in_ (t^.TrackMediaTypeId) (valList [2, 3]) pure (t^.TrackName, t^.TrackAlbumId, t^.TrackMediaTypeId)
# select name, albumid, mediatypeid from tracks where mediatypeid in (2, 3) limit 5 1. (Value "Balls to the Wall", Value (Just 2), Value 2) 2. (Value "Fast As a Shark", Value (Just 3), Value 2}) 3. (Value "Restless and Wild", Value (Just 3), Value 2}) 4. (Value "Princess of the Dawn", Value (Just 3), Value 2) 5. (Value "Welcome to the Jungle", Value (Just 90), Value 2)
Again, the in_
function could have been called with infix backticks:
t^.TrackMediaTypeId `in_` valList [2, 3]
It saves some parentheses at the cost of making people confused about the order in which code runs.
Section 2.6: Checking for membership in subquery
We just used the Esqueleto in_
function to check for membership in a literal
list using the valList
function. We can also use the results of a subquery as
the list to check for membership in. This query finds all tracks that come from
albums by the artist with id 12.
printResults "select t.name from tracks t where t.albumid in (select a.albumid from albums a where a.artistid = 12) limit 5;" $ select $ do t <- from (table @Track) limit 5 where_ $ in_ (t^.TrackAlbumId) . subList_select $ do a <- from (table @Album) where_ $ a^.AlbumArtistId ==. val (ArtistKey 12) pure (just (a^.AlbumId)) pure (t^.TrackName)
# select t.name from tracks t where t.albumid in (select a.albumid from albums a where a.artistid = 12) limit 5; 1. Value "Black Sabbath" 2. Value "The Wizard" 3. Value "Behind The Wall Of Sleep" 4. Value "N.I.B." 5. Value "Evil Woman"
There’s a lot going on in that query, but the new bit is the subquery:
subList_select $ do a <- from (table @Album) where_ $ a^.AlbumArtistId ==. val 12 pure (just (a^.AlbumId))
The Esqueleto select
function executes a query, but the subList_select
function merely translates code into a sql expression that can be inserted
into another query where a value list is expected. The only new thing about this
subquery is the just
function, which we use to convert the AlbumId
into a
Maybe AlbumId
, in order to make it possible to compare against TrackAlbumId
which is nullable in the database.10 We might be tempted to fmap
the Just
constructor over a^.AlbumId
, but the SqlExpr
type is not a real functor, so
this cannot be done. Thus, the Esqueleto authors have given us a just
function
to perform that specific operation in a sql-aware way.
We compose this subquery with the in_
function as the outer filter. Since the
subList_select
function returns a sql expression object, we could also have
assigned it to a variable separately, like-a-so:
printResults "select t.name from tracks t where t.albumid in (select a.albumid from albums a where a.artistid = 12) limit 5;" $ let albumIdSubquery = subList_select $ do a <- from (table @Album) where_ $ a^.AlbumArtistId ==. val 12 pure (just (a^.AlbumId)) in select $ do t <- from (table @Track) limit 5 where_ $ in_ (t^.TrackAlbumId) albumIdSubquery pure (t^.TrackName)
This is why Esqueleto queries being plain Haskell code is a strength. It gives us more opportunities to organise our code and break it apart into manageable chunks.
Section 2.7: Checking for nullity
The Esqueleto isNothing
and not_ . isNothing
functions can be used to check
for nullness and non-nullness respectively. For example
printResults "select t.name from tracks t where t.composer is null order by t.name limit 5;" $ select $ do t <- from (table @Track) limit 5 orderBy [asc (t^.TrackName)] where_ $ isNothing (t^.TrackComposer) pure (t^.TrackName)
or
printResults "select t.name from tracks t where t.composer is not null order by t.name limit 5;" $ select $ do t <- from (table @Track) limit 5 orderBy [asc (t^.TrackName)] where_ $ not_ (isNothing (t^.TrackComposer)) pure (t^.TrackName)
Part 3: Joins
This is the part I was looking forward to learning. Being able to filter data in one table is very powerful, but being able to cross-reference tables is what really sets sql apart as a powerful data storage tool.
Section 3.1: A regular inner join between two tables
Let’s say we want to know the track title and album name for the shortest tracks that belong to an album in our database.
printResults "select t.name, a.title from tracks t inner join albums a on a.albumid = t.albumid order by t.milliseconds limit 5;" $ select $ do (t :& a) <- from $ innerJoin (table @Track) . on (table @Album) $ \(t :& a) -> t^.TrackAlbumId ==. just (a^.AlbumId) limit 5 orderBy [asc (t^.TrackMilliseconds)] pure (t^.TrackName, a^.AlbumTitle)
# select t.name, a.title from tracks t inner join albums a on a.albumid = t.albumid order by t.milliseconds limit 5; 1. (Value "\201 Uma Partida De Futebol", Value "O Samba Pocon\233") 2. (Value "Now Sports", Value "Body Count") 3. (Value "A Statistic", Value "Body Count") 4. (Value "Oprah", Value "Body Count") 5. (Value "Commercial 1", Value "House of Pain")
At this point, we finally see why we have been giving our tables short
identifiers (t
, c
, i
, a
etc.) in both sql and Haskell code. When we
are juggling multiple tables at once in a join, we can access fields from either
table by choosing the right prefix, and it’s clear where the data is coming from.
The new part in this query is the join. I’ve written it this way:
(t :& a) <- from $ innerJoin (table @Track) . on (table @Album) $ \(t :& a) -> t^.TrackAlbumId ==. just (a^.AlbumId)
The way you’ll find it written by other people is using infix backticks.
(t :& a) <- from $ table @Track `innerJoin` table @Album `on` (\(t :& a) -> t^.TrackAlbumId ==. just (a^.AlbumId))
However such a from clause is written, its innermost expression is going to be a function that produces a sql conditional expression indicating how to join rows from both tables with each other. In our case, the function is
let joinRowsOn (t :& a) = t^.TrackAlbumId ==. just (a^.AlbumId) in -- ...
The :&
data constructor serves practically the same purpuse as a tuple – it
binds together two table objects that are joined in a query, and by
deconstructing that pair we can access each table object individually, as we do
here when asserting equivalence between album ids.11 Note also the just
function, required because TrackAlbumId
is nullable.
This function is paired up with the second table we are joining using the on
function, as in
let joinRowsOn (t :& a) = t^.TrackAlbumId ==. just (a^.AlbumId) hasOnClause = on (table @Album) joinRowsOn in -- ...
Then the innerJoin
function takes the first table and the result of the
on
function and produces an object that can be passed to the Esqueleto from
function.
let joinRowsOn (t :& a) = t^.TrackAlbumId ==. just (a^.AlbumId) hasOnClause = on (table @Album) joinRowsOn completeJoinExpression = innerJoin (table @Track) hasOnClause in select $ do joinedTables <- from completeJoinExpression -- ...
This will get translated to an inner join
sql query.
To get the identifiers of the individual tables out of the joinedTables
variable, we can add a pattern match on it, since it is one of those :&
pairs
that we can deconstruct.
let joinRowsOn (t :& a) = t^.TrackAlbumId ==. just (a^.AlbumId) hasOnClause = on (table @Album) joinRowsOn completeJoinExpression = innerJoin (table @Track) hasOnClause in select $ do joinedTables <- from completeJoinExpression case joinedTables of t :& a -> do -- ...
Going backwards again, we can now inline the pattern match:
let joinRowsOn (t :& a) = t^.TrackAlbumId ==. just (a^.AlbumId) hasOnClause = on (table @Album) joinRowsOn completeJoinExpression = innerJoin (table @Track) hasOnClause in select $ do (t :& a) <- from completeJoinExpression
and then inline the completeJoinExpression
.
let joinRowsOn (t :& a) = t^.TrackAlbumId ==. just (a^.AlbumId) hasOnClause = on (table @Album) joinRowsOn in select $ do (t :& a) <- from $ innerJoin (table @Track) hasOnClause
From here, we continue to inline the hasOnClause
variable.
let joinRowsOn (t :& a) = t^.TrackAlbumId ==. just (a^.AlbumId) in select $ do (t :& a) <- from $ innerJoin (table @Track) . on (table @Album) $ joinRowsOn
We inline the last variable too.
select $ do (t :& a) <- from $ innerJoin (table @Track) . on (table @Album) $ \(t :& a) -> t^.TrackAlbumId ==. just (a^.AlbumId)
Now we’re back at the query we started with. Hopefully that gives a sense of how these joins are constructed.
Section 3.2: Adding another table to the join
We now want to do something simliar to the above, except replace the album name with the artist name in the listing. This requires yet another join, to go from the artist id of the album to the artist name. We could do this without infix backticks, and it might look something like
printResults "select t.name, r.name from tracks t inner join albums l on l.albumid = t.albumid inner join artists r on l.artistid = r.artistid order by t.milliseconds limit 5 offset 30;" $ select $ do (t :& _ :& r) <- from $ let trackAlbum = on (table @Album) $ \(t :& l) -> t^.TrackAlbumId ==. just (l^.AlbumId) albumArtist = on (table @Artist) $ \(_ :& l :& r) -> l^.AlbumArtistId ==. r^.ArtistId in innerJoin (innerJoin (table @Track) trackAlbum) albumArtist limit 5 offset 30 orderBy [asc (t^.TrackMilliseconds)] pure (t^.TrackName, r^.ArtistName)
but at this point I might have to give up my crusade and bow before the backticks. The triple join reads more naturally that way:
(t :& _ :& r) <- from $ table @Track `innerJoin` table @Album `on` (\(t :& l) -> t^.TrackAlbumId ==. just (l^.AlbumId)) `innerJoin` table @Artist `on` (\(_ :& l :& r) -> l^.AlbumArtistId ==. r^.ArtistId)
The other joins work much like this one so we’ll ignore them and march right ahead onto the next part.
Part 4: Data aggregation
The most basic form of aggregation is counting within groups, so that’s what the tutorial goes to and where we’ll pick up.
Section 4.1: Counting within groups
We can guess how to group by something based on what we’ve seen so far. As
for the count
function, it is generic over all numbers so we specify we
want an integer with a type application.
printResults "select t.albumid, count(t.trackid) from tracks t group by t.albumid limit 5;" $ select $ do t <- from $ table @Track limit 5 groupBy (t^.TrackAlbumId) pure (t^.TrackAlbumId, count @Int (t^.TrackId))
We could also order by count to get the most trackful albums.
printResults "select t.albumid, count(t.trackid) from tracks t group by t.albumid order by count(t.trackid) desc limit 5;" $ select $ do t <- from $ table @Track limit 5 groupBy (t^.TrackAlbumId) orderBy [desc (count @Int (t^.TrackId))] pure (t^.TrackAlbumId, count @Int (t^.TrackId))
We can list the names of the first five albums with more than 15 tracks on them,
combining grouping, joining, and a having
filter.
printResults "select a.title from tracks t inner join albums a on t.albumid = a.albumid group by t.albumid having count(t.trackid) > 15 limit 5;" $ select $ do (t :& a) <- from $ table @Track `innerJoin` table @Album `on` (\(t :& a) -> t^.TrackAlbumId ==. just (a^.AlbumId)) limit 5 groupBy (t^.TrackAlbumId) having (count (t^.TrackId) >. val @Int 15) pure (a^.AlbumTitle)
More aggregation, set operations, CTEs, etc.
There are many more things we can do12 I’m a big fan of recursive ctes after having used them to speed up application logic by many orders of magnitude at one of my first jobs., both in sql and Esqueleto, but I’m leaving this off here. After having gone through the above examples, we ought to feel much more confident in being able to both read and write an Esqueleto query when we need it.
Maybe we’ll come back to this in the future and show how more advanced features work. Maybe not.