What even are GADTs?

Posted on November 26, 2017

Note: This post assumes some casual familiarity with Haskell, especially with (plain ol’) algebraic data types. If you’re familiar with the way the data keyword works, you’ll be fine.

a common GHC extension

I decided to write this post because GADTs are pretty popular in Haskell, but I found that most of the literature covers what they are and how they work, but not what they’re actually useful for.

It wasn’t until I saw an example from Haxl that it really hit me how useful GADTs can be in real-world applications. What follows is an attempt to distill that use case into a simplified form and show how it improves on standard algebraic data types.

the goal

Let’s say we’re working on an application that queries some live data source, and we want to:

model queries as data
run those queries in a type-safe manner

Ideally, we’d end up with a function that more or less works like this:

runQuery :: Query a -> IO a

In other words, it might take a Query [Transaction] and return an IO [Transaction], or it could take a Query (Maybe User) and return IO (Maybe User), etc. You get the picture.

a first try

Let’s try writing a phantom type to represent queries:

data Query' a =
    SelectUserById' Int
  | SelectTransactionsForUser' Int

There’s something kinda off about this. We have a type parameter a, but it’s not used in any of the constructors (hence the term “phantom”). Since it doesn’t appear in the type’s definition, there are no restrictions on what it can be.

In other words, the a in Query' a is whatever we claim it is. That seems weird, but it does give us the ability to do this:

selectUser' :: Int -> Query' (Maybe User)
selectUser' = SelectUserById'

selectTransactions' :: Int -> Query' [Transaction]
selectTransactions' = SelectTransactionsForUser'

With these helper functions, we technically have everything we need in order to write runQuery, but nothing’s stopping someone from writing up some garbage like this:

wtfQuery :: Query' (Either Unicorn Void)
wtfQuery = SelectTransactionsForUser' 666

that kinda sucks

With the above Query' a type, any relation between the type parameter a and the actual result is purely a mirage. Why even have types at all if they’re just smoke and mirrors?

use a dang GADT

This is exactly the situation that GADTs (Generalized Algebraic Data Types) are made for.

-- requires {-# LANGUAGE GADTs #-}

data Query a where ①
  SelectUserById            :: Int  -> Query User ②
  SelectTransactionsForUser :: User -> Query [Transaction]

Let’s start by comparing Query (the GADT version) to the bogus Query' definition from the beginning of this post. First of all, there’s the where keyword (①), which is GADT-specific syntax that replaces the = in a normal ADT definition.

Second, and much more importantly, the constructors are fully annotated with their types. If you look at ②, you can see that SelectUserById can only ever return a Query User.

The only other constructor returns Query [Transaction], so the only possible forms of Query a are:

Query User
Query [Transaction]

We don’t have to worry about absurd cases like Query (Either Unicorn Void) because there’s simply no way to construct them. We don’t need to play weird “hide the constructor” games, because these constructors are thoroughly safe.

tying it together

With the new Query type defined, we can now use it to write the query function:

query :: Query a -> IO a
query (SelectUserById uid)             = getUser uid ①
query (SelectTransactionsForUser user) = getTransactions user ②

The type Query a -> IO a looks very general, but it really expands to just two potential cases:

Query User -> IO User (①)
Query [Transaction] -> IO [Transaction] (②)

If you try to return anything but IO User from ①, you’ll get a type error. Same if you try to return anything but IO [Transaction] from ②.

What’s interesting is that those restrictions are ultimately determined by the Query data type. Functions are great, but anything we can represent in pure data is even better. When you use this pattern, it puts that much more emphasis on the data model, and lets that drive your function implementations.

Note: the complete source used in this post is available here.