Kindness for Mean Girls
A primer on the cruel, tacit laws of type-level programming in Haskell
Haskell programs are constructed at the junction of two worlds: one of types, and the other of values. Values are operational run-time entities that are denoted syntactically by terms in the code. To this end, types classify terms. They provide a powerful abstraction to organize data, determine how it should be stored in memory, passed through various operations, and more. On a superficial level, the difference between these worlds is simple: types refer to datatypes, which are either built-in (such as Integer
and String
), or user-declared, for example:
data SpringFlingQueen = Cady | Regina
By comparison, values are denoted by terms those types categorize (such as 1
and "abc"
described by the standard library’s Integer
and String
types, or Cady
and Regina
in the case of our user-declared type, SpringFlingQueen
). Thus, the term 1
denotes a value that exists in memory only when the program is run.
In addition to these two worlds, a third, darker, and more elusive underworld of kinds also lurks in the shadows of Haskell programs. Haskell kinds can be as unpredictable and insidious as the soulless alpha-female of an elite high school clique. While daunting, understanding the coaction between these three worlds lays the foundation for type-level programming, a topic worth learning as it strengthens overall Haskell intuition.
Kindness is a virtue
Much like counter-intuitive adolescent social dynamics, mastery of Haskell bears a steep learning curve because of its underlying complexity. It is difficult to wrap one’s head around the language’s typeclass hierarchies, category-theoretic roots, or the intricacies of compiler behavior. However, an intimate understanding of the type system lets programmers look beyond the chaos of code and apply a recognizable template by which to understand it.
Kindness takes patience
This topic is also difficult to master because it is both vast and subtle. Ideas underlying type-level programming are scattered across many disconnected resources focused on several corners of the Haskell ecosystem. Authoritative references on the topic (such as relevant library documentation) tend to assume familiarity with domain-specific terminology that may be unknown to non-experts. Due to the paucity of approachable materials on the subject, I’ve attempted to break down type-level programming into explanations of its constituent parts. While I don’t go into extensive depth on any individual section below, I hope to provide a valuable starting point with which readers can explore ideas in greater detail.
Types vs. Values
As mentioned above, Haskell programs are divided into two worlds: the type-level and the value-level. The type-level refers to the part of a program analyzed by the static type-checking phase during compilation. Since every expression is assigned a type, code is checked against Haskell’s type system to ensure it fits GHC’s specified correctness criteria.
If the program is well-typed, the program will compile. However, this type-information evaporates once the compilation process terminates, leaving only values behind at run-time. In this way, types and values can be better distinguished by the phasing distinction, with types being compile-time entities and values being run-time entities.
For example, if a function takes a String
, the type-checker doesn’t care if the String
has a value denoted by "abc"
or "123"
or "get in loser"
—so long as it’s a String
. Type information gets discarded at run-time, leaving only the string’s value (ex.,"get in loser"
). In some cases, however, we want the type system to care about what those values are and distinguish between them. The DataKinds
extension, which we’ll explore below, lets us do that by allowing us to shove more information about our program into the type system. This lets us add meaning to the value of a String
at the type-level, moving them from their usual, strictly run-time existence, into the compile-time phase. The technique allows us to use more information statically in the logical development and abstraction of the program’s behavior. We’re also able to add consequences that can halt compilation if the specific value of String
doesn’t conform to the rules we’ve specified, or define classes over them.
Distinguishing types and values in GHCi
Let’s examine our aforementioned SpringFlingQueen
datatype in GHCi. When we query the type of one of its constructors using :t
(a handy shorthand for :type
), we see that it is indeed of type SpringFlingQueen
:
λ data SpringFlingQueen = Cady | Regina
λ :t Cady
Cady :: SpringFlingQueen
Now consider literals 1
, 2
, 3
. These are values of a type that parameterizes the Num
class:
Typing :t 1
into GHCi gives us Num p => p
, indicating that a literal such as 1
can be of any polymorphic type p
as long as the Num
type class has instances for that type (for example, Integer
and Float
both have Num
instances and therefore are valid types that p
could be instantiated with). This is possible due to type inference.
Kinds
Just like types classify terms, kinds classify types, and therefore are frequently described as “types of types”, or referred to be “one level up.” The “star” syntax (i.e., *
) denotes kinds. It is defiantly used in this post despite recent syntactic changes. A prerequisite for understanding the kind system necessitates comprehending the differences between three pairs of ideas:
-
Data constructors vs. type constructors: data constructors create values, whereas type constructors create types. Type constructors take one or more type arguments and produce a datatype when enough arguments are provided. This means that through currying, a type constructor can be partially applied. For example, the list type constructor
[]
may take a single type argument (ex.String
) to denote the elements of the list (i.e.,[String]
).[String]
is simply syntactic sugar for[] String
, where the type[]
is applied toString
. -
Full vs. partial application A partially-applied type, like a partially-applied function, is one that is missing some of its data constructors. For instance, consider the list type
[]
. The kind of[]
is* -> *
. A fully-applied list has data constructors, such as[Int]
, and its kind is*
. -
Inhabited types vs. uninhabited types: An inhabitant of a type is precisely a value of that type. This means that inhabited types refer to the types that contain concrete values, such as the value denoted by the term
1 :: Int
. This suggests that the typeInt
is inhabited by value denoted by1
. By contrast, uninhabited types refer to type constructors to which no values are abstracted. For instance,Void
is uninhabited because it has no data constructors, and thus can not be used to construct a valid term. WhileVoid
may seem pointless at first, it can be a useful way to represent that a container is empty (ex.,[Void]
, which is inhabited by the term[]
and the value this term denotes).
How do these ideas relate to the kind system? Well, all fully-applied runtime values are of kind *
and they are inhabited. You can confirm this by typing :k Int
and :k String
in GHCi. However, this doesn’t work the other way around—just because a type is inhabited, doesn’t necessarily mean it’s fully-applied (ex., []
is not fully-applied but it is inhabited, and its kind signature is * -> *
). Conversely, all partially-applied types are uninhabited (since they don’t correspond to a value), but not all uninhabited types are partially applied. Void
for instance is not partially-applied, but it is uninhabited.
To get the hang of this idea, consider the following examples:
Type | Inhabited? | Fully-applied? | Kind |
---|---|---|---|
Int |
Yes | Yes | * |
String |
Yes | Yes | * |
[] |
Yes | No | * -> * |
[Int] |
Yes | Yes | * |
(,) |
Yes | No | * -> * -> * |
Either |
Yes | No | * -> * -> * |
Void |
No | N/A | * |
Either Void Void |
Yes | Yes | * |
Kind signatures can be specified manually in GHCi using the XKindSignatures
extension. Try extending the above table by investigating the kind signatures of various types.
Higher-kinded types
Just like there are higher-order functions (functions that take other functions as arguments), there are higher-kinded types (types constructors that take other type constructors as arguments). Type constructors such as []
are a first-class type, but also a higher-kinded type, given they take another type constructor to be reified:
Let’s consider Functor
:
λ :k Functor
Functor :: (* -> *) -> Constraint
We see that Functor
takes a type constructor * -> *
and returns a Constraint
. Let’s use :info
to examine Functor
a bit more closely:
λ :info Functor
class Functor (f :: * -> *) where
fmap :: (a -> b) -> f a -> f b
(<$) :: a -> f b -> f a
{-# MINIMAL fmap #-}
-- Defined in ‘GHC.Base’
instance Functor (Either a) -- Defined in ‘Data.Either’
instance Functor ((,,,) a b c) -- Defined in ‘Data.Orphans’
instance Functor ((,,) a b) -- Defined in ‘Data.Orphans’
instance Functor [] -- Defined in ‘GHC.Base’
instance Functor Maybe -- Defined in ‘GHC.Base’
instance Functor IO -- Defined in ‘GHC.Base’
instance Functor ((->) r) -- Defined in ‘GHC.Base’
instance Functor ((,) a) -- Defined in ‘GHC.Base’
We see that Functor
allows type constructors like Maybe
and []
to have Functor
instances, but not Int
or String
. This is because Maybe
and []
are * -> *
, while Int
has kind *
. Similarly, Either
has * -> * -> *
, which is why its instance above is parameterized with a type parameter denoting something of kind * -> *
: Either a
.
Constraint kinds
We got a glimpse of constraint kinds above in the Functor
example. Constraint kinds are another flavor of kinds, denoting typeclass constraints left of the =>
arrow. This allows the programmer to extend the power of constraints to more types, using the -XConstraintKinds
extension.
Kind polymorphism
Parametric polymorphism is everywhere in Haskell; it allows us to abstract at the type level. Using type variables such as a
that can be of any type instead of concrete types such as Int
, we’re able to define more generic data and functions that offer greater reusability across our code. Kind polymorphism works similarly, but at the kind level. Using the PolyKinds
extension, we’re able to define functions and types that work over a variety of kinds, as opposed to those whose implementations are tied to specific kind signatures. The classic example of Typeable
is widely used to demonstrate why this is useful:
class Typeable (t :: *) where
typeOf :: t -> TypeRep
class Typeable1 (t :: * -> *) where
typeOf1 :: t a -> TypeRep
class Typeable2 (t :: * -> * -> *) where
typeOf2 :: t a b -> TypeRep
The classes above are defined for multiple arities, with each class specifying a kind signature that caters to a specific arity: *
, * -> *
or * -> * -> *
. The type variables a
and b
are both of kind *
by default, which is why the Typeable1
has one type variable, and Typeable2
has two. Accommodating for arguments with varying kinds is cool, but this implementation necessitates having to type out separate definitions over and over again. It also obliges the programmer to consult the provided kind signatures to determine the appropriate type we can use with Typeable
vs. Typeable1
vs. Typeable2
to ensure well-kindedness:
instance Typeable Int where typeOf _ = TypeRep
instance Typeable1 [] where typeOf1 _ = TypeRep
instance Typeable2 Either where typeOf2 _ = TypeRep
Basic programming intuition dictates that not only is it unsustainable and tedious to have separate implementations for every kind, but it also increases the surface area of our code, thereby increasing the likelihood of errors. What if we could have one definition for all kinds? The PolyKinds
extension lets us do exactly that. This lets us unify the three classes into a single representation:
{-# LANGUAGE PolyKinds #-}
data Proxy t = Proxy
class Typeable (t :: k) where
typeOf :: Proxy t -> TypeRep
We now have the flexibility to pass any type-level entity to typeOf
, not just one that is kind *
like Int
or * -> *
like []
or * -> * -> *
like Either
. This is because Proxy
has kind forall. k -> *
. This allows us to instantiate a single Typeable
class for a variety of kinds:
instance Typeable Int where typeOf _ = TypeRep
instance Typeable [] where typeOf _ = TypeRep
instance Typeable Either where typeOf _ = TypeRep
DataKinds and type-level literals
The DataKinds
language extension lets us reason about data at the type-level by providing type-level literals. Normally, using :k
to query the kind of literals such as 1
or "foo"
doesn’t work. That’s because these entities are at the value level:
λ :k 1
<interactive>:1:1: error:
Illegal type: ‘1’ Perhaps you intended to use DataKinds
As the error suggests, to be able to inspect the kind of a value, we need to enable the DataKinds
extension. Recall, 1
exists at the value-level. Haskell’s type system doesn’t care about values at compile-time, in that they are run-time entities only. However, with XDataKinds
enabled, investigating the kinds of literals (such as 1
or "foo"
) gives us the following:
λ :set -XDataKinds
λ :k 1
1 :: GHC.Types.Nat
λ :k "foo"
"foo" :: GHC.Types.Symbol
The integer 1
is of kind Nat
, and the string "foo"
is of kind Symbol
. Nat
and Symbol
are constructors defined by the GHC.TypeLits
module (explored below, and are therefore out of scope unless we import it.
The DataKinds
extension also enables datatype promotion. When we define datatypes, we create custom types. With DataKinds
turned on, that type becomes a custom kind, and its data constructors become types, thus getting “promoted” through the ranks of the hierarchy of values -> types -> kinds.
Returning to our beloved SpringFlingQueen
example, let’s consider what normally happens when we create a simple type constructor in the REPL:
λ data SpringFlingQueen = Cady | Regina
λ :t Cady
Cady :: SpringFlingQueen
λ :k SpringFlingQueen
SpringFlingQueen :: *
λ :k Cady
<interactive>:1:1: error:
Not in scope: type constructor or class ‘Cady’
A data constructor of that name is in scope; did you mean DataKinds?
We can inspect the type of its data constructors, and the kind of its type constructor. However, when we try to query for the kind of Cady
, we get an error. However, when we turn on XDataKinds
, we see the following:
λ :set -XDataKinds
λ :t Cady
Cady :: SpringFlingQueen
λ :k Cady
<interactive>:1:1: warning: [-Wunticked-promoted-constructors]
Unticked promoted constructor: ‘Cady’.
Use ‘'Cady’ instead of ‘Cady’.
Cady :: SpringFlingQueen
There are two things to note in the experiment with XDataKinds
turned on:
-
We can query for the kind of
Cady
, since it is now a type-level value andSpringFlingQueen
is now a kind-level type. -
It compiles! However, if you have the
-Wunticked-promoted-constructors
warning on like I do, that will get triggered. While ticks are not required (your code compiles without them), you’ll typically want to prefix your data constructors with a tick to disambiguate them from regular, unpromoted data constructors:
data SpringFlingQueen = 'Cady | 'Regina
Here’s a quick summary of what you can and can’t promote:
Things that get promoted | Things that don’t get promoted |
---|---|
<li>data types</li><li>newtypes</li><li>GADTs</li> | <li>type synonyms</li><li>type/data families</li><li>higher-kinded types (such as data Fix f = In (f (Fix f)) </li><li>datatypes whose kinds involve promoted types(such as Vec :: * -> Nat -> * )</li><li>data constructors that are kind polymorphic, involve constraints, mention type or data families</li> |
Relationship between values, terms, types, and kinds
Drawing the distinction between these worlds can be confusing. While each category deals in different entities, they all use the same literal syntax. Recall that values only exist at run-time, and are denoted by terms which appear in the syntax (1
). I made the following chart to demonstrate the relationship between a few simple values. Typing the leftmost entity into GHCi using :t
if it’s a value and :k
if it’s a type yields the following results:
value-level (run-time) constructs denoted by terms | type-level (compile-time) constructs | kind-level constructs |
---|---|---|
1 |
Num p => p |
Nat |
1 :: Int |
Int |
* |
True |
Bool |
N/A |
N/A | Bool |
* |
N/A | 'True |
Bool |
Just 1 |
Maybe |
Maybe Nat |
Just 1 |
Maybe |
N/A |
Just True |
Maybe Bool |
Maybe Bool |
N/A | Just 'True |
Maybe Bool |
N/A | 'Just True |
Maybe Bool |
N/A | 'Just 'True |
Maybe Bool |
N/A | [] |
* -> * |
N/A | Either |
* -> * -> * |
N/A | Either Int |
* -> * |
N/A | Either Int Int |
* |
The difference between “type” and “type-level”
There’s a subtle but essential difference between something being a type, vs. being at the type-level. Although datatype promotion allows us to represent constructors that otherwise denote values at the type-level (such as True
being promoted to 'True'
), this does not make them types. For something to be a type, it must have the ability to be inhabited, meaning that a value of its type exists. Bool
, for example, is a type because it can have inhabitants (i.e., True
and False
), whereas 'True
is not a type because it can not have inhabitants, despite having compile-time significance.
Type Families
Functions are to values what type families are to types. Enabled with the -XTypeFamilies
flag in GHCi or the {-# LANGUAGE TypeFamilies #-}
option in your program, they provide a way to specify how to map one type to another. For example, string concatenation would normally be accomplished through the ++
or <>
functions, but those operators are specifically for values. To concatenate type-level strings, GHC.TypeLits
defines a type family called AppendSymbol
:
type family AppendSymbol (m :: Symbol) (n :: Symbol) :: Symbol
AppendSymbol
works similarly to <>
, but over types instead of values:
λ :kind! AppendSymbol "so " "fetch!"
= "so fetch!"
The type system can’t run regular Haskell functions. It can, however, evaluate type families. Since type families map types to other types, they behave a lot like functions. For example, the type family +
(of kind Nat -> Nat -> Nat
), when applied to the type-level naturals 5
and 2
, evaluates to the Nat
values 7
just as the value-level function (+)
(of type Num p => p -> p -> p
) would behave when applied to Int
values.
There are two types of type families: closed and open.
Closed type families
Closed type families are defined all in one place. They’re closed in the sense that they can not be extended: you can’t add more cases to them somewhere else. These are defined using the where
clause and a list of their cases, also known as “type instances”:
type family Wednesday a where
Pink Int = [Int]
Pink Char = (String, Int)
Open type families
Open type families are defined without the where
clause followed by a list of instances, and can appear either at the top-level or in the body of a typeclass. Unlike closed type families, which are a black-box, open type families can be extended.
Here is an example of a type family declared at the top-level:
type family Skirts a
Below is another example of a type family declared at the top-level, this time with instances:
type family UnsignedVersionOf t
type instance UnsignedVersionOf Int = Word
type instance UnsignedVersionOf Int32 = Word32
type instance UnsignedVersionOf Int64 = Word64
Notice that type families are parametric, in that they contain a polymorphic type variable t
which can be instantiated with different parameters, allowing specialized representations. In the above example, Int
, Int32
and Int64
all result in different assignments. This is powerful if you want to yield different results depending on the type.
When they appear in the body of a typeclass, they’re also called associated types. For example:
class IronicTees a where
type Pink a
In this case, we don’t have to say type family
, just type
, because any type associated with a class makes an open type family. Another example of an associated type is Rep
from Generic
or Rep1
from Generic1
.
Note, this looks like filling out cases of a function, and that’s because it is. We’re declaring that the UnsignedVersionOf
type family maps types to other types, and that this mapping occurs at compile-time. This is how functions work on the type-level.
Associated types
Similarly, Rep
is an associated type of the Generic
class that maps a type to the generic information about that type available to the compiler. It is defined by the GHC.Generics
class as:
type Rep a :: Type -> Type
When you invoke Rep Int
in the type-level, GHC does some computation and gives you back some type describing Int#
, an unboxed type. This type is usually based on one of the generic types denoting the shape of a data type, (for example, M1
, K
, etc.)
Unboxed types have the kind #
. Usually, values are contained in a “box”, in that they are represented by a pointer to a heap-allocated object. This way of boxing values enables lazy evaluation, but can result in slow compile times. Conversely, unboxed values get at the raw values directly.
GHC.TypeLits
Importing GHC.TypeLits
exposes the names of kinds Nat
and Symbol
(the ones we saw earlier when surveying the kinds for integer and string values). The Num
class defines operations (such as +
) that can be used on types that instantiate it (such as Integer
). Functions like +
, however, can only be used to perform computations over values (like 6+2
which yields 8
). If we wanted to perform these computations at the type-level, we’d need type families. GHC.TypeLits
provides the type families that can be used over type-level literals. For example, if we wanted binary addition via +
, we have (type family (m :: Nat) + (n :: Nat) :: Nat
), allowing us to perform these computations over type-level values:
This module provides us with constructors and operations that add type information to the literals that otherwise hold no meaning at compile-time.
If we want the run-time value of a type-level number like 5
, we can use natVal :: forall n proxy. KnownNat n => proxy n -> Integer
. To do this, we will also need Data.Proxy
:
Proxy a
allows us to use a value whose type us parameterized by some arbitrary thing of arbitrary kind. For example, Proxy a
could be Proxy Int
—Int :: *
and so Proxy :: * -> *
. Proxy Show
is another valid example since Show :: * -> Constraint
, so Proxy :: (* -> Constraint -> *)
. Proxy :: forall k (a :: k) . a -> *
meaning that k
and a
are instantiated to the relevant types and kinds.
If we import Data.Proxy
and type the following into the REPL, we’ll see that it returns the value, but it is able to perform the computation at the type-level.
> natVal (Proxy :: Proxy 5)
5
Proxy
doesn’t hold any data, and it has one constructor (also known as Proxy
). To drive home the relationship between value and type level paradigms, consider a simple addition operation. Addition using the type operator +
provided by Prelude
, (+) :: a -> a -> a
, we know it is a function that works on values. However, what if we wanted to perform this not over values but over types? GHC.TypeLits
gives us a series of type families that can be used in place of these.. In this case, we want to invoke +
, not the function from Prelude but the type-family from TypeLits
: type family (m :: Nat) + (n :: Nat) :: Nat
which provides addition of type-level natural numbers. For example,
> natVal (Proxy :: Proxy (5+5))
10
This allows us to do 5+5
at the type-level, and return a value, 10
.
Distinguishing between DataKinds and GHC.TypeLits
At a high level, one is a language extension while the other is a module. Modules never affect the meaning of syntax, only the availability of symbols and instances. Conversely, language extensions can add syntax and change semantics, but usually don’t affect the available symbols (with the exception of NoImplicitPrelude
).
GHC.TypeLits
gives us the identifiers associated with GHC’s type literal support: the Nat
and Symbol
kinds, the natVal
and symbolVal
functions, operations such as +
, -
, *
, etc. for forth type families.
-XDataKinds
gives us the language features we need to use these: the syntactic support for 1
,2
,3
literals for Nat
values and "pink"
literals for Symbols, as well as the 'Ticked
constructor syntax for user-defined kinds.
You rarely ever use TypeLits
without DataKinds
, but you could use DataKinds
without TypeLits
(if you didn’t need type-level naturals or symbols). For example, here’s what we can get away with without importing GHC.TypeLits
, assuming we have DataKinds
and PolyKinds
on:
class Pink a where
pink :: proxy a -> String -- like Show, but doesn’t need to take a value, just a proxy to sort out the type
instance Pink 1 where
pink _ = "1"
Let’s try the same, but for KnownNat
:
class Pink a where
pink :: proxy a -> String -- like Show, but doesn’t need to take a value, just a proxy to sort out the type
instance KnownNat n => Pink n where
pink n = show $ natVal n
In GHCi when you turn on the language extension XDataKinds
and look at its kind:
> :k "a"
"a" :: ghc-prim-0.5.3:GHC.Types.Symbol
The hype of dependently-typed languages
While this post won’t make you the Regina George of advanced type-level programming techniques overnight, I hope it provides readers with enough threads to pull on as they move in that direction. Moreover, it provides sufficient context to start watching a cool trend: dependently-typed languages are growing in popularity. While type systems let us validate programs with respect to a fixed set of criteria, dependent types are much more flexible because they extend the guarantees expected of regular type systems to other aspects of program behavior. Users can load up their type systems with more information about values, thereby introducing more constraints than the status quo which can provide stronger safety guarantees.
References
Special shout out to @patrickt and @robrix for meticulous edits, for being bottomless pits of Haskell knowledge, and a pleasure to work with daily.
from Hacker News https://ift.tt/2ZcxIqD
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.