# Typeable - A long journey to type-safe dynamic type representation

Haskell has grown and changed amazingly in recent years, with great contribution from community and industrial adoption. There are many new ideas and changes in new GHC versions, as well as breaking changes. The most influent library is `base`

package, which contains the Standard Haskell Prelude, and its support libraries, and a large collection of useful libraries ranging from data structures to parsing combinators and debugging utilities. (hackage)

Typeable is a module in `base`

package. Maybe you already know, the `Typeable`

class is used to create runtime type information for arbitrary types. Explicitly, `Typeable`

is a unique Fingerprint (or hash) of type coupled with a coercion. This module has big influence in `Data.Dynamic`

, `Data.Data`

module as well as generic programming libraries. This post gives you a short story about the innovation of `Typeable`

along with GHC extensions.

## A long time ago...

`Typeable`

was originally a part of `Dynamic`

, which is the solution for Dynamic typing in Statically typed language. In real world, there are programming situations we need to deal with data whose type cannot be determined at compile time, for example, fetch data from API, parse JSON string, query from database...

Since mid-1960s, early programming languages (Algol-68, Pascal, Simula-67) used similar technique of `Dynamic`

. Until 1989, `Dynamic`

and `typecase`

term was introduced in "Dynamic Typing in a Statically Typed Language" paper, with lambda calculus notation.

In Haskell, from what I saw in git.haskell.org, the first commit was in June 28, 2001 by Simon Marlow. However, I'm not sure this proof is correct. Maybe the module was implemented before that. Unfortunately I can't find another reliable source. In this version, `Typeable`

and `Dynamic`

were in same module. In July 24 2003, `Typeable`

was moved to `Data.Typeable`

package.

In the early stage of Haskell, `TypeRep`

was constructed of `TyCon`

and child `TypeRep`

s. `TyCon`

hold `Key`

as an unique `HashTable`

, and a `String`

to determine type name at user level.

`data TypeRep = TypeRep !Key TyCon [TypeRep] `

data TyCon = TyCon !Key String

mkTyCon :: String -> TyCon

boolTc :: TyCon

boolTc = mkTyCon "Bool"

class Typeable a where

typeOf :: a -> TypeRep

> show $ typeOf (True :: Bool) -- Bool

FFI C macros can help generating instances, with some exceptions, such as `Tuple`

. `undefined`

was used as an alternative for Proxy. `unsafeCoerce`

was used anywhere for coercing types.

INSTANCE_TYPEABLE0(Bool,boolTc,"Bool")

tup2Tc :: TyCon

tup2Tc = mkTyCon ","

instance (Typeable a, Typeable b) => Typeable (a,b) where

typeOf tu = mkAppTy tup2Tc [typeOf ((undefined :: (a,b) -> a) tu),

typeOf ((undefined :: (a,b) -> b) tu)]

tup3Tc :: TyCon

tup3Tc = mkTyCon ",,"

instance ( Typeable a , Typeable b , Typeable c) => Typeable (a,b,c) where

...

tup4Tc :: TyCon

tup4Tc = mkTyCon ",,," -- too much boilerplate

## From Key to Fingerprint

`TyCon`

was stored in unsafe IORef cache, which internally kept a `HashTable`

mapping `String`

s to `Int`

s. So that each `TyCon`

could be given a unique Int for fast comparison, the `String`

has to be unique across all types in the program. However, derived instances of typeable used the qualified original name (e.g. `GHC.Types.Int`

) which is not necessarily unique, is non-portable, and exposes implementation details.

`newtype Key = Key Int deriving( Eq )`

data KeyPr = KeyPr !Key !Key deriving( Eq )

data Cache = Cache { next_key :: !(IORef Key),

tc_tbl :: !(HT.HashTable String Key),

ap_tbl :: !(HT.HashTable KeyPr Key)

}

In July 2011, Simon Marlow replaced `Key`

with Fingerprint - , and moved `TyCon`

as well as `TypeRep`

internally. The fields of TyCon are not exposed via the public API. Together the three fields `tyConPackage`

, `tyConModule`

and `tyConName`

uniquely identify a `TyCon`

, and the Fingerprint is a hash of the concatenation of these three Strings (so no more internal cache to map strings to unique IDs). This implementation is also easier for GHC to generate derived instances

`data TyCon = TyCon {`

tyConHash :: {-# UNPACK #-} !Fingerprint,

tyConPackage :: String,

tyConModule :: String,

tyConName :: String

}

data TypeRep = TypeRep {-# UNPACK #-} !Fingerprint TyCon [TypeRep]

## Merge all the things

With the rise of Type-level programming, GHC compiler is more and more better. Many boilerplate codes were refactored and optimized in safe and well-performed. `Typeable`

also took advantage of new GHC features:

### PolyKinds

- PolyKinds extension has been implemented since GHC 7.4.1 (February 2012). Until GHC 7.6.1 (September 2012) release, Polymorphic kinds and data promotion were fully implemented and supported features.
- November 2012, José Pedro Magalhães @dreixel adapted poly-kinded into Typeable, then released in GHC 7.8.1 (April 2014) - base-4.7.0.0

Before Poly-kinds, there are many duplicated code to support variant for N-ary type constructors. To represent type constructors with kind
`* -> *`

, such as `Maybe`

or `[]`

, we could create a separate type class, called `Typeable1`

. However, this approach is ugly and inflexible. What about tuples? Do we need a `Typeable2`

, `Typeable3`

... for them?

`class Typeable (t :: *) where`

typeOf :: t -> TypeRep

class Typeable1 (t :: * -> *) where

typeOf1 :: t a -> TypeRep

...

-- The maximum number of parameters was 7

class Typeable7 (t :: * -> * -> * -> * -> * -> * -> * -> *) where

typeOf7 :: t a b c d e f g -> TypeRep

With kind polymorphism we can write:

`class Typeable a where`

typeRep :: proxy a -> TypeRep

typeOf :: forall a. Typeable a => a -> TypeRep

typeOf _ = typeRep (Proxy :: Proxy a)

We have generalized in two ways here.

- First,
`Typeable`

gets`a`

polymorphic kind:`forall a. a -> Constraint`

, so that it can be used for types of any kind. - Second, we need some way to generalize the argument of
`typeRep`

, which we have done via a poly-kinded data type Proxy:

`data Proxy a = Proxy``

The idea is that, say `typeRep (Proxy :: Proxy Int)`

will return the type representation for `Int`

, while `typeRep (Proxy :: Proxy Maybe)`

will do the same for `Maybe`

. The proxy argument carries no information—the type has only one, nullary constructor and is only present so that the programmer can express the type at which to invoke `typeRep`

. Because there are no constraints on the kind of `a`

, it is safe to assign `Proxy`

the polymorphic kind `forall a. a -> *`

.

### TypeLevelReasoning

Richard A. Eisenberg also proposed and implemented TypeLevelReasoning. New module `Data.Type.Equality`

was born. The idea is defines the type of equality witnesses for two types of any kind `k`

:

`data a :~: b where`

Refl :: a :~: a

Pattern matching on this generalized algebraic datatype (GADT) allows GHC to discover the equality between two types. If `a :=: b`

is inhabited by some terminating value, then the type `a`

is the same as the type `b`

. To use this equality in practice, pattern-match on the `a :=: b`

to get out the `Refl`

constructor:

`coerce :: (a :~: b) -> a -> b`

coerce Refl x = x

We pattern-match on `Refl`

. This exposes the fact that `a`

and `b`

must be the same. Then, GHC happily uses `x`

of type `a`

in a context expecting something of type `b`

.

With this, we can remove unsafe hack when convert Proxy:

`-- no equality`

cast :: (Typeable a, Typeable b) => a -> Maybe b

cast x = r

where

r = if typeOf x == typeOf (fromJust r)

then Just $ unsafeCoerce x

else Nothing

-- with equality

eqT :: forall a b. (Typeable a, Typeable b) => Maybe (a :~: b)

eqT = if typeRep (Proxy :: Proxy a) == typeRep (Proxy :: Proxy b)

then Just $ unsafeCoerce Refl

else Nothing

cast :: forall a b. (Typeable a, Typeable b) => a -> Maybe b

cast x

| Just Refl <- ta `eqT` tb = Just x

| otherwise = Nothing

where

ta = typeRep :: TypeRep a

tb = typeRep :: TypeRep b

### And more...

This release also added is the `AutoDeriveTypeable`

language extension, which will automatically derive `Typeable`

for all types and classes declared in that module.

The update could caused breaking changes, although backwards compatibility interfaces were kept. José Pedro Magalhães also backed up old codes into `Data.OldTypeable`

module, which was removed later in GHC 7.10.1 - base-4.8.0.0

- Since GHC 8.2, GHC has supported type-indexed type representations.
`Data.Typeable`

provides type representations which are qualified over this index. To keep`Typeable`

backwards compatibility, the interface of this module is similar to old releases.`Type.Reflection`

module is available for the type-indexed interface.

## Safer and more expressive type representations

With the motivation of distribution programming, mainly focusing on Cloud Haskell, new challenges occurred: Typeable wasn't safe enough.

### Open world's challenge

Cloud Haskell is an Erlang-style library for programming a distributed system in Haskell. It is based on the message-passing model of Erlang, but with additional advantages that stem from Haskell’s purity, types, and monads. If you are from Scala land, Cloud Haskell has similar features with Akka, which use Actor model for communicating across processes. However, Cloud Haskell also supports thread in same machine. You can conveniently do things the old way.

Because the information that is transmitted between machines by network must be in bit, the data must be `Serializable`

. This ensures two properties: The data can be encoded and decoded to binary fore and back, and can produce a `TypeRep`

that captures the item’s type.

`class (Typeable a, Binary a) => Serializable a`

class Binary a where

put :: t -> PutM ()

get :: Get t

class Typeable a where

typeOf :: a -> TypeRep

encode :: a -> ByteString

decode :: ByteString -> Maybe (a, ByteString)

To guarantee that the code is then applied to appropriately typed values, the receiving node must perform a dynamic type test. That way, even if the code pointer was corrupted in transit, by accident or malice, the receiving node will be type-sound. You can think of it like this: a decoder is simply a parser for the bits in the `ByteString`

, so a decoder for `Int`

can fail to parse a full `Int`

(returning `Nothing`

), but it can't return a non-Int. A simple way to do this is to serialize a code pointer as a key into a static pointer table containing Dynamic values (RemoteTable). When receiving code pointer with key `k`

, the recipient can lookup up entry `k`

in the table, find a `Dynamic`

, and check that it has the expected type.

To do that, `Typeable`

needed to evolve further.

`TypeRep`

was not indexed, so there was no connection between the `TypeRep`

stored in a `Dynamic`

and the corresponding value. Indeed, accessing the `typeRep`

required a proxy argument to specify the type that should be represented.

`data TypeRep `

data Typeable a where

typeRep :: proxy a -> TypeRep

data Dynamic where

Dyn :: TypeRep -> a -> Dynamic

data Proxy a = Proxy

Because there is no connection between types and their representations, this implementation of `Dynamic`

requires `unsafeCoerce`

. For example, here is the old `fromDynamic`

:

`fromDynamic :: forall d. Typeable d => Dynamic -> Maybe d `

fromDynamic (Dyn trx x)

| typeRep (Proxy :: Proxy d) == trx = Just (unsafeCoerce x)

| otherwise = Nothing

Client code is un-trusted. The current design (GHC 7.8) couldn't hide all such uses of `unsafeCoerce`

from the user. If they could write a `Typeable`

instance, they must use `unsafeCoerce`

, and defeat type safety. So only GHC is allowed write `Typeable`

instances.

### Kind-indexed GADTs

The key to our approach is our type-indexed type representation `TypeRep`

. But what is a type-indexed type representation? That is, the index in a type-indexed type representation is itself the represented type. For example:

- The representation of
`Int`

is the value of type`TypeRep Int`

.

– The representation of `Bool`

is the value of type `TypeRep Bool`

.

- And so on.

The idea of kind-indexed is originally from GADTs. For example, we consider designing a GADT for closed type representations:

`data TyRep :: * -> * where`

TyInt :: TyRep Int

TyBool :: TyRep Bool

GADTs differ from ordinary algebraic data types in that they allow each data constructor to constrain the type parameters to the datatype. For example, the `TyInt`

constructor requires that the single parameter to `TyRep`

be `Int`

.

We can use type representations for type-indexed programming a simple example liked computing a default element for each type.

`zero :: forall a. TyRep a -> a`

zero TyInt = 0 -- ‘a’ must be Int

zero TyBool = False -- ‘a’ must be Bool

This code pattern matches the type representation to determine what value to return. Because of the nonuniform type index, pattern matching recovers the identity of the type variable `a`

.

- In the first case, because the data constructor is
`TyInt`

, this parameter must be`Int`

, so`0`

can be returned. - In the second case, the parameter
`a`

must be equal to`Bool`

, so returning`False`

is well-typed.

However, the GADT above can only be used to represent types of kind `*`

. To represent type constructors with kind `* -> *`

, such as `Maybe`

or `[]`

, we could create a separate datatype, perhaps called `TyRep1`

, `TyRep2`

,... Kind polymorphism which allows data types to be parameterized by kind variables as well as type variables, could be the solution. However, it is not enough to unify the representations for `TyRep`

. the type representation (shown below) should constrain its kind parameter.

`data TyRep (a :: k) where`

TyInt :: TyRep Int

TyBool :: TyRep Bool

TyMaybe :: TyRep Maybe

TyApp :: TyRep b -> TyRep c -> TyRep (b c)

This `TyRep`

type takes two parameters, a kind `k`

and a type of that kind (not named in the kind annotation). The data constructors constrain
`k`

to a concrete kind. For the example to be well-formed, `TyInt`

must constrain the kind parameter to *. Similarly, `TyMaybe`

requires the kind parameter to be `* -> *`

. We call this example a kind-indexed GADT because the datatype is indexed by both kind and type information.

Pattern matching with this datatype refines kinds as well as types—determining whether a type is of the form `TyApp`

makes new kind and type equalities available. For example, consider the zero function extended with a default value of the `Maybe`

type.

`zero :: forall (a :: *). TyRep a -> a`

zero TyInt = 0

zero TyBool = False

zero TyApp (TyMaybe _) = Nothing

In the last case, the `TyApp`

pattern introduces the kind variable `k`

, the type variables `b :: k -> *`

and `c :: k`

, and the type equality `a ∼ b c`

. The `TyMaybe`

pattern adds the kind equality `k ~ *`

and type equality `b ∼ Maybe`

. Combining the equality, we can show that `Maybe c`

, the type of `Nothing`

, is well-kinded and equal to `a`

.

With this design, we also enable type decomposition feature. Finally, new `TypeRep`

will be:

`data TypeRep (a :: k) where`

TrTyCon :: !Fingerprint -> TyCon -> TypeRep k -> TypeRep a

TrApp :: !Fingerprint -> TypeRep a -> TypeRep b -> TypeRep (a b)

data TyCon = TyCon { tc_module :: Module, tc_name :: String }

data Module = Module { mod_pkg :: String, mod_name :: String }

The current implementation of `TypeRep`

allows constant-time comparison based on fingerprints. To support this in the new scheme we would want to add a fingerprint to every TypeRep node. But we would not want clients to see those fingerprints.

The `TyCon`

type, which is a runtime representation of the “identity” of a type constructor, is now silently generates a binding for a suitable instance for every datatype declaration by GHC. For example, for `Maybe`

GHC will generate:

`$tcMaybe :: TyCon `

$tcMaybe = TyCon { tc_module = Module { mod_pkg = "base"

, mod_name = "Data.Maybe"

}

, tc_name = "Maybe"

}

The name `$tcMaybe`

is not directly available to the programmer. Instead (this is the second piece of built-in support), GHC’s type-constraint solver has special behavior for `Typeable`

constraints, as follows.

To solve `Typeable(t1 t2)`

, GHC simply solves `Typeable t1`

and `Typeable t2`

, and combines the results with `TrApp`

. To solve `Typeable T`

where `T`

is a type constructor, the solver uses `TrTyCon`

. The first argument of `TrTyCon`

is straight-forward: it is the (runtime representation of the) type constructor itself, e.g `$tcMaybe`

.

But `TrTyCon`

also stores the representation of the kind of this very constructor, of type `TypeRep k`

. Recording the kind representations is important, otherwise we would not be able to distinguish, say, `Proxy :: * -> *`

from `Proxy :: (* -> *) -> *`

, where Proxy has a polymorphic kind `(Proxy :: forall k. k -> *)`

. We do not support direct representations of kind-polymorphic constructors like `Proxy`

, rather
`TrTyCon`

encodes the instantiation of a kind-polymorphic constructor (such as Proxy).

Notice that `TrTyCon`

is fundamentally insecure: you could use it to build a `TypeRep t`

for any `t`

whatsoever. That is why we do not expose the representation of `TypeRep`

to the programmer. Instead the part of GHC’s Typeable solver that builds `TrTyCon`

applications is part of GHC’s trusted code base.

`TypeRep`

is abstract, and thus we don't use proxy to determine the `TypeRep`

value anymore:

`-- before`

class Typeable a where

typeRep :: proxy a -> TypeRep

-- after

data Typeable a where

typeRep :: TypeRep

### Kind equalities

We also need to recompute representation equality function `eqT`

. It is easy, just need to compare equality between 2 `TypeRep`

fingerprint:

`data (a :: k1) :~~: (a :: k2) where`

HRefl :: forall (a :: k). a :~~: a

eqT :: forall k1 k2 (a :: k1) (b :: k2)

. TypeRep a -> TypeRep b -> Maybe (a :~~: b)

eqT a b

| typeRepFingerprint a == typeRepFingerprint b = Just (unsafeCoerce Refl)

| otherwise = Nothing

It is critical that this function returns `(:~~:)`

, not `(:~:)`

. GHC can't compile. This is because `TyCon`

s exist at many different kinds. For example, `Int`

is at kind `*`

, and `Maybe`

is at kind `* -> *`

. Thus, when comparing two `TyCon`

representations for equality, we want to learn whether the types and the kinds are equal. If we used type equalities `(:~:)`

here, the `eqTypeRep`

function could be used only when we know, from some other source, that the kinds are equal.

Richard A. Eisenberg proposed and implemented kind heterogeneous equalities (2013 - 2015). It enable new, useful features such as kind-indexed GADTs, promoted GADTs and kind families. This extension was experiment in GHC 8.0.1, then was provided in `Data.Type.Equality`

module.

The restriction above exists because GHC reasons about only type equality, never kind equality. The solution to all of these problems is simple to state: merge the concepts of type and kind. If types and kinds are the same, then we surely have kind equalities. In order to overcome those challenges, it has been necessary to augment GHC’s internal language, `System FC`

. This is beyond the scope of this post. If you need to dig into detail, let's read this paper.

`fromDynamic`

turns out like this:

`fromDynamic :: forall d. Typeable d => Dynamic -> Maybe d`

fromDynamic (Dyn (ra :: TypeRep a) (x :: a))

= case eqT ra (typeRep :: TypeRep d) of

Nothing -> Nothing

Just HRefl -> Just x

We use `eqT`

to compare the two `TypeRep`

s, and pattern-match on `HRefl`

, so that in the second case alternative we know that `a`

and `d`

are equal, so we can return `Just x`

where a value of type `Maybe d`

is needed. More generally, `eqT`

allows to implement type-safe cast, a useful operation in its own right.

### Decomposing polykinds representations

So far, we have discussed type representations for only types of kind `*`

. The only operation we have provided over `TypeRep`

is `eqT`

, which compares two type representations for equality. Does `(,)`

which has kind `(* -> *)`

too have a `TypeRep`

? For example, how can we decompose the type representation, to check that it indeed represents a pair, and extract its first component?

`dynFst :: Dynamic -> Maybe Dynamic`

dynFst (Dyn rpab x)

= ...

Of course it must. Since types in Haskell are built via a sequence of type applications (much like how an expression applying a function to multiple arguments is built with several nested term applications), the natural dual is to provide a way to decompose type applications. Let's take a look at `TrApp`

definition:

`data TypeRep a where `

TrCon :: !Fingerprint -> TyCon -> TypeRep k -> TypeRep a

TrApp :: !Fingerprint -> TypeRep a -> TypeRep b -> TypeRep (a b)

`TrApp`

allows us to observe the structure of types and expose the type equalities it has discovered to the type checker. Now we can implement
`dynFst`

:

`dynFst :: Dynamic -> Maybe Dynamic`

dynFst (Dyn (TrApp _ rpa rb) x)

= case rpa of

TrApp rp ra -> case eqT rp (typeRep :: TypeRep (,)) of

Just Refl -> Just $ Dyn ra (fst x)

Nothing -> Nothing

_ -> Nothing

dynFst _ = Nothing

We check that the `TypeRep`

of x is of form `(,) a b`

by decomposing it twice. Then we must check that `rp`

, the `TypeRep`

of the function part of this application, is indeed the pair type constructor `(,)`

; we can do that using `eqT`

. These three GADT pattern matches combine to tell the type checker that the type of `x`

, which began life in the `(Dyn rpab x)`

pattern match as an existentially-quantified type variable, is indeed a pair type `(a, b)`

. So we can safely apply `fst`

to `x`

, to get a result whose type representation `ra`

we have in hand.

The code is simple enough, but the type checker has to work remarkably hard behind the scenes to prove that it is sound. Let us take a closer look with kind signatures added:

`data TypeRep a where `

TrApp :: forall k1 k2 (a :: k1 -> k2) (b :: k1)

. !Fingerprint -> TypeRep a -> TypeRep b -> TypeRep (a b)

Note that `k1`

, the kind of `b`

is *existentially* bound in this data structure, meaning that it does not appear in the kind of the result type `(a b)`

. We know the result kind of the type application but there is no way to know the kinds of the sub-components.

With kind polymorphism in mind, let’s add some type annotations to see what existential variables are introduced in `dynFst`

:

dynFst :: Dynamic -> Maybe Dynamic

-- dynFst (Dyn (rpab :: TypeRep pab) (x :: pab))

dynFst (Dyn (TrApp _ (rpa :: TypeRep pa) (rb :: TypeRep b)) (x :: pab))

-- introduces kind k2, and types pa :: k2 -> *; b :: k2

= case rpa of

TrApp (rp :: TypeRep p) (ra :: TypeRep a) ->

-- introduces kind k1, and types p :: k1 -> k2 -> *, a :: k1

case eqT rp (typeRep :: TypeRep (,)) of

Just Refl -> Just $ Dyn ra (fst x)

-- introduces p ~ (,) and (k1 -> k2 -> *) ~ (* -> * -> *)

Nothing -> Nothing

_ -> Nothing

dynFst _ = Nothing

Focus on the arguments to the call to `eqT`

in the third line. We know that:

- rp :: TypeRep p and p :: k1 -> k2 -> *
- typeRep :: TypeRep (,) and (,) :: * -> * -> *

So `eqT`

must compare the `TypeRep`

s for two types of different kinds; if the runtime test succeeds, we know not only that `p ~ (,)`

, but also that `(k1 ~ *)`

and `(k2 ~ *)`

. That is, the pattern match on `Refl`

GADT constructor brings local kind equalities into scope, as well as type equalities.

### Better Pattern matching with PatternSynonyms

The code above is ugly. Moreover, we don't want to expose `TypeRep`

constructor to users. Earliest solution is returning another GADT:

`data AppResult t where`

App :: TypeRep a -> TypeRep b -> AppResult (a b)

splitApp :: TypeRep a -> Maybe (AppResult a)

splitApp (TrApp _ ra rb) = Just $ App ra rb

splitApp _ = Nothing

dynFst :: Dynamic -> Maybe Dynamic

dynFst (Dyn rpab x) = do

App rpa rb <- splitApp rpab

App rp ra <- splitApp rpa

Refl <- eqT rp (typeRep :: TypeRep (,))

return $ Dyn ra (fst x)

However, we can make it better with `PatternSynonyms`

extension which was provided since GHC 7.8.

Pattern synonyms enable giving names to parametrized pattern schemes. They can also be thought of as abstract constructors that don’t have a bearing on data representation. GHC User's Guide

`pattern App :: forall k2 (t :: k2). ()`

=> forall k1 (a :: k1 -> k2) (b :: k1). (t ~ a b)

=> TypeRep a -> TypeRep b -> TypeRep t

pattern App f x <- TrApp _ f x

where App f x = mkTrApp f x

dynFst :: Dynamic -> Maybe Dynamic

dynFst (Dyn (App (App rp ra) rb) x) = do

Refl <- eqT rp (typeRep :: TypeRep (,))

return $ Dyn ra (fst x)

dynFst _ = Nothing

With this extension, you can not only hide the representation of the datatype, but also use it in pattern matching. Our code is much cleaner.

### Decompose function type, TypeInType and Dependent Types

We also need to implement `dynApply`

, which applies a function Dynamic to an argument Dynamic. It is necessary in real-world application, e.g, send an object with a function type, say `Bool -> Int`

, over the network.

`dynApply :: Dynamic -> Dynamic -> Maybe Dynamic`

dynApply (Dyn rf f) (Dyn rx x) = ?

In theory, We can use `TrApp`

and `App`

pattern to construct and decompose function type `(->)`

too. The definition of this function is fairly
straightforward:

`dynApply :: Dynamic -> Dynamic -> Maybe Dynamic`

dynApply (Dyn (TrApp (TrApp tr targ) tres) fun) (Dyn targ' arg)

| Just HRefl <- eqT tr (typeRep :: TypeRep (->))

, Just HRefl <- eqT targ targ'

= Just (Dyn tres (fun arg))

dynApply _ _ = Nothing

However, because functions are quite ubiquitous, we should define another constructor for the sake of efficiency:

`data TypeRep (a :: k) where `

TrFun :: TypeRep a -> TypeRep b -> TypeRep (a -> b)

This definition wasn't compiled before GHC 8.0. First, `TrApp`

and `TrFun`

is ambiguous that can be solved with explicit quantification. Second, GHC can't know that kind `TypeRep :: k -> *`

is identical with `* -> *`

.

Kind equality extends the idea of kind polymorphism by declaring that types and kinds are indeed one and the same. Nothing within GHC distinguishes between types and kinds. Another way of thinking about this is that the type `Bool`

and the “promoted kind” `'Bool`

are actually identical.

One simplification allowed by combining types and kinds is that the type of

`Type`

is just`Type`

. It is true that the`Type :: Type`

axiom can lead to non-termination, but this is not a problem in GHC, as we already have other means of non-terminating programs in both types and expressions. This decision (among many, many others) does mean that despite the expressiveness of GHC’s type system, a “proof” you write in Haskell is not an irrefutable mathematical proof. GHC promises only partial correctness, that if your programs compile and run to completion, their results indeed have the types assigned. It makes no claim about programs that do not finish in a finite amount of time. GHC User's Guide

To enable `(* ::*)`

axiom, you have to enable `TypeInType`

extension, which is a deprecated alias of `PolyKinds`

, `DataKinds`

and `KindSignatures`

. Its functionality has been integrated into these other extensions. With this extension, GHC can know that `k ~ *`

, and the code can compile.

`data TypeRep (a :: k) where`

TrTyCon :: !Fingerprint -> TyCon -> TypeRep k -> TypeRep (a :: k)

TrApp :: forall k1 k2 (a :: k1 -> k2) (b :: k1).

!Fingerprint

-> TypeRep (a :: k1 -> k2)

-> TypeRep (b :: k1)

-> TypeRep (a b)

TrFun :: forall (r1 :: RuntimeRep) (r2 :: RuntimeRep)

(a :: TYPE r1) (b :: TYPE r2).

!Fingerprint

-> TypeRep a

-> TypeRep b

-> TypeRep (a -> b)

### Levity Polymorphism

You may notice, there are `RuntimeRep`

and `TYPE`

kind signatures in TrFun constructor. They relates to Levity Polymorphism, which is implemented in GHC version 8.0.1, released early 2016.

In brief, most of types we use everyday (Int, Bool, AST, ...) are boxed value, which is represented by a pointer into the heap. The main advantage of boxed types are supporting polymorphism. The disadvantage is slow performance. Most polymorphic languages also support some form of unboxed primitive values that are represented not by a pointer but by the value itself. In Haskell, unboxed types are denoted with `MagicHash`

suffix.

`{-# LANGUAGE MagicHash #-}`

f :: Int# -> Int#

Haskell also categorizes types into `lifted`

, and `unlifted`

. Lifted types is one that is lazy. It is considered `lifted`

because it has one extra element beyond the usual ones, representing a non-terminating computation. For example, Haskell’s `Bool`

type is lifted, meaning that three `Bool`

s are possible: `True`

, `False`

,and `⊥`

. An `unlifted`

type, on the other hand, is strict. The element `⊥`

does not exist in an unlifted type.

Because Haskell represents lazy computation as thunks at runtime, all lifted types must also be boxed. However, it is possible to have boxed, unlifted types.

- Lifted - Boxed:
`Int`

,`Bool`

,... - Unlifted - Boxed:
`ByteArray#`

- Unlifted - Unboxed:
`Int#`

,`Bool#`

Given these unboxed values, the boxed versions can be defined in Haskell itself; GHC does not treat them specially. For example:

`data Char = C# Char#`

data Int = I# Int#

plusInt :: Int -> Int -> Int

plusInt (I# i1) (I# i2) = I# (i1 +# i2)

Here `Int`

is an ordinary algebraic data type, with one data constructor `I#`

, that has one field of type `Int#`

. The function `plusInt`

simply
pattern matches on its arguments, fetches their contents (`i1`

and `i2`

, both of type `Int#`

), adds them using `(+#)`

, and boxes the result with `I#`

.

The issue is, like many other compilers for a polymorphic language, GHC assumes that a value of polymorphic type, such as `x :: a`

s represented uniformly by a heap pointer, or lifted type. The compiler adopts `The Instantiation Principle`

:

You cannot instantiate a polymorphic type variable with an unlifted type.

That is tiresome for programmers, but in return they get solid performance guarantees.

How can the compiler implement the instantiation principle? For example, how does it even know if a type is unlifted? By kinds, much the same way that terms are classified by types. `Type`

, or `*`

kind which we use every day is lifted. In contrast, `#`

is a new kind that classifies unlifted types, e.g: `Int#`

, `Bool#`

.

In default, polymorphism functions assume kind of parameters is `Type`

. If we attempt to instantiate it at type `Float# :: #`

, we will get a
kind error because `Type`

and `#`

are different kinds. The function arrow type `(->)`

is the same. It is just a binary type constructor with kind:

`(->) :: Type -> Type -> Type`

But now we have a serious problem: a function over unlifted types, such as `sumTo# :: Int# -> Int# -> Int#`

, becomes ill-kinded!.

For many years its “solution” was to support a sub-kinding relation. That is, GHC had a kind `OpenKind`

, a super-kind of both `Type`

and `#`

. We could then say that:

`(->) :: OpenKind -> OpenKind -> Type `

However, there are drawbacks. The combination of type inference, polymorphism, and sub-typing, is problematic. And indeed GHC’s implementation of type inference was riddled with awkward and unprincipled special cases caused by sub-kinding. Moreover, The kind `OpenKind`

would embarrassingly appear in error messages.

Levity polymorphism bring new idea: replace sub-kinding with kind polymorphism. New primitive type-level constant, `TYPE :: RuntimeRep -> Type`

is introduced with the following supporting definitions:

`data RuntimeRep = VecRep VecCount VecElem -- ^ a SIMD vector type`

| TupleRep [RuntimeRep] -- ^ An unboxed tuple of the given reps

| SumRep [RuntimeRep] -- ^ An unboxed sum of the given reps

| LiftedRep -- ^ lifted; represented by a pointer

| UnliftedRep -- ^ unlifted; represented by a pointer

| IntRep -- ^ signed, word-sized value

| WordRep -- ^ unsigned, word-sized value

| Int64Rep -- ^ signed, 64-bit value (on 32-bit only)

| Word64Rep -- ^ unsigned, 64-bit value (on 32-bit only)

| AddrRep -- ^ A pointer, but /not/ to a Haskell value

| FloatRep -- ^ a 32-bit floating point number

| DoubleRep -- ^ a 64-bit floating point number

type Type = TYPE 'LiftedRep

RuntimeRep is a type that describes the runtime representation of values of a type. `Type`

, the kind that classifies the types of values, was
previously treated as primitive, but now becomes a synonym for `TYPE Lifted`

, where `Lifted :: RuntimeRep`

. It is easiest to see how these
definitions work using examples:

`Int :: Type`

Int :: TYPE `LiftedRep -- Expanding Type

Int# :: TYPE 'IntRep

Float# :: TYPE 'FloatRep

(Int, Bool) :: Type

Maybe Int :: Type

Maybe :: Type -> Type

Any type that classifies values, whether boxed or unboxed, lifted or unlifted, has kind `TYPE r`

for some `r :: RuntimeRep`

. The type
`RuntimeRep`

tells us the runtime representation of values of that type. This datatype encodes the choice of runtime value.

We can now give proper types to `(->)`

, same as `TrFun`

. This enables polymorphism for both lifted and unlifted types.

`import GHC.Exts`

(->) :: forall (r1 :: RuntimeRep) (r2 :: RuntimeRep)

. TYPE r1 -> TYPE r2 -> Type

**Note**: Unboxed and Levity Polymorphism types are imported in `GHC.Exts`

module.

If you are more curious about Levity polymorphism, let's take a look at original paper.

### Compare TypeReps

It is sometimes necessary to use type representations in the key of a map. For example, Shake uses a map keyed on type representations to look up class instances (dictionaries) at runtime; these instances define class operations for the types of data stored in a collection of `Dynamic`

s. Storing the class operations once per type, instead of with each `Dynamic`

package, is much more efficient.

More specifically, we would like to implement the following interface:

`data TyMap `

empty :: TyMap

insert :: Typeable a => a -> TyMap -> TyMap

lookup :: Typeable a => TyMap -> Maybe a

But how should we implement these type-indexed maps? One option is to use `HashMap`

. We can define the typed-map as a map between the type representation and a dynamic value.

`data SomeTypeRep where`

SomeTypeRep :: forall k (a :: k). !(TypeRep a) -> SomeTypeRep

type TyMap = HashMap SomeTypeRep Dynamic

Notice that we must wrap the `TypeRep`

key in an existential `SomeTypeRep`

, otherwise all the keys would be for the same type, which would rather defeat the purpose! The `insert`

and `lookup`

functions can then use `toDynamic`

and `fromDynamic`

to ensure that the right type of value is stored with each key.

`import Data.HashMap.Strict as Map`

insert :: Typeable a => a -> TyMap -> TyMap

insert x = Map.insert (SomeTypeRep (typeRep :: TypeRep a)) (toDynamic x)

lookup :: Typeable a => TyMap -> Maybe a

lookup = fromDynamic <=< Map.lookup (SomeTypeRep (typeRep :: TypeRep a))

Because `Map`

family uses balanced binary trees to achieve efficient lookup, `SomeTypeRep`

must be an instance of `Ord`

. This is straightforward, since `TypeRep`

use fingerprint for unique hash, it can be compared too.

`instance Ord SomeTypeRep where`

SomeTypeRep a `compare` SomeTypeRep b =

typeRepFingerprint a `compare` typeRepFingerprint b

Notice that we cannot make an instance for `Ord (TypeRep a)`

: if we compare two values both of type `TypeRep t`

, following the signature of
compare, they should always be equal!

### Other Changes

- Type-indexed type representation interface is in
`Type.Reflection`

module.`Data.Typeable`

provides type representations which are qualified over this index, providing an interface very similar to the`Typeable`

notion seen in previous releases for backward compatibility. `Data.Typeable.TypeRep`

and`Type.Reflection.TypeRep`

are different.`Data.Typeable.TypeRep`

is alias of`SomeTypeRep`

`TypeRep`

definition is replaced with record, to easier to extend parameters

`data TypeRep (a :: k) where`

TrType :: TypeRep Type

TrTyCon :: { trTyConFingerprint :: {-# UNPACK #-} !Fingerprint

, trTyCon :: !TyCon

, trKindVars :: [SomeTypeRep]

, trTyConKind :: !(TypeRep k)

}

-> TypeRep (a :: k)

TrApp :: forall k1 k2 (a :: k1 -> k2) (b :: k1).

{

trAppFingerprint :: {-# UNPACK #-} !Fingerprint

, trAppFun :: !(TypeRep (a :: k1 -> k2))

, trAppArg :: !(TypeRep (b :: k1))

, trAppKind :: !(TypeRep k2) }

-> TypeRep (a b)

TrFun :: forall (r1 :: RuntimeRep) (r2 :: RuntimeRep)

(a :: TYPE r1) (b :: TYPE r2).

{

trFunFingerprint :: {-# UNPACK #-} !Fingerprint

, trFunArg :: !(TypeRep a)

, trFunRes :: !(TypeRep b) }

-> TypeRep (a -> b)

- The kind of the TypeRep in each
`TrTyCon`

and`TrApp`

constructor is cached. This is necessary to ensure that typeRepKind (which is used, at least, in deserialization and dynApply) is cheap, because calculating the kind of type constructor and nested type applications is pricy, - We need to be able to represent
`TypeRep Type`

. This is a bit tricky because`typeRepKind (typeRep @Type) = typeRep @Type`

, so if we actually cache the`TypeRep`

of the kind of`Type`

, we will have a loop. One simple way to do this is to make the cached kind fields lazy and allow`TypeRep Type`

to be cyclical.

## Limitation and unexplored future

Our interface does not support representations of polymorphic types, such as `TypeRep (∀ a. a -> a)`

. Although plausible, supporting those in our setting brings in a whole new range of design decisions that are as of yet unexplored (e.g. higher-order representations vs. de-Bruijn?). Furthermore, it requires the language to support impredicative polymorphism (the ability to instantiate quantified variables with polymorphic types, for instance the `a`

variable in `TypeRep a`

or `Typeable a`

), which GHC currently does not. Finally, representations of polymorphic types have implications on semantics and possibly parametricity.

Similarly, constructors with polymorphic kinds would require impredicative kind polymorphism. A representation of type `TypeRep (Proxy :: ∀ kp. kp -> *)`

would require the kind parameter `k`

of `TypeRep (a :: k)`

to be instantiated to the polymorphic kind `∀ kp. kp -> *`

. Type inference for impredicative kind polymorphism is no easier than for impredicative type polymorphism and we have thus excluded this possibility.

## Summary

`Typeable`

take a long journey for definition, design, refactor and redesign, from runtime term level to type-safe. However, the journey doesn't stop here. GHC is still growing, better and safer, with innovation and contribution of brilliant researchers. It motivates us to study and do great software and bring toward the industry.

## Acknowledgment

Thanks to Richard A. Eisenberg and his team for great research and contribution. This post take many reference in their papers.

## Appendix

- The complete code in "A reflection on types" paper: link
- Homogeneous equality ~ Type equality: (
`a :~: b`

) - Heterogeneous equality ~ Kind equality: (
`a :~~: b`

) - Universal quantification:
`forall a. a -> a`

- Existentially quantified type:
`forall a. Show a => a -> String`

## References

- Dynamic Typing in a Statically Typed Language - Martin Abadi, Luca Cardelli, Benjamin Pierce, Gordon Plotkin (1989)
- Kind polymorphism - GHC Language Features. Chapter 7
- Scrap your boilerplate: a practical approach to generic programming - Ralf Lämmel, Simon Peyton Jones (January 2003)
- A Lightweight Implementation of Generics and Dynamics - James Cheney, Ralf Hinze (October 3, 2002)
- Giving Haskell a Promotion - Brent A. Yorgey, Stephanie Weirich, Julien Cretin, Simon Peyton Jones, Dimitrios Vytiniotis, Jose Pedro Magalhaes (2012)
- Who invented proxy passing and when? - Stackoverflow
- What is Levity polymorphism - Stackoverflow
- Levity polymorphism (extended version) - Richard A. Eisenberg, Simon Peyton Jones (2017)
- TypeLevelReasoning Proposal - Richard A. Eisenberg
- An overabundance of equality: Implementing kind equalities into Haskell - Richard A. Eisenberg (September 22, 2015)
- Typed reflection in Haskell - Simon Peyton Jones, Stephanie Weirich, Richard A. Eisenberg, Dimitrios Vytiniotis - Proc Philip Wadler's 60th birthday Festschrift, Edinburgh, April 2016
- Safer and more expressive type representations - Haskell Trac Wiki
- Types-safe Distributed Haskell - Haskell Trac Wiki
- System FC with Explicit Kind Equality
- A reflection on types
- What does the exclamation mark mean in a Haskell declaration? - Stackoverflow
- Haskell Git Repository
- Towards Haskell in the Cloud - Jeff Epstein, Andrew Black, Simon Peyton Jones (September 2011)
- GHC User's Guide