Thomas M. Kehrenberg

Basic building blocks of dependent type theory

(Part of the series .)

This post is the first in a sequence of posts meant to be an introduction to pragmatic type theory. “Type theory” here refers to Martin-Löf dependent type theory (MLTT), as opposed to theories like the simply-typed lambda calculus; the main difference being that in MLTT, functions can also act on types and not just values, which makes it a more powerful theory.

I was initially motivated to study type theory because of a remark in the Alignment Research Field Guide about how defining terms and adding type annotations to them can make the topic of research discussions more precise, as an intermediate step between vague idea and fully formalized proof.

You might also be aware that type theory can serve as an alternative foundation for mathematics, in place of set theory. And some people think type theory is the more elegant choice of the two; a point which this article is for example arguing for. Another reason to learn type theory is that it serves as the basis of most theorem provers.

I’m not aware of any short-ish but still pretty comprehensive introductions to MLTT, so that’s why I wrote this sequence of posts. The content is mostly based on the book Homotopy Type Theory, with the main difference that the book tries to convince you to do constructive mathematics, whereas I won’t. (I also won’t talk about any homotopies at all.) This means though that if you’re interested in constructive mathematics (or homotopies), you should probably just read the book instead.

I will sometimes compare type concepts to concepts found in programming languages, but if you don’t have any programming experience, you can just ignore that. What you should have though, is a good understanding of set theory, because I will reference it frequently.

With that said, let’s get into it!

Type judgements

A simple type annotation looks like this:

x:Tx:T

where TT is a type and xx is an element of this type. This looks very similar to the statement “xTx\in T” where TT is a set, and it is quite similar, but there are some important differences.

For one, types are usually stricter than sets when it comes to membership. In set theory, an object can easily be a member of multiple sets. In type theory, an object usually only belongs to a single type (though maybe that’s not quite the case when union types are involved).

The other difference is that “xTx\in T” can be used as a proposition within a logic formula like this:

x(x∉AxB)\forall x(x\not\in A \lor x\in B)

but you cannot do that in type theory. For example, something like “(x:T)(x<y)(x:T)\Rightarrow (x<y)” doesn’t make any sense. Type judgements (i.e., the “:T:T” part) can only be applied from the outside to a full statement. For example, you could have some formula Φ\Phi, describing the result of some computation, and then you can claim “Φ:T\Phi :T” (“formula Φ\Phi is of type TT”) and that is then either true (more precisely, “well-typed”) or false (“not well-typed”) as seen from the outside. But type judgments cannot be used inside a statement as a true-or-false-valued term.

It’s like if you had a programming language that lacked any kind of type introspection, such that at runtime you couldn’t write an if-statement that branches based on the type of some object. But external programs like the compiler and static analysis tools identify the type, and tell us if we wrote our program correctly; for example, whether the values we are passing to functions have the appropriate type.

There is a caveat to the rule that something like “x:Ax:A” may not appear within formulas and this concerns universal and existential quantification. Later, we will define these two quantifiers to write formulas like

(x:A).(y:A).(y>x)\forall(x:A).\exists(y:A).(y>x)

but even here, “x:Ax:A” is not something that can be true or false; it’s just an indicator for the “domain” of the quantifier.

So, membership in a type cannot be used as a proposition in type theory, but then how do we express things like “(x:S)(x<0)(x:S)\Rightarrow(x<0)” where SS is, for example, some “subset” on the integers? The solution we’ll use is to define “subsets” as predicates (or characteristic functions) on some base type. So, you could, for example, define a set in N\mathbb{N} via a predicate function, i.e., a function that takes in an n:Nn:\mathbb{N} and returns true or false. And then you could ask questions about membership in this “set” by just applying the predicate function. We will later see how this works in detail.

Product types

Speaking of functions, how do we type-annotate those? Before we get there, let’s talk about something simpler first: pairs of values. If we have two values: a:Aa:A and b:Bb:B, then we can form a pair that has the type A×BA\times B:

(a,b):A×B .(a, b):A\times B~.

In contrast to (the usual definition of) set theory, type theory has ordered pairs as a native building block. The definition of ordered pairs goes along with a definition of projection functions (going from (a,b)(a, b) to, e.g., just aa) but we’ll get there later.

For nn-tuples, we can pair repeatedly:

(a,(b,(c,(d,e)))):A×(B×(C×(D×E))) .(a, (b, (c, (d, e)))): A\times(B\times(C\times(D\times E)))~.

However, we usually leave out the extra parentheses:

(a,b,c,d,e):A×B×C×D×E(a, b, c, d, e): A\times B\times C\times D\times E

while always implying that parentheses open after any “T×T\times” and close at the very end of the expression.

Now, back to functions. If we have a function f(x)f(x), one thing we can annotate is its return type. We could maybe do it like this:

f(x):Y .f(x):Y~.

However, we would also like to know the type of the argument xx, so that we can know what values we can pass to this function. Say that our function ff maps values from XX to YY. Type theorists have developed a brilliant notation for this:

f:x:XYf:\prod_{x:X}Y

Now, you might be saying: what the heck is this? And I’d agree with you, but there is some justification for writing a function type like this. You see, a function could be seen as a very long ordered tuple, where each entry of the tuple is associated with a particular input value. The entries of the tuple all have type YY, so the type of the tuple is kind of

Y×Y×Y×Y×Y×Y\times Y\times Y\times Y\times Y\times \dots

with one entry for every value x:Xx:X. Hence, we write it as a product: x:XY\prod_{x:X}Y. In set theory, the set of all functions from XX to YY is written YXY^X for a similar reason. (Why are we not writing it like that in type theory? We’ll see later how the x:Xx:X is useful.)

In contrast to set theory, functions are a fundamental concept in type theory, and are not defined in terms of other things. There are three basic building blocks in type theory: values, functions, and types. (And then you can additionally construct ordered pairs out of any combination of these.) Values could also be seen as constant functions that don’t accept any arguments, so, in a sense there are only functions and types, but we will talk about “values” as separate things nevertheless.

The basic theory of functions that type theory is built on is lambda calculus. I will just quickly summarize it here. Feel free to skip this next section if you are already familiar.

Aside on Lambda Calculus

Say you have a function like:

f(x):xx .f(x):\equiv x\cdot x~.

If you want to apply this function now to the number 4, then the rules of lambda calculus tell you to replace all xx’s with “4”:

f(4)44f(4)\equiv 4\cdot 4

which probably doesn’t come as a surprise.

Now, in this example, ff was a named function (its name is “f”), but often it’s inconvenient to give every function a name, so we need a notation for anonymous functions. This is most useful when some function takes another function as an argument.

For example, the Fourier transform transforms functions and if we let the transform be denoted by F\mathcal{F} then applying it to a function ff looks like this: F(f)\mathcal{F}(f). However, we might not always want to give the function passed to F\mathcal{F} a name. In this case we can pass it an anonymous function. My preferred notation for anonymous functions is something like

xxxx\mapsto x\cdot x

where xx is mapped to xxx\cdot x. So that you can write F(xxx)\mathcal{F}(x\mapsto x\cdot x) for its Fourier transform. But the notation used in lambda calculus is this:

λx.xx\lambda x.x\cdot x

(and so we’ll be using this notation). The lambda (λ\lambda) is not meant to be a variable here, but a marker that marks the beginning of a function. The variable that comes after the λ\lambda is the input argument and what comes after the dot (..) is the function body. Applying the Fourier transform to it looks like this: F(λx.xx)\mathcal{F}(\lambda x.x\cdot x).

Anonymous functions (and named function too, actually) may only take a single argument, but as we’ll later see, we can get around this restriction by passing in nn-tuples as arguments or by using nested functions. I know this notation takes some getting used to, but it won’t feel strange after a while. As before, we’ll have the convention that the function body extends until the end of the expression (if there are no parentheses).

The following two are completely equivalent then:

f(x):3+xf:(λx.3+x)\begin{aligned} &f(x):\equiv 3+ x\\ &f:\equiv(\lambda x.\,3+ x) \end{aligned}

(However, whenever possible, we will prefer the first notation.) Applying anonymous functions works exactly like named functions; all occurrences of the input argument are replaced and we can drop the part before the dot:

(λx.xx)(4)44 .(\lambda x.x\cdot x)(4)\equiv 4\cdot 4~.

The variable following the λ\lambda is a temporary variable and the concrete letter used doesn’t matter. So the following two functions are completely identical:

(λx.2x)(λy.2y) .(\lambda x.2\cdot x)\equiv (\lambda y.2\cdot y)~.

Sometimes you have to be a bit careful in function application when there is a risk of a naming clash. This can happen when you have nested functions.[1]

Function types

The identity function on the integers Z\mathbb{Z} has the following type:

(λx.x):x:ZZ(\lambda x. x):\prod_{x:\mathbb{Z}}\mathbb{Z}

Notice that functions are instances of some function type; a single function type can be associated with many different concrete functions.

Sometimes we also explicitly annotate the type of the input argument:

(λ(x:Z).x):x:ZZ(\lambda (x:\mathbb{Z}). x):\prod_{x:\mathbb{Z}}\mathbb{Z}

However, we usually only do this if the type is not fixed, as in the following example.

As it is, our identity function only works for one type: Z\mathbb{Z}. But it would be nice if it worked for any type. We can achieve this by constructing a function that takes in a type as an additional argument, which in programming languages is known as a generic function. To do this, we need to talk about the type of types, so that we can specify the type of that additional argument.

The type of a type is called a universe U\mathcal{U}. However, there can’t be a single universe that contains all types because then it would have to contain itself, which leads to problems like Russell’s paradox. So, instead, there will be an infinite hierarchy of type universes where each universe is contained in the next universe on the hierarchy: Ui:Ui+1\mathcal{U}_i:\mathcal{U}_{i+1}. Additionally, we’ll say this hierarchy is cumulative, so if T:UiT:\mathcal{U}_i then also T:Ui+1T:\mathcal{U}_{i+1}. These detail doesn’t really matter to us though. What’s important to know is that for any type TT, there is some universe Ui\mathcal{U}_i in the hierarchy of universes that contains this type: T:UiT:\mathcal{U}_i. As the exact universe is usually not important, we’ll usually drop the index ii.

With this knowledge, we can now define the generic identity function:

id:(λT.(λ(x:T).x)):T:U(x:TT)\mathsf{id}:\equiv(\lambda T. (\lambda (x:T). x))\,:\,\prod_{T:\mathcal{U}}\left(\prod_{x:T}T\right)

Let’s take a moment to understand this. We have nested functions: the outer function takes a type parameter TT and then returns the inner function, which is just the identity function, but on type TT (which we annotated explicitly). To get the identity function on the integers, you would do: id(Z)\mathsf{id}(\mathbb{Z}), and then id(Z)(3)3\mathsf{id}(\mathbb{Z})(3)\equiv3. For notational convenience, people often write the first type argument as a subscript:

idZ(3)3\mathsf{id}_\mathbb{Z}(3)\equiv3

In general, functions with more than one argument can be realized in two ways: either by modeling them as nested functions that each only take one argument (also called currying), or by modeling them as a single function that takes an ordered pair or nn-tuple as argument. However, we could not have realized the generic identity function using the latter method, because the type of the second argument depended on the value of the first argument. Currying is strictly more flexible than using tuples. Additionally, you always need projection functions when working with tuples. Speaking of which, here they are for an ordered pair:

pr1:λ(a,b).a:p:A×BApr2:λ(a,b).b:p:A×BB\begin{aligned} \mathsf{pr}_1:\equiv \lambda(a, b). a : \prod_{p:A\times B}A\\ \mathsf{pr}_2:\equiv \lambda(a, b). b : \prod_{p:A\times B}B \end{aligned}

The way we have defined these functions here is called function definition by pattern matching, because we have written the input argument in an unpacked form. The non-pattern-matching notation would have been λp.()\lambda p.(\dots), where pp is an ordered pair. However, with that notation we cannot express what this function does because… we haven’t defined any way to unpack a tuple yet; that’s what we’re trying to do right now! So, pattern matching is a very convenient way to describe new functionality that our theory didn’t have yet, even though it’s not technically legal in lambda calculus. (Don’t worry though, pattern matching can be defined rigorously; we just won’t do that here.) We will use pattern matching for later function definitions as well.

Just as a further example, here is the type of the add function on the reals, using currying:

add:x:Ry:RR\mathsf{add}: \prod_{x:\mathbb{R}} \prod_{y:\mathbb{R}} \mathbb{R}

We left out the parentheses in the type, as is customary. And here it is using pairs:

add:p:R×RR\mathsf{add}: \prod_{p:\mathbb{R}\times\mathbb{R}} \mathbb{R}

Multiplication has the same type, of course.

Sum types

In the beginning I said that type theory has no type introspection that most programming languages have. But actually, through some cleverness, we can get something very similar. The setup is that we have some value xx, but we want to allow for it to be either of type AA or of type BB. The problem is that nothing within our theory can query the type. But we can solve this through tagged unions. Instead of a bare value xx, we’ll have a pair (t,x)(t, x) where the first element is a tag and the type of the second element depends on the value of the tag. This way, we can find out the type of xx simply by checking the value of the tag! (Assuming of course that we tagged everything correctly.)

You might think we could just use the pair type for this – like A×BA\times B – but that’s not possible because we need the type of the second entry to depend on the value of the first entry. This can’t be done with what we already have, so we need to introduce a new type formation rule. And that is the sum type, also called dependent pair type (because it’s a pair where the type of the second entry is dependent on the first entry):

(t,x):t:AB(t)(t, x):\sum_{t:A}B(t)

This means that tt is of type AA, and the type of xx is B(t)B(t), which is a function with takes in tt and returns a type (so, it’s a type-valued function). Now, why is this written as a sum? It makes sense in a certain way: As we learned, the type of a pair is X×YX\times Y, which is kind of like a multiplication. But a multiplication is just repeated addition, so we could say

X×Yx:XY .X\times Y \equiv\sum_{x:X}Y~.

Thus, if we have an ordered pair where the second element depends on the value of the first, then the sum x:XY(x)\sum_{x:X}Y(x) is a somewhat reasonable way to write this. The sum can also be seen as a disjoint union over the family of “sets” B(t)B(t) that is indexed by t:At:A where the tag ensures that the “sets” are definitely disjoint. The resulting type can have elements from all the individual types B(t)B(t).

In order to do anything with an instance of this tagged union, we need to handle all possible types that the second element of the pair (t,x)(t, x) can have. This means we can’t easily write down a projection function for the second element, because we don’t know what the return type will be.

So, instead of defining a projection function, we’ll define how some function gg could consume a tagged union. gg will need to accept two arguments: the tag tt and the actual value xx. The type has to be something like this:

g:t:Ax:B(t)Cg:\prod_{t:A}\prod_{x:B(t)}C

where CC is simply an arbitrary return type of gg (which could be another product or sum type). We can now define the function rec\mathsf{rec} which every sum type has its own version of; it takes in a return type CC, a function gg and an element of the sum type:

recΣt:AB(t):C:U g:Πt:AΠx:B(t)C p:Σt:AB(t)CrecΣt:AB(t)(C,g,(t,x)):g(t)(x)\begin{aligned} \mathsf{rec}_{\Sigma_{t:A}B(t)}:\prod_{C:\mathcal{U}}~\prod_{g:\Pi_{t:A}\Pi_{x:B(t)}C}~\prod_{p:\Sigma_{t:A}B(t)}C\\ \mathsf{rec}_{\Sigma_{t:A}B(t)}(C, g, (t, x)) :\equiv g(t)(x) \end{aligned}

Note that we had to make rec()\mathsf{rec}(\cdot) generic in CC in order to make sure that the overall return type matches the return type of gg. Note also that this is another instance of function definition by pattern matching; we’re unpacking the (t,x)(t, x) pair in the input argument definition, which isn’t normally a legal operation.

If you only have a handful of types that you want to take the union over, the this whole x:XY(x)\sum_{x:X}Y(x) setup can be a bit overkill. For that case, there is a simpler notation that doesn’t mention the tags explicitly. If AA and BB are types, then the following is also a type:

A+BA+B

It’s called a binary sum type and the notation follows straightforwardly from the idea that we were taking a sum over types.

Even though you can’t see any explicit tags in this definition, they are still kind of there, because for example if a:Aa:A then we do not have a:A+Ba:A+B. The elements of A+BA+B are different from the elements of AA and BB. In order to turn an a:Aa:A into an element of A+BA+B, we have to use a special function, one of which is associated with every binary sum type, and there is of course also one for BB. These “tagging functions” will get the names left\mathsf{left} and right\mathsf{right}:

left:a:A(A+B)right:b:B(A+B)left(a):A+B ,right(b):A+B\begin{aligned} \mathsf{left}&:\prod_{a:A}(A+B)\\ \mathsf{right}&:\prod_{b:B}(A+B)\\ \mathsf{left}(a):A+B~&,\quad\mathsf{right}(b):A+B \end{aligned}

You can see how they can turn an element of AA into an element of A+BA+B. In order to use one of the values of A+BA+B, you again need a way to handle both possibilities. So, say you have functions g0g_0 and g1g_1 which take an argument of type AA and BB respectively, but both return something of type CC:

g0:a:ACg1:b:BC\begin{aligned} g_0:\prod_{a:A}C\\ g_1:\prod_{b:B}C \end{aligned}

Then we can use a function called rec\mathsf{rec} to map an element of A+BA+B to the right function:

recA+B:C:U g0:Πa:AC g1:Πb:BC p:A+BC\begin{aligned} \mathsf{rec}_{A+B}:\prod_{C:\mathcal{U}}~\prod_{g_0:\Pi_{a:A}C}~\prod_{g_1:\Pi_{b:B}C}~\prod_{p:A+B}C\\ \end{aligned}

It’s a bit tricky to give the implementation of recA+B\mathsf{rec}_{A+B}, because we don’t have explicit tags – we only have the tagging functions. This means, we can only describe from the outside how recA+B\mathsf{rec}_{A + B} will work; if the element of A+BA+B was tagged with left\mathsf{left}, then we execute g0g_0, and if it was tagged with right\mathsf{right}, then we execute g1g_1:

recA+B(C,g0,g1,left(a)):g0(a)recA+B(C,g0,g1,right(b)):g1(b)\begin{aligned} \mathsf{rec}_{A+B}(C, g_0, g_1, \mathsf{left}(a)) :\equiv g_0(a)\\ \mathsf{rec}_{A+B}(C, g_0, g_1, \mathsf{right}(b)) :\equiv g_1(b) \end{aligned}

The knowledge of whether an element of A+BA+B comes from AA or BB is never encoded anywhere within the system, but from the outside, we can choose the correct function definition to execute. We have to assert that this function exists, because it cannot be defined with the other building blocks that we have. This is again an instance of a function definition by pattern matching; the suitable definition can be found by pattern matching any value against left(a)\mathsf{left}(a) and right(b)\mathsf{right}(b).

It’s important to state here that in practice, you would likely not use the recA+B\mathsf{rec}_{A+B} function explicitly. Rather, you can just allude to the fact that it’s possible to do two different things depending on which original type an element of a union had. Like, you would say “let x:A+Bx:A+B, if xx came from AA, do this; otherwise do that.”

Examples

An example of a function that can be modeled as returning a union is division on the rational numbers, including 0. As we know, division by 0 is not defined, but you could return a special symbol when dividing by 0. So, let’s say 1\mathbf{1} is the type that only contains one element, for which we could use the symbol :1\star :\mathbf{1}, and we return that symbol when dividing by 0. Then the function would have this type:

div:x:Qy:QQ+1\mathsf{div}:\prod_{x:\mathbb{Q}}\prod_{y:\mathbb{Q}}\mathbb{Q}+\mathbf{1}

When dealing with the output of this function in your derivation, you could (sloppily) say: “If the output is some z:Qz:\mathbb{Q}, then we do X next; otherwise we handle it with Y.”

An example of a use case for non-binary sum types is the definition of mathematical “data structures” like a ring or a group. These data structures are usually defined as a tuple, where the first element is a set, and the other elements are functions or other structures based on that set. In type theory, the set would be a type, obviously, but when we try to just define the data structure as a tuple, we will encounter the problem that the type of later tuple entries depends on the value of the first tuple entry.

To make this concrete, let’s consider groups. A group is a set with a binary operation, \cdot, and a neutral element, ee. So, groups should have this type:

(T,,e) : U×(x:Ty:TT)×T(T,\cdot, e)~:~\mathcal{U}\times \left(\prod_{x:T}\prod_{y:T}T \right)\times T

A tuple of a type, a binary operation and an element of the type. However, how do we ensure that TT is actually the type given in the first entry of the tuple? Clearly, we need a sum type here; after all, a sum type is just a pair where the type of the second element depends on the value of the first element:

Group : T:U  (x:Ty:TT)×T\mathsf{Group}~:\equiv~ \sum_{T:\mathcal{U}}\; \left(\prod_{x:T}\prod_{y:T}T \right)\times T

As usual, we left out the parentheses around the last two terms. This type of all groups is a union over all pairs of a binary operation and a neutral element, tagged with the type that they’re operating in. With this type, we could now write a function that takes in an arbitrary group, and it would know exactly what to do with it, because the “tag” (the first element of this 3-tuple) would tell the function what the underlying type is and what to expect with the binary operation and the neutral element.

Similarly, in set theory, an ordered set is technically a tuple of a set and an order relation. In type theory, we can analogously have a dependent pair (aka a sum type) of a type and an order relation on that type: (T,>)(T, >) where “>>” is a relation on TT. However, we haven’t discussed how to define relations yet.


This concludes the first part of this series. In the next post, we’ll learn how to define functions that return truth values, which includes predicate functions and relations.

Thanks to Simon Fischer for comments on a draft of this post.


  1. For example, the following function returns another function that always returns a constant, which is set by the outer function:

    λx.(λy.x)\lambda x.(\lambda y.x)

    So, for example, this returns a function that always returns 4:

    (λx.(λy.x))(4)λy.4(\lambda x.(\lambda y.x))(4)\equiv\lambda y.4

    It might be a little clearer if we write this with the other notation:

    (x(yx))(4)(y4) .\big(x\mapsto (y\mapsto x)\big)(4)\equiv(y\mapsto 4)~.

    Now, the problem appears if we re-use one of the inner argument variables outside:

    (λx.(λy.x))(y)  ?(\lambda x.(\lambda y.x))(y)\equiv \;?

    If we were just blindly following the substitution rule, we would get this:

    (λx.(λy.x))(y)λy.y(\lambda x.(\lambda y.x))(y)\equiv \lambda y.y

    which is of course not correct, because the result of a function application shouldn’t change just because we renamed some variables. To get the correct result, we need to avoid the naming clash by renaming the inner argument variable:

    (λx.(λz.x))(y)λz.y .(\lambda x.(\lambda z.x))(y)\equiv \lambda z.y~.

    ↩︎

 

 


This post is part of the series .