## Overview

This will be a very non-traditional introduction to the ideas behind category theory. It will essentially be a slice through model theory (presented in a more programmer-friendly manner) with an unusual organization. Of course the end result will be ***SPOILER ALERT*** it was category theory all along. A secret decoder ring will be provided at the end. This approach is inspired by the notion of an internal logic/language and by Vaughn Pratt’s paper The Yoneda Lemma Without Category Theory.

I want to be very clear, though. This is not meant to be an analogy or an example or a guide for intuition. This is category theory. It is simply presented in a different manner.

Dan Doel pointed me at some draft lecture notes by Mike Shulman that seem very much in the spirit of this blog post (albeit aimed at an audience familiar with category theory): Categorical Logic from a Categorical Point of View. A theory in my sense corresponds to a category presentation (Definition 1.7.1) as defined by these lecture notes. Its oft-mentioned Appendix A will also look very familiar.

## Theories

The first concept we’ll need is that of a theory. If you’ve ever implemented an interpreter for even the simplest language, then most of what follows modulo some terminological differences should be both familiar and very basic. If you are familiar with algebraic semantics, then that is exactly what is happening here only restricting to unary (but multi-sorted) algebraic theories.

For us, a theory, ccT, is a collection of sorts, a collection of (unary) function symbols1, and a collection of equations. Each function symbol has an input sort and an output sort which we’ll call the source and target of the function symbol. We’ll write ttf : A -> B to say that ttf is a function symbol with source A and target B. We define "src"(ttf) -= A and "tgt"(ttf) -= B. Sorts and function symbols are just symbols. Something is a sort if it is in the collection of sorts. Nothing else is required. A function symbol is not a function, it’s just a, possibly structured, name. Later, we’ll map those names to functions, but the same name may be mapped to different functions. In programming terms, a theory defines an interface or signature. We’ll write bb "sort"(ccT) for the collection of sorts of ccT and bb "fun"(ccT) for the collection of function symbols.

A (raw) term in a theory is either a variable labelled by a sort, bbx_A, or it’s a function symbol applied to a term, tt "f"(t), such that the sort of the term t matches the source of ttf. The sort or target of a term is the sort of the variable if it’s a variable or the target of the outermost function symbol. The source of a term is the sort of the innermost variable. In fact, all terms are just sequences of function symbol applications to a variable, so there will always be exactly one variable. All this is to say the expressions need to be “well-typed” in the obvious way. Given a theory with two function symbols ttf : A -> B and ttg : B -> A, bbx_A, bbx_B , tt "f"(bbx_A), and tt "f"(tt "g"(tt "f"(bbx_A))) are all examples of terms. tt "f"(bbx_B) and tt "f"(tt "f"(bbx_A)) are not terms because they are not “well-typed”, and ttf by itself is not a term simply because it doesn’t match the syntax. Using Haskell syntax, we can define a data type representing this syntax if we ignore the sorting:

data Term = Var Sort $Apply FunctionSymbol Term Using GADTs, we could capture the sorting constraints as well: data Term (s :: Sort) (t :: Sort) where Var :: Term t t Apply :: FunctionSymbol x t -> Term s x -> Term s t An important operation on terms is substitution. Given a term t_1 with source A and a term t_2 with target A we define the substitution of t_2 into t_1, written t_1[bbx_A$-> t_2], as:

If t_1 = bbx_A then bbx_A[bbx_A $-> t_2] -= t_2. If t_1 = tt "f"(t) then tt "f"(t)[bbx_A$-> t_2] -= tt "f"(t[bbx_A $-> t_2]). Using the theory from before, we have: tt "f"(bbx_A)[bbx_A$-> tt "g"(bbx_B)] = tt "f"(tt "g"(bbx_B))

As a shorthand, for arbitrary terms t_1 and t_2, t_1(t_2) will mean t_1[bbx_("src"(t_1)) $-> t_2]. Finally, equations2. An equation is a pair of terms with equal source and target, for example, (: tt "f"(tt "g"(bbx_B)), bbx_B :). The idea is that we want to identify these two terms. To do this we quotient the set of terms by the congruence generated by these pairs, i.e. by the reflexive-, symmetric-, transitive-closure of the relation generated by the equations which further satisfies “if s_1 ~~ t_1 and s_2 ~~ t_2 then s_1(s_2) ~~ t_1(t_2)”. From now on, by “terms” I’ll mean this quotient with “raw terms” referring to the unquotiented version. This means that when we say “tt "f"(tt "g"(bbx_B)) = bbx_B”, we really mean the two terms are congruent with respect to the congruence generated by the equations. We’ll write ccT(A, B) for the collection of terms, in this sense, with source A and target B. To make things look a little bit more normal, I’ll write s ~~ t as a synonym for (: s, t :) when the intent is that the pair represents a given equation. Expanding the theory from before, we get the theory of isomorphisms, ccT_{:~=:}, consisting of two sorts, A and B, two function symbols, ttf and ttg, and two equations tt "f"(tt "g"(bbx_B)) ~~ bbx_B and tt "g"(tt "f"(bbx_A)) ~~ bbx_A. The equations lead to equalities like tt "f"(tt "g"(tt "f"(bbx_A))) = tt "f"(bbx_A). In fact, it doesn’t take much work to show that this theory only has four distinct terms: bbx_A, bbx_B, tt "f"(bbx_A), and tt "g"(bbx_B). In traditional model theory or universal algebra we tend to focus on multi-ary operations, i.e. function symbols that can take multiple inputs. By restricting ourselves to only unary function symbols, we expose a duality. For every theory ccT, we have the opposite theory, ccT^(op) defined by using the same sorts and function symbols but swapping the source and target of the function symbols which also requires rewriting the terms in the equations. The rewriting on terms is the obvious thing, e.g. if ttf : A -> B, ttg : B -> C, and tth : C -> D, then the term in ccT, tt "h"(tt "g"(tt "f"(bbx_A))) would become the term tt "f"(tt "g"(tt "h"(bbx_D))) in ccT^(op). From this it should be clear that (ccT^(op))^(op) = ccT. ## Product Theories Given two theories ccT_1 and ccT_2 we can form a new theory ccT_1 xx ccT_2 called the product theory of ccT_1 and ccT_2. The sorts of this theory are pairs of sorts from ccT_1 and ccT_2. The collection of function symbols is the disjoint union bb "fun"(ccT_1) xx bb "sort"(ccT_2) + bb "sort"(ccT_1) xx bb "fun"(ccT_2). A disjoint union is like Haskell’s Either type. Here we’ll write tt "inl" and tt "inr" for the left and right injections respectively. tt "inl" takes a function symbol from ccT_1 and a sort from ccT_2 and produces a function symbol of ccT_1 xx ccT_2 and similarly for tt "inr". If tt "f" : A -> B in ccT_1 and C is a sort of ccT_2, then tt "inl"(f, C) : (A, C) -> (B, C) and similarly for tt "inr". The collection of equations for ccT_1 xx ccT_2 consists of the following: • for every equation, l ~~ r of ccT_1 and every sort, C, of ccT_2 we produce an equation l’ ~~ r’ by replacing each function symbol ttf in l and r with tt "inl"(tt "f", C) • similarly for equations of ccT_2 • for every pair of function symbols ttf : A -> B from ccT_1 and ttg : C -> D from ccT_2, we produce the equation tt "inl"(tt "f", D)(tt "inr"(A, tt "g")(bbx_{:(A, C")":})) ~~ tt "inr"(B, tt "g")("inl"(tt "f", C)(bbx_{:(A, C")":})) The above is probably unreadable. If you work through it, you can show that every term of ccT_1 xx ccT_2 is equivalent to a pair of terms (t_1, t_2) where t_1 is a term in ccT_1 and t_2 is a term in ccT_2. Using this equivalence, the first bullet is seen to be saying that if l = r in ccT_1 and C is a sort in ccT_2 then (l, bbx_C) = (r, bbx_C) in ccT_1 xx ccT_2. The second is similar. The third then states (t_1, bbx_C)((bbx_A, t_2)(bbx_{:"(A, C)":})) = (t_1, t_2)(bbx_{:"(A, C)":}) = (bbx_A, t_2)((t_1, bbx_C)(bbx_{:"(A, C)":})). To establish the equivalence between terms of ccT_1 xx ccT_2 and pairs of terms from ccT_1 and ccT_2, we use the third bullet to move all the tt "inl"s outward at which point we’ll have a sequence of ccT_1 function symbols followed by a sequence of ccT_2 function symbols each corresponding to term. The above might seem a bit round about. An alternative approach would be to define the function symbols of ccT_1 xx ccT_2 to be all pairs of all the terms from ccT_1 and ccT_2. The problem with this approach is that it leads to an explosion in the number of function symbols and equations required. In particular, it easily produces an infinitude of function symbols and equations even when provided with theories that only have a finite number of sorts, function symbols, and equations. As a concrete and useful example, consider the theory ccT_bbbN consisting of a single sort, 0, a single function symbol, tts, and no equations. This theory has a term for each natural number, n, corresponding to n applications of tts. Now let’s articulate ccT_bbbN xx ccT_bbbN. It has one sort, (0, 0), two function symbols, tt "inl"(tt "s", 0) and tt "inr"(0, tt "s"), and it has one equation, tt "inl"(tt "s", 0)(tt "inr"(0, tt "s")(bbx_{:(0, 0")":})) ~~ tt "inr"(0, tt "s")(tt "inl"(tt "s", 0)(bbx_{:(0, 0")":})). Unsurprisingly, the terms of this theory correspond to pairs of natural numbers. If we had used the alternative definition, we’d have had an infinite number of function symbols and an infinite number of equations. Nevertheless, for clarity I will typically write a term of a product theory as a pair of terms. As a relatively easy exercise — easier than the above — you can formulate and define the disjoint sum of two theories ccT_1 + ccT_2. The idea is that every term of ccT_1 + ccT_2 corresponds to either a term of ccT_1 or a term of ccT_2. Don’t forget to define what happens to the equations. Related to these, we have the theory ccT_{:bb1:}, which consists of one sort and no function symbols or equations, and ccT_{:bb0:} which consists of no sorts and thus no possibility for function symbols or equations. ccT_{:bb1:} has exactly one term while ccT_{:bb0:} has no terms. ## Collages Sometimes we’d like to talk about function symbols whose source is in one theory and target is in another. As a simple example, that we’ll explore in more depth later, we may want function symbols whose sources are in a product theory. This would let us consider terms with multiple inputs. The natural way to achieve this is to simply make a new theory that contains sorts from both theories plus the new function symbols. A collage, ccK, from a theory ccT_1 to ccT_2, written ccK : ccT_1 ↛ ccT_2, is a theory whose collection of sorts is the disjoint union of the sorts of ccT_1 and ccT_2. The function symbols of ccK consist for each function symbol ttf : A -> B in ccT_1, a function symbol tt "inl"(ttf) : tt "inl"(A) -> tt "inl"(B), and similarly for function symbols from ccT_2. Equations from ccT_1 and ccT_2 are likewise taken and lifted appropriately, i.e. ttf is replaced with tt "inl"(ttf) or tt "inr"(ttf) as appropriate. Additional function symbols of the form k : tt "inl"(A) -> tt "inr"(Z) where A is a sort of ccT_1 and Z is a sort of ccT_2, and potentially additional equations involving these function symbols, may be given. (If no additional function symobls are given, then this is exactly the disjoint sum of ccT_1 and ccT_2.) These additional function symbols and equations are what differentiate two collages that have the same source and target theories. Note, there are no function symbols tt "inr"(Z) -> tt "inl"(A), i.e. where Z is in ccT_2 and A is in ccT_1. That is, there are no function symbols going the “other way”. To avoid clutter, I’ll typically assume that the sorts and function symbols of ccT_1 and ccT_2 are disjoint already, and dispense with the tt "inl"s and tt "inr"s. Summarizing, we have ccK(tt "inl"(A), "inl"(B)) ~= ccT_1(A, B), ccK(tt "inr"(Y), tt "inr"(Z)) ~= ccT_2(Y, Z), and ccK(tt "inr"(Z), tt "inl"(A)) = O/ for all A, B, Y, and Z. ccK(tt "inl"(A), tt "inr"(Z)) for any A and Z is arbitrary generated. To distinguish them, I’ll call the function symbols that go from one theory to another bridges. More generally, an arbitrary term that has it’s source in one theory and target in another will be described as a bridging term. Here’s a somewhat silly example. Consider ccK_+ : ccT_bbbN xx ccT_bbbN ↛ ccT_bbbN that has one bridge tt "add" : (0, 0) -> 0 with the equations tt "add"(tt "inl"(tts, 0)(bbx_("("0, 0")"))) ~~ tts(tt "add"(bbx_("("0, 0")"))) and tt "add"(tt "inr"(0, tts)(bbx_("("0, 0")"))) ~~ tts(tt "add"(bbx_("("0, 0")"))). More usefully, if a bit degenerately, every theory induces a collage in the following way. Given a theory ccT, we can build the collage ccK_ccT : ccT ↛ ccT where the bridges consist of the following. For each sort, A, of ccT, we have the following bridge: tt "id"_A : tt "inl"(A) -> tt "inr"(A). Then, for every function symbol, ttf : A -> B in ccT, we have the following equation: tt "inl"(tt "f")(tt "id"_A(bbx_(tt "inl"(A)))) ~~ tt "id"_B(tt "inr"(tt "f")(bbx_(tt "inl"(A)))). We have ccK_ccT(tt "inl"(A), tt "inr"(B)) ~= ccT(A, B). You can think of a bridging term in a collage as a sequence of function symbols partitioned into two parts by a bridge. Naturally, we might consider partitioning into more than two parts by having more than one bridge. It’s easy to generalize the definition of collage to combine an arbitrary number of theories, but I’ll take a different, probably worse, route. Given collages ccK_1 : ccT_1 ↛ ccT_2 and ccK_2 : ccT_2 ↛ ccT_3, we can make the collage ccK_2 @ ccK_1 : ccT_1 ↛ ccT_3 by defining its bridges to be triples of a bridge of ccK_1, k_1 : A_1 -> A_2, a term, t : A_2 -> B_2 of ccT_2, and a bridge of ccK_2, k_2 : B_2 -> B_3 which altogether will be a bridge of ccK_2 @ ccK_1 going from A_1 -> B_3. These triples essentially represent a term like k_2(t(k_1(bbx_(A_1)))). With this intuition we can formulate the equations. For each equation t'(k_1(t_1)) ~~ s'(k'_1(s_1)) where k_1 and k'_1 are bridges of ccK_1, we have for every bridge k_2 of ccK_2 and term t of the appropriate sorts (k_2, t(t'(bbx)), k_1)(t_1) ~~ (k_2, t(s'(bbx)), k'_1)(s_1) and similarly for equations involving the bridges of ccK_2. This composition is associative… almost. Furthermore, the collages generated by theories, ccK_ccT, behave like identities to this composition… almost. It turns out these statements are true, but only up to isomorphism of theories. That is, (ccK_3 @ ccK_2) @ ccK_1 ~= ccK_3 @ (ccK_2 @ ccK_1) but is not equal. To talk about isomorphism of theories we need the notion of… ## Interpretations An interpretation of a theory gives meaning to the syntax of a theory. There are two nearly identical notions of interpretation for us: interpretation (into sets) and interpretation into a theory. I’ll define them in parallel. An interpretation (into a theory), ccI, is a mapping, written ⟦-⟧^ccI though the superscript will often be omitted, which maps sorts to sets (sorts) and function symbols to functions (terms). The mapping satisfies: ⟦"src"(f)⟧ = "src"(⟦f⟧) and ⟦"tgt"(f)⟧ = "tgt"(⟦f⟧) where "src" and "tgt" on the right are the domain and codomain operations for an interpretation. We extend the mapping to a mapping on terms via: • ⟦bbx_A⟧ = x$-> x, i.e. the identity function, or, for interpretation into a theory, ⟦bbx_A⟧ = bbx_{:⟦A⟧:}
• ⟦tt "f"(t)⟧ = ⟦tt "f"⟧ @ ⟦t⟧ or, for interpretation into a theory, ⟦tt "f"(t)⟧ = ⟦tt "f"⟧(⟦t⟧)

and we require that for any equation of the theory, l ~~ r, ⟦l⟧ = ⟦r⟧. (Technically, this is implicitly required for the extension of the mapping to terms to be well-defined, but it’s clearer to state it explicitly.) I’ll write ccI : ccT -> bb "Set" when ccI is an interpretation of ccT into sets, and ccI’ : ccT_1 -> ccT_2 when ccI’ is an interpretation of ccT_1 into ccT_2.

An interpretation of the theory of isomorphisms produces a bijection between two specified sets. Spelling out a simple example where bbbB is the set of booleans:

• ⟦A⟧ -= bbbB
• ⟦B⟧ -= bbbB
• ⟦tt "f"⟧ -= x $-> not x • ⟦tt "g"⟧ -= x$-> not x

plus the proof not not x = x.

As another simple example, we can interpret the theory of isomorphisms into itself slightly non-trivially.

• ⟦A⟧ -= B
• ⟦B⟧ -= A
• ⟦tt "f"⟧ -= tt "g"(bbx_B)
• ⟦tt "g"⟧ -= tt "f"(bbx_A)

As an (easy) exercise, you should define pi_1 : ccT_1 xx ccT_2 -> ccT_1 and similarly pi_2. If you defined ccT_1 + ccT_2 before, you should define iota_1 : ccT_1 -> ccT_1 + ccT_2 and similarly for iota_2. As another easy exercise, show that an interpretation of ccT_{:~=:} is a bijection. In Haskell, an interpretation of ccT_bbbN would effectively be foldNat. Something very interesting happens when you consider what an interpretation of the collage generated by a theory, ccK_ccT, is. Spell it out. In a different vein, you can show that a collage ccK : ccT_1 ↛ ccT_2 and an interpretation ccT_1^(op) xx ccT_2 -> bb "Set" are essentially the same thing in the sense that each gives rise to the other.

Two theories are isomorphic if there exists interpretations ccI_1 : ccT_1 -> ccT_2 and ccI_2 : ccT_2 -> ccT_1 such that ⟦⟦A⟧^(ccI_1)⟧^(ccI_2) = A and vice versa, and similarly for function symbols. In other words, each is interpretable in the other, and if you go from one interpretation and then back, you end up where you started. Yet another way to say this is that there is a one-to-one correspondence between sorts and terms of each theory, and this correspondence respects substitution.

As a crucially important example, the set of terms, ccT(A, B), can be extended to an interpretation. In particular, for each sort A, ccT(A, -) : ccT -> bb "Set". It’s action on function symbols is the following:

⟦tt "f"⟧^(ccT(A, -)) -= t $-> tt "f"(t) We have, dually, ccT(-, A) : ccT^(op) -> bb "Set" with the following action: ⟦tt "f"⟧^(ccT(-, A)) -= t$-> t(tt "f"(bbx_B))

We can abstract from both parameters making ccT(-, =) : ccT^(op) xx ccT -> bb "Set" which, by an early exercise, can be shown to correspond with the collage ccK_ccT.

Via an abuse of notation, I’ll identify ccT^(op)(A, -) with ccT(-, A), though technically we only have an isomorphism between the interpretations, and to talk about isomorphisms between interpretations we need the notion of…

## Homomorphisms

The theories we’ve presented are (multi-sorted) universal algebra theories. Universal algebra allows us to specify a general notion of “homomorphism” that generalizes monoid homomorphism or group homomorphism or ring homomorphism or lattice homomorphism.

In universal algebra, the algebraic theory of groups consists of a single sort, a nullary operation, 1, a binary operation, *, a unary operation, tt "inv", and some equations which are unimportant for us. Operations correspond to our function symbols except that they’re are not restricted to being unary. A particular group is a particular interpretation of the algebraic theory of groups, i.e. it is a set and three functions into the set. A group homomorphism then is a function between those two groups, i.e. between the two interpretations, that preserves the operations. In a traditional presentation this would look like the following:

Say alpha : G -> K is a group homomorphism from the group G to the group K and g, h in G then:

• alpha(1_G) = 1_K
• alpha(g *_G h) = alpha(g) *_K alpha(h)
• alpha(tt "inv"_G(g)) = tt "inv"_K(alpha(g))

Using something more akin to our notation, it would look like:

• alpha(⟦1⟧^G) = ⟦1⟧^K
• alpha(⟦*⟧^G(g,h)) = ⟦*⟧^K(alpha(g), alpha(h))
• alpha(⟦tt "inv"⟧^G(g)) = ⟦tt "inv"⟧^K(alpha(g))

The tt "inv" case is the most relevant for us as it is unary. However, for us, a function symbol ttf may have a different source and target and so we made need a different function on each side of the equation. E.g. for ttf : A -> B, alpha : ccI_1 -> ccI_2, and a in ⟦A⟧^(ccI_1) we’d have:

alpha_B(⟦tt "f"⟧^(ccI_1)(a)) = ⟦tt "f"⟧^(ccI_2)(alpha_A(a))

So a homomorphism alpha : ccI_1 -> ccI_2 : ccT -> bb "Set" is a family of functions, one for each sort of ccT, that satisfies the above equation for every function symbol of ccT. We call the individual functions making up alpha components of alpha, and we have alpha_A : ⟦A⟧^(ccI_1) -> ⟦A⟧^(ccI_2). The definition for an interpretation into a theory, ccT_2, is identical except the components of alpha are terms of ccT_2 and a can be replaced with bbx_(⟦A⟧^(ccI_1)). Two interpretations are isomorphic if we have homomorphism alpha : ccI_1 -> ccI_2 such that each component is a bijection. This is the same as requiring a homomorphism beta : ccI_2 -> ccI_1 such that for each A, alpha_A(beta_A(x)) = x and beta_A(alpha_A(x)) = x. A similar statement can be made for interpretations into theories, just replace x with bbx_(⟦A⟧).

Another way to look at homomorphisms is via collages. A homomorphism alpha : ccI_1 -> ccI_2 : ccT -> bb "Set" gives rise to an interpretation of the collage ccK_ccT. The interpretation ccI_alpha : ccK_ccT -> bb "Set" is defined by:

• ⟦tt "inl"(A)⟧^(ccI_alpha) -= ⟦A⟧^(ccI_1)
• ⟦tt "inr"(A)⟧^(ccI_alpha) -= ⟦A⟧^(ccI_2)
• ⟦tt "inl"(ttf)⟧^(ccI_alpha) -= ⟦ttf⟧^(ccI_1)
• ⟦tt "inr"(ttf)⟧^(ccI_alpha) -= ⟦ttf⟧^(ccI_2)
• ⟦tt "id"_A⟧^(ccI_alpha) -= alpha_A

The homomorphism law guarantees that it satisfies the equation on tt "id". Conversely, given an interpretation of ccK_ccT, we have the homomorphism, ⟦tt "id"⟧ : ⟦tt "inl"(-)⟧ -> ⟦tt "inr"(-)⟧ : ccT -> bb "Set". and the equation on tt "id" is exactly the homomorphism law.

## Yoneda

Consider a homomorphism alpha : ccT(A, -) -> ccI. The alpha needs to satisfy for every sort B and C, every function symbol ttf : C -> D, and every term t : B -> C:

alpha_D(tt "f"(t)) = ⟦tt "f"⟧^ccI(alpha_C(t))

Looking at this equation, the possibility of viewing it as a recursive “definition” leaps out suggesting that the action of alpha is completely determined by it’s action on the variables. Something like this, for example:

alpha_D(tt "f"(tt "g"(tt "h"(bbx_A)))) = ⟦tt "f"⟧(alpha_C(tt "g"(tt "h"(bbx_A)))) = ⟦tt "f"⟧(⟦tt "g"⟧(alpha_B(tt "h"(bbx_A)))) = ⟦tt "f"⟧(⟦tt "g"⟧(⟦tt "h"⟧(alpha_A(bbx_A))))

We can easily establish that there’s a one-to-one correspondence between the set of homomorphisms ccT(A, -) -> ccI and the elements of the set ⟦A⟧^ccI. Given a homomorphism, alpha, we get an element of ⟦A⟧^ccI via alpha_A(bbx_A). Inversely, given an element a in ⟦A⟧^ccI, we can define a homomorphism a^** via:

• a_D^**(tt "f"(t)) -= ⟦tt "f"⟧^ccI(a_C^**(t))
• a_A^**(bbx_A) -= a

which clearly satisfies the condition on homomorphisms by definition. It’s easy to verify that (alpha_A(bbx_A))^** = alpha and immediately true that a^**(bbx_A) = a establishing the bijection.

We can state something stronger. Given any homomorphism alpha : ccT(A, -) -> ccI and any function symbol ttg : A -> X, we can make a new homomorphism alpha * ttg : ccT(X, -) -> ccI via the following definition:

(alpha * ttg)(t) = alpha(t(tt "g"(bbx_A)))

Verifying that this is a homomorphism is straightforward:

(alpha * ttg)(tt "f"(t)) = alpha(tt "f"(t(tt "g"(bbx_A)))) = ⟦tt "f"⟧(alpha(t(tt "g"(bbx_A)))) = ⟦tt "f"⟧((alpha * ttg)(t))

and like any homomorphism of this form, as we’ve just established, it is completely determined by it’s action on variables, namely (alpha * ttg)_A(bbx_A) = alpha_X(tt "g"(bbx_A)) = ⟦tt "g"⟧(alpha_A(bbx_A)). In particular, if alpha = a^**, then we have a^** * ttg = (⟦tt "g"⟧(a))^**. Together these facts establish that we have an interpretation ccY : ccT -> bb "Set" such that ⟦A⟧^ccY -= (ccT(A, -) -> ccI), the set of homomorphisms, and ⟦tt "g"⟧^ccY(alpha) -= alpha * tt "g". The work we did before established that we have homomorphisms (-)(bbx) : ccY -> ccI and (-)^** : ccI -> ccY that are inverses. This is true for all theories and all interpretations as at no point did we use any particular facts about them. This statement is the (dual form of the) Yoneda lemma. To get the usual form simply replace ccT with ccT^(op). A particularly important and useful case (so useful it’s usually used tacitly) occurs when we choose ccI = ccT(B,-), we get (ccT(A, -) -> ccT(B, -)) ~= ccT(B, A) or, choosing ccT^(op) everywhere, (ccT(-, A) -> ccT(-, B)) ~= ccT(A, B) which states that a term from A to B is equivalent to a homomorphism from ccT(-, A) to ccT(-, B).

There is another result, dual in a different way, called the co-Yoneda lemma. It turns out it is a corollary of the fact that for a collage ccK : ccT_1 ↛ ccT_2, ccK_(ccT_2) @ ccK ~= ccK and the dual is just the composition the other way. To get (closer to) the precise result, we need to be able to turn an interpretation into a collage. Given an interpretation, ccI : ccT -> bb "Set", we can define a collage ccK_ccI : ccT_bb1 ↛ ccT whose bridges from 1 -> A are the elements of ⟦A⟧^ccI. Given this, the co-Yoneda lemma is the special case, ccK_ccT @ ccK_ccI ~= ccK_ccI.

Note, that the Yoneda and co-Yoneda lemmas only apply to interpretations into sets as ccY involves the set of homomorphisms.

## Representability

The Yoneda lemma suggests that the interpretations ccT(A, -) and ccT(-, A) are particularly important and this will be borne out as we continue.

We call an interpretation, ccI : ccT^(op) -> bb "Set" representable if ccI ~= ccT(-, X) for some sort X. We then say that X represents ccI. What this states is that every term of sort X corresponds to an element in one of the sets that make up ccI, and these transform appropriately. There’s clearly a particularly important element, namely the image of bbx_X which corresponds to an element in ⟦X⟧^ccI. This element is called the universal element. The dual concept is, for ccI : ccT -> bb "Set", ccI is co-representable if ccI ~= ccT(X, -). We will also say X represents ccI in this case as it actually does when we view ccI as an interpretation of (ccT^(op))^(op).

As a rather liberating exercise, you should establish the following result called parameterized representability. Assume we have theories ccT_1 and ccT_2, and a family of sorts of ccT_2, X, and a family of interpretations of ccT_2^(op), ccI, both indexed by sorts of ccT_1, such that for each A in bb "sort"(ccT_1), X_A represents ccI_A, i.e. ccI_A ~= ccT_2(-, X_A). Given all this, then there is a unique interpretation ccX : ccT_1 -> ccT_2 and ccI : ccT_1 xx ccT_2^(op) -> bb "Set" where ⟦A⟧^(ccX) -= X_A and "⟦("A, B")⟧"^ccI -= ⟦B⟧^(ccI_A) such that ccI ~= ccT_2(=,⟦-⟧^ccX). To be a bit more clear, the right hand side means (A, B) $-> ccT_2(B, ⟦A⟧^ccX). Simply by choosing ccT_1 to be a product of multiple theories, we can generalize this result to an arbitrary number of parameters. What makes this result liberating is that we just don’t need to worry about the parameters, they will automatically transform homomorphically. As a technical warning though, since two interpretations may have the same action on sorts but a different action on function symbols, if the family X_A was derived from an interpretation ccJ, i.e. X_A -= ⟦A⟧^ccJ, it may not be the case that ccX = ccJ. Let’s look at some examples. As a not-so-special case of representability, we can consider ccI -= ccK(tt "inl"(-), tt "inr"(Z)) where ccK : ccT_1 ↛ ccT_2. Saying that A represents ccI in this case is saying that bridging terms of sort tt "inr"(Z), i.e. sort Z in ccT_2, in ccK, correspond to terms of sort A in ccT_1. We’ll call the universal element of this representation the universal bridge (though technically it may be a bridging term, not a bridge). Let’s write varepsilon for this universal bridge. What representability states in this case is given any bridging term k of sort Z, there exists a unique term $~ k ~$ of sort A such that k = varepsilon($~ k ~$). If we have an interpretation ccX : ccT_2 -> ccT_1 such that ⟦Z⟧^ccX represents ccK(tt "inl"(-), tt "inr"(Z)) for each sort Z of ccT_2 we say we have a right representation of ccK. Note, that the universal bridges become a family varepsilon_Z : ⟦Z⟧^ccX -> Z. Similarly, if ccK(tt "inl"(A), tt "inr"(-)) is co-representable for each A, we say we have a left representation of ccK. The co-universal bridge is then a bridging term eta_A : A -> ⟦A⟧ such that for any bridging term k with source A, there exists a unique term $__ k __$ in ccT_2 such that k =$__ k __$(eta_A). For reference, we’ll call these equations universal properties of the left/right representation. Parameterized representability implies that a left/right representation is essentially unique. Define ccI_bb1 via ⟦A⟧^(ccI_bb1) -= bb1 where bb1 is some one element set. ⟦ttf⟧^(ccI_bb1) is the identity function for all function symbols ttf. We’ll say a theory ccT has a unit sort or has a terminal sort if there is a sort that we’ll also call bb1 that represents ccI_bb1. Spelling out what that means, we first note that there is nothing notable about the universal element as it’s the only element. However, writing the homomorphism ! : ccI_bb1 -> ccT(-, bb1) and noting that since there’s only one element of ⟦A⟧^(ccI_bb1) we can, with a slight abuse of notation, also write the term ! picks out as ! which gives the equation: !_B(tt "g"(t)) = !_A(t) for any function symbol ttg : A -> B and term, t, of sort A, note !_A : A -> bb1. This equation states what the isomorphism also fairly directly states: there is exactly one term of sort bb1 from any sort A, namely !_A(bbx_A). The dual notion is called a void sort or an initial sort and will usually be notated bb0, the analog of ! will be written as 0. The resulting equation is: tt "f"(0_A) = 0_B for any function symbol ttf : A -> B, note 0_A : bb0 -> A. For the next example, I’ll leverage collages. Consider the collage ccK_2 : ccT ↛ ccT xx ccT whose bridges from A -> (B, C) consist of pairs of terms t_1 : A -> B and t_2 : A -> C. ccT has pairs if ccK_2 has a right representation. We’ll write (B, C)$-> B xx C for the representing interpretation’s action on sorts. We’ll write the universal bridge as (tt "fst"(bbx_(B xx C)), tt "snd"(bbx_(B xx C))). The universal property then looks like (tt "fst"(bbx_(B xx C)), tt "snd"(bbx_(B xx C)))((: t_1, t_2 :)) = (t_1, t_2) where (: t_1, t_2 :) : A -> B xx C is the unique term induced by the bridge (t_1, t_2). The universal property implies the following equations:

• (: tt "fst"(bbx_(B xx C)), tt "snd"(bbx_(B xx C))) = bbx_(B xx C)
• tt "fst"((: t_1, t_2 :)) = t_1
• tt "snd"((: t_1, t_2 :)) = t_2

One aspect of note is regardless of whether ccK_2 has a right representation, i.e. regardless of whether ccT has pairs, it always has a left representation. The co-universal bridge is (bbx_A, bbx_A) and the unique term $__(t_1, t_2)__$ is tt "inl"(t_1, bbx_A)(tt "inr"(bbx_A, t_2)(bbx_("("A,A")"))).

Define an interpretation Delta : ccT -> ccT xx ccT so that ⟦A⟧^Delta -= (A,A) and similarly for function symbols. Delta left represents ccK_2. If the interpretation (B,C) $-> B xx C right represents ccK_2, then we say we have an adjunction between Delta and (- xx =), written Delta ⊣ (- xx =), and that Delta is left adjoint to (- xx =), and conversely (- xx =) is right adjoint Delta. More generally, whenever we have the situation ccT_1(⟦-⟧^(ccI_1), =) ~= ccT_2(-, ⟦=⟧^(ccI_2)) we say that ccI_1 : ccT_2 -> ccT_1 is left adjoint to ccI_2 : ccT_1 -> ccT_2 or conversely that ccI_2 is right adjoint to ccI_1. We call this arrangement an adjunction and write ccI_1 ⊣ ccI_2. Note that we will always have this situation if ccI_1 left represents and ccI_2 right represents the same collage. As we noted above, parameterized representability actually determines one adjoint given (its action on sorts and) the other adjoint. With this we can show that adjoints are unique up to isomorphism, that is, given two left adjoints to an interpretation, they must be isomorphic. Similarly for right adjoints. This means that stating something is a left or right adjoint to some other known interpretation essentially completely characterizes it. One issue with adjunctions is that they tend to be wholesale. Let’s say the pair sort A xx B existed but no other pair sorts existed, then the (no longer parameterized) representability approach would work just fine, but the adjunction would no longer exist. Here’s a few of exercises using this. First, a moderately challenging one (until you catch the pattern): spell out the details to the left adjoint to Delta. We say a theory has sums and write those sums as A + B if (- + =) ⊣ Delta. Recast void and unit sorts using adjunctions and/or left/right representations. As a terminological note, we say a theory has finite products if it has unit sorts and pairs. Similarly, a theory has finite sums or has finite coproducts if it has void sorts and sums. An even more challenging exercise is the following: a theory has exponentials if it has pairs and for every sort A, (A xx -) ⊣ (A => -) (note, parameterized representability applies to A). Spell out the equations characterizing A => B. ## Finite Product Theories Finite products start to lift us off the ground. So far the theories we’ve been working with have been extremely basic: a language with only unary functions, all terms being just a sequence of applications of function symbols. It shouldn’t be underestimated though. It’s more than enough to do monoid and group theory. A good amount of graph theory can be done with just this. And obviously we were able to establish several general results assuming only this structure. Nevertheless, while we can talk about specific groups, say, we can’t talk about the theory of groups. Finite products change this. A theory with finite products allows us to talk about multi-ary function symbols and terms by considering unary function symbols from products. This allows us to do all of universal algebra. For example, the theory of groups, ccT_(bb "Grp"), consists of a sort S and all it’s products which we’ll abbreviate as S^n with S^0 -= bb1 and S^(n+1) -= S xx S^n. It has three function symbols tte : bb1 -> S, ttm : S^2 -> S, and tti : S -> S plus the ones that having finite products requires. In fact, instead of just heaping an infinite number of sorts and function symbols into our theory — and we haven’t even gotten to equations — let’s define a compact set of data from which we can generate all this data. A signature, Sigma, consists of a collection of sorts, sigma, a collection of multi-ary function symbols, and a collection of equations. Equations still remain pairs of terms, but we need to now extend our definition of terms for this context. A term (in a signature) is either a variable, bbx_i^[A_0,A_1,...,A_n] where A_i are sorts and 0 <= i <= n, the operators tt "fst" or tt "snd" applied to a term, the unit term written (::)^A with sort A, a pair of terms written (: t_1, t_2 :), or the (arity correct) application of a multi-ary function symbol to a series of terms, e.g. tt "f"(t_1, t_2, t_3). As a Haskell data declaration, it might look like: data SigTerm = SigVar [Sort] Int$ Fst SigTerm
$Snd SigTerm$ Unit Sort
$Pair SigTerm SigTerm$ SigApply FunctionSymbol [SigTerm]

At this point, sorting (i.e. typing) the terms is no longer trivial, though it is still pretty straightforward. Sorts are either bb1, or A xx B for sorts A and B, or a sort A in sigma. The source of function symbols or terms are lists of sorts.

• bbx_i^[A_0, A_1, ..., A_n] : [A_0, A_1, ..., A_n] -> A_i
• (::)^A : [A] -> bb1
• (: t_1, t_2 :) : bar S -> T_1 xx T_2 where t_i : bar S -> T_i
• tt "fst"(t) : bar S -> T_1 where t : bar S -> T_1 xx T_2
• tt "snd"(t) : bar S -> T_2 where t : bar S -> T_1 xx T_2
• tt "f"(t_1, ..., t_n) : bar S -> T where t_i : bar S -> T_i and ttf : [T_1,...,T_n] -> T

The point of a signature was to represent a theory so we can compile a term of a signature into a term of a theory with finite products. The theory generated from a signature Sigma has the same sorts as Sigma. The equations will be equations of Sigma, with the terms compiled as will be described momentarily, plus for every pair of sorts the equations that describe pairs and the equations for !. Finally, we need to describe how to take a term of the signature and make a function symbol of the theory, but before we do that we need to explain how to convert those sources of the terms which are lists. That’s just a conversion to right nested pairs, [A_0,...,A_n] $-> A_0 xx (... xx (A_n xx bb1) ... ). The compilation of a term t, which we’ll write as ccC[t], is defined as follows: • ccC[bbx_i^[A_0, A_1, ..., A_n]] = tt "snd"^i(tt "fst"(bbx_(A_i xx(...)))) where tt "snd"^i means the i-fold application of tt "snd" • ccC[(::)^A] = !_A • ccC[(: t_1, t_2 :)] = (: ccC[t_1], ccC[t_2] :) • ccC[tt "fst"(t)] = tt "fst"(ccC[t]) • ccC[tt "snd"(t)] = tt "snd"(ccC[t]) • ccC[tt "f"(t_1, ..., t_n)] = tt "f"((: ccC[t_1], (: ... , (: ccC[t_n], ! :) ... :) :)) As you may have noticed, the generated theory will have an infinite number of sorts, an infinite number of function symbols, and an infinite number of equations no matter what the signature is — even an empty one! Having an infinite number of things isn’t a problem as long as we can algorithmically describe them and this is what the signature provides. Of course, if you’re a (typical) mathematician you nominally don’t care about an algorithmic description. Besides being compact, signatures present a nicer term language. The theories are like a core or assembly language. We could define a slightly nicer variation where we keep a context and manage named variables leading to terms-in-context like: x:A, y:B$-- tt "f"(x, x, y)

which is

tt "f"(bbx_0^[A,B], bbx_0^[A,B], bbx_1^[A,B])

for our current term language for signatures. Of course, compilation will be (slightly) trickier for the nicer language.

The benefit of having compiled the signature to a theory, in addition to being able to reuse the results we’ve established for theories, is we only need to define operations on the theory, which is simpler since we only need to deal with pairs and unary function symbols. One example of this is we’d like to extend our notion of interpretation to one that respects the structure of the signature, and we can do that by defining an interpretation of theories that respects finite products.

A finite product preserving interpretation (into a finite product theory), ccI, is an interpretation (into a finite product theory) that additionally satisfies:

• ⟦bb1⟧^ccI ~~ bb1
• ⟦A xx B⟧^ccI ~~ ⟦A⟧^ccI xx ⟦B⟧^ccI
• ⟦!_A⟧^ccI = !_(⟦A⟧^ccI)
• ⟦tt "fst"(t)⟧^ccI = tt "fst"(⟦t⟧^ccI)
• ⟦tt "snd"(t)⟧^ccI = tt "snd"(⟦t⟧^ccI)
• ⟦(: t_1, t_2 :)⟧^ccI = (: ⟦t_1⟧^ccI, ⟦t_2⟧^ccI :)

where, for bb "Set", bb1 -= {{}}, xx is the cartesian product, tt "fst" and tt "snd" are the projections, !_A -= x $-> \{\}, and (: f, g :) -= x$-> (: f(x), g(x) :).

With signatures, we can return to our theory, now signature, of groups. Sigma_bb "Grp" has a single sort S, three function symbols tte : [bb1] -> S, tti : [S] -> S, and ttm : [S, S] -> S, with the following equations:

• tt "m"(tt "e"((::)^S), bbx_0^S) ~~ bbx_0^S
• tt "m"(tt "i"(bbx_0^S), bbx_0^S) ~~ tt "e"((::)^S)
• tt "m"(tt "m"(bbx_0^[S,S,S], bbx_1^[S,S,S]), bbx_2^[S,S,S]) ~~ tt "m"(bbx_0^[S,S,S], tt "m"(bbx_1^[S,S,S], bbx_2^[S,S,S]))

or using the nicer syntax:

• x:S $-- tt "m"(tt "e"(), x) ~~ x • x:S$-- tt "m"(tt "i"(x), x) ~~ tt "e"()
• x:S, y:S, z:S $-- tt "m"(tt "m"(x, y), z) ~~ tt "m"(x, tt "m"(y, z)) An actual group is then just a finite product preserving interpretation of (the theory generated by) this signature. All of universal algebra and much of abstract algebra can be formulated this way. ## The Simply Typed Lambda Calculus and Beyond We can consider additionally assuming that our theory has exponentials. I left articulating exactly what that means as an exercise, but the upshot is we have the following two operations: For any term t : A xx B -> C, we have the term tt "curry"(t) : A -> C^B. We also have the homomorphism tt "app"_(AB) : B^A xx A -> B. They satisfy: • tt "curry"(tt "app"(bbx_(B^A xx A))) = bbx_(B^A) • tt "app"((: tt "curry"(t_1), t_2 :)) = t_1((: bbx_A, t_2 :)) where t_1 : A xx B -> C and t_2 : A -> B. We can view these, together with the the product operations, as combinators, and it turns out we can compile the simply typed lambda calculus into the above theory. This is exactly what the Categorical Abstract Machine did. The “Caml” in “O’Caml” stands for “Categorical Abstract Machine Language”, though O’Caml no longer uses the CAM. Conversely, every term of the theory can be expressed as a simply typed lambda term. This means we can view the simply typed lambda calculus as just a different presentation of the theory. At this point, this presentation of category theory starts to connect to the mainstream categorical literature on universal algebra, internal languages, sketches, and internal logic. This page gives a synopsis of the relationship between type theory and category theory. For some reason, it is unusual to talk about the internal language of a plain category, but that is exactly what we’ve done here. I haven’t talked about finite limits or colimits beyond products and coproducts, nor have I talked about even the infinitary versions of products and coproducts, let alone arbitrary limits and colimits. These can be handled the same way as products and coproducts. Formulating a language like signatures or the simply typed lambda calculus is a bit more complicated, but not that hard. I may make a follow-up article covering this among other things. I also have a side project (don’t hold your breath), that implements the internal language of a category with finite limits. The result looks roughly like a simple version of an algebraic specification language like the OBJ family. The RING theory described in the Maude manual gives an idea of what it would look like. In fact, here’s an example of the current actual syntax I’m using.3 theory Categories type O type A given src : A -> O given tgt : A -> O given id : O -> A satisfying o:O$ src (id o) = o, tgt (id o) = o

given c : { f:A, g:A $src f = tgt g } -> A satisfying (f, g):{ f:A, g:A$ src f = tgt g }
$tgt (c (f, g)) = tgt f, src (c (f, g)) = src g satisfying "left unit" (o, f):{ o:O, f:A$ tgt f = o }
$c (id o, f) = f satisfying "right unit" (o, f):{ o:O, f:A$ src f = o }
$c (f, id o) = f satisfying "associativity" (f, g, h):{ f:A, g:A, h:A$ src f = tgt g, src g = tgt h }
| c (c (f, g), h) = c (f, c (g, h))
endtheory

It turns out this is a particularly interesting spot in the design space. The fact that the theory of theories with finite limits is itself a theory with finite limits has interesting consequences. It is still relatively weak though. For example, it’s not possible to describe the theory of fields in this language.

There are other directions one could go. For example, the internal logic of monoidal categories is (a fragment of) ordered linear logic. You can cross this bridge either way. You can look at different languages and consider what categorical structure is needed to support the features of the language, or you can add features to the category and see how that impacts the internal language. The relationship is similar to the source language and a core/intermediate language in a compiler, e.g. GHC Haskell and System Fc.

## Decoder

If you’ve looked at category theory at all, you can probably make most of the connections without me telling you. The table below outlines the mapping, but there are some subtleties. First, as a somewhat technical detail, my definition of a theory corresponds to a small category, i.e. a category which has a set of objects and a set of arrows. For more programmer types, you should think of “set” as Set in Agda, i.e. similar to the * kind in Haskell. Usually “category” means “locally small category” which may have a proper class of objects and between any two objects a set of arrows (though the union of all those sets may be a proper class). Again, for programmers, the distinction between “class” and “set” is basically the difference between Set and Set1 in Agda.4 To make my definition of theory closer to this, all that is necessary is instead of having a set of function symbols, have a family of sets indexed by pairs of objects. Here’s what a partial definition in Agda of the two scenarios would look like:

-- Small category (the definition I used)
record SmallCategory : Set1 where
field
objects : Set
arrows : Set
src : arrows -> objects
tgt : arrows -> objects
...

-- Locally small category
record LocallySmallCategory : Set2 where
field
objects : Set1
hom : objects -> objects -> Set
...

-- Different presentation of a small category
record SmallCategory' : Set1 where
field
objects : Set
hom : objects -> objects -> Set
...

The benefit of the notion of locally small category is that Set itself is a locally small category. The distinction I was making between interpretations into theories and interpretations into Set was due to the fact that Set wasn’t a theory. If I used a definition theory corresponding to a locally small category, I could have combined the notions of interpretation by making Set a theory. The notion of a small category, though, is still useful. Also, an interpretation into Set corresponds to the usual notion of a model or semantics, while interpretations into other theories was a less emphasized concept in traditional model theory and universal algebra.

A less technical and more significant difference is that my definition of a theory doesn’t correspond to a category, but rather to a presentation of a category, from which a category can be generated. The analog of arrows in a category is terms, not function symbols. This is a bit more natural route from the model theory/universal algebra/programming side. Similarly, having an explicit collection of equations, rather than just an equivalence relation on terms is part of the presentation of the category but not part of the category itself.

model theory category theory
sort object
term arrow
function symbol generating arrow
theory presentation of a (small) category
collage collage, cograph of a profunctor
bridge heteromorphism
signature presentation of a (small) category with finite products
interpretation into sets, aka models a functor into Set, a (co)presheaf
interpretation into a theory functor
homomorphism natural transformation
simply typed lambda calculus (with products) a cartesian closed category

## Conclusion

In some ways I’ve stopped just when things were about to get good. I may do a follow-up to elaborate on this good stuff. Some examples are: if I expand the definition so that Set becomes a “theory”, then interpretations also form such a “theory”, and these are often what we’re really interested in. The category of finite-product preserving interpretations of the theory of groups essentially is the category of groups. In fact, universal algebra is, in categorical terms, just the study of categories with finite products and finite-product preserving functors from them, particularly into Set. It’s easy to generalize this in many directions. It’s also easy to make very general definitions, like a general definition of a free algebraic structure. In general, we’re usually more interested in the interpretations of a theory than the theory itself.

While I often do advocate thinking in terms of internal languages of categories, I’m not sure that it is a preferable perspective for the very basics of category theory. Nevertheless, there are a few reasons for why I wrote this. First, this very syntactical approach is, I think, more accessible to someone coming from a programming background. From this view, a category is a very simple programming language. Adding structure to the category corresponds to adding features to this programming language. Interpretations are denotational semantics.

Another aspect about this presentation that is quite different is the use and emphasis on collages. Collages correspond to profunctors, a crucially important and enabling concept that is rarely covered in categorical introductions. The characterization of profunctors as collages in Vaughn Pratt’s paper (not using that name) was one of the things I enjoyed about that paper and part of what prompted me to start writing this. In earlier drafts of this article, I was unable to incorporate collages in a meaningful way as I was trying to start from profunctors. This approach just didn’t add value. Collages just looked like a bizarre curio and weren’t integrated into the narrative at all. For other reasons, though, I ended up revisiting the idea of a heteromorphism. My (fairly superficial) opinion is that once you have the notion of functors and natural transformations, adding the notion of heteromorphisms has a low power-to-weight ratio, though it does make some things a bit nicer. Nevertheless, in thinking of how best to fit them into this context, it was clear that collages provided the perfect mechanism (which isn’t a big surprise), and the result works rather nicely. When I realized a fact that can be cryptically but compactly represented as ccK_ccT ≃ bbbI xx ccT where bbbI is the interval category, i.e. two objects with a single arrow joining them, I realized that this is actually an interesting perspective. Since most of this article was written at that point, I wove collages into the narrative replacing some things. If, though, I had started with this perspective from the beginning I suspect I would have made a significantly different article, though the latter sections would likely be similar.

1. It’s actually better to organize this as a family of collections of function symbols indexed by pairs of sorts.↩︎

2. Instead of having equations that generate an equivalence relation on (raw) terms, we could simply require an equivalence relation on (raw) terms be directly provided.↩︎

3. Collaging is actually quite natural in this context. I already intend to support one theory importing another. A collage is just a theory that imports two others and then adds function symbols between them.↩︎

4. For programmers familiar with Agda, at least, if you haven’t made this connection, this might help you understand and appreciate what a “class” is versus a “set” and what “size issues” are, which is typically handled extremely vaguely in a lot of the literature.↩︎