This will be a very non-traditional introduction to the ideas behind category theory. It will essentially be a slice through model theory (presented in a more programmer-friendly manner) with an unusual organization. Of course the end result will be ***SPOILER ALERT*** it was category theory all along. A secret decoder ring will be provided at the end. This approach is inspired by the notion of an internal logic/language and by Vaughn Pratt’s paper The Yoneda Lemma Without Category Theory.
I want to be very clear, though. This is not meant to be an analogy or an example or a guide for intuition. This is category theory. It is simply presented in a different manner.
The first concept we’ll need is that of a theory. If you’ve ever implemented an interpreter for even the simplest language, than most of what follows modulo some terminological differences should be both familiar and very basic. If you are familiar with algebraic semantics, then that is exactly what is happening here only restricting to unary (but multi-sorted) algebraic theories.
For us, a theory, #ccT#, is a collection of sorts, a collection of (unary) function symbols^{1}, and a collection of equations. Each function symbol has an input sort and an output sort which we’ll call the source and target of the function symbol. We’ll write #ttf : A -> B# to say that #ttf# is a function symbol with source #A# and target #B#. We define #"src"(ttf) -= A#
and #"tgt"(ttf) -= B#
. Sorts and function symbols are just symbols. Something is a sort if it is in the collection of sorts. Nothing else is required. A function symbol is not a function, it’s just a, possibly structured, name. Later, we’ll map those names to functions, but the same name may be mapped to different functions. In programming terms, a theory defines an interface or signature. We’ll write #bb "sort"(ccT)#
for the collection of sorts of #ccT# and #bb "fun"(ccT)#
for the collection of function symbols.
A (raw) term in a theory is either a variable labelled by a sort, #bbx_A#, or it’s a function symbol applied to a term, #tt "f"(t)#
, such that the sort of the term #t# matches the source of #ttf#. The sort or target of a term is the sort of the variable if it’s a variable or the target of the outermost function symbol. The source of a term is the sort of the innermost variable. In fact, all terms are just sequences of function symbol applications to a variable, so there will always be exactly one variable. All this is to say the expressions need to be “well-typed” in the obvious way. Given a theory with two function symbols #ttf : A -> B# and #ttg : B -> A#, #bbx_A#
, #bbx_B#
, #tt "f"(bbx_A)#
, and #tt "f"(tt "g"(tt "f"(bbx_A)))#
are all examples of terms. #tt "f"(bbx_B)#
and #tt "f"(tt "f"(bbx_A))#
are not terms because they are not “well-typed”, and #ttf# by itself is not a term simply because it doesn’t match the syntax. Using Haskell syntax, we can define a data type representing this syntax if we ignore the sorting:
data Term = Var Sort | Apply FunctionSymbol Term
Using GADTs, we could capture the sorting constraints as well:
data Term (s :: Sort) (t :: Sort) where
Var :: Term t t
Apply :: FunctionSymbol x t -> Term s x -> Term s t
An important operation on terms is substitution. Given a term #t_1# with source #A# and a term #t_2# with target #A# we define the substitution of #t_2# into #t_1#, written #t_1[bbx_A |-> t_2]#, as:
If #t_1 = bbx_A# then #bbx_A[bbx_A |-> t_2] -= t_2#.
If
#t_1 = tt "f"(t)#
then#tt "f"(t)[bbx_A |-> t_2] -= tt "f"(t[bbx_A |-> t_2])#
.
Using the theory from before, we have:
#tt "f"(bbx_A)[bbx_A |-> tt "g"(bbx_B)] = tt "f"(tt "g"(bbx_B))#
As a shorthand, for arbitrary terms #t_1# and #t_2#, #t_1(t_2)# will mean #t_1[bbx_("src"(t_1)) |-> t_2]#
.
Finally, equations^{2}. An equation is a pair of terms with equal source and target, for example, #(: tt "f"(tt "g"(bbx_B)), bbx_B :)#
. The idea is that we want to identify these two terms. To do this we quotient the set of terms by the congruence generated by these pairs, i.e. by the reflexive-, symmetric-, transitive-closure of the relation generated by the equations which further satisfies “if #s_1 ~~ t_1# and #s_2 ~~ t_2# then #s_1(s_2) ~~ t_1(t_2)#
”. From now on, by “terms” I’ll mean this quotient with “raw terms” referring to the unquotiented version. This means that when we say “#tt "f"(tt "g"(bbx_B)) = bbx_B#
”, we really mean the two terms are congruent with respect to the congruence generated by the equations. We’ll write #ccT(A, B)# for the collection of terms, in this sense, with source #A# and target #B#. To make things look a little bit more normal, I’ll write #s ~~ t# as a synonym for #(: s, t :)# when the intent is that the pair represents a given equation.
Expanding the theory from before, we get the theory of isomorphisms, #ccT_{:~=:}#
, consisting of two sorts, #A# and #B#, two function symbols, #ttf# and #ttg#, and two equations #tt "f"(tt "g"(bbx_B)) ~~ bbx_B#
and #tt "g"(tt "f"(bbx_A)) ~~ bbx_A#
. The equations lead to equalities like #tt "f"(tt "g"(tt "f"(bbx_A))) = tt "f"(bbx_A)#
. In fact, it doesn’t take much work to show that this theory only has four distinct terms: #bbx_A#, #bbx_B#, #tt "f"(bbx_A)#
, and #tt "g"(bbx_B)#
.
In traditional model theory or universal algebra we tend to focus on multi-ary operations, i.e. function symbols that can take multiple inputs. By restricting ourselves to only unary function symbols, we expose a duality. For every theory #ccT#, we have the opposite theory, #ccT^(op)# defined by using the same sorts and function symbols but swapping the source and target of the function symbols which also requires rewriting the terms in the equations. The rewriting on terms is the obvious thing, e.g. if #ttf : A -> B#, #ttg : B -> C#, and #tth : C -> D#, then the term in #ccT#, #tt "h"(tt "g"(tt "f"(bbx_A)))#
would become the term #tt "f"(tt "g"(tt "h"(bbx_D)))#
in #ccT^(op)#. From this it should be clear that #(ccT^(op))^(op) = ccT#
.
Given two theories #ccT_1# and #ccT_2# we can form a new theory #ccT_1 xx ccT_2# called the product theory of #ccT_1# and #ccT_2#. The sorts of this theory are pairs of sorts from #ccT_1# and #ccT_2#. The collection of function symbols is the disjoint union #bb "fun"(ccT_1) xx bb "sort"(ccT_2) + bb "sort"(ccT_1) xx bb "fun"(ccT_2)#
. A disjoint union is like Haskell’s Either
type. Here we’ll write #tt "inl"#
and #tt "inr"#
for the left and right injections respectively. #tt "inl"#
takes a function symbol from #ccT_1# and a sort from #ccT_2# and produces a function symbol of #ccT_1 xx ccT_2# and similarly for #tt "inr"#
. If #tt "f" : A -> B#
in #ccT_1# and #C# is a sort of #ccT_2#, then #tt "inl"(f, C) : (A, C) -> (B, C)#
and similarly for #tt "inr"#
.
The collection of equations for #ccT_1 xx ccT_2# consists of the following:
#tt "inl"(tt "f", C)#
#tt "inl"(tt "f", D)(tt "inr"(A, tt "g")(bbx_{:(A, C")":})) ~~ tt "inr"(B, tt "g")("inl"(tt "f", C)(bbx_{:(A, C")":}))#
The above is probably unreadable. If you work through it, you can show that every term of #ccT_1 xx ccT_2# is equivalent to a pair of terms #(t_1, t_2)# where #t_1# is a term in #ccT_1# and #t_2# is a term in #ccT_2#. Using this equivalence, the first bullet is seen to be saying that if #l = r# in #ccT_1# and #C# is a sort in #ccT_2# then #(l, bbx_C) = (r, bbx_C)# in #ccT_1 xx ccT_2#. The second is similar. The third then states
#(t_1, bbx_C)((bbx_A, t_2)(bbx_{:"(A, C)":})) = (t_1, t_2)(bbx_{:"(A, C)":}) = (bbx_A, t_2)((t_1, bbx_C)(bbx_{:"(A, C)":}))#
.
To establish the equivalence between terms of #ccT_1 xx ccT_2# and pairs of terms from #ccT_1# and #ccT_2#, we use the third bullet to move all the #tt "inl"#
s outward at which point we’ll have a sequence of #ccT_1# function symbols followed by a sequence of #ccT_2# function symbols each corresponding to term.
The above might seem a bit round about. An alternative approach would be to define the function symbols of #ccT_1 xx ccT_2# to be all pairs of all the terms from #ccT_1# and #ccT_2#. The problem with this approach is that it leads to an explosion in the number of function symbols and equations required. In particular, it easily produces an infinitude of function symbols and equations even when provided with theories that only have a finite number of sorts, function symbols, and equations.
As a concrete and useful example, consider the theory #ccT_bbbN# consisting of a single sort, #0#, a single function symbol, #tts#, and no equations. This theory has a term for each natural number, #n#, corresponding to #n# applications of #tts#. Now let’s articulate #ccT_bbbN xx ccT_bbbN#. It has one sort, #(0, 0)#, two function symbols, #tt "inl"(tt "s", 0)#
and #tt "inr"(0, tt "s")#
, and it has one equation, #tt "inl"(tt "s", 0)(tt "inr"(0, tt "s")(bbx_{:(0, 0")":})) ~~ tt "inr"(0, tt "s")("inl"(tt "s", 0)(bbx_{:(0, 0")":}))#
. Unsurprisingly, the terms of this theory correspond to pairs of natural numbers. If we had used the alternative definition, we’d have had an infinite number of function symbols and an infinite number of equations.
Nevertheless, for clarity I will typically write a term of a product theory as a pair of terms.
As a relatively easy exercise — easier than the above — you can formulate and define the disjoint sum of two theories #ccT_1 + ccT_2#. The idea is that every term of #ccT_1 + ccT_2# corresponds to either a term of #ccT_1# or a term of #ccT_2#. Don’t forget to define what happens to the equations.
Related to these, we have the theory #ccT_{:bb1:}#, which consists of one sort and no function symbols or equations, and #ccT_{:bb0:}# which consists of no sorts and thus no possibility for function symbols or equations. #ccT_{:bb1:}# has exactly one term while #ccT_{:bb0:}# has no terms.
Sometimes we’d like to talk about function symbols whose source is in one theory and target is in another. As a simple example, that we’ll explore in more depth later, we may want function symbols whose sources are in a product theory. This would let us consider terms with multiple inputs.
The natural way to achieve this is to simply make a new theory that contains sorts from both theories plus the new function symbols. A collage, #ccK#, from a theory #ccT_1# to #ccT_2#, written #ccK : ccT_1 ↛ ccT_2#, is a theory whose collection of sorts is the disjoint union of the sorts of #ccT_1# and #ccT_2#. The function symbols of #ccK# consist for each function symbol #ttf : A -> B# in #ccT_1#, a function symbol #tt "inl"(ttf) : tt "inl"(A) -> tt "inl"(B)#
, and similarly for function symbols from #ccT_2#. Equations from #ccT_1# and #ccT_2# are likewise taken and lifted appropriately, i.e. #ttf# is replaced with #tt "inl"(ttf)#
or #tt "inr"(ttf)#
as appropriate. Additional function symbols of the form #k : tt "inl"(A) -> tt "inr"(Z)#
where #A# is a sort of #ccT_1# and #Z# is a sort of #ccT_2#, and potentially additional equations involving these function symbols, may be given. (If no additional function symobls are given, then this is exactly the disjoint sum of #ccT_1# and #ccT_2#.) These additional function symbols and equations are what differentiate two collages that have the same source and target theories. Note, there are no function symbols #tt "inr"(Z) -> tt "inl"(A)#
, i.e. where #Z# is in #ccT_2# and #A# is in #ccT_1#. That is, there are no function symbols going the “other way”. To avoid clutter, I’ll typically assume that the sorts and function symbols of #ccT_1# and #ccT_2# are disjoint already, and dispense with the #tt "inl"#
s and #tt "inr"#
s.
Summarizing, we have #ccK(tt "inl"(A), "inl"(B)) ~= ccT_1(A, B)#
, #ccK(tt "inr"(Y), tt "inr"(Z)) ~= ccT_2(Y, Z)#
, and #ccK(tt "inr"(Z), tt "inl"(A)) = O/#
for all #A#, #B#, #Y#, and #Z#. #ccK(tt "inl"(A), tt "inr"(Z))#
for any #A# and #Z# is arbitrary generated. To distinguish them, I’ll call the function symbols that go from one theory to another bridges. More generally, an arbitrary term that has it’s source in one theory and target in another will be described as a bridging term.
Here’s a somewhat silly example. Consider #ccK_+ : ccT_bbbN xx ccT_bbbN ↛ ccT_bbbN# that has one bridge #tt "add" : (0, 0) -> 0#
with the equations #tt "add"(tt "inl"(tts, 0)(bbx_("("0, 0")"))) ~~ tts(tt "add"(bbx_("("0, 0")")))#
and #tt "add"(tt "inr"(0, tts)(bbx_("("0, 0")"))) ~~ tts(tt "add"(bbx_("("0, 0")")))#
.
More usefully, if a bit degenerately, every theory induces a collage in the following way. Given a theory #ccT#, we can build the collage #ccK_ccT : ccT ↛ ccT# where the bridges consist of the following. For each sort, #A#, of #ccT#, we have the following bridge: #tt "id"_A : tt "inl"(A) -> tt "inr"(A)#
. Then, for every function symbol, #ttf : A -> B# in #ccT#, we have the following equation: #tt "inl"(tt "f")(tt "id"_A(bbx_(tt "inl"(A)))) ~~ tt "id"_B(tt "inr"(tt "f")(bbx_(tt "inl"(A))))#
. We have #ccK_ccT(tt "inl"(A), tt "inr"(B)) ~= ccT(A, B)#
.
You can think of a bridging term in a collage as a sequence of function symbols partitioned into two parts by a bridge. Naturally, we might consider partitioning into more than two parts by having more than one bridge. It’s easy to generalize the definition of collage to combine an arbitrary number of theories, but I’ll take a different, probably less good, route. Given collages #ccK_1 : ccT_1 ↛ ccT_2# and #ccK_2 : ccT_2 ↛ ccT_3#, we can make the collage #ccK_2 @ ccK_1 : ccT_1 ↛ ccT_3# by defining its bridges to be triples of a bridge of #ccK_1#, #k_1 : A_1 -> A_2#, a term, #t : A_2 -> B_2# of #ccT_2#, and a bridge of #ccK_2#, #k_2 : B_2 -> B_3# which altogether will be a bridge of #ccK_2 @ ccK_1# going from #A_1 -> B_3#. These triples essentially represent a term like #k_2(t(k_1(bbx_(A_1))))#
. With this intuition we can formulate the equations. For each equation #t'(k_1(t_1)) ~~ s'(k'_1(s_1))#
where #k_1# and #k'_1#
are bridges of #ccK_1#, we have for every bridge #k_2# of #ccK_2# and term #t# of the appropriate sorts #(k_2, t(t'(bbx)), k_1)(t_1) ~~ (k_2, t(s'(bbx)), k'_1)(s_1)#
and similarly for equations involving the bridges of #ccK_2#.
This composition is associative… almost. Furthermore, the collages generated by theories, #ccK_ccT#, behave like identities to this composition… almost. It turns out these statements are true, but only up to isomorphism of theories. That is, #(ccK_3 @ ccK_2) @ ccK_1 ~= ccK_3 @ (ccK_2 @ ccK_1)# but is not equal.
To talk about isomorphism of theories we need the notion of…
An interpretation of a theory gives meaning to the syntax of a theory. There are two nearly identical notions of interpretation for us: interpretation (into sets) and interpretation into a theory. I’ll define them in parallel. An interpretation (into a theory), #ccI#, is a mapping, written #⟦-⟧^ccI# though the superscript will often be omitted, which maps sorts to sets (sorts) and function symbols to functions (terms). The mapping satisfies:
#⟦"src"(f)⟧ = "src"(⟦f⟧)#
and#⟦"tgt"(f)⟧ = "tgt"(⟦f⟧)#
where#"src"#
and#"tgt"#
on the right are the domain and codomain operations for an interpretation.
We extend the mapping to a mapping on terms via:
- #⟦bbx_A⟧ = x |-> x#, i.e. the identity function, or, for interpretation into a theory,
#⟦bbx_A⟧ = bbx_{:⟦A⟧:}#
#⟦tt "f"(t)⟧ = ⟦tt "f"⟧ @ ⟦t⟧#
or, for interpretation into a theory,#⟦tt "f"(t)⟧ = ⟦tt "f"⟧(⟦t⟧)#
and we require that for any equation of the theory, #l ~~ r#, #⟦l⟧ = ⟦r⟧#. (Technically, this is implicitly required for the extension of the mapping to terms to be well-defined, but it’s clearer to state it explicitly.) I’ll write #ccI : ccT -> bb "Set"#
when #ccI# is an interpretation of #ccT# into sets, and #ccI’ : ccT_1 -> ccT_2# when #ccI’# is an interpretation of #ccT_1# into #ccT_2#.
An interpretation of the theory of isomorphisms produces a bijection between two specified sets. Spelling out a simple example where #bbbB# is the set of booleans:
- #⟦A⟧ -= bbbB#
- #⟦B⟧ -= bbbB#
#⟦tt "f"⟧ -= x |-> not x#
#⟦tt "g"⟧ -= x |-> not x#
plus the proof #not not x = x#.
As another simple example, we can interpret the theory of isomorphisms into itself slightly non-trivially.
- #⟦A⟧ -= B#
- #⟦B⟧ -= A#
#⟦tt "f"⟧ -= tt "g"(bbx_B)#
#⟦tt "g"⟧ -= tt "f"(bbx_A)#
As an (easy) exercise, you should define #pi_1 : ccT_1 xx ccT_2 -> ccT_1# and similarly #pi_2#. If you defined #ccT_1 + ccT_2# before, you should define #iota_1 : ccT_1 -> ccT_1 + ccT_2# and similarly for #iota_2#. As another easy exercise, show that an interpretation of #ccT_{:~=:}#
is a bijection. In Haskell, an interpretation of #ccT_bbbN# would effectively be foldNat
. Something very interesting happens when you consider what an interpretation of the collage generated by a theory, #ccK_ccT#, is. Spell it out. In a different vein, you can show that a collage #ccK : ccT_1 ↛ ccT_2# and an interpretation #ccT_1^(op) xx ccT_2 -> bb "Set"#
are essentially the same thing in the sense that each gives rise to the other.
Two theories are isomorphic if there exists interpretations #ccI_1 : ccT_1 -> ccT_2#
and #ccI_2 : ccT_2 -> ccT_1#
such that #⟦⟦A⟧^(ccI_1)⟧^(ccI_2) = A#
and visa versa, and similarly for function symbols. In other words, each is interpretable in the other, and if you go from one interpretation and then back, you end up where you started. Yet another way to say this is that there is a one-to-one correspondence between sorts and terms of each theory, and this correspondence respects substitution.
As a crucially important example, the set of terms, #ccT(A, B)#, can be extended to an interpretation. In particular, for each sort #A#, #ccT(A, -) : ccT -> bb "Set"#
. It’s action on function symbols is the following:
#⟦tt "f"⟧^(ccT(A, -)) -= t |-> tt "f"(t)#
We have, dually, #ccT(-, A) : ccT^(op) -> bb "Set"#
with the following action:
#⟦tt "f"⟧^(ccT(-, A)) -= t |-> t(tt "f"(bbx_B))#
We can abstract from both parameters making #ccT(-, =) : ccT^(op) xx ccT -> bb "Set"#
which, by an early exercise, can be shown to correspond with the collage #ccK_ccT#.
Via an abuse of notation, I’ll identify #ccT^(op)(A, -)# with #ccT(-, A)#, though technically we only have an isomorphism between the interpretations, and to talk about isomorphisms between interpretations we need the notion of…
The theories we’ve presented are (multi-sorted) universal algebra theories. Universal algebra allows us to specify a general notion of “homomorphism” that generalizes monoid homomorphism or group homomorphism or ring homomorphism or lattice homomorphism.
In universal algebra, the algebraic theory of groups consists of a single sort, a nullary operation, #1#, a binary operation, #*#
, a unary operation, #tt "inv"#
, and some equations which are unimportant for us. Operations correspond to our function symbols except that they’re are not restricted to being unary. A particular group is a particular interpretation of the algebraic theory of groups, i.e. it is a set and three functions into the set. A group homomorphism then is a function between those two groups, i.e. between the two interpretations, that preserves the operations. In a traditional presentation this would look like the following:
Say #alpha : G -> K# is a group homomorphism from the group #G# to the group #K# and #g, h in G# then:
- #alpha(1_G) = 1_K#
#alpha(g *_G h) = alpha(g) *_K alpha(h)#
#alpha(tt "inv"_G(g)) = tt "inv"_K(alpha(g))#
Using something more akin to our notation, it would look like:
- #alpha(⟦1⟧^G) = ⟦1⟧^K#
#alpha(⟦*⟧^G(g,h)) = ⟦*⟧^K(alpha(g), alpha(h))#
#alpha(⟦tt "inv"⟧^G(g)) = ⟦tt "inv"⟧^K(alpha(g))#
The #tt "inv"#
case is the most relevant for us as it is unary. However, for us, a function symbol #ttf# may have a different source and target and so we made need a different function on each side of the equation. E.g. for #ttf : A -> B#, #alpha : ccI_1 -> ccI_2#, and #a in ⟦A⟧^(ccI_1)# we’d have:
#alpha_B(⟦tt "f"⟧^(ccI_1)(a)) = ⟦tt "f"⟧^(ccI_2)(alpha_A(a))#
So a homomorphism #alpha : ccI_1 -> ccI_2 : ccT -> bb "Set"#
is a family of functions, one for each sort of #ccT#, that satisfies the above equation for every function symbol of #ccT#. We call the individual functions making up #alpha# components of #alpha#, and we have #alpha_A : ⟦A⟧^(ccI_1) -> ⟦A⟧^(ccI_2)#
. The definition for an interpretation into a theory, #ccT_2#, is identical except the components of #alpha# are terms of #ccT_2# and #a# can be replaced with #bbx_(⟦A⟧^(ccI_1))#. Two interpretations are isomorphic if we have homomorphism #alpha : ccI_1 -> ccI_2# such that each component is a bijection. This is the same as requiring a homomorphism #beta : ccI_2 -> ccI_1# such that for each #A#, #alpha_A(beta_A(x)) = x# and #beta_A(alpha_A(x)) = x#. A similar statement can be made for interpretations into theories, just replace #x# with #bbx_(⟦A⟧)#
.
Another way to look at homomorphisms is via collages. A homomorphism #alpha : ccI_1 -> ccI_2 : ccT -> bb "Set"#
gives rise to an interpretation of the collage #ccK_ccT#. The interpretation #ccI_alpha : ccK_ccT -> bb "Set"#
is defined by:
#⟦tt "inl"(A)⟧^(ccI_alpha) -= ⟦A⟧^(ccI_1)#
#⟦tt "inr"(A)⟧^(ccI_alpha) -= ⟦A⟧^(ccI_2)#
#⟦tt "inl"(ttf)⟧^(ccI_alpha) -= ⟦ttf⟧^(ccI_1)#
#⟦tt "inr"(ttf)⟧^(ccI_alpha) -= ⟦ttf⟧^(ccI_2)#
#⟦tt "id"_A⟧^(ccI_alpha) -= alpha_A#
The homomorphism law guarantees that it satisfies the equation on #tt "id"#
. Conversely, given an interpretation of #ccK_ccT#, we have the homomorphism, #⟦tt "id"⟧ : ⟦tt "inl"(-)⟧ -> ⟦tt "inr"(-)⟧ : ccT -> bb "Set"#
. and the equation on #tt "id"#
is exactly the homomorphism law.
Consider a homomorphism #alpha : ccT(A, -) -> ccI#. The #alpha# needs to satisfy for every sort #B# and #C#, every function symbol #ttf : C -> D#, and every term #t : B -> C#:
#alpha_D(tt "f"(t)) = ⟦tt "f"⟧^ccI(alpha_C(t))#
Looking at this equation, the possibility of viewing it as a recursive “definition” leaps out suggesting that the action of #alpha# is completely determined by it’s action on the variables. Something like this, for example:
#alpha_D(tt "f"(tt "g"(tt "h"(bbx_A)))) = ⟦tt "f"⟧(alpha_C(tt "g"(tt "h"(bbx_A)))) = ⟦tt "f"⟧(⟦tt "g"⟧(alpha_B(tt "h"(bbx_A)))) = ⟦tt "f"⟧(⟦tt "g"⟧(⟦tt "h"⟧(alpha_A(bbx_A))))#
We can easily establish that there’s a one-to-one correspondence between the set of homomorphisms #ccT(A, -) -> ccI# and the elements of the set #⟦A⟧^ccI#. Given a homomorphism, #alpha#, we get an element of #⟦A⟧^ccI# via #alpha_A(bbx_A)#. Inversely, given an element #a in ⟦A⟧^ccI#, we can define a homomorphism #a^**#
via:
#a_D^**(tt "f"(t)) -= ⟦tt "f"⟧^ccI(a_C^**(t))#
#a_A^**(bbx_A) -= a#
which clearly satisfies the condition on homomorphisms by definition. It’s easy to verify that #(alpha_A(bbx_A))^** = alpha#
and immediately true that #a^**(bbx_A) = a#
establishing the bijection.
We can state something stronger. Given any homomorphism #alpha : ccT(A, -) -> ccI# and any function symbol #ttg : A -> X#, we can make a new homomorphism #alpha * ttg : ccT(X, -) -> ccI# via the following definition:
#(alpha * ttg)(t) = alpha(t(tt "g"(bbx_A)))#
Verifying that this is a homomorphism is straightforward:
#(alpha * ttg)(tt "f"(t)) = alpha(tt "f"(t(tt "g"(bbx_A)))) = ⟦tt "f"⟧(alpha(t(tt "g"(bbx_A)))) = ⟦tt "f"⟧((alpha * ttg)(t))#
and like any homomorphism of this form, as we’ve just established, it is completely determined by it’s action on variables, namely #(alpha * ttg)_A(bbx_A) = alpha_X(tt "g"(bbx_A)) = ⟦tt "g"⟧(alpha_A(bbx_A))#
. In particular, if #alpha = a^**#
, then we have #a^** * ttg = (⟦tt "g"⟧(a))^**#
. Together these facts establish that we have an interpretation #ccY : ccT -> bb "Set"#
such that #⟦A⟧^ccY -= (ccT(A, -) -> ccI)#
, the set of homomorphisms, and #⟦tt "g"⟧^ccY(alpha) -= alpha * tt "g"#
. The work we did before established that we have homomorphisms #(-)(bbx) : ccY -> ccI# and #(-)^** : ccI -> ccY#
that are inverses. This is true for all theories and all interpretations as at no point did we use any particular facts about them. This statement is the (dual form of the) Yoneda lemma. To get the usual form simply replace #ccT# with #ccT^(op)#. A particularly important and useful case (so useful it’s usually used tacitly) occurs when we choose #ccI = ccT(B,-)#, we get #(ccT(A, -) -> ccT(B, -)) ~= ccT(B, A)# or, choosing #ccT^(op)# everywhere, #(ccT(-, A) -> ccT(-, B)) ~= ccT(A, B)# which states that a term from #A# to #B# is equivalent to a homomorphism from #ccT(-, A)# to #ccT(-, B)#.
There is another result, dual in a different way, called the co-Yoneda lemma. It turns out it is a corollary of the fact that for a collage #ccK : ccT_1 ↛ ccT_2#, #ccK_(ccT_2) @ ccK ~= ccK#
and the dual is just the composition the other way. To get (closer to) the precise result, we need to be able to turn an interpretation into a collage. Given an interpretation, #ccI : ccT -> bb "Set"#
, we can define a collage #ccK_ccI : ccT_bb1 ↛ ccT# whose bridges from #1 -> A# are the elements of #⟦A⟧^ccI#. Given this, the co-Yoneda lemma is the special case, #ccK_ccT @ ccK_ccI ~= ccK_ccI#.
Note, that the Yoneda and co-Yoneda lemmas only apply to interpretations into sets as #ccY# involves the set of homomorphisms.
The Yoneda lemma suggests that the interpretations #ccT(A, -)# and #ccT(-, A)# are particularly important and this will be borne out as we continue.
We call an interpretation, #ccI : ccT^(op) -> bb "Set"#
representable if #ccI ~= ccT(-, X)# for some sort #X#. We then say that #X# represents #ccI#. What this states is that every term of sort #X# corresponds to an element in one of the sets that make up #ccI#, and these transform appropriately. There’s clearly a particularly important element, namely the image of #bbx_X# which corresponds to an element in #⟦X⟧^ccI#. This element is called the universal element. The dual concept is, for #ccI : ccT -> bb "Set"#
, #ccI# is co-representable if #ccI ~= ccT(X, -)#. We will also say #X# represents #ccI# in this case as it actually does when we view #ccI# as an interpretation of #(ccT^(op))^(op)#
.
As a rather liberating exercise, you should establish the following result called parameterized representability. Assume we have theories #ccT_1# and #ccT_2#, and a family of sorts of #ccT_2#, #X#, and a family of interpretations of #ccT_2^(op)#, #ccI#, both indexed by sorts of #ccT_1#, such that for each #A in bb "sort"(ccT_1)#
, #X_A# represents #ccI_A#, i.e. #ccI_A ~= ccT_2(-, X_A)#. Given all this, then there is a unique interpretation #ccX : ccT_1 -> ccT_2# and #ccI : ccT_1 xx ccT_2^(op) -> bb "Set"#
where #⟦A⟧^(ccX) -= X_A# and #"⟦("A, B")⟧"^ccI -= ⟦B⟧^(ccI_A)#
such that #ccI ~= ccT_2(=,⟦-⟧^ccX)#. To be a bit more clear, the right hand side means #(A, B) |-> ccT_2(B, ⟦A⟧^ccX)#. Simply by choosing #ccT_1# to be a product of multiple theories, we can generalize this result to an arbitrary number of parameters. What makes this result liberating is that we just don’t need to worry about the parameters, they will automatically transform homomorphically. As a technical warning though, since two interpretations may have the same action on sorts but a different action on function symbols, if the family #X_A# was derived from an interpretation #ccJ#, i.e. #X_A -= ⟦A⟧^ccJ#, it may not be the case that #ccX = ccJ#.
Let’s look at some examples.
As a not-so-special case of representability, we can consider #ccI -= ccK(tt "inl"(-), tt "inr"(Z))#
where #ccK : ccT_1 ↛ ccT_2#. Saying that #A# represents #ccI# in this case is saying that bridging terms of sort #tt "inr"(Z)#
, i.e. sort #Z# in #ccT_2#, in #ccK#, correspond to terms of sort #A# in #ccT_1#. We’ll call the universal element of this representation the universal bridge (though technically it may be a bridging term, not a bridge). Let’s write #varepsilon# for this universal bridge. What representability states in this case is given any bridging term #k# of sort #Z#, there exists a unique term #|~ k ~|# of sort #A# such that #k = varepsilon(|~ k ~|)#. If we have an interpretation #ccX : ccT_2 -> ccT_1# such that #⟦Z⟧^ccX# represents #ccK(tt "inl"(-), tt "inr"(Z))#
for each sort #Z# of #ccT_2# we say we have a right representation of #ccK#. Note, that the universal bridges become a family #varepsilon_Z : ⟦Z⟧^ccX -> Z#
. Similarly, if #ccK(tt "inl"(A), tt "inr"(-))#
is co-representable for each #A#, we say we have a left representation of #ccK#. The co-universal bridge is then a bridging term #eta_A : A -> ⟦A⟧# such that for any bridging term #k# with source #A#, there exists a unique term #|__ k __|#
in #ccT_2# such that #k = |__ k __|(eta_A)#
. For reference, we’ll call these equations universal properties of the left/right representation. Parameterized representability implies that a left/right representation is essentially unique.
Define #ccI_bb1# via #⟦A⟧^(ccI_bb1) -= bb1# where #bb1# is some one element set. #⟦ttf⟧^(ccI_bb1)# is the identity function for all function symbols #ttf#. We’ll say a theory #ccT# has a unit sort or has a terminal sort if there is a sort that we’ll also call #bb1# that represents #ccI_bb1#. Spelling out what that means, we first note that there is nothing notable about the universal element as it’s the only element. However, writing the homomorphism #! : ccI_bb1 -> ccT(-, bb1)# and noting that since there’s only one element of #⟦A⟧^(ccI_bb1)# we can, with a slight abuse of notation, also write the term #!# picks out as #!# which gives the equation:
#!_B(tt "g"(t)) = !_A(t)#
for any function symbol #ttg : A -> B# and term, #t#, of sort #A#, note#!_A : A -> bb1#
.
This equation states what the isomorphism also fairly directly states: there is exactly one term of sort #bb1# from any sort #A#, namely #!_A(bbx_A)#
. The dual notion is called a void sort or an initial sort and will usually be notated #bb0#, the analog of #!# will be written as #0#. The resulting equation is:
#tt "f"(0_A) = 0_B#
for any function symbol #ttf : A -> B#, note #0_A : bb0 -> A#.
For the next example, I’ll leverage collages. Consider the collage #ccK_2 : ccT ↛ ccT xx ccT# whose bridges from #A -> (B, C)# consist of pairs of terms #t_1 : A -> B# and #t_2 : A -> C#. #ccT# has pairs if #ccK_2# has a right representation. We’ll write #(B, C) |-> B xx C# for the representing interpretation’s action on sorts. We’ll write the universal bridge as #(tt "fst"(bbx_(B xx C)), tt "snd"(bbx_(B xx C)))#
. The universal property then looks like #(tt "fst"(bbx_(B xx C)), tt "snd"(bbx_(B xx C)))((: t_1, t_2 :)) = (t_1, t_2)#
where #(: t_1, t_2 :) : A -> B xx C# is the unique term induced by the bridge #(t_1, t_2)#. The universal property implies the following equations:
#(: tt "fst"(bbx_(B xx C)), tt "snd"(bbx_(B xx C))) = bbx_(B xx C)#
#tt "fst"((: t_1, t_2 :)) = t_1#
#tt "snd"((: t_1, t_2 :)) = t_2#
One aspect of note, is regardless of whether #ccK_2# has a right representation, i.e. regardless of whether #ccT# has pairs, it always has a left representation. The co-universal bridge is #(bbx_A, bbx_A)# and the unique term #|__(t_1, t_2)__|#
is #tt "inl"(t_1, bbx_A)(tt "inr"(bbx_A, t_2)(bbx_("("A,A")")))#
.
Define an interpretation #Delta : ccT -> ccT xx ccT# so that #⟦A⟧^Delta -= (A,A)# and similarly for function symbols. #Delta# left represents #ccK_2#. If the interpretation #(B,C) |-> B xx C# right represents #ccK_2#, then we say we have an adjunction between #Delta# and #(- xx =)#, written #Delta ⊣ (- xx =)#, and that #Delta# is left adjoint to #(- xx =)#, and conversely #(- xx =)# is right adjoint #Delta#.
More generally, whenever we have the situation #ccT_1(⟦-⟧^(ccI_1), =) ~= ccT_2(-, ⟦=⟧^(ccI_2))#
we say that #ccI_1 : ccT_2 -> ccT_1# is left adjoint to #ccI_2 : ccT_1 -> ccT_2# or conversely that #ccI_2# is right adjoint to #ccI_1#. We call this arrangement an adjunction and write #ccI_1 ⊣ ccI_2#. Note that we will always have this situation if #ccI_1# left represents and #ccI_2# right represents the same collage. As we noted above, parameterized representability actually determines one adjoint given (its action on sorts and) the other adjoint. With this we can show that adjoints are unique up to isomorphism, that is, given two left adjoints to an interpretation, they must be isomorphic. Similarly for right adjoints. This means that stating something is a left or right adjoint to some other known interpretation essentially completely characterizes it. One issue with adjunctions is that they tend to be wholesale. Let’s say the pair sort #A xx B# existed but no other pair sorts existed, then the (no longer parameterized) representability approach would work just fine, but the adjunction would no longer exist.
Here’s a few of exercises using this. First, a moderately challenging one (until you catch the pattern): spell out the details to the left adjoint to #Delta#. We say a theory has sums and write those sums as #A + B# if #(- + =) ⊣ Delta#. Recast void and unit sorts using adjunctions and/or left/right representations. As a terminological note, we say a theory has finite products if it has unit sorts and pairs. Similarly, a theory has finite sums or has finite coproducts if it has void sorts and sums. An even more challenging exercise is the following: a theory has exponentials if it has finite products and for every sort #A#, #(A xx -) ⊣ (A => -)# (note, parameterized representability applies to #A#). Spell out the equations characterizing #A => B#.
Finite products start to lift us off the ground. So far the theories we’ve been working with have been extremely basic: a language with only unary functions, all terms being just a sequence of applications of function symbols. It shouldn’t be underestimated though. It’s more than enough to do monoid and group theory. A good amount of graph theory can be done with just this. And obviously we were able to establish several general results assuming only this structure. Nevertheless, while we can talk about specific groups, say, we can’t talk about the theory of groups. Finite products change this.
A theory with finite products allows us to talk about multi-ary function symbols and terms by considering unary function symbols from products. This allows us to do all of universal algebra. For example, the theory of groups, #ccT_(bb "Grp")#
, consists of a sort #S# and all it’s products which we’ll abbreviate as #S^n# with #S^0 -= bb1# and #S^(n+1) -= S xx S^n#. It has three function symbols #tte : bb1 -> S#, #ttm : S^2 -> S#, and #tti : S -> S# plus the ones that having finite products requires. In fact, instead of just heaping an infinite number of sorts and function symbols into our theory — and we haven’t even gotten to equations — let’s define a compact set of data from which we can generate all this data.
A signature, #Sigma#, consists of a collection of sorts, #sigma#, a collection of multi-ary function symbols, and a collection of equations. Equations still remain pairs of terms, but we need to now extend our definition of terms for this context. A term (in a signature) is either a variable, #bbx_i^[A_0,A_1,...,A_n]#
with #A_i# are sorts and #0 <= i <= n#, the operators #tt "fst"#
or #tt "snd"#
applied to a term, the unit term written #(::)^A# with sort #A#, a pair of terms written #(: t_1, t_2 :)#, or the (arity correct) application of a multi-ary function symbol to a series of terms, e.g. #tt "f"(t_1, t_2, t_3)#
. As a Haskell data declaration, it might look like:
data SigTerm
= SigVar [Sort] Int
| Fst SigTerm
| Snd SigTerm
| Unit Sort
| Pair SigTerm SigTerm
| SigApply FunctionSymbol [SigTerm]
At this point, sorting (i.e. typing) the terms is no longer trivial, though it is still pretty straightforward. Sorts are either #bb1#, or #A xx B# for sorts #A# and #B#, or a sort #A in sigma#. The source of function symbols or terms are lists of sorts.
#bbx_i^[A_0, A_1, ..., A_n] : [A_0, A_1, ..., A_n] -> A_i#
#(::)^A : [A] -> bb1#
#(: t_1, t_2 :) : bar S -> T_1 xx T_2#
where #t_i : bar S -> T_i##tt "fst"(t) : bar S -> T_1#
where #t : bar S -> T_1 xx T_2##tt "snd"(t) : bar S -> T_2#
where #t : bar S -> T_1 xx T_2##tt "f"(t_1, ..., t_n) : bar S -> T#
where #t_i : bar S -> T_i# and#ttf : [T_1,...,T_n] -> T#
The point of a signature was to represent a theory so we can compile a term of a signature into a term of a theory with finite products. The theory generated from a signature #Sigma# has the same sorts as #Sigma#. The equations will be equations of #Sigma#, with the terms compiled as will be described momentarily, plus for every pair of sorts the equations that describe pairs and the equations for #!#. Finally, we need to describe how to take a term of the signature and make a function symbol of the theory, but before we do that we need to explain how to convert those sources of the terms which are lists. That’s just a conversion to right nested pairs, #[A_0,...,A_n] |-> A_0 xx (... xx (A_n xx bb1) ... )#
. The compilation of a term #t#, which we’ll write as #ccC[t]#
, is defined as follows:
#ccC[bbx_i^[A_0, A_1, ..., A_n]] = tt "snd"^i(tt "fst"(bbx_(A_i xx(...))))#
where#tt "snd"^i#
means the #i#-fold application of#tt "snd"#
#ccC[(::)^A] = !_A#
#ccC[(: t_1, t_2 :)] = (: ccC[t_1], ccC[t_2] :)#
#ccC[tt "fst"(t)] = tt "fst"(ccC[t])#
#ccC[tt "snd"(t)] = tt "snd"(ccC[t])#
#ccC[tt "f"(t_1, ..., t_n)] = tt "f"((: ccC[t_1], (: ... , (: ccC[t_n], ! :) ... :) :))#
As you may have noticed, the generated theory will have an infinite number of sorts, an infinite number of function symbols, and an infinite number of equations no matter what the signature is — even an empty one! Having an infinite number of things isn’t a problem as long as we can algorithmically describe them and this is what the signature provides. Of course, if you’re a (typical) mathematician you nominally don’t care about an algorithmic description. Besides being compact, signatures present a nicer term language. The theories are like a core or assembly language. We could define a slightly nicer variation where we keep a context and manage named variables leading to terms-in-context like:
#x:A, y:B |-- tt "f"(x, x, y)#
which is
#tt "f"(bbx_0^[A,B], bbx_0^[A,B], bbx_1^[A,B])#
for our current term language for signatures. Of course, compilation will be (slightly) trickier for the nicer language.
The benefit of having compiled the signature to a theory, in addition to being able to reuse the results we’ve established for theories, is we only need to define operations on the theory, which is simpler since we only need to deal with pairs and unary function symbols. One example of this is we’d like to extend our notion of interpretation to one that respects the structure of the signature, and we can do that by defining an interpretation of theories that respects finite products.
A finite product preserving interpretation (into a finite product theory), #ccI#, is an interpretation (into a finite product theory) that additionally satisfies:
#⟦bb1⟧^ccI = bb1#
#⟦A xx B⟧^ccI = ⟦A⟧^ccI xx ⟦B⟧^ccI#
#⟦!_A⟧^ccI = !_(⟦A⟧^ccI)#
#⟦tt "fst"(t)⟧^ccI = tt "fst"(⟦t⟧^ccI)#
#⟦tt "snd"(t)⟧^ccI = tt "snd"(⟦t⟧^ccI)#
#⟦(: t_1, t_2 :)⟧^ccI = (: ⟦t_1⟧^ccI, ⟦t_2⟧^ccI :)#
where, for #bb "Set"#
, #bb1 -= {{}}#, #xx# is the cartesian product, #tt "fst"#
and #tt "snd"#
are the projections, #!_A -= x |-> \{\}#
, and #(: f, g :) -= x |-> (: f(x), g(x) :)#.
With signatures, we can return to our theory, now signature, of groups. #Sigma_bb "Grp"#
has a single sort #S#, three function symbols #tte : [bb1] -> S#
, #tti : [S] -> S#
, and #ttm : [S, S] -> S#
, with the following equations (written as equations rather than pairs):
#tt "m"(tt "e"((::)^S), bbx_0^S) = bbx_0^S#
#tt "m"(tt "i"(bbx_0^S), bbx_0^S) = tt "e"((::)^S)#
#tt "m"(tt "m"(bbx_0^[S,S,S], bbx_1^[S,S,S]), bbx_2^[S,S,S]) = tt "m"(bbx_0^[S,S,S], tt "m"(bbx_1^[S,S,S], bbx_2^[S,S,S]))#
or using the nicer syntax:
#x:S |-- tt "m"(tt "e"(), x) = x#
#x:S |-- tt "m"(tt "i"(x), x) = tt "e"()#
#x:S, y:S, z:S |-- tt "m"(tt "m"(x, y), z) = tt "m"(x, tt "m"(y, z))#
An actual group is then just a finite product preserving interpretation of (the theory generated by) this signature. All of universal algebra and much of abstract algebra can be formulated this way.
We can consider additionally assuming that our theory has exponentials. I left articulating exactly what that means as an exercise, but the upshot is we have the following two operations:
For any term #t : A xx B -> C#, we have the term #tt "curry"(t) : A -> C^B#
. We also have the homomorphism #tt "app"_(AB) : B^A xx A -> B#
. They satisfy:
#tt "curry"(tt "app"(bbx_(B^A xx A))) = bbx_(B^A)#
#tt "app"((: tt "curry"(t_1), t_2 :)) = t_1((: bbx_A, t_2 :))#
where #t_1 : A xx B -> C# and #t_2 : A -> B#.
We can view these, together with the the product operations, as combinators and it turns out we can compile the simply typed lambda calculus into the above theory. This is exactly what the Categorical Abstract Machine did. The “Caml” in “O’Caml” stands for “Categorical Abstract Machine Language”, though O’Caml no longer uses the CAM. Conversely, every term of the theory can be expressed as a simply typed lambda term. This means we can view the simply typed lambda calculus as just a different presentation of the theory.
At this point, this presentation of category theory starts to connect to the mainstream categorical literature on universal algebra, internal languages, sketches, and internal logic. This page gives a synopsis of the relationship between type theory and category theory. For some reason, it is unusual to talk about the internal language of a plain category, but that is exactly what we’ve done here.
I haven’t talked about finite limits or colimits beyond products and coproducts, nor have I talked about even the infinitary versions of products and coproducts, let alone arbitrary limits and colimits. These can be handled the same way as products and coproducts. Formulating a language like signatures or the simply typed lambda calculus is a bit more complicated, but not that hard. I may make a follow-up article covering this among other things. I also have a side project (don’t hold your breath), that implements the internal language of a category with finite limits. The result looks roughly like a simple version of an algebraic specification language like the OBJ family. The RING
theory described in the Maude manual gives an idea of what it would look like. In fact, here’s an example of the current actual syntax I’m using.^{3}
theory Categories
type O
type A
given src : A -> O
given tgt : A -> O
given id : O -> A
satisfying o:O | src (id o) = o, tgt (id o) = o
given c : { f:A, g:A | src f = tgt g } -> A
satisfying (f, g):{ f:A, g:A | src f = tgt g }
| tgt (c (f, g)) = tgt f, src (c (f, g)) = src g
satisfying "left unit" (o, f):{ o:O, f:A | tgt f = o }
| c (id o, f) = f
satisfying "right unit" (o, f):{ o:O, f:A | src f = o }
| c (f, id o) = f
satisfying "associativity" (f, g, h):{ f:A, g:A, h:A | src f = tgt g, src g = tgt h }
| c (c (f, g), h) = c (f, c (g, h))
endtheory
It turns out this is a particularly interesting spot in the design space. The fact that the theory of theories with finite limits is itself a theory with finite limits has interesting consequences. It is still relatively weak though. For example, it’s not possible to describe the theory of fields in this language.
There are other directions one could go. For example, the internal logic of monoidal categories is (a fragment of) ordered linear logic. You can cross this bridge either way. You can look at different languages and consider what categorical structure is needed to support the features of the language, or you can add features to the category and see how that impacts the internal language. The relationship is similar to the source language and a core/intermediate language in a compiler, e.g. GHC Haskell and System Fc.
If you’ve looked at category theory at all, you can probably make most of the connections without me telling you. The table below outlines the mapping, but there are some subtleties. First, as a somewhat technical detail, my definition of a theory corresponds to a small category, i.e. a category which has a set of objects and a set of arrows. For more programmer types, you should think of “set” as Set
in Agda, i.e. similar to the *
kind in Haskell. Usually “category” means “locally small category” which may have a proper class of objects and between any two objects a set of arrows (though the union of all those sets may be a proper class). Again, for programmers, the distinction between “class” and “set” is basically the difference between Set
and Set1
in Agda.^{4} To make my definition of theory closer to this, all that is necessary is instead of having a set of function symbols, have a family of sets indexed by pairs of objects. Here’s what a partial definition in Agda of the two scenarios would look like:
-- Small category (the definition I used)
record SmallCategory : Set1 where
field
objects : Set
arrows : Set
src : arrows -> objects
tgt : arrows -> objects
...
-- Locally small category
record LocallySmallCategory : Set2 where
field
objects : Set1
hom : objects -> objects -> Set
...
-- Different presentation of a small category
record SmallCategory' : Set1 where
field
objects : Set
hom : objects -> objects -> Set
...
The benefit of the notion of locally small category is that Set
itself is a locally small category. The distinction I was making between interpretations into theories and interpretations into Set was due to the fact that Set wasn’t a theory. If I used a definition theory corresponding to a locally small category, I could have combined the notions of interpretation by making Set a theory. The notion of a small category, though, is still useful. Also, an interpretation into Set corresponds to the usual notion of a model or semantics, while interpretations into other theories was a less emphasized concept in traditional model theory and universal algebra.
A less technical and more significant difference is that my definition of a theory doesn’t correspond to a category, but rather to a presentation of a category, from which a category can be generated. The analog of arrows in a category is terms, not function symbols. This is a bit more natural route from the model theory/universal algebra/programming side. Similarly, having an explicit collection of equations, rather than just an equivalence relation on terms is part of the presentation of the category but not part of the category itself.
model theory | category theory |
---|---|
sort | object |
term | arrow |
function symbol | generating arrow |
theory | presentation of a (small) category |
collage | collage, cograph of a profunctor |
bridge | heteromorphism |
signature | presentation of a (small) category with finite products |
interpretation into sets, aka models | a functor into Set, a (co)presheaf |
interpretation into a theory | functor |
homomorphism | natural transformation |
simply typed lambda calculus (with products) | a cartesian closed category |
In some ways I’ve stopped just when things were about to get good. I may do a follow-up to elaborate on this good stuff. Some examples are: if I expand the definition so that Set becomes a “theory”, then interpretations also form such a “theory”, and these are often what we’re really interested in. The category of finite-product preserving interpretations of the theory of groups essentially is the category of groups. In fact, universal algebra is, in categorical terms, just the study of categories with finite products and finite-product preserving functors from them, particularly into Set. It’s easy to generalize this in many directions. It’s also easy to make very general definitions, like a general definition of a free algebraic structure. In general, we’re usually more interested in the interpretations of a theory than the theory itself.
While I often do advocate thinking in terms of internal languages of categories, I’m not sure that it is a preferable perspective for the very basics of category theory. Nevertheless, there are a few reasons for why I wrote this. First, this very syntactical approach is, I think, more accessible to someone coming from a programming background. From this view, a category is a very simple programming language. Adding structure to the category corresponds to adding features to this programming language. Interpretations are denotational semantics.
Another aspect about this presentation that is quite different is the use and emphasis on collages. Collages correspond to profunctors, a crucially important and enabling concept that is rarely covered in categorical introductions. The characterization of profunctors as collages in Vaughn Pratt’s paper (not using that name) was one of the things I enjoyed about that paper and part of what prompted me to start writing this. In earlier, drafts of this article, I was unable to incorporate collages in a meaningful way as I was trying to start from profunctors. This approach just didn’t add value. Collages just looked like a bizarre curio and weren’t integrated into the narrative at all. For other reasons, though, I ended up revisiting the idea of a heteromorphism. My (fairly superficial) opinion is that once you have the notion of functors and natural transformations, adding the notion of heteromorphisms has a low power-to-weight ratio, though it does make some things a bit nicer. Nevertheless, in thinking of how best to fit them into this context, it was clear that collages provided the perfect mechanism (which isn’t a big surprise), and the result works rather nicely. When I realized a fact that can be cryptically but compactly represented as #ccK_ccT ≃ bbbI xx ccT# where #bbbI# is the interval category, i.e. two objects with a single arrow joining them, I realized that this is actually an interesting perspective. Since most of this article was written at that point, I wove collages into the narrative replacing some things. If, though, I had started with this perspective from the beginning I suspect I would have made a significantly different article, though the latter sections would likely be similar.
It’s actually better to organize this as a family of collections of function symbols indexed by pairs of sorts.↩
Instead of having equations that generate an equivalence relation on (raw) terms, we could simply require an equivalence relation on (raw) terms be directly provided.↩
Collaging is actually quite natural in this context. I already intend to support one theory importing another. A collage is just a theory that imports two others and then adds function symbols between them.↩
For programmers familiar with Agda, at least, if you haven’t made this connection, this might help you understand and appreciate what a “class” is versus a “set” and what “size issues” are, which is typically handled extremely vaguely in a lot of the literature.↩
I’ve been watching the Spring 2012 lectures for MIT 6.851 Advanced Data Structures with Prof. Erik Demaine. In lecture 12, “Fusion Trees”, it mentions a constant time algorithm for finding the index of the first most significant 1 bit in a word, i.e. the binary logarithm. Assuming word operations are constant time, i.e. in the Word RAM model, the below algorithm takes 27 word operations (not counting copying). When I compiled it with GHC 8.0.1 -O2 the core of the algorithm was 44 straight-line instructions. The theoretically interesting thing is, other than changing the constants, the same algorithm works for any word size that’s an even power of 2. Odd powers of two need a slight tweak. This is demonstrated for Word64
, Word32
, and Word16
. It should be possible to do this for any arbitrary word size w
.
The clz
instruction can be used to implement this function, but this is a potential simulation if that or a similar instruction wasn’t available. It’s probably not the fastest way. Similarly, find first set and count trailing zeros can be implemented in terms of this operation.
Below is the complete code. You can also download it here.
{-# LANGUAGE BangPatterns #-}
import Data.Word
import Data.Bits
-- Returns 0-based bit index of most significant bit that is 1. Assumes input is non-zero.
-- That is, 2^indexOfMostSignificant1 x <= x < 2^(indexOfMostSignificant1 x + 1)
-- From Erik Demaine's presentation in Spring 2012 lectures of MIT 6.851, particularly "Lecture 12: Fusion Trees".
-- Takes 26 (source-level) straight-line word operations.
indexOfMostSignificant1 :: Word64 -> Word64
indexOfMostSignificant1 w = idxMsbyte .|. idxMsbit
where
-- top bits of each byte
!wtbs = w .&. 0x8080808080808080
-- all but top bits of each byte producing 8 7-bit chunks
!wbbs = w .&. 0x7F7F7F7F7F7F7F7F
-- parallel compare of each 7-bit chunk to 0, top bit set in result if 7-bit chunk was not 0
!pc = parallelCompare 0x8080808080808080 wbbs
-- top bit of each byte set if the byte has any bits set in w
!ne = wtbs .|. pc
-- a summary of which bytes (except the first) are non-zero as a 7-bit bitfield, i.e. top bits collected into bottom byte
!summary = sketch ne `unsafeShiftR` 1
-- parallel compare summary to powers of two
!cmpp2 = parallelCompare 0xFFBF9F8F87838180 (0x0101010101010101 * summary)
-- index of most significant non-zero byte * 8
!idxMsbyte = sumTopBits8 cmpp2
-- most significant 7-bits of most significant non-zero byte
!msbyte = ((w `unsafeShiftR` (fromIntegral idxMsbyte)) .&. 0xFF) `unsafeShiftR` 1
-- parallel compare msbyte to powers of two
!cmpp2' = parallelCompare 0xFFBF9F8F87838180 (0x0101010101010101 * msbyte)
-- index of most significant non-zero bit in msbyte
!idxMsbit = sumTopBits cmpp2'
-- Maps top bits of each byte into lower byte assuming all other bits are 0.
-- 0x2040810204081 = sum [2^j | j <- map (\i -> 49 - 7*i) [0..7]]
-- In general if w = 2^(2*k+p) and p = 0 or 1 the formula is:
-- sum [2^j | j <- map (\i -> w-(2^k-1) - 2^(k+p) - (2^(k+p) - 1)*i) [0..2^k-1]]
-- Followed by shifting right by w - 2^k
sketch w = (w * 0x2040810204081) `unsafeShiftR` 56
parallelCompare w1 w2 = complement (w1 - w2) .&. 0x8080808080808080
sumTopBits w = ((w `unsafeShiftR` 7) * 0x0101010101010101) `unsafeShiftR` 56
sumTopBits8 w = ((w `unsafeShiftR` 7) * 0x0808080808080808) `unsafeShiftR` 56
indexOfMostSignificant1_w32 :: Word32 -> Word32
indexOfMostSignificant1_w32 w = idxMsbyte .|. idxMsbit
where !wtbs = w .&. 0x80808080
!wbbs = w .&. 0x7F7F7F7F
!pc = parallelCompare 0x80808080 wbbs
!ne = wtbs .|. pc
!summary = sketch ne `unsafeShiftR` 1
!cmpp2 = parallelCompare 0xFF838180 (0x01010101 * summary)
!idxMsbyte = sumTopBits8 cmpp2
!msbyte = ((w `unsafeShiftR` (fromIntegral idxMsbyte)) .&. 0xFF) `unsafeShiftR` 1
!cmpp2' = parallelCompare 0x87838180 (0x01010101 * msbyte)
-- extra step when w is not an even power of two
!cmpp2'' = parallelCompare 0xFFBF9F8F (0x01010101 * msbyte)
!idxMsbit = sumTopBits cmpp2' + sumTopBits cmpp2''
sketch w = (w * 0x204081) `unsafeShiftR` 28
parallelCompare w1 w2 = complement (w1 - w2) .&. 0x80808080
sumTopBits w = ((w `unsafeShiftR` 7) * 0x01010101) `unsafeShiftR` 24
sumTopBits8 w = ((w `unsafeShiftR` 7) * 0x08080808) `unsafeShiftR` 24
indexOfMostSignificant1_w16 :: Word16 -> Word16
indexOfMostSignificant1_w16 w = idxMsnibble .|. idxMsbit
where !wtbs = w .&. 0x8888
!wbbs = w .&. 0x7777
!pc = parallelCompare 0x8888 wbbs
!ne = wtbs .|. pc
!summary = sketch ne `unsafeShiftR` 1
!cmpp2 = parallelCompare 0xFB98 (0x1111 * summary)
!idxMsnibble = sumTopBits4 cmpp2
!msnibble = ((w `unsafeShiftR` (fromIntegral idxMsnibble)) .&. 0xF) `unsafeShiftR` 1
!cmpp2' = parallelCompare 0xFB98 (0x1111 * msnibble)
!idxMsbit = sumTopBits cmpp2'
sketch w = (w * 0x249) `unsafeShiftR` 12
parallelCompare w1 w2 = complement (w1 - w2) .&. 0x8888
sumTopBits w = ((w `unsafeShiftR` 3) * 0x1111) `unsafeShiftR` 12
sumTopBits4 w = ((w `unsafeShiftR` 3) * 0x4444) `unsafeShiftR` 12
Programmers in typed languages with higher order functions and algebraic data types are already comfortable with most of the basic constructions of set/type theory. In categorical terms, those programmers are familiar with finite products and coproducts and (monoidal/cartesian) closed structure. The main omissions are subset types (equalizers/pullbacks) and quotient types (coequalizers/pushouts) which would round out limits and colimits. Not having a good grasp on either of these constructions dramatically shrinks the world of mathematics that is understandable, but while subset types are fairly straightforward, quotient types are quite a bit less intuitive.
In my opinion, most programmers can more or less immediately understand the notion of a subset type at an intuitive level.
A subset type is just a type combined with a predicate on that type that specifies which values of the type we want. For example, we may have something like { n:Nat | n /= 0 }
meaning the type of naturals not equal to #0#. We may use this in the type of the division function for the denominator. Consuming a value of a subset type is easy, a natural not equal to #0# is still just a natural, and we can treat it as such. The difficult part is producing a value of a subset type. To do this, we must, of course, produce a value of the underlying type — Nat
in our example — but then we must further convince the type checker that the predicate holds (e.g. that the value does not equal #0#). Most languages provide no mechanism to prove potentially arbitrary facts about code, and this is why they do not support subset types. Dependently typed languages do provide such mechanisms and thus either have or can encode subset types. Outside of dependently typed languages the typical solution is to use an abstract data type and use a runtime check when values of that abstract data type are created.
The dual of subset types are quotient types. My impression is that this construction is the most difficult basic construction for people to understand. Further, programmers aren’t much better off, because they have little to which to connect the idea. Before I give a definition, I want to provide the example with which most people are familiar: modular (or clock) arithmetic. A typical way this is first presented is as a system where the numbers “wrap-around”. For example, in arithmetic mod #3#, we count #0#, #1#, #2#, and then wrap back around to #0#. Programmers are well aware that it’s not necessary to guarantee that an input to addition, subtraction, or multiplication mod #3# is either #0#, #1#, or #2#. Instead, the operation can be done and the mod
function can be applied at the end. This will give the same result as applying the mod
function to each argument at the beginning. For example, #4+7 = 11# and #11 mod 3 = 2#, and #4 mod 3 = 1# and #7 mod 3 = 1# and #1+1 = 2 = 11 mod 3#.
For mathematicians, the type of integers mod #n# is represented by the quotient type #ZZ//n ZZ#. The idea is that the values of #ZZ // n ZZ# are integers except that we agree that any two integers #a# and #b# are treated as equal if #a - b = kn# for some integer #k#. For #ZZ // 3 ZZ#, #… -6 = -3 = 0 = 3 = 6 = …# and #… = -5 = -2 = 1 = 4 = 7 = …# and #… = -4 = -1 = 2 = 5 = 8 = …#.
To start to formalize this, we need the notion of an equivalence relation. An equivalence relation is a binary relation #(~~)#
which is reflexive (#x ~~ x# for all #x#), symmetric (if #x ~~ y#
then #y ~~ x#
), and transitive (if #x ~~ y#
and #y ~~ z#
then #x ~~ z#
). We can check that “#a ~~ b# iff there exists an integer #k# such that #a-b = kn#” defines an equivalence relation on the integers for any given #n#. For reflexivity we have #a - a = 0n#. For symmetry we have if #a - b = kn# then #b - a = -kn#. Finally, for transitivity we have if #a - b = k_1 n# and #b - c = k_2 n# then #a - c = (k_1 + k_2)n# which we get by adding the preceding two equations.
Any relation can be extended to an equivalence relation. This is called the reflexive-, symmetric-, transitive-closure of the relation. For an arbitrary binary relation #R# we can define the equivalence relation #(~~_R)# via “#a ~~_R b# iff #a = b# or #R(a, b)# or #b ~~_R a# or #a ~~_R c and c ~~_R b# for some #c#“. To be precise, #~~_R# is the smallest relation satisfying those constraints. In Datalog syntax, this looks like:
eq_r(A, A).
eq_r(A, B) :- r(A, B).
eq_r(A, B) :- eq_r(B, A).
eq_r(A, B) :- eq_r(A, C), eq_r(C, B).
If #T# is a type, and #(~~)#
is an equivalence relation, we use #T // ~~# as the notation for the quotient type, which we read as “#T# quotiented by the equivalence relation #(~~)#
”. We call #T# the underlying type of the quotient type. We then say #a = b# at type #T // ~~# iff #a ~~ b#. Dual to subset types, to produce a value of a quotient type is easy. Any value of the underlying type is a value of the quotient type. (In type theory, this produces the perhaps surprising result that #ZZ# is a subtype of #ZZ // n ZZ#.) As expected, consuming a value of a quotient type is more complicated. To explain this, we need to explain what a function #f : T // ~~ -> X# is for some type #X#. A function #f : T // ~~ -> X# is a function #g : T -> X# which satisfies #g(a) = g(b)# for all #a# and #b# such that #a ~~ b#. We call #f# (or #g#, they are often conflated) well-defined if #g# satisfies this condition. In other words, any well-defined function that consumes a quotient type isn’t allowed to produce an output that distinguishes between equivalent inputs. A better way to understand this is that quotient types allow us to change what the notion of equality is for a type. From this perspective, a function being well-defined just means that it is a function. Taking equal inputs to equal outputs is one of the defining characteristics of a function.
Sometimes we can finesse needing to check the side condition. Any function #h : T -> B# gives rise to an equivalence relation on #T# via #a ~~ b# iff #h(a) = h(b)#. In this case, any function #g : B -> X# gives rise to a function #f : T // ~~ -> X# via #f = g @ h#. In particular, when #B = T# we are guaranteed to have a suitable #g# for any function #f : T // ~~ -> X#. In this case, we can implement quotient types in a manner quite similar subset types, namely we make an abstract type and we normalize with the #h# function as we either produce or consume values of the abstract type. A common example of this is rational numbers. We can reduce a rational number to lowest terms either when it’s produced or when the numerator or denominator get accessed, so that we don’t accidentally write functions which distinguish between #1/2# and #2/4#. For modular arithmetic, the mod by #n# function is a suitable #h#.
In set theory such an #h# function can always be made by mapping the elements of #T# to the equivalence classes that contain them, i.e. #a# gets mapped to #{b | a ~~ b}# which is called the equivalence class of #a#. In fact, in set theory, #T // ~~# is usually defined to be the set of equivalence classes of #(~~)#
. So, for the example of #ZZ // 3 ZZ#, in set theory, it is a set of exactly three elements: the elements are #{ 3n+k | n in ZZ}# for #k = 0, 1, 2#. Equivalence classes are also called partitions and are said to partition the underlying set. Elements of these equivalence classes are called representatives of the equivalence class. Often a notation like #[a]# is used for the equivalence class of #a#.
Here is a quick run-through of some significant applications of quotient types. I’ll give the underlying type and the equivalence relation and what the quotient type produces. I’ll leave it as an exercise to verify that the equivalence relations really are equivalence relations, i.e. reflexive, symmetric, and transitive. I’ll start with more basic examples. You should work through them to be sure you understand how they work.
Integers can be presented as pairs of naturals #(n, m)# with the idea being that the pair represents “#n - m#”. Of course, #1 - 2# should be the same as #2 - 3#. This is expressed as #(n_1, m_1) ~~ (n_2, m_2)# iff #n_1 + m_2 = n_2 + m_1#. Note how this definition only relies on operations on natural numbers. You can explore how to define addition, subtraction, multiplication, and other operations on this representation in a well-defined manner.
Rationals can be presented very similarly to integers, only with multiplication instead of addition. We also have pairs #(n, d)#, usually written #n/d#, in this case of an integer #n# and a non-zero natural #d#. The equivalence relation is #(n_1, d_1) ~~ (n_2, d_2)# iff #n_1 d_2 = n_2 d_1#.
We can extend the integers mod #n# to the continuous case. Consider the real numbers with the equivalence relation #r ~~ s# iff #r - s = k# for some integer #k#. You could call this the reals mod #1#. Topologically, this is a circle. If you walk along it far enough, you end up back at a point equivalent to where you started. Occasionally this is written as #RR//ZZ#.
Doing the previous example in 2D gives a torus. Specifically, we have pairs of real numbers and the equivalence relation #(x_1, y_1) ~~ (x_2, y_2)# iff #x_1 - x_2 = k# and #y_1 - y_2 = l# for some integers #k# and #l#. Quite a bit of topology relies on similar constructions as will be expanded upon on the section on gluing.
Here’s an example that’s a bit closer to programming. Consider the following equivalence relation on arbitrary pairs: #(a_1, b_1) ~~ (a_2, b_2)# iff #a_1 = a_2 and b_1 = b_2# or #a_1 = b_2 and b_1 = a_2#. This just says that a pair is equivalent to either itself, or a swapped version of itself. It’s interesting to consider what a well-defined function is on this type.^{1}
Returning to topology and doing a bit more involved construction, we arrive at gluing or pushouts. In topology, we often want to take two topological spaces and glue them together in some specified way. For example, we may want to take two discs and glue their boundaries together. This gives a sphere. We can combine two spaces into one with the disjoint sum (or coproduct, i.e. Haskell’s Either
type.) This produces a space that contains both the input spaces, but they don’t interact in any way. You can visualize them as sitting next to each other but not touching. We now want to say that certain pairs of points, one from each of the spaces, are really the same point. That is, we want to quotient by an equivalence relation that would identify those points. We need some mechanism to specify which points we want to identify. One way to accomplish this is to have a pair of functions, #f : C -> A# and #g : C -> B#, where #A# and #B# are the space we want to glue together. We can then define a relation #R# on the disjoint sum via #R(a, b)# iff there’s a #c : C# such that #a = tt "inl"(f(c)) and b = tt "inr"(g(c))#
. This is not an equivalence relation, but we can extend it to one. The quotient we get is then the gluing of #A# and #B# specified by #C# (or really by #f# and #g#). For our example of two discs, #f# and #g# are the same function, namely the inclusion of the boundary of the disc into the disc. We can also glue a space to itself. Just drop the disjoint sum part. Indeed, the circle and torus are examples.
We write #RR[X]# for the type of polynomials with one indeterminate #X# with real coefficients. For two indeterminates, we write #RR[X, Y]#. Values of these types are just polynomials such as #X^2 + 1# or #X^2 + Y^2#. We can consider quotienting these types by equivalence relations generated from identifications like #X^2 + 1 ~~ 0# or #X^2 - Y ~~ 0#, but we want more than just the reflexive-, symmetric-, transitive-closure. We want this equivalence relation to also respect the operations we have on polynomials, in particular, addition and multiplication. More precisely, we want if #a ~~ b# and #c ~~ d# then #ac ~~ bd# and similarly for addition. An equivalence relation that respects all operations is called a congruence. The standard notation for the quotient of #RR[X, Y]# by a congruence generated by both of the previous identifications is #RR[X, Y]//(X^2 + 1, X^2 - Y)#. Now if #X^2 + 1 = 0# in #RR[X, Y]//(X^2 + 1, X^2 - Y)#, then for any polynomial #P(X, Y)#, we have #P(X, Y)(X^2 + 1) = 0# because #0# times anything is #0#. Similarly, for any polynomial #Q(X, Y)#, #Q(X, Y)(X^2 - Y) = 0#. Of course, #0 + 0 = 0#, so it must be the case that #P(X, Y)(X^2 + 1) + Q(X, Y)(X^2 - Y) = 0# for all polynomials #P# and #Q#. In fact, we can show that all elements in the equivalence class of #0# are of this form. You’ve now motivated the concrete definition of a ring ideal and given it’s significance. An ideal is an equivalence class of #0# with respect to some congruence. Let’s work out what #RR[X, Y]//(X^2 + 1, X^2 - Y)# looks like concretely. First, since #X^2 - Y = 0#, we have #Y = X^2# and so we see that values of #RR[X, Y]//(X^2 + 1, X^2 - Y)# will be polynomials in only one indeterminate because we can replace all #Y#s with #X^2#s. Since #X^2 = -1#, we can see that all those polynomials will be linear (i.e. of degree 1) because we can just keep replacing #X^2#s with #-1#s, i.e. #X^(n+2) = X^n X^2 = -X^n#. The end result is that an arbitrary polynomial in #RR[X, Y]//(X^2 + 1, X^2 - Y)# looks like #a + bX# for real numbers #a# and #b# and we have #X^2 = -1#. In other words, #RR[X, Y]//(X^2 + 1, X^2 - Y)# is isomorphic to the complex numbers, #CC#.
As a reasonably simple exercise, given a polynomial #P(X) : RR[X]#, what does it get mapped to when embedded into #RR[X]//(X - 3)#, i.e. what is #[P(X)] : RR[X]//(X - 3)#?^{2}
Moving much closer to programming, we have a rather broad and important example that a mathematician might describe as free algebras modulo an equational theory. This example covers several of the preceding examples. In programmer-speak, a free algebra is just a type of abstract syntax trees for some language. We’ll call a specific absract syntax tree a term. An equational theory is just a collection of pairs of terms with the idea being that we’d like these terms to be considered equal. To be a bit more precise, we will actually allow terms to contain (meta)variables. An example equation for an expression language might be Add(
#x#,
#x#) = Mul(2,
#x#)
. We call a term with no variables a ground term. We say a ground term matches another term if there is a consistent substitution for the variables that makes the latter term syntactically equal to the ground term. E.g. Add(3, 3)
matches Add(
#x#,
#x#)
via the substitution #x |->#3
. Now, the equations of our equational theory gives rise to a relation on ground terms #R(t_1, t_2)# iff there exists an equation #l = r# such that #t_1# matches #l# and #t_2# matches #r#. This relation can be extended to an equivalence relation on ground terms, and we can then quotient by that equivalence relation.
Let’s consider a worked example. We can consider the theory of monoids. We have two operations (types of AST nodes): Mul(
#x#,
#y#)
and 1
. We have the following three equations: Mul(1,
#x#) =
#x#, Mul(
#x#, 1) =
#x#, and Mul(Mul(
#x#,
#y#),
#z#) = Mul(
#x#, Mul(
#y#,
#z#))
. We additionally have a bunch of constants subject to no equations. In this case, it turns out we can define a normalization function, what I called #h# far above, and that the quotient type is isomorphic to lists of constants. Now, we can extend this theory to the theory of groups by adding a new operation, Inv(
#x#)
, and new equations: Inv(Inv(
#x#)) =
#x#, Inv(Mul(
#x#,
#y#)) = Mul(Inv(
#y#), Inv(
#x#))
, and Mul(Inv(
#x#),
#x#) = 1
. If we ignore the last of these equations, you can show that we can normalize to a form that is isomorphic to a list of a disjoint sum of the constants, i.e. [Either Const Const]
in Haskell if Const
were the type of the constant terms. Quotienting this type by the equivalence relation extended with that final equality, corresponds to adding the rule that a Left c
cancels out Right c
in the list whenever they are adjacent.
This overall example is a fairly profound one. Almost all of abstract algebra can be viewed as an instance of this or a closely related variation. When you hear about things defined in terms of “generators and relators”, it is an example of this sort. Indeed, those “relators” are used to define a relation that will be extended to an equivalence relation. Being defined in this way is arguably what it means for something to be “algebraic”.
The Introduction to Type Theory section of the NuPRL book provides a more comprehensive and somewhat more formal presentation of these and related concepts. While the quotient type view of quotients is conceptually different from the standard set theoretic presentation, it is much more amenable to computation as the #ZZ // n ZZ# example begins to illustrate.
]]>I don’t believe classical logic is false; I just believe that it is not true.
]]>Knockout is a nice JavaScript library for making values that automatically update when any of their “dependencies” update. Those dependencies can form an arbitrary directed acyclic graph. Many people seem to think of it as “yet another” templating library, but the core idea which is useful far beyond “templating” is the notion of observable values. One nice aspect is that it is a library and not a framework so you can use it as little or as much as you want and you can integrate it with other libraries and frameworks.
At any rate, this article is more geared toward those who have already decided on using Knockout or a library (in any language) offering similar capabilities. I strongly suspect the issues and solutions I’ll discuss apply to all similar sorts of libraries. While I’ll focus on one particular example, the ideas behind it apply generally. This example, admittedly, is one that almost anyone will implement, and in my experience will do it incorrectly the first time and won’t realize the problem until later.
When doing any front-end work, before long there will be a requirement to support “multi-select” of something. Of course, you want the standard select/deselect all functionality and for it to work correctly, and of course you want to do something with the items you’ve selected. Here’s a very simple example:
Item | |
---|---|
Here, the number selected is an overly simple example of using the selected items. More realistically, the selected items will trigger other items to show up and/or trigger AJAX requests to update the data or populate other data. The HTML for this example is completely straightforward.
<div id="#badExample">
Number selected: <span data-bind="text: $data.numberSelected()"></span>
<table>
<tr><th><input type="checkbox" data-bind="checked: $data.allSelected"/></th><th>Item</th></tr>
<!-- ko foreach: { data: $data.items(), as: '$item' } -->
<tr><td><input type="checkbox" data-bind="checked: $data.selected"/></td><td data-bind="text: 'Item number: '+$data.body"></td></tr>
<!-- /ko -->
<tr><td><button data-bind="click: function() { $data.add(); }">Add</button></td></tr>
</table>
</div>
The way nearly everyone (including me) first thinks to implement this is by adding a selected
observable to each item and then having allSelected
depend on all of the selected
observables. Since we also want to write to allSelected
to change the state of the selected
observables we use a writable computed observable. This computed observable will loop through all the items and check to see if they are all set to determine it’s state. When it is updated, it will loop through all the selected
observables and set them to the appropriate state. Here’s the full code listing.
var badViewModel = {
counter: 0,
items: ko.observableArray()
};
badViewModel.allSelected = ko.computed({
read: function() {
var items = badViewModel.items();
var allSelected = true;
for(var i = 0; i < items.length; i++) { // Need to make sure we depend on each item, so don't break out of loop early
allSelected = allSelected && items[i].selected();
}
return allSelected;
},
write: function(newValue) {
var items = badViewModel.items();
for(var i = 0; i < items.length; i++) {
items[i].selected(newValue);
}
}
});
badViewModel.numberSelected = ko.computed(function() {
var count = 0;
var items = badViewModel.items();
for(var i = 0; i < items.length; i++) {
if(items[i].selected()) count++;
}
return count;
});
badViewModel.add = function() {
badViewModel.items.push({
body: badViewModel.counter++,
selected: ko.observable(false)
});
};
ko.applyBindings(badViewModel, document.getElementById('#badExample'));
This should be relatively straightforward, and it works, so what’s the problem? The problem can be seen in numberSelected
(and it also comes up with allSelected
which I’ll get to momentarily). numberSelected
depends on each selected
observable and so it will be fired each time each one updates. That means if you have 100 items, and you use the select all checkbox, numberSelected
will be called 100 times. For this example, that doesn’t really matter. For a more realistic example than numberSelected
, this may mean rendering one, then two, then three, … then 100 HTML fragments or making 100 AJAX requests. In fact, this same behavior is present in allSelected
. When it is written, as it’s writing to the selected
observables, it is also triggering itself.
So the problem is updating allSelected
or numberSelected
can’t be done all at once, or to use database terminology, it can’t be updated atomically. One possible solution in newer versions of Knockout is to use deferredUpdates
or, what I did back in the much earlier versions of Knockout, abuse the rate limiting features. The problem with this solution is that it makes updates asynchronous. If you’ve written your code to not care whether it was called synchronously or asynchronously, then this will work fine. If you haven’t, doing this throws you into a world of shared state concurrency and race conditions. In this case, this solution is far worse than the disease.
So, what’s the alternative? We want to update all selected items atomically; we can atomically update a single observable; so we’ll put all selected items into a single observable. Now an item determines if it is selected by checking whether it is in the collection of selected items. More abstractly, we make our observables more coarse-grained, and we have a bunch of small computed observables depend on a large observable instead of a large computed observable depending on a bunch of small observables as we had in the previous code. Here’s an example using the exact same HTML and presenting the same overt behavior.
Item | |
---|---|
And here’s the code behind this second example:
var goodViewModel = {
counter: 0,
selectedItems: ko.observableArray(),
items: ko.observableArray()
};
goodViewModel.allSelected = ko.computed({
read: function() {
return goodViewModel.items().length === goodViewModel.selectedItems().length;
},
write: function(newValue) {
if(newValue) {
goodViewModel.selectedItems(goodViewModel.items().slice(0)); // Need a copy!
} else {
goodViewModel.selectedItems.removeAll();
}
}
});
goodViewModel.numberSelected = ko.computed(function() {
return goodViewModel.selectedItems().length;
});
goodViewModel.add = function() {
var item = { body: goodViewModel.counter++ }
item.selected = ko.computed({
read: function() {
return goodViewModel.selectedItems.indexOf(item) > -1;
},
write: function(newValue) {
if(newValue) {
goodViewModel.selectedItems.push(item);
} else {
goodViewModel.selectedItems.remove(item);
}
}
});
goodViewModel.items.push(item);
};
ko.applyBindings(goodViewModel, document.getElementById('#goodExample'));
One thing to note is that setting allSelected
and numberSelected
are now both simple operations. A write to an observable on triggers a constant number of writes to other observables. In fact, there are only two (non-computed) observables. On the other hand, reading the selected
observable is more expensive. Toggling all items has quadratic complexity. In fact, it had quadratic complexity before due to the feedback. However, unlike the previous code, this also has quadratic complexity when any individual item is toggled. Unlike the previous code, though, this is simply due to a poor choice of data structure. Equipping each item with an “ID” field and using an object as a hash map would reduce the complexity to linear. In practice, for this sort of scenario, it tends not to make a big difference. Also, Knockout won’t trigger dependents if the value doesn’t change, so there’s no risk of the extra work propagating into still more extra work. Nevertheless, while I endorse this solution for this particular problem, in general making finer grained observables can help limit the scope of changes so unnecessary work isn’t done.
Still, the real concern and benefit of this latter approach isn’t the asymptotic complexity of the operations, but the atomicity of the operations. In the second solution, every update is atomic. There are no intermediate states on the way to a final state. This means that dependents, represented by numberSelected
but which are realistically much more complicated, don’t get triggered excessively and don’t need to “compensate” for unintended intermediate values.
We could take the coarse-graining to its logical conclusion and have the view model for an application be a single observable holding an object representing the entire view model (and containing no observables of its own). Taking this approach actually does have a lot of benefits, albeit there is little reason to use Knockout at that point. Instead this starts to lead to things like Facebook’s Flux pattern and the pattern perhaps most clearly articulated by Cycle JS.
]]>For many people interested in type systems and type theory, their first encounter with the literature presents them with this:
#frac(Gamma,x:tau_1 |--_Sigma e : tau_2)(Gamma |--_Sigma (lambda x:tau_1.e) : tau_1 -> tau_2) ->I#
#frac(Gamma |--_Sigma f : tau_1 -> tau_2 \qquad Gamma |--_Sigma x : tau_1)(Gamma |--_Sigma f x : tau_2) ->E#
Since this notation is ubiquitous, authors (reasonably) expect readers to already be familiar with it and thus provide no explanation. Because the notation is ubiquitous, the beginner looking for alternate resources will not escape it. All they will find is that the notation is everywhere but exists in myriad minor variations which may or may not indicate significant differences. At this point the options are: 1) to muddle on and hope understanding the notation isn’t too important, 2) look for introductory resources which typically take the form of $50+ 500+ page textbooks, or 3) give up.
The goal of this article is to explain the notation part-by-part in common realizations, and to cover the main idea behind the notation which is the idea of an inductively defined relation. To eliminate ambiguity and make hand-waving impossible, I’ll ground the explanations in code, in particular, in Agda. That means for each example of the informal notation, there will be how it would be realized in Agda.^{1} It will become clear that I’m am not (just) using Agda as a formal notation to talk about these concepts, but that Agda’s^{2} data type mechanism directly captures them^{3}. The significance of this is that programmers are already familiar with many of the ideas behind the informal notation, and the notation is just obscuring this familiarity. Admittedly, Agda is itself pretty intimidating. I hope most of this article is accessible to those with familiarity with algebraic data types as they appear in Haskell, ML, Rust, or Swift with little to no need to look up details about Agda. Nevertheless, Agda at least has the benefit, when compared to the informal notation, of having a clear place to go to learn more, an unambiguous meaning, and tools that allow playing around with the ideas.
To start, if you are not already familiar with it, get familiar with the Greek alphabet. It will be far easier to (mentally) read mathematical notation of any kind if you can say “Gamma x” rather than “right angle thingy x” or “upside-down L x”.
Using the example from the introduction, the whole thing is a rule. The “|->I|” part is just the name of the rule (in this case being short for “|->| Introduction”). This rule is only part of the definition of the judgment of the form:
#Gamma |--_Sigma e : tau#
The judgment can be viewed as a proposition and the rule is an “if-then” statement read from top to bottom. So the “|->I|” rule says, “if #Gamma, x : tau_1 |--_Sigma e : tau_2#
then #Gamma |--_Sigma lambda x : tau_1.e : tau_1 -> tau_2#
”. It is often profitable to read it bottom-up as “To prove #Gamma |--_Sigma lambda x : tau_1.e : tau_1 -> tau_2#
you need to show #Gamma, x : tau_1 |--_Sigma e : tau_2#
”.
So what is the judgment saying? First, the judgment is, in this case, a four argument relation. The arguments of this relation are #Gamma#, #Sigma#, #e#, and #tau#. We could say the name of this relation is the perspicuous #(_)|--_((_)) (_) : (_)#
. Note that it does not make sense to ask what “⊢” means or what “:” means anymore than it makes sense to ask what “->” means in Haskell’s \ x -> e
.^{4}
In the context of type systems, #Gamma# is called the context, #Sigma# is called the signature, #e# is the term or expression, and #tau# is the type. Given this, I would read #Gamma |--_Sigma e : tau#
as “the expression e has type tau in context gamma given signature sigma.” For the “#->E#” rule we have, additionally, multiple judgements above the line. These are joined together by conjunction, that is, we’d read “#->E#” as “if #Gamma |--_Sigma f : tau_1 -> tau_2#
and #Gamma |--_Sigma x : tau_1#
then #Gamma |--_Sigma f x : tau_2#
In most recent type system research multiple judgments are necessary to describe the type system, and so you may see things like #Gamma |-- e > tau#
or #Gamma |-- e_1 "~" e_2#
. The key thing to remember is that these are completely distinct relations that will have their own definitions (although the collection of them will often be mutually recursively defined).
Relations in set theory are boolean valued functions. Being programmers, and thus constructivists, we want evidence, so a relation |R : A xx B -> bb2| becomes a type constructor R : (A , B) -> Set
. |R(a,b)| holds if we have a value (proof/witness) w : R a b
. An inductively defined relation or judgment is then just a type constructor for an (inductive) data type. That means, if R
is an inductively defined relation, then its definition is data R : A -> B -> Set where ...
. A rule is a constructor of this data type. A derivation is a value of this data type, and will usually be a tree-like structure. As a bit of ambiguity in the terminology (arguably arising from a common ambiguity in mathematical notation), it’s a bit more natural to use the term “judgment” to refer to something that can be (at the meta level) true or false. For example, we’d say |R(a,b)| is a judgment. Nevertheless, when we say something like “the typing judgment” it’s clear that we’re referring to the whole relation, i.e. |R|.
Since a judgment is a relation, we need to describe what the arguments to the relation look like. Typically something like BNF is used. The BNF definitions provide the types used as parameters to the judgments. It is common to use a Fortran-esque style where a naming convention is used to avoid the need to explicitly declare the types of meta-variables. For example, the following says meta-variables like #n#, #m#, and #n_1# are all natural numbers.
n, m ::= Z | S n
BNF definitions translate readily to algebraic data types.
data Nat : Set where
Z : Nat
S : Nat -> Nat
Agda note:
Set
is what is called*
in Haskell. “Type” would be a better name. Also, these sidebars will cover details about Agda with the aim that readers unfamiliar with Agda don’t get tripped up by tangential details.
Sometimes it’s not possible to fully capture the constraints on well-formed syntax with BNF. In other words, only a subset of syntactically valid terms are well-formed. For example, Nat Nat
is syntactically valid but is not well-formed. We can pick out that subset with a predicate, i.e. a unary relation. This is, of course, nothing but another judgment. As an example, if we wired the Maybe
type into our type system, we’d likely have a judgment that looks like #tau\ tt"type"#
which would include the following rule:
#frac(tau\ tt"type")(("Maybe"\ tau)\ tt"type")#
In a scenario like this, we’d also have to make sure the rules of our typing judgment also required the types involved to be well-formed. Modifying the example from the introduction, we’d get:
#frac(Gamma,x:tau_1 |--_Sigma e : tau_2 \qquad tau_1\ tt"type" \qquad tau_2\ tt"type")(Gamma |--_Sigma (lambda x:tau_1.e) : tau_1 -> tau_2) ->I#
As a very simple example, let’s say we wanted to provide explicit evidence that one natural number was less than or equal to another in Agda. Scenarios like this are common in dependently typed programming, and so we’ll start with the Agda this time and then “informalize” it.
data _isLessThanOrEqualTo_ : Nat -> Nat -> Set where
Z<=n : {n : Nat} -> Z isLessThanOrEqualTo n
Sm<=Sn : {m : Nat} -> {n : Nat} -> m isLessThanOrEqualTo n -> (S m) isLessThanOrEqualTo (S n)
Agda notes: In Agda identifiers can contain almost any character so
Z<=n
is just an identifier. Agda allows any identifier to be used infix (or more generally mixfix). The underscores mark where the arguments go. So_isLessThanOrEqualTo_
is a binary infix operator. Finally, curly brackets indicate implicit arguments which can be omitted and Agda will “guess” their values. Usually, they’ll be obvious to Agda by unification.
In the informal notation, the types of the arguments are implied by the naming. n
is a natural number because it was used as the metavariable (non-terminal) in the BNF for naturals. We also implicitly quantify over all free variables. In the Agda code, this quantification was explicit.
#frac()(Z <= n) tt"Z<=n"#
#frac(m <= n)(S m <= S n) tt"Sm<=Sn"#
Again, I want to emphasize that these are defining isLessThanOrEqualTo
and |<=|. They can’t be wrong. They can only fail to coincide with our intuitions or to an alternate definition. A derivation that |2 <= 3| looks like:
In Agda:
twoIsLessThanThree : (S (S Z)) isLessThanOrEqualTo (S (S (S Z)))
twoIsLessThanThree = Sm<=Sn (Sm<=Sn Z<=n)
In the informal notation:
#frac(frac()(Z <= S Z))(frac(S Z <= S (S Z))(S (S Z) <= S (S (S Z)))#
Here’s a larger example that also illustrates that these judgments do not need to be typing judgments. Here we’re defining a big-step operational semantics for the untyped lambda calculus.
x variable
v ::= λx.e
e ::= v | e e | x
In informal presentations, binders like #lambda# are handled in a fairly relaxed manner. While the details of handling binders are tricky and error-prone, they are usually standard and so authors assume readers can fill in those details and are aware of the concerns (e.g. variable capture). In Agda, of course, we’ll need to spell out the details. There are many approaches for dealing with binders with different trade-offs. One of the newer and more convenient approaches is parametric higher-order abstract syntax (PHOAS). Higher-order abstract syntax (HOAS) approaches allow us to reuse the binding structure of the host language and thus eliminate much of the work. Below, this is realized by the Lambda
constructor taking a function as its argument. In a later section, I’ll use a different approach using deBruijn indices.
-- PHOAS approach to binding
mutual
data Expr (A : Set) : Set where
Val : Value A -> Expr A
App : Expr A -> Expr A -> Expr A
Var : A -> Expr A
data Value (A : Set) : Set where
Lambda : (A -> Expr A) -> Value A
-- A closed expression
CExpr : Set1
CExpr = {A : Set} -> Expr A
-- A closed expression that is a value
CValue : Set1
CValue = {A : Set} -> Value A
Agda note:
Set1
is needed for technical reasons that are unimportant. You can just pretend it saysSet
instead. More important is that the definitions ofExpr
andValue
are a bit different than the definition for_isLessThanOrEqualTo_
. In particular, the argument(A : Set)
occurs to the left of the colon. When an argument occurs to the left of the colon we say it parameterizes the data declaration and that it is a parameter. When it occurs to the right of the colon we say it indexes the data declaration and that it is an index. The difference is that parameters must occur uniformly in the return type of the data constructors while indexes can be different in each data constructor. The arguments of an inductively defined relation like_isLessThanOrEqualTo_
will always be indexes (though there could be additional parameters.)
#frac(e_1 darr lambda x.e \qquad e_2 darr v_2 \qquad e[x|->v_2] darr v)(e_1 e_2 darr v) tt"App"#
#frac()(v darr v) tt"Trivial"#
The #e darr v# judgment (read as “the expression #e# evaluates to the value #v#”) defines a call-by-value evaluation relation. #e[x|->v]# means “substitute #v# for #x# in the expression #e#”. This notation is not standardized; there are many variants. In more rigorous presentations this operation will be formally defined, but usually the authors assume you are familiar with it. In the #tt"Trivial"#
rule, the inclusion of values into expressions is implicitly used. Note that the rule is restricted to values only.
The #tt"App"#
rule specifies call-by-value because the #e_2# expression is evaluated and then the resulting value is substituted into #e#. For call-by-name, we’d omit the evaluation of #e_2# and directly substitute #e_2# for #x# in #e#. Whether #e_1# or #e_2# is evaluated first (or in parallel) is not specified in this example.
subst : {A : Set} -> Expr (Expr A) -> Expr A
subst (Var e) = e
subst (Val (Lambda b)) = Val (Lambda (λ a -> subst (b (Var a))))
subst (App e1 e2) = App (subst e1) (subst e2)
data _EvaluatesTo_ : CExpr -> CValue -> Set1 where
EvaluateTrivial : {v : CValue} -> (Val v) EvaluatesTo v
EvaluateApp : {e1 : CExpr} -> {e2 : CExpr}
-> {e : {A : Set} -> A -> Expr A}
-> {v2 : CValue} -> {v : CValue}
-> e1 EvaluatesTo (Lambda e)
-> e2 EvaluatesTo v2
-> (subst (e (Val v2))) EvaluatesTo v
-> (App e1 e2) EvaluatesTo v
The EvaluateTrivial
constructor explicitly uses the Val
injection of values into expressions. The EvaluateApp
constructor starts off with a series of implicit arguments that introduce and quantify over the variables used in the rule. After those, each judgement above the line in the #tt"App"#
rule, becomes an argument to the EvaluateApp
constructor.
In this case ↓ is defining a functional relation, meaning for every expression there’s at most one value that the expression evaluates to. So another natural way to interpret ↓ is as a definition, in logic programming style, of a (partial) recursive function. In other words we can use the concept of mode from logic programming and instead of treating the arguments to ↓ as inputs, we can treat the first as an input and the second as an output.
↓ gives rise to a partial function because not every expression has a normal form. For _EvaluatesTo_
this is realized by the fact that we simply won’t be able to construct a term of type e EvaluatesTo v
for any v
if e
doesn’t have a normal form. In fact, we can use the inductive structure of the relationship to help prove that statement. (Unfortunately, Agda doesn’t present a very good experience for data types indexed by functions, so the proof is not nearly as smooth as one would like.)
Next we’ll turn to type systems which will present an even larger example, and will introduce some concepts that are specific to type systems (though, of course, they overlap greatly with concepts in logic due to the Curry-Howard correspondence.)
Below is an informal presentation of the polymorphic lambda calculus with explicit type abstraction and type application. An interesting fact about the polymorphic lambda calculus is that we don’t need any base types. Via Church-encoding, we can define types like natural numbers and lists.
α type variable
τ ::= τ → τ | ∀α. τ | α
x variable
c constant
v ::= λx:τ.e | Λτ.e | c
e ::= v | e e | e[τ] | x
In this case I’ll be using deBruijn indices to handle the binding structure of the terms and types. This means instead of writing |lambda x.lambda y. x|
, you would write |lambda lambda 1|
where the |1| counts how many binders (lambdas) you need to traverse to reach the binding site.
data TType : Set where
TTVar : Nat -> TType -- α
_=>_ : TType -> TType -> TType -- τ → τ
Forall : TType -> TType -- ∀α. τ
mutual
data TExpr : Set where
TVal : TValue -> TExpr -- v
TApp : TExpr -> TExpr -> TExpr -- f x
TTyApp : TExpr -> TType -> TExpr -- e[τ]
TVar : Nat -> TExpr -- x
data TValue : Set where
TLambda : TType -> TExpr -> TValue -- λx:τ.e
TTyLambda : TExpr -> TValue -- Λτ.e
TConst : Nat -> TValue -- c
In formulating the typing rules we need to deal with open terms, that is terms which refer to variables that they don’t bind. This should only happen if some enclosing terms did bind those variables, so we need to keep track of the variables that have been bound by enclosing terms. For example, when type checking |lambda x:tau.x|
, we’ll need to type check the subterm |x| which does not contain enough information in itself for us to know what the type should be. So, we keep track of what variables have been bound (and to what type) in a context and then we can just look up the expected type. When authors bother formally spelling out the context, it will look something like the following:
Γ ::= . | Γ, x:τ
Δ ::= . | Δ, α
We see that this is just a (snoc) list. In the first case, |Gamma|, it is a list of pairs of variables and types, i.e. an association list mapping variables to types. Often it will be treated as a finite mapping. In the second case, |Delta|, it is a list of type variables. Since I’m using deBruijn notation, there are no variables so we end up with a list of types in the first case. In the second case, we would end up with a list of nothing in particular, i.e. a list of unit, but that is isomorphic to a natural number. In other words, the only purpose of the type context, |Delta|, is to make sure we don’t use unbound variables, which in deBruijn notation just means we don’t have deBruijn indexes that try to traverse more lambdas than enclose them. The Agda code for the above is completely straight-forward.
data List (A : Set) : Set where
Nil : List A
_,_ : List A -> A -> List A
Context : Set
Context = List TType
TypeContext : Set
TypeContext = Nat
Signatures keep track of what primitive, “user-defined” constants might exist. Often the signature is omitted since nothing particularly interesting happens with it. Indeed, that will be the case for us. Nevertheless, we see that the signature is just another association list mapping constants to types.
Σ ::= . | Σ, c:τ
Signature : Set
Signature = List TType
The main reason I included the signature, beyond just covering it for the cases when it is included, is that sometimes certain rules can be better understood as manipulations of the signature. For example, in logic, universal quantification is often described by a rule like:
#frac(Gamma |-- P[x|->c] \qquad c\ "fresh")(Gamma |-- forall x.P)#
What’s happening and what “freshness” is is made a bit clearer by employing a signature (which for logic is usually just a list of constants similar to our TypeContext
):
#frac(Gamma |--_(Sigma, c) P[x|->c] \qquad c notin Sigma)(Gamma |--_Sigma forall x.P)#
To define the typing rules we need two judgements. The first, #Delta |-- tau#
, will be a simple judgement that says |tau| is a well formed type in |Delta|. This basically just requires that all variables are bound.
#frac(alpha in Delta)(Delta |-- alpha)#
#frac(Delta, alpha |-- tau)(Delta |-- forall alpha. tau)#
#frac(Delta |-- tau_1 \qquad Delta |-- tau_2)(Delta |-- tau_1 -> tau_2)#
The Agda is
data _<_ : Nat -> Nat -> Set where
Z<Sn : {n : Nat} -> Z < S n
Sn<SSm : {n m : Nat} -> n < S m -> S n < S (S m)
data _isValidIn_ : TType -> TypeContext -> Set where
TyVarJ : {n : Nat} -> {ctx : TypeContext} -> n < ctx -> (TTVar n) isValidIn ctx
TyArrJ : {t1 t2 : TType} -> {ctx : TypeContext} -> t1 isValidIn ctx -> t2 isValidIn ctx -> (t1 => t2) isValidIn ctx
TyForallJ : {t : TType} -> {ctx : TypeContext} -> t isValidIn (S ctx) -> (Forall t) isValidIn ctx
The meat is the following typing judgement, depending on the judgement defining well-formed types. I’m not really going to explain these rules because, in some sense, there is nothing to explain. Beyond explaining the notation itself, which was the point of the article, the below is “self-explanatory” in the sense that it is a definition, and whether it is a good definition or “meaningful” depends on whether we can prove the theorems we want about it.
#frac(c:tau in Sigma \qquad Delta |-- tau)(Delta;Gamma |--_Sigma c : tau) tt"Const"#
#frac(x:tau in Gamma \qquad Delta |-- tau)(Delta;Gamma |--_Sigma x : tau) tt"Var"#
#frac(Delta;Gamma |--_Sigma e_1 : tau_1 -> tau_2 \qquad Delta;Gamma |--_Sigma e_2 : tau_1)(Delta;Gamma |--_Sigma e_1 e_2 : tau_2) tt"App"#
#frac(Delta;Gamma |--_Sigma e : forall alpha. tau_1 \qquad Delta |-- tau_2)(Delta;Gamma |--_Sigma e[tau_2] : tau_1[alpha|->tau_2]) tt"TyApp"#
#frac(Delta;Gamma, x:tau_1 |--_Sigma e : tau_2 \qquad Delta |-- tau_1)(Delta;Gamma |--_Sigma (lambda x:tau_1.e) : tau_1 -> tau_2) tt"Abs"#
#frac(Delta, alpha;Gamma |--_Sigma e : tau)(Delta;Gamma |--_Sigma (Lambda alpha.e) : forall alpha. tau) tt"TyAbs"#
Here’s the corresponding Agda code. Note, all Agda is doing for us here is making sure we haven’t written self-contradictory nonsense. In no way is Agda ensuring that this is the “right” definition. For example, it could be the case (but isn’t) that there are no values of this type. Agda would be perfectly content to let us define a type that had no values.
tySubst : TType -> TType -> TType
tySubst t1 t2 = tySubst' t1 t2 Z
where tySubst' : TType -> TType -> Nat -> TType
tySubst' (TTVar Z) t2 Z = t2
tySubst' (TTVar Z) t2 (S _) = TTVar Z
tySubst' (TTVar (S n)) t2 Z = TTVar (S n)
tySubst' (TTVar (S n)) t2 (S d) = tySubst' (TTVar n) t2 d
tySubst' (t1 => t2) t3 d = tySubst' t1 t3 d => tySubst' t2 t3 d
tySubst' (Forall t1) t2 d = tySubst' t1 t2 (S d)
data _isIn_at_ {A : Set} : A -> List A -> Nat -> Set where
Found : {a : A} -> {l : List A} -> a isIn (l , a) at Z
Next : {a : A} -> {b : A} -> {l : List A} -> {n : Nat} -> a isIn l at n -> a isIn (l , b) at (S n)
data _hasType_inContext_and_given_ : TExpr -> TType -> Context -> TypeContext -> Signature -> Set where
ConstJ : {t : TType} -> {c : Nat}
-> {Sigma : Signature} -> {Gamma : Context} -> {Delta : TypeContext}
-> t isIn Sigma at c
-> t isValidIn Delta
-> (TVal (TConst c)) hasType t inContext Gamma and Delta given Sigma
VarJ : {t : TType} -> {x : Nat}
-> {Sigma : Signature} -> {Gamma : Context} -> {Delta : TypeContext}
-> t isIn Gamma at x
-> t isValidIn Delta
-> (TVar x) hasType t inContext Gamma and Delta given Sigma
AppJ : {t1 : TType} -> {t2 : TType} -> {e1 : TExpr} -> {e2 : TExpr}
-> {Sigma : Signature} -> {Gamma : Context} -> {Delta : TypeContext}
-> e1 hasType (t1 => t2) inContext Gamma and Delta given Sigma
-> e2 hasType t1 inContext Gamma and Delta given Sigma
-> (TApp e1 e2) hasType t2 inContext Gamma and Delta given Sigma
TyAppJ : {t1 : TType} -> {t2 : TType} -> {e : TExpr}
-> {Sigma : Signature} -> {Gamma : Context} -> {Delta : TypeContext}
-> e hasType (Forall t1) inContext Gamma and Delta given Sigma
-> t2 isValidIn Delta
-> (TTyApp e t2) hasType (tySubst t1 t2) inContext Gamma and Delta given Sigma
AbsJ : {t1 : TType} -> {t2 : TType} -> {e : TExpr}
-> {Sigma : Signature} -> {Gamma : Context} -> {Delta : TypeContext}
-> e hasType t2 inContext (Gamma , t1) and Delta given Sigma
-> (TVal (TLambda t1 e)) hasType (t1 => t2) inContext Gamma and Delta given Sigma
TyAbsJ : {t : TType} -> {e : TExpr}
-> {Sigma : Signature} -> {Gamma : Context} -> {Delta : TypeContext}
-> e hasType t inContext Gamma and (S Delta) given Sigma
-> (TVal (TTyLambda e)) hasType (Forall t) inContext Gamma and Delta given Sigma
Here’s a typing derivation for the polymorphic constant function:
tyLam : TExpr -> TExpr
tyLam e = TVal (TTyLambda e)
lam : TType -> TExpr -> TExpr
lam t e = TVal (TLambda t e)
polyConst
: tyLam (tyLam (lam (TTVar Z) (lam (TTVar (S Z)) (TVar (S Z))))) -- Λs.Λt.λx:t.λy:s.x
hasType (Forall (Forall (TTVar Z => (TTVar (S Z) => TTVar Z)))) -- ∀s.∀t.t→s→t
inContext Nil and Z
given Nil
polyConst = TyAbsJ (TyAbsJ (AbsJ (AbsJ (VarJ (Next Found) (TyVarJ Z<Sn))))) -- written by Agda
data False : Set where
Not : Set -> Set
Not A = A -> False
wrongType
: Not (tyLam (lam (TTVar Z) (TVar Z)) -- Λt.λx:t.x
hasType (Forall (TTVar Z)) -- ∀t.t
inContext Nil and Z
given Nil)
wrongType (TyAbsJ ())
Having written all this, we have not defined a type checking algorithm (though Agda’s auto
tactic does a pretty good job); we’ve merely specified what evidence that a program is well-typed is. Explicitly, a type checking algorithm would be a function with the following type:
data Maybe (A : Set) : Set where
Nothing : Maybe A
Just : A -> Maybe A
typeCheck : (e : TExpr) -> (t : TType) -> (sig : Signature) -> Maybe (e hasType t inContext Nil and Z given sig)
typeCheck = ?
In fact, we’d want to additionally prove that this function never returns Nothing
if there does exist a typing derivation that would give e
the type t
in signature sig
. We could formalize this in Agda by instead giving typeCheck
the following type:
data Decidable (A : Set) : Set where
IsTrue : A -> Decidable A
IsFalse : Not A -> Decidable A
typeCheckDec : (e : TExpr) -> (t : TType) -> (sig : Signature) -> Decidable (e hasType t inContext Nil and Z given sig)
typeCheckDec = ?
This type says that either typeCheckDec
will return a typing derivation, or it will return a proof that there is no typing derivation. As the name Decidable
suggests, this may not always be possible. Which is to say, type checking may not always be decidable. Note, we can always check that a typing derivation is valid — we just need to verify that we applied the rules correctly — what we can’t necessarily do is find such a derivation given only the expression and the type or prove that no such derivation exists. Similar concerns apply to type inference which could have one of the following types:
record Σ (T : Set) (F : T -> Set) : Set where
field
fst : T
snd : F fst
inferType : (e : TExpr) -> (sig : Signature) -> Maybe (Σ TType (λ t → e hasType t inContext Nil and Z given sig))
inferType = ?
inferTypeDec : (e : TExpr) -> (sig : Signature) -> Decidable (Σ TType (λ t → e hasType t inContext Nil and Z given sig))
inferTypeDec = ?
where Σ indicates a dependent sum, i.e. a pair where the second component (here of type e hasType t inContext Nil and Z given sig
) depends on the first component (here of type TType
). With type inference we have the additional concern that there may be multiple possible types an expression could have, and we may want to ensure it returns the “most general” type in some sense. There may not always be a good sense of “most general” type and user-input is required to pick out of the possible types.
Sometimes the rules themselves can be viewed as the defining rules of a logic program and thus directly provide an algorithm. For example, if we eliminate the rules, types, and terms related to polymorphism, we’d get the simply typed lambda calculus. A Prolog program to do type checking can be written in a few lines with a one-to-one correspondence to the type checking rules (and, for simplicitly, also omitting the signature):
lookup(z, [T|_], T).
lookup(s(N), [_|Ctx],T) :- lookup(N, Ctx, T).
typeCheck(var(N), Ctx, T) :- lookup(N, Ctx, T).
typeCheck(app(F,X), Ctx, T2) :- typeCheck(F, Ctx, tarr(T1, T2)), typeCheck(X, Ctx, T1).
typeCheck(lam(B), Ctx, tarr(T1, T2)) :- typeCheck(B, [T1|Ctx], T2).
This Agda file contains all the code from this article.↩
Most dependently typed languages, such as Coq or Epigram would also be adequate.↩
See this StackExchange answer for more discussion of this.↩
#r = sqrt(1/2^2 + 1/2^2) - 1/2 = sqrt(2)/2 - 1/2 ~~ 0.207106781#
Now if we go to three dimensions we’ll have eight circles instead of four, but everything else is the same
except the distances will now be #sqrt(1/2^2 + 1/2^2 + 1/2^2)#
. It’s clear that the only
difference for varying dimensions is that in dimension #n# we’ll have #n# #1/2^2# terms under the square
root sign. So the general solution is easily shown to be:
#r = sqrt(n)/2 - 1/2#
You should be weirded out now. If you aren’t, here’s a hint: what happens when #n = 10#? Here’s another
hint: what happens as #n# approaches #oo#?
]]>The ultimate goal of behavioral reflection (aka procedural reflection and no doubt other things) is to make a language where programs within the language are able to completely redefine the language as it executes. This is arguably the pinnacle of expressive power. This also means, though, local reasoning about code is utterly impossible, essentially by design.
Smalltalk is probably the language closest to this that is widely used. The Common Lisp MOP (Meta-Object Protocol) is also inspired by research in this vein. The ability to mutate classes and handle calls to missing methods as present in, e.g. Ruby, are also examples of very limited forms of behavioral reflection. (Though, for the latter, often the term “structural reflection” is used instead, reserving “behavioral reflection” for things beyond mere structural reflection.)
The seminal reference for behavioral reflection is Brian Smith’s 1982 thesis on 3-LISP. This was followed by the languages Brown, Blond, Refci, and Black in chronological order. This is not a comprehensive list, and the last time I was in to this was about a decade ago. (Sure enough, there is some new work on Black and some very new citations of Black.)
The core idea is simply to expose the state of the (potentially conceptual) interpreter and allow it to be manipulated.
From this perspective Scheme’s call/cc
is a basic, limited example of behavioral reflection. It exposes one part of the state of the interpreter and allows it to be replaced. Delimited continuations (i.e. shift
and reset
) are a bit more powerful. They expose the same part of the state as call/cc
, but they expose it in a more structured manner that allows more manipulations beyond merely replacing it. We could imagine representing the continuation with a less opaque object than a function which would lead to Smalltalk’s MethodContext
.
Early Lisps’ fexpr was another example of exposing a different part of the interpreter state, namely the expression under evaluation. The Kernel language explores this in more depth (among other things which can likely also be classified as forms of behavior reflection.)
For a language with mutation the heap would also be exposed. For a language without mutation, mutation can be added using the techniques from Representing Monads since something at least as powerful as delimited continuations will almost certainly be available. At that point, the heap would be first-class.
As an aside, I like to use the word “introspection” for purely read-only access to interpreter state, and reserve the word “reflection” for when manipulations are possible. Terminologically, “reflection” is also often broken down into “structural reflection” for the widely available ability to add and remove methods to classes and similar such operations; and “behavioral” or “procedural” reflection, the ability to manipulate the language itself. To control introspection and, at least, structural reflection, mirrors can be used.
Making a simple behaviorally reflective language is actually pretty straightforward. If you have an abstract machine for your language, all you need to do is provide one additional primitive operation which gets passed the interpreter state and returns a new interpreter state. It may be clearer and cleaner, perhaps, to split this into two operations: one, often called reify
, that provides the interpreter state as a value, and another, often called reflect
, that sets the interpreter state to the state represented by the given value. Note that both reify
and reflect
are (very) impure operations. The more tedious part is marshalling the state of the interpreter into a language level object and vice versa. In the special case of a meta-circular interpreter, this part turns out to be trivial. The following interpreter is NOT meta-circular though.
The approach taken in this example, while simple, is very unstructured. For example, it is possible to write a procedure that when evaluated transforms the language from call-by-value (the CEK machine is an implementation of the CbV lambda calculus), to call-by name. However, to do this requires walking all the code in the environment and continuation and rewriting applications to force their first argument and delay their second argument. Primitives also need to be dealt with. It would be far nicer and cleaner to simply be able to say, “when you do an application, do this instead.” The newer behaviorally reflective languages work more like this.
Note, despite the fact that we do a global transformation, this is not an example of lack of expressiveness. We can define this transformation locally and execute it at run-time without coordination with the rest of the code. In this sense, everything is macro expressible (a la Felleisen) because arbitrary global transformations are macro expressible.
You can get a copy of the full version of the code from here.
I arbitrarily decided to start from the CEK machine: an abstract machine for the untyped call-by-value lambda calculus. (CEK stands for Control-Environment-Kontinuation, these being the three pieces of interpreter state with control being the expression driving evaluation.) While a call-by-value language is probably the way to go because both reify
and reflect
are very impure operations (particularly reflect
), the main motivation for choosing this machine was that the state components correspond in a direct way to concepts that are natural to the programmer. Compare this to the SECD machine which stands for Stack-Environment-Control-Dump.
The AST uses deBruijn indices.
data Expr
= EVar !Int
| ELam Expr
| Expr :@: Expr
The only types of values we have are closures. The two additional pieces of interpreter state are the environment and the continuation (the call stack).
data Value = Closure Expr Env
type Env = [Value]
data Kont = Done | Arg Expr Env Kont | Fun Value Kont
Finally the evaluator is fairly straightforward, and is a straightforward transcription of the operational semantics. Evaluation starts off with the final contination and an empty environment. With a bit of inspection you can see that evaluation is call-by-value and proceeds left-to-right. Derive Show
for Expr
and Value
and these 15 lines of code are a complete interpreter.
evaluate :: Expr -> Value
evaluate e = cek e [] Done
cek :: Expr -> Env -> Kont -> Value
cek (EVar i) env k = cek e env' k where Closure e env' = env !! i
cek (f :@: x) env k = cek f env (Arg x env k)
cek (ELam b) env Done = Closure b env
cek (ELam b) env (Arg x env' k) = cek x env' (Fun (Closure b env) k)
cek (ELam b) env (Fun (Closure b' env') k) = cek b' (Closure b env:env') k
The first stab at adding reflective features looks like this. We add the primitive operation to reify and reflect. We’ll require them to be wrapped in lambdas so the interpreter doesn’t have to deal with unevaluated arguments when interpreting them. Note that reify
doesn’t pass the Expr
part to its argument. This is because the Expr
part would just be EReify
. The arguments of this particular application are stored, unevaluated, in the continuation as arguments needing to be evaluated. So, if we want to define quote
which simply returns the expression representing it’s argument, we’ll have to dig into the continuation to get that argument.
data Expr
= ...
| EReify
| EReflect
-- reify f = f e k
reify = ELam EReify
-- reflect c e k
reflect = ELam (ELam (ELam EReflect))
And here’s what we’d like to write in the interpreter (and is very close to what we ultimately will write.)
cek :: Expr -> Env -> Kont -> Value
...
cek EReify (Closure b env':env) k = cek b (k:env:env') k
cek EReflect (k:e:c:_) _ = cek c e k
There are two problems with this code: one minor and one major. The minor problem is that the argument to reify
takes two arguments but we can’t just pass them directly to it. We need to follow our calling convention which expects to evaluate arguments one at a time. This problem is easily fixed by pushing an Arg
call stack frame onto the continuation to (trivially) evaluate the continuation.
The major problem is that this doesn’t type check. c
, e
, env
, and k
can’t simultaneously be Value
s and Expr
s, Env
s, and Kont
s. We need a way to embed and project Expr
, Env
, and Kont
value into and out of Value
. The embedding is easy; you just fold over the data structure and build up a lambda term representing the Church encoding. The projection from Value
s is… non-obvious, to say the least.
Instead of figuring that out, we can simply add Expr
, Env
, and Kont
to our language as primitive types. This is also, almost certainly, dramatically more efficient.
We extend our AST and Value
.
data Expr
= ...
| EInject Value
-- Int so we can manipulate the EVar case.
data Value = Closure Expr Env | Int !Int | Expr Expr | Env Env | Kont Kont
The changes to the interpreter are minimal. We need to change the EVar
case to handle the new kinds of values that can be returned and add a trivial EInject
case. Some cases can be omitted because they would only come up in programs that would “get stuck” anyway. (In our case, “getting stuck” means pattern match failure or index out of bounds.)
inject :: Value -> (Expr, Env)
inject (Closure b env) = (ELam b, env)
inject v = (EInject v, [])
cek :: Expr -> Env -> Kont -> Value
cek (EVar i) env k = cek e env' k where (e, env') = inject (env !! i)
...
cek EReify (Closure b env':env) k = cek b (Env env:env') (Arg (EInject (Kont k)) [] k)
cek EReflect (Kont k:Env e:Expr c:_) _ = cek c e k
cek (EInject v) _ Done = v
cek (EInject v) _ (Fun (Closure b' env') k) = cek b' (v:env') k
While this interpreter achieves the goal, it is somewhat limited. We don’t have any means to manipulate the values of these new primitive types, so our manipulation of the interpreter state is limited to replacing a component, e.g. the environment, with some version of it that we got before via reify
. Though it may be limited, it is not trivial. You can implement something close to call/cc if not call/cc itself.
Still, the scenario above of turning the language into a call-by-name language doesn’t seem possible. Modifying the interpreter to support primitive operations defined in Haskell is a simple matter of programming: you add a constructor for primitive operations to the AST, you make a very slight variant of the EInject
case in cek
, and then you tediously make primitives corresponding to each constructor for each type and a fold for each type. See the linked source file for the details.
The file additionally defines a pretty printer and a layer using parametric higher-order abstract syntax because humans are terrible with deBruijn indices. The end result is code that looks like this:
one = _Suc :@ _Zero
identity = lam (\x -> x)
loop = lam (\x -> x :@ x) :@ lam (\x -> x :@ x)
tailEnv = lam (\e -> paraEnv :@ e :@ _Nil
:@ lam (\_ -> lam (\x -> lam (\_ -> x))))
tailKont = lam (\k -> paraKont :@ k :@ _Done
:@ lam (\_ -> lam (\_ -> lam (\x -> lam (\_ -> x))))
:@ lam (\_ -> lam (\x -> lam (\_ -> x))))
eval = lam (\c -> reify :@ lam (\e -> lam (\k -> reflect :@ c :@ (tailEnv :@ e) :@ k)))
quote = reify :@ lam (\e -> lam (\k -> reflect :@ (_Inject :@ c k) :@ e :@ (tailKont :@ k)))
where c k = paraKont :@ k :@ garbage
:@ lam (\x -> (lam (\_ -> lam (\_ -> lam (\_ -> x)))))
:@ lam (\_ -> lam (\_ -> lam (\_ -> garbage)))
garbage = _Inject :@ _Zero
callCC = lam (\f -> reify :@ lam (\e -> lam (\k ->
f :@ lam (\a -> reflect :@ (_Inject :@ a) :@ (tailEnv :@ e) :@ k))))
example1 = evaluate $ quote :@ loop -- output looks like: Expr ((\a -> a a) (\a -> a a))
example2 = evaluate $ eval :@ (quote :@ loop) -- loops forever
example3 = evaluate $ callCC :@ lam (\k -> k :@ one :@ loop) -- escape before evaluating the loop
example4 = evaluate $ callCC :@ lam (\k -> lam (\_ -> loop) :@ (k :@ one)) -- also escape before the loop
Here’s a small example of why people find category theory interesting. Consider the category, call it |ccC|, consisting of two objects, which we’ll imaginatively name |0| and |1|, and two non-identity arrows |s,t : 0 -> 1|. The category of graphs (specifically directed multigraphs with loops) is the category of presheaves over |ccC|, which is to say, it’s the category of contravariant functors from |ccC| to |cc"Set"|
. I’ll spell out what that means momentarily, but first think about what has been done. We’ve provided a complete definition of not only graphs but the entire category of graphs and graph homomorphisms given you’re familiar with very basic category theory^{1}. This is somewhat magical.
Spelling things out, |G| is a contravariant functor from |ccC| to |cc"Set"|
. This means that |G| takes each object of |ccC| to a set, i.e. |G(0) = V| and |G(1) = E| for arbitrary sets |V| and |E|, but which will represent the vertices and edges of our graph. In addition, |G| takes the arrows in |ccC| to a (set) function, but, because it’s contravariant, it flips the arrows around. So we have, for example, |G(s) : G(1) -> G(0)| or |G(s) : E -> V|, and similarly for |t|. |G(s)| is a function that takes an edge and gives its source vertex, and |G(t)| gives the target. The arrows in the category of graphs are just natural transformations between the functors. In this case, that means if we have two graphs |G| and |H|, a graph homomorphism |f : G -> H| is a pair of functions |f_0 : G(0) -> H(0)| and |f_1 : G(1) -> H(1)| that satisfy |H(s) @ f_1 = f_0 @ G(s)| and |H(t) @ f_1 = f_0 @ G(t)|. With a bit of thought you can see that all this says is that we must map the source and target of an edge |e| to the source and target of the edge |f_1(e)|. You may also have noticed that any pair of sets and pair of functions between them is a graph. In other words, graphs are very simple things. This takes out a lot of the magic, though it is nice that we get the right notion of graph homomorphism for free. Let me turn the magic up to 11.
Presheaves are extremely important in category theory, so categorists know a lot about them. It turns out that they have a lot of structure (mostly because |cc"Set"|
has a lot of structure). In particular, in the categorical jargon, if |ccC| is small (and finite is definitely small), then the category of presheaves over |ccC| is a complete and cocomplete elementary topos with a natural number object. For those of you learning category theory who’ve gotten beyond the basics, proving this is a very good and important exercise. In programmer-speak, that says we can do dependently typed lambda calculus with algebraic data and codata types in that category. To be clear, this is not saying we have a lambda calculus with a graph type. It’s saying we have a lambda calculus where all our types have graph structure.
But set that aside for now. This isn’t an article about graphs so let’s start generalizing away from them. We’ll start with a baby step. You can probably guess how we’d go about making edge labeled graphs. We’d add another object to |ccC|, say |2|, and an arrow from |2| to |1|, call it |l_e| — remember that the arrows are “backwards” because presheaves are contravariant. You may start seeing a pattern already. One way to start formalizing that pattern is to say for every object of |ccC| we have an abstract data type with a field for each arrow into that object. In our example, |1| has three arrows into it, namely |s|, |t|, and |l_e|. We can view an element |e in F(1)| for a presheaf |F| as having fields e.s
, e.t
and e.l_e
. Since |cc"Set"|
has (non-constructive, non-computable) decidable equality for everything, we can tell |e| from |e’| even if e.s = e'.s
and e.t = e'.t
and e.l_e = e'.l_e
. This is different from a normal abstract data type (in a purely functional language) where an abstract data type with three fields would be isomorphic to a 3-tuple. The extra ability to differentiate them could be modeled by having a fourth field that returned something that could only be compared for equality, kind of like Data.Unique (ignoring the Ord
instance…) It may look like:
data Vertex -- abstract
vertexId :: Vertex -> Unique
data Label -- abstract
labelId :: Label -> Unique
data Edge -- abstract
edgeId :: Edge -> Unique
src :: Edge -> Vertex
tgt :: Edge -> Vertex
label :: Edge -> Label
Of course, if these are literally all the functions we have for these types (plus some constants otherwise we can’t make anything), then these abstract types might as well be records. Anything else they have is unobservable and thus superfluous.
data Vertex = Vertex { vertexId :: Unique }
data EdgeLabel = EdgeLabel { edgeLabelId :: Unique }
data Edge = Edge {
edgeId :: Unique,
src :: Vertex,
tgt :: Vertex,
edgeLabel :: EdgeLabel
}
This starts to suggest that we can just turn each object in our category |ccC| into a record with an ID field. Each arrow in to that object becomes an additional field. This almost works. We’ll come back to this, but let’s take another baby step.
Something a bit more interesting happens if we want to label the vertices. We proceed as before: add another object, call it |3|, and another arrow |l_v : 3 -> 0|. This time, though, we’re not finished. |ccC| is supposed to be a category, so we can compose arrows and thus we have two composites but no arrow for the composites to be, namely: |s @ l_v| and |t @ l_v|. The abstract data type intuition suggests that whenever we run into this situation, we should add a distinct arrow for each composite. For the purpose of labeling vertices, this is the right way to go: we want to think of each vertex as having a vertex label field with no constraints. There is, however, another choice. We could add one new arrow and have both composites be equal to it. What that would say is that for every edge, the source and the target vertices must have the same label. It’s easy to see that that would lead to every vertex in a connected component having the same label. In code, the former choice would look like:
data VertexLabel = VertexLabel { vertexLabelId :: Unique }
data Vertex = Vertex {
vertexId :: Unique,
vertexLabel :: VertexLabel
}
data EdgeLabel = EdgeLabel { edgeLabelId :: Unique }
data Edge = Edge {
edgeId :: Unique,
src :: Vertex,
tgt :: Vertex,
edgeLabel :: EdgeLabel
-- With our earlier "rule" to add a field for each arrow
-- we should also have:
-- srcVertexLabel :: VertexLabel
-- tgtVertexLabel :: VertexLabel
-- but, by definition, these are:
-- vertexLabel . src and
-- vertexLabel . tgt respectively.
-- So we don't explicitly add a field for composite arrows.
}
The latter choice would look the same, except there would be an extra constraint that we can’t capture in Haskell, namely vertexLabel . src == vertexLabel . tgt
.
If we stick to the abstract data type intuition and we have a cycle in the arrows in |ccC|, then the requirement to add a distinct arrow for each composite means we need to add a countably infinite number of arrows, so |ccC| is no longer finite. It is still “small” though, so that’s no problem. We could say we have a finite graph of the non-composite arrows and possibly some equations like |s @ l_v = t @ l_v| and we generate a category from that graph subject to those equations by adding in all composites as necessary. We will use the arrows in this graph to decide on the fields for our abstract data types, rather than the arrows in the category so we don’t end up with an infinity of fields in our data types. Even when there aren’t an infinite number of arrows in our category, this is still useful since we aren’t going to add a field to our edges to hold each vertex’s label, since we can just get the vertex and then get the label.
Some of you have probably thought of another more appropriate intuition. A presheaf is a database. The category which our presheaf is over, what we’ve been calling |ccC| ^{2} is sort of like a schema. You don’t specify types, but you do specify foreign key relationships plus some additional integrity constraints that don’t fit in to the typical ones supported by databases.
For our earlier example:
CREATE VertexLabel (Id uniqueidentifier PRIMARY KEY);
CREATE Vertex (
Id uniqueidentifier PRIMARY KEY,
Label uniqueidentifier REFERENCES VertexLabel(Id)
);
CREATE EdgeLabel (Id uniqueidentifier PRIMARY KEY);
CREATE Edge (
Id uniqueidentifier PRIMARY KEY,
Src uniqueidentifier REFERENCES VertexLabel(Id),
Tgt uniqueidentifier REFERENCES VertexLabel(Id),
Label uniqueidentifier REFERENCES EdgeLabel(Id)
-- Nothing stops us from having additional columns here and elsewhere
-- corresponding to the fact that our types were really abstract data
-- types in the Haskell, and to the fact that we can choose an arbitrary
-- set as long as it has at least this much structure. They won't be
-- preserved by "homomorphisms" though.
);
To be clear, each presheaf corresponds to a database with data. Inserting a row into a table is a homomorphism of presheaves, but so is adding a (not preserved by homomorphism) column. The schema above corresponds to the |ccC| part of the presheaf.
If we had made that second choice before where we had only one arrow back that would lead to the following integrity constraint:
NOT EXISTS(SELECT *
FROM Edge e
JOIN Vertex src ON e.Src = src.Id
JOIN Vertex tgt ON e.Tgt = tgt.Id
WHERE src.Label <> tgt.Label)
It turns out this intuition is completely general. You can actually think of all of presheaf theory as a (slightly unusual) database theory. More objects just means more tables. More arrows means more foreign keys and potentially additional integrity constraints. It’s not exactly relational though. In fact, in some ways it’s a bit closer to SQL^{3} or object databases. You can turn this around too going back to the point at the beginning. Whether or not we can think of all presheaves like databases or all databases like presheaves, we can certainly think of this example, and many like it, as presheaves. This means for many “databases”, we can talk about doing dependently typed programming where our types are databases of the given shape. This also dramatically shifts the focus in database theory from databases to database migrations, i.e. homomorphisms of databases.
David Spivak is the one who has done the most on this, so I’ll refer you to his work. He also has a better story for schemas that are much closer to normal schemas. I’ll end as I began:
Here’s a small example of why people find category theory interesting.
]]>Arguably (1-)category theory is the study of universal properties. There are three standard formulations of the notion of universal property which are all equivalent. Most introductions to category theory will cover all three and how they relate. By systematically preferring one formulation above the others, three styles of doing category theory arise with distinctive constructions and proof styles.
I noticed this trichotomy many years ago, but I haven’t heard anyone explicitly identify and compare these styles. Furthermore, in my opinion, one of these styles, the one I call Set-category theory, is significantly easier to use than the others, but seems poorly represented in introductions to category theory.
For myself, it wasn’t until I read Basic Concepts of Enriched Category Theory by G. M. Kelly and was introduced to indexed (co)limits (aka weighted (co)limits), that I began to recognize and appreciate Set-category theory. Indexed (co)limits are virtually absent from category theory introductions, and the closely related notion of (co)ends are also very weakly represented despite being far more useful than (conical) (co)limits. I don’t know why this is. Below I’ll be more focused on comparing the styles than illustrating the usefulness of indexed (co)limits. I may write an article about that in the future; in the mean time you can read “Basic Concepts of Enriched Category Theory” and just ignore the enriched aspect. Admittedly, it is definitely not an introduction to category theory.
What follows are three equivalent formulations of the notion of universal property including proofs of equivalence.
Initiality is by far the simplest formulation of universal properties to understand because it is a very special case, albeit a special case that can be finagled to cover the general case.
An object, typically denoted |0|, is initial iff for every object in the category there exists a unique arrow from |0|. If |A| is an object in the category, we can write |(:A:) : 0 -> A| for the unique arrow.
Representability isn’t as simple or intuitive as initiality, but it is still pretty simple.
A functor |F : C^(op) -> Set| is representable iff |F ~= Hom(-,Y)| for some |Y|. Note that this is an isomorphism between functors, i.e. a natural isomorphism. We say |Y| represents the functor |F|. A functor |G : C -> Set| is co-representable iff |G ~= Hom(X,-)| for some |X|. Similarly we say |X| (co-)represents the functor |G|. Representable and co-representable aren’t actually different; if we set |D = C^(op)| then we could say |F| is co-representable as a functor from |D| and it would mean the same thing.
Now we need to prove initiality and (co-)representability equivalent. One direction is easy, |0| co-represents the functor |lambda C . {**}|.^{1} It’s easy to show that this says exactly that |0| is initial.
I’ll hold off on the other direction and instead prove the circular implication: Representability |=>| Initiality |=>| Universal Arrow |=>| Representability. Right now we’ve done the first step; we can recast initiality as representability. Next we’ll show that we can recast universal arrows as initiality and representability as a universal arrow and we’ll have shown them all equivalent.
Let |U : C -> D| be a functor. |eta : A -> UX| is a universal arrow iff for all |f : A -> UY| there exists a unique arrow |(:f:) : X -> Y| such that |f = U(:f:) @ eta|. Dually, let |F : C -> D| be a functor, then |epsilon : FX -> A| is a co-universal arrow iff for all |g : FY -> A| there exists a unique arrow |(:g:) : Y -> X| such that |g = epsilon @ F(:g:)|. Again, these are equivalent notions. A co-universal arrow is just a universal arrow in the opposite category.
Now to prove Initiality |=>| Universal Arrow. Consider the (comma) category whose objects are pairs |(f, Y)| where |f : A -> UY| and whose arrows |k : (f, Y) -> (g, Z)| are arrows |k : Y -> Z| such that |g = Uk @ f|. |eta| is a universal arrow iff |(eta, X)| is initial in this category. This is easy to check.
Finally, Universal Arrow |=>| Representability. Let |G : C -> Set|. The claim is there exists a universal arrow |eta : {**} -> GX|
iff |X| co-represents |G|. Doing the backwards direction first, if |X| co-represents |G| then there is an element of |GX| that corresponds to |id : X -> X|. Have |eta| pick out that element. Given another element |y : {**} -> GY| we need to find a unique arrow |(:y:) : X -> Y| such that |G(:y:)(eta) = y|. Naturality of the isomorphism making |X| co-represent |G| says: |f @ phi_A(a) = phi_B(Gf(a))| where |phi : G ~= Hom(X,-)|, |f : A -> B|, and |a in GA|. Applying |phi_Y| to both sides of the earlier equation and using |phi_X(eta) = id| we get: |(:y:) = phi_Y(y)| showing that |(:y:)| exists and is unique and thus |eta| is a universal arrow.
Next we need to show that |eta : {**} -> GX| being a universal arrow implies that |X| co-represents |G|. We immediately have a parameterized bijection namely |(: - :)_Y : GY ~= Hom(X,Y)|. All we need to do is show that it is natural in |Y| which is to say |f @ (:a:)_A = (:Gf(a):)_B| where |f : A -> B|. We have |Gf(a) = Gf(G(:a:)(eta)) = (Gf @ G(:a:))(eta) = G(f @ (:a:))(eta)|. Using |f = (:Gf(eta):)_B| which is a consequence of |eta| being universal, we get |(:Gf(a):)_B = f @ (:a:)_A| as desired.
In the initiality style of doing category theory, the main exercise is defining an appropriate category. This is pretty difficult from both a creativity perspective and a notational perspective. Once an appropriate category is defined, though, the proofs are typically not too hard. Perhaps the main tool is the comma category as that allows the compact and flexible specification of many categories, though it can often take a touch of creativity to recognize a category as a comma category. Technically, the proof above implies that for any universal property there’s an appropriate comma category in which the universal object is the initial object.
This style is often seen in introductions to category theory. If you’ve ever read about categories of (co)cones or wedges, then you’ve seen an example of this style with an ad-hoc definition of the relevant categories. In practice, the categories needed to characterize a specific universal property are of little interest on their own so the effort spent defining them isn’t repaid. The main exception is the category of |F|-algebras for some functor |F|. (Incidentally, the category of |F|-algebras is a full subcategory of a comma category.)
The style of proof is to leverage the uniqueness of the mediating morphism to equate two seemingly different things. Lambek’s lemma is an excellent illustration of this proof style.
The core of this style of category theory is representability and its parameterized variant. Representability alone is already very powerful and ergonomic. Every limit or colimit in a category represents a limit of homsets. This means all the tools and intuitions of set theory are available. This is the basis for my naming of this style: we elucidate the structure of a category by relating it to the structure of the category of sets.
The main tools of this style are (co)ends, indexed (co)limits, and adjunctions in the form |Hom(FA,B) ~= Hom(A,UB)|. In particular, the intimate connection between ends/indexed limits and sets of natural transformations significantly eases moving up and down functor categories.
By far the best part of this approach is the style of proof. By leveraging continuity properties, characterizations in terms of (parameterized) representability, some general isomorphisms, and occasionally some hand-crafted isomorphisms, nearly any categorical theorem can be proven in a dozen or less isomorphisms. The presentation looks and feels very much like solving an algebraic equation. The proofs are easy to read and easy to write.
The main drawback of this style is that most of the properties above weaken and some disappear when working in enriched or higher categories.
Arguably, this style should be based on universal arrows. While it technically is, in practice, it’s rare that one talks about universal arrows and instead the parameterized version, namely adjunctions in their unit/counit formulation are the main tool. After adjunctions would be Kan extensions. All of these are concepts that live naturally at the level of 2-categories which makes them much easier to generalize.
The main style of proof is diagram chasing, i.e. proving typically several equalities. On the one hand, proving an equation is straight-forward equational reasoning; on the other hand, there tend to be several equations that need to be proven at a time, they tend to be fine-grained and thus detailed and tedious, and often general results don’t obviously apply. The solution to most of these problems is to use string diagrams instead. Many of the bureaucratic details completely disappear and the proofs become crystal clear. A wonderful example of this style of proof is proving that an adjunction gives rise to a monad. With normal diagram chasing it’s mildly tedious. With string diagrams it proves itself.
One issue that comes up in this style is the elegant definitions tend to be wholesale. For example, there’s a nice adjunction that characterizes all limits of any given shape. Unfortunately, the adjunction only exists if all the limits of that shape exists. If they don’t all exist, then you need to use some other method to talk about the limits that do exist. Technically, universal arrows could handle this, but usually representability is used instead.
Ultimately, one should become familiar with all these approaches. There are definitely problems that fit nicer in some than others. However, when it comes to universal properties the Set-category theory wins hands down. Cat-category theory tends to be more useful when working with structures that aren’t characterized by a universal property such as monoidal structure. Initiality has its niche with initial algebras. I don’t have any ideas on where it might also be a profitable perspective. In fact, I’m not completely sure it’s a good thing that we think about initial algebras as initial algebras. There are interesting interactions between initial algebras and adjunctions for example and it’s not clear to me that they wouldn’t be more accessible with a different presentation of initial algebras.
|{**}| is just some singleton set.↩