Learning for the fun of it

Andrej Bauer has a paper titled The pullback lemma in gory detail that goes over the proof of the pullback lemma in full detail. This is a basic result of category theory and most introductions leave it as an exercise. It is a good exercise, and you should prove it yourself before reading this article or Andrej Bauer’s.

Andrej Bauer’s proof is what most introductions are expecting you to produce. I very much like the representability perspective on category theory and like to see what proofs look like using this perspective.

So this is a proof of the pullback lemma from the perspective of representability.

The key thing we need here is a characterization of pullbacks in terms of representability. To
just jump to the end, we have for |f : A \to C| and |g : B \to C|, |A \times_{f,g} B| is **the
pullback of |f| and |g|** if and only if it represents the functor
\[\{(h, k) \in \mathrm{Hom}({-}, A) \times \mathrm{Hom}({-}, B) \mid f \circ h = g \circ k \}\]

That is to say we have the natural isomorphism \[ \mathrm{Hom}({-}, A \times_{f,g} B) \cong \{(h, k) \in \mathrm{Hom}({-}, A) \times \mathrm{Hom}({-}, B) \mid f \circ h = g \circ k \} \]

We’ll write the left to right direction of the isomorphism as |\langle u,v\rangle : U \to A \times_{f,g} B|
where |u : U \to A| and |v : U \to B| and they satisfy |f \circ u = g \circ v|. Applying
the isomorphism right to left on the identity arrow gives us two arrows |p_1 : A \times_{f,g} B \to A|
and |p_2 : A \times_{f,g} B \to B| satisfying |p_1 \circ \langle u, v\rangle = u| and
|p_2 \circ \langle u,v \rangle = v|. (Exercise: Show that this follows from being a *natural* isomorphism.)

One nice thing about representability is that it reduces categorical reasoning to set-theoretic reasoning that you are probably already used to, as we’ll see. You can connect this definition to a typical universal property based definition used in Andrej Bauer’s article. Here we’re taking it as the definition of the pullback.

The claim to be proven is if the right square in the below diagram is a pullback square, then the left square is a pullback square if and only if the whole rectangle is a pullback square. \[ \xymatrix { A \ar[d]_{q_1} \ar[r]^{q_2} & B \ar[d]_{p_1} \ar[r]^{p_2} & C \ar[d]^{h} \\ X \ar[r]_{f} & Y \ar[r]_{g} & Z }\]

Rewriting the diagram as equations, we have:

**Theorem**: If |f \circ q_1 = p_1 \circ q_2|, |g \circ p_1 = h \circ p_2|, and |(B, p_1, p_2)| is a
pullback of |g| and |h|, then |(A, q_1, q_2)| is a pullback of |f| and |p_1| if and only if
|(A, q_1, p_2 \circ q_2)| is a pullback of |g \circ f| and |h|.

**Proof**: If |(A, q_1, q_2)| was a pullback of |f| and |p_1| then we’d have the following.

\[\begin{align} \mathrm{Hom}({-}, A) & \cong \{(u_1, u_2) \in \mathrm{Hom}({-}, X)\times\mathrm{Hom}({-}, B) \mid f \circ u_1 = p_1 \circ u_2 \} \\ & \cong \{(u_1, (v_1, v_2)) \in \mathrm{Hom}({-}, X)\times\mathrm{Hom}({-}, Y)\times\mathrm{Hom}({-}, C) \mid f \circ u_1 = p_1 \circ \langle v_1, v_2\rangle \land g \circ v_1 = h \circ v_2 \} \\ & = \{(u_1, (v_1, v_2)) \in \mathrm{Hom}({-}, X)\times\mathrm{Hom}({-}, Y)\times\mathrm{Hom}({-}, C) \mid f \circ u_1 = v_1 \land g \circ v_1 = h \circ v_2 \} \\ & = \{(u_1, v_2) \in \mathrm{Hom}({-}, X)\times\mathrm{Hom}({-}, C) \mid g \circ f \circ u_1 = h \circ v_2 \} \end{align}\]

The second isomorphism is |B| being a pullback and |u_2| is an arrow into |B| so it’s necessarily of the form |\langle v_1, v_2\rangle|. The first equality is just |p_1 \circ \langle v_1, v_2\rangle = v_1| mentioned earlier. The second equality merely eliminates the use of |v_1| using the equation |f \circ u_1 = v_1|.

This overall natural isomorphism, however, is exactly what it means for |A| to be a pullback of |g \circ f| and |h|. We verify the projections are what we expect by pushing |id_A| through the isomorphism. By assumption, |u_1| and |u_2| will be |q_1| and |q_2| respectively in the first isomorphism. We see that |v_2 = p_2 \circ \langle v_1, v_2\rangle = p_2 \circ q_2|.

We simply run the isomorphism backwards to get the other direction of the if and only if. |\square|

The simplicity and compactness of this proof demonstrates why I like representability.

It is not uncommon for universal quantification to be described as
(potentially) infinite conjunction^{1}.
Quoting Wikipedia’s Quantifier_(logic)
page (my emphasis):

For a finite domain of discourse |D = \{a_1,\dots,a_n\}|, the universal quantifier is equivalent to a logical conjunction of propositions with singular terms |a_i| (having the form |Pa_i| for monadic predicates).

The existential quantifier is equivalent to a logical disjunction of propositions having the same structure as before.

For infinite domains of discourse, the equivalences are similar.

While there’s a small grain of truth to this, I think it is wrong and/or misleading far more often than it’s useful or correct. Indeed, it takes a bit of effort to even get a statement that makes sense at all. There’s a bit of conflation between syntax and semantics that’s required to have it naively make sense, unless you’re working (quite unusually) in an infinitary logic where it is typically outright false.

What harm does this confusion do? The most obvious harm is that this view does not generalize to non-classical logics. I’ll focus on constructive logics, in particular. Besides causing problems in these contexts, which maybe you think you don’t care about, it betrays a significant gap in understanding of what universal quantification actually is. Even in purely classical contexts, this confusion often manifests, e.g., in confusion about |\omega|-inconsistency.

So what is the difference between universal quantification and
infinite conjunction? Well, the most obvious difference is that
infinite conjunction is indexed by some (meta-theoretic) set that
doesn’t have anything to do with the domain the universal quantifier
quantifies over. However, even if these sets happened to coincide^{2} there are
still differences between universal quantification and infinite
conjunction. The key is that universal quantification
requires the predicate being quantified over to hold *uniformly*,
while infinite conjunction does not. It just so happens that for
the standard set-theoretic semantics of classical first-order logic
this “uniformity” constraint is degenerate. However, even for
classical first-order logic, this notion of uniformity will be
relevant.

The purpose of this article is to answer the question: what is the coproduct of two groups? The approach, however, will be somewhat absurd. Instead of simply presenting a construction and proving that it satisfies the appropriate universal property, I want to find the general answer and simply instantiate it for the case of groups.

Specifically, this will be a path through the theory of Lawvere theories and their models with the goal of motivating some of the theory around it in pursuit of the answer to this relatively simple question.

If you really just want to know the answer to the title question, then the construction is usually called the free product and is described on the linked Wikipedia page.

This is a brief article about the notions of preserving, reflecting, and creating limits and, by duality, colimits. Preservation is relatively intuitive, but the distinction between reflection and creation is subtle.

A functor, |F|, **preserves limits** when it takes limiting cones to limiting cones. As
often happens in category theory texts, the notation focuses on the objects. You’ll often
see things like |F(X \times Y) \cong FX \times FY|, but implied is that one direction of
this isomorphism is the canonical morphism |\langle F\pi_1, F\pi_2\rangle|. To put it yet
another way, in this example we require |F(X \times Y)| to satisfy the universal property
of a product with the projections |F\pi_1| and |F\pi_2|.

Other than that subtlety, preservation is fairly intuitive.

A functor, |F|, **reflects limits** when whenever the image of a *cone* is a limiting cone,
then the original cone was a limiting cone. For products this would mean that if we
had a wedge |A \stackrel{p}{\leftarrow} Z \stackrel{q}{\to} B|, and |FZ| was the product
of |FA| and |FB| with projections |Fp| and |Fq|, then |Z| was the product of |A| and |B|
with projections |p| and |q|.

A functor, |F|, **creates limits** when whenever the image of a * diagram has a limit*,
then the diagram itself has a limit and |F| preserves the limiting cones. For products
this would mean if |FX| and |FY| had a product, |FX \times FY|, then |X| and |Y| have
a product and |F(X \times Y) \cong FX \times FY| via the canonical morphism.

Creation of limits implies reflection of limits since we can just ignore the apex of the cone. While creation is more powerful, often reflection is enough in practice as we usually have a candidate limit, i.e. a cone. Again, this is often not made too explicit.

Consider the posets:

$$\xymatrix{ & & & c \\ X\ar@{}[r]|{\Large{=}} & a \ar[r] & b \ar[ur] \ar[dr] & \\ & & & d \save "1,2"."3,4"*+[F]\frm{} \restore } \qquad \xymatrix{ & & c \\ Y\ar@{}[r]|{\Large{=}} & b \ar[ur] \ar[dr] & \\ & & d \save "1,2"."3,3"*+[F]\frm{} \restore } \qquad \xymatrix{ & c \\ Z\ar@{}[r]|{\Large{=}} & \\ & d \save "1,2"."3,2"*+[F]\frm{} \restore }$$

Let |X=\{a, b, c, d\}| with |a \leq b \leq c| and |b \leq d| mapping to |Y=\{b, c, d\}| where |a \mapsto b|. Reflection fails because |a| maps to a meet but is not itself a meet.

If we change the source to just |Z=\{c, d\}|, then creation fails because |c| and |d| have a meet in the image but not in the source. Reflection succeeds, though, because there are no non-trivial cones in the source, so every cone (trivially) gets mapped to a limit cone. It’s just that we don’t have any cones with both |c| and |d| in them.

In general, recasting reflection and creation of limits for posets gives us: Let |F: X \to Y| be a monotonic function. |F| reflects limits if every lower bound that |F| maps to a meet is already a meet. |F| creates limits if whenever |F[U]| has a meet for |U \subseteq X|, then |U| already had a meet and |F| sends the meet of |U| to the meet of |F[U]|.

**tl;dr** The notion of two sets overlapping is very common. Often it is expressed
via |A \cap B \neq \varnothing|. Constructively, this is not the best definition
as it does not imply |\exists x. x \in A \land x \in B|. Even classically, this
second-class treatment of overlapping obscures important and useful connections.
In particular, writing |U \between A| for “|U| overlaps |A|”, we have a De Morgan-like
duality situation with |\between| being dual to |\subseteq|. Recognizing and
exploiting this duality, in part by using more appropriate notation for “overlaps”,
can lead to new concepts and connections.

The most common way I’ve seen the statement “|A| overlaps |B|” formalized is |A \cap B \neq \varnothing|. To a constructivist, this definition isn’t very satisfying. In particular, this definition of overlaps does not allow us to constructively conclude that there exists an element contained in both |A| and |B|. That is, |A \cap B \neq \varnothing| does not imply |\exists x. x \in A \land x \in B| constructively.

As is usually the case, even if you are not philosophically a constructivist, taking a constructivist perspective can often lead to better definitions and easier to see connections. In this case, constructivism suggests the more positive statement |\exists x. x \in A \land x \in B| be the definition of “overlaps”. However, given that we now have two (constructively) non-equivalent definitions, it is better to introduce notation to abstract from the particular definition. In many cases, it makes sense to have a primitive notion of “overlaps”. Here I will use the notation |A \between B| which is the most common option I’ve seen.

We can more compactly write the quantifier-based definition as |\exists x \in A.x \in B| using a common set-theoretic abbreviation. This presentation suggests a perhaps surprising connection. If we swap the quantifier, we get |\forall x\in A.x \in B| which is commonly abbreviated |A \subseteq B|. This leads to a duality between |\subseteq| and |\between|, particularly in topological contexts. In particular, if we pick a containing set |X|, then |\neg(U \between A) \iff U \subseteq A^c| where the complement is relative to |X|, and |A| is assumed to be a subset of |X|. This is a De Morgan-like duality.

If we want to characterize these operations via an adjunction, or, more precisely, a Galois connection, we have a slight awkwardness arising from |\subseteq| and |\between| being binary predicates on sets. So, as a first step we’ll identify sets with predicates via, for a set |A|, |\underline A(x) \equiv x \in A|. In terms of predicates, the adjunctions we want are just a special case of the adjunctions characterizing the quantifiers.

\[\underline U(x) \land P \to \underline A(x) \iff P \to U \subseteq A\]

\[U \between B \to Q \iff \underline B(x) \to (\underline U(x) \to Q)\]

What we actually want is a formula of the form |U \between B \to Q \iff B \subseteq (\dots)|. To do this, we need an operation that will allow us to produce a set from a predicate. This is exactly what set comprehension does. For reasons that will become increasingly clear, we’ll assume that |A| and |B| are subsets of a set |X|. We will then consider quantification relative to |X|. The result we get is:

\[\{x \in U \mid P\} \subseteq A \iff \{x \in X \mid x \in U \land P\} \subseteq A \iff P \to U \subseteq A\]

\[U \between B \to Q \iff B \subseteq \{x \in X \mid x \in U \to Q\} \iff B \subseteq \{x \in U \mid \neg Q\}^c\]

The first and last equivalences require additionally assuming |U \subseteq X|.
The last equivalence requires classical reasoning. You can already see motivation to
limit to subsets of |X| here. First, set complementation, the |(-)^c|, only makes sense relative to
some containing set. Next, if we choose |Q \equiv \top|, then the latter formulas
state that *no matter what |B| is* it should be a subset of the expression that
follows it. Without constraining to subsets of |X|, this would require a universal
set which doesn’t exist in typical set theories.

Choosing |P| as |\top|, |Q| as |\bot|, and |B| as |A^c| leads to the familiar |\neg (U \between A^c) \iff U \subseteq A|, i.e. |U| is a subset of |A| if and only if it doesn’t overlap |A|’s complement.

Incidentally, characterizing |\subseteq| and |\between| in terms of Galois connections, i.e. adjunctions, immediately gives us some properties for free via continuity. We have |U \subseteq \bigcap_{i \in I}A_i \iff \forall i\in I.U \subseteq A_i| and |U \between \bigcup_{i \in I}A_i \iff \exists i \in I.U \between A_i|. This is relative to a containing set |X|, so |\bigcap_{i \in \varnothing}A_i = X|, and |U| and each |A_i| are assumed to be subsets of |X|.

Below I’ll perform a categorical analysis of the situation. I’ll mostly be using categorical notation and perspectives to manipulate normal sets. That said, almost all of what I say will be able to be generalized immediately just by reinterpreting the symbols.

To make things a bit cleaner in the future, and to make it easier to apply these ideas beyond sets, I’ll introduce the concept of a Heyting algebra. A Heyting algebra is a partially ordered set |H| satisfying the following:

- |H| has two elements called |\top| and |\bot| satisfying for all |x| in |H|, |\bot \leq x \leq \top|.
- We have operations |\land| and |\lor| satisfying for all |x|, |y|, |z| in |H|, |x \leq y \land z| if and only |x \leq y| and |x \leq z|, and similarly for |\lor|, |x \lor y \leq z| if and only |x \leq z| and |y \leq z|.
- We have an operation |\to| satisfying for all |x|, |y|, and |z| in |H|, |x \land y \leq z| if and only if |x \leq y \to z|.

For those familiar with category theory, you might recognize this as simply the decategorification
of the notion of a bicartesian closed category.
We can define the **pseudo-complement**, |\neg x \equiv x \to \bot|.

Any Boolean algebra is an example of a Heyting algebra where we can define |x \to y| via |\neg x \lor y| where here |\neg| is taken as primitive. In particular, subsets of a given set ordered by inclusion form a Boolean algebra, and thus a Heyting algebra. The |\to| operation can also be characterized by |x \leq y \iff (x \to y) = \top|. This lets us immediately see that for subsets of |X|, |(A \to B) = \{x \in X \mid x \in A \to x \in B\}|. All this can be generalized to the subobjects in any Heyting category.

As the notation suggests, intuitionistic logic (and thus classical logic) is another example of a Heyting algebra.

We’ll write |\mathsf{Sub}(X)| for the partially ordered set of subsets of |X| ordered by inclusion. As mentioned above, this is (classically) a Boolean algebra and thus a Heyting algebra. Any function |f : X \to Y| gives a monotonic function |f^* : \mathsf{Sub}(Y) \to \mathsf{Sub}(X)|. Note the swap. |f^*(U) \equiv f^{-1}(U)|. (Alternatively, if we think of subsets in terms of characteristic functions, |f^*(U) \equiv U \circ f|.) Earlier, we needed a way to turn predicates into sets. In this case, we’ll go the other way and identify truth values with subsets of |1| where |1| stands for an arbitrary singleton set. That is, |\mathsf{Sub}(1)| is the poset of truth values. |1| being the terminal object of |\mathbf{Set}| induces the (unique) function |!_U : U \to 1| for any set |U|. This leads to the important monotonic function |!_U^* : \mathsf{Sub}(1) \to \mathsf{Sub}(U)|. This can be described as |!_U^*(P) = \{x \in U \mid P\}|. Note, |P| cannot contain |x| as a free variable. In particular |!_U^*(\bot) = \varnothing| and |!_U^*(\top) = U|. This monotonic function has left and right adjoints:

\[\exists_U \dashv {!_U^*} \dashv \forall_U : \mathsf{Sub}(U) \to \mathsf{Sub}(1)\]

|F \dashv G| for monotonic functions |F : X \to Y| and |G : Y \to X| means |\forall x \in X. \forall y \in Y.F(x) \leq_Y y \iff x \leq_X G(y)|.

|\exists_U(A) \equiv \exists x \in U. x \in A| and |\forall_U(A) \equiv \forall x \in U. x \in A|.
It’s easily verified that each of these functions are monotonic.^{1}

It seems like we should be done. These formulas are the formulas I originally gave for |\between| and |\subseteq| in terms of quantifiers. The problem here is that these functions are only defined for subsets of |U|. This is especially bad for interpreting |U \between A| as |\exists_U(A)| as it excludes most of the interesting cases where |U| partially overlaps |A|. What we need is a way to extend |\exists_U| / |\forall_U| beyond subsets of |U|. That is, we need a suitable monotonic function |\mathsf{Sub}(X) \to \mathsf{Sub}(U)|.

Assume |U \subseteq X| and that we have an inclusion |\iota_U : U \hookrightarrow X|. Then |\iota_U^* : \mathsf{Sub}(X) \to \mathsf{Sub}(U)| and |\iota_U^*(A) = U \cap A|. This will indeed allow us to define |\subseteq| and |\between| as |U \subseteq A \equiv \forall_U(\iota_U^*(A))| and |U \between A \equiv \exists_U(\iota_U^*(A))|. We have:

\[\iota_U[-] \dashv \iota_U^* \dashv U \to \iota_U[-] : \mathsf{Sub}(U) \to \mathsf{Sub}(X)\]

Here, |\iota_U[-]| is the direct image of |\iota_U|. This doesn’t really do anything in this case
except witness that if |A \subseteq U| then |A \subseteq X| because |U \subseteq X|.^{2}

We can recover the earlier adjunctions by simply using these two pairs of adjunctions. \[\begin{align} U \between B \to Q & \iff \exists_U(\iota_U^*(B)) \to Q \\ & \iff \iota_U^*(B) \subseteq {!}_U^*(Q) \\ & \iff B \subseteq U \to \iota_U[{!}_U^*(Q)] \\ & \iff B \subseteq \{x \in X \mid x \in U \to Q\} \end{align}\]

Here the |\iota_U[-]| is crucial so that we use the |\to| of |\mathsf{Sub}(X)| and not |\mathsf{Sub}(U)|.

\[\begin{align} P \to U \subseteq A & \iff P \to \forall_U(\iota_U^*(A)) \\ & \iff {!}_U^*(P) \subseteq \iota_U^*(A) \\ & \iff \iota_U[{!}_U^*(P)] \subseteq A \\ & \iff \{x \in X \mid x \in U \land P\} \subseteq A \end{align}\]

In this case, the |\iota_U[-]| is truly doing nothing because |\{x \in X \mid x \in U \land P\}| is the same as |\{x \in U \mid P\}|.

While we have |{!}_U^* \circ \exists_U \dashv {!}_U^* \circ \forall_U|, we see that the inclusion of |\iota_U^*| is what breaks the direct connection between |U \between A| and |U \subseteq A|.

As a first example, write |\mathsf{Int}A| for the **interior** of |A| and |\bar A| for the **closure** of |A|
each with respect to some topology
on a containing set |X|.
One way to define |\mathsf{Int}A| is |x \in \mathsf{Int}A| if and only if there exists an open set
containing |x| that’s a subset of |A|. Writing |\mathcal O(X)| for the set of open sets, we
can express this definition in symbols:
\[x \in \mathsf{Int}A \iff \exists U \in \mathcal O(X). x \in U \land U \subseteq A\]
We have a “dual” notion:
\[x \in \bar A \iff \forall U \in \mathcal O(X). x \in U \to U \between A\]
That is, |x| is in the closure of |A| if and only if every open set containing |x| overlaps |A|.

As another example, here is a fairly unusual way of characterizing a compact subset |Q|.
|Q| is **compact** if and only if |\{U \in \mathcal O(X) \mid Q \subseteq U\}| is open
in |\mathcal O(X)| equipped with the Scott topology^{3}.
As before, this suggests a “dual” notion characterized by |\{U \in \mathcal O(X) \mid O \between U\}|
being an open subset. A set |O| satisfying this is called **overt**.
This concept is never mentioned in traditional presentations of point-set topology because *every*
subset is overt. However, if we don’t require that *arbitrary* unions of open sets are open (and only require
finite unions to be open) as happens in synthetic topology
or if we aren’t working in a classical context then overtness becomes a meaningful concept.

One benefit of the intersection-based definition of overlaps is that it is straightforward to generalize to many sets overlapping, namely |\bigcap_{i\in I} A_i \neq \varnothing|. This is also readily expressible using quantifiers as: |\exists x.\forall i \in I. x \in A_i|. As before, having an explicit “universe” set also clarifies this. So, |\exists x \in X.\forall i \in I. x \in A_i| with |\forall i \in I. A_i \subseteq X| would be better. The connection of |\between| to |\subseteq| suggests instead of this fully symmetric presentation, it may still be worthwhile to single out a set producing |\exists x \in U.\forall i \in I. x \in A_i| where |U \subseteq X|. This can be read as “there is a point in |U| that touches/meets/overlaps every |A_i|”. If desired we could notate this as |U \between \bigcap_{i \in I}A_i|. Negating and complementing the |A_i| leads to the dual notion |\forall x \in U.\exists i \in I.x \in A_i| which is equivalent to |U \subseteq \bigcup_{i \in I}A_i|. This dual notion could be read as “the |A_i| (jointly) cover |U|” which is another common and important concept in mathematics.

Ultimately, the concept of two (or more) sets overlapping comes up quite often. The usual circumlocution, |A \cap B \neq \varnothing|, is both notationally and conceptually clumsy. Treating overlapping as a first-class notion via notation and formulating definitions in terms of it can reveal some common and important patterns.

If one wanted to be super pedantic, I should technically write something like |\{\star \mid \exists x \in U. x \in A\}| where |1 = \{\star\}| because elements of |\mathsf{Sub}(1)| are subsets of |1|. Instead, we’ll conflate subsets of |1| and truth values.↩︎

If we think of subobjects as (equivalence classes of) monomorphisms as is typical in category theory, then because |\iota_U| is itself a monomorphism, the direct image, |\iota_U[-]|, is simply post-composition by |\iota_U|, i.e. |\iota_U \circ {-}|.↩︎

The Scott topology is the natural topology on the space of continuous functions |X \to \Sigma| where |\Sigma| is the Sierpinski space.↩︎

Complex-step differentiation is a simple and effective technique for numerically differentiating a(n analytic) function.
Discussing it is a neat combination of complex analysis, numerical analysis, and ring theory. We’ll see that it is very
closely connected to forward-mode automatic differentiation (FAD). For better or worse, while widely applicable, the scenarios
where complex-step differentiation is the *best* solution are a bit rare. To apply complex-step differentiation, you need
a version of your desired function that operates on complex numbers. If you have that, then you can apply complex-step
differentiation immediately. Otherwise, you need to adapt the function to complex arguments. This can be done essentially
automatically using the same techniques as automatic differentiation, but at that point you might as well use automatic
differentiation. Adapting the code to complex numbers or AD takes about the same amount of effort, however, the AD version
will be more efficient, more accurate, and easier to use.

Nevertheless, this serves as a simple example to illustrate several theoretical and practical ideas.

This is part 3 in a series. See the previous part about internal languages for indexed monoidal categories upon which this part heavily depends.

In category theory, the hom-sets between two objects can often be equipped with some extra structure which is respected by identities and composition. For example, the set of group homomorphisms between two abelian groups is itself an abelian group by defining the operations pointwise. Similarly, the set of monotonic functions between two partially ordered sets (posets) is a poset again by defining the ordering pointwise. Linear functions between vector spaces form a vector space. The set of functors between small categories is a small category. Of course, the structure on the hom-sets can be different than the objects. Trivially, with the earlier examples a vector space is an abelian group, so we could say that linear functions form an abelian group instead of a vector space. Likewise groups are monoids. Less trivially, the set of relations between two sets is a partially ordered set via inclusion. There are many cases where instead of hom-sets we have hom-objects that aren’t naturally thought of as sets. For example, we can have hom-objects be non-negative (extended) real numbers from which the category laws become the laws of a generalized metric space. We can identify posets with categories who hom-objects are elements of a two element set or, even better, a two element poset with one element less than or equal to the other.

This general process is called enriching a category in some other
category which is almost always called |\V| in the generic
case. We then talk about having |\V|-categories and |\V|-functors, etc. In a specific case, it will be something
like |\mathbf{Ab}|-categories for an |\mathbf{Ab}|-enriched category, where |\mathbf{Ab}| is the category of abelian groups.
Unsurprisingly, not just *any* category will do for |\V|. However, it turns out very little structure is needed to define
a notion of |\V|-category, |\V|-functor, |\V|-natural transformation, and |\V|-profunctor. The
usual “baseline” is that |\V| is a monoidal category. As mentioned in the previous
post, paraphrasing Bénabou, notions of “families of objects/arrows” are ubiquitous and fundamental in category theory. It is useful for
our purposes to make this structure explicit. For very little cost, this will also provide a vastly more general notion that will readily
capture enriched categories, indexed categories, and categories that are simultaneously
indexed and enriched, of which internal categories are an example. The tool for this is
a (Grothendieck) fibration aka a fibered category or the mostly equivalent concept
of an indexed category.^{1}

To that end, instead of just a monoidal category, we’ll be using indexed monoidal categories. Typically, to get an experience as much like ordinary category theory as possible, additional structure is assumed on |\V|. In particular, it is assumed to be an (indexed) cosmos which means that it is an indexed symmetric monoidally closed category with indexed coproducts preserved by |\otimes| and indexed products and fiberwise finite limits and colimits (preserved by the indexed structure). This is quite a lot more structure which I’ll introduce in later parts. In this part, I’ll make no assumptions beyond having an indexed monoidal category.

This is part 2 in a series. See the previous part about internal languages for (non-indexed) monoidal categories. The main application I have in mind – enriching in indexed monoidal categories – is covered in the next post.

As Jean Bénabou pointed out in Fibered Categories and the Foundations of Naive Category Theory (PDF)
notions of “families of objects/arrows” are ubiquitous and fundamental in category theory. One of the more noticeable places early
on is in the definition of a natural transformation as a family of arrows. However, even in the definition of category, identities and
compositions are families of functions, or, in the enriched case, arrows of |\mathbf V|. From a foundational perspective, one place
where this gets really in-your-face is when trying to formalize the notion of (co)completeness. It is straightforward to make a first-order
theory of a finitely complete category, e.g. this one.
For arbitrary products and thus limits, we need to talk about families of objects. To formalize the usual meaning of this in a
first-order theory would require attaching an entire first-order theory of sets, e.g. **ZFC**, to our notion of complete category.
If your goals are of a foundational nature like Bénabou’s were, then this is unsatisfactory. Instead, we can abstract out what we need
of the notion of “family”. The result turns out to be equivalent to the notion of a fibration.

My motivations here are not foundational but leaving the notion of “family” entirely meta-theoretical means not being able to talk
about it except in the semantics. Bénabou’s comment suggests that at the semantic level we want not just a monoidal category, but a
fibration of monoidal categories^{1}.
At the syntactic level, it suggests that there should be a built-in notion of “family” in our
language. We accomplish both of these goals by formulating the internal language of an indexed monoidal category.

As a benefit, we can generalize to other notions of “family” than set-indexed families. We’ll clearly be able to formulate the notion of an enriched category. It’s also clear that we’ll be able to formulate the notion of an indexed category. Of course, we’ll also be able to formulate the notion of a category that is both enriched and indexed which includes the important special case of an internal category. We can also consider cases with trivial indexing which, in the unenriched case, will give us monoids, and in the |\mathbf{Ab}|-enriched case will give us rings.

This is the first post in a series of posts on doing enriched indexed category theory and using the notion of an internal language to make this look relatively mundane. The internal language aspects are useful for other purposes too, as will be illustrated in this post, for example. This is related to the post Category Theory, Syntactically. In particular, it can be considered half-way between the unary theories and the finite product theories described there.

First in this series – this post – covers the internal language of a monoidal category. This is fairly straightforward, but it already provides some use. For example, the category of endofunctors on a category is a strict monoidal category, and so we can take a different perspective on natural transformations. This will also motivate the notions of a (virtual) bicategory and an actegory. Throughout this post, I’ll give a fairly worked example of turning some categorical content into rules of a type-/proof-theory.

The second post will add indexing to the notion of monoidal category and introduce the very powerful and useful notion of an indexed monoidal category.

The third post will formulate the notion of categories enriched in an indexed monoidal category and give the definitions which don’t require any additional assumptions.

The fourth post will introduce the notion and internal language for an indexed cosmos. Normally, when we do enriched category theory, we want the category into which we’re enriching to not be just a monoidal category but a cosmos. This provides many additional properties. An indexed cosmos is just the analogue of that for indexed monoidal categories.

The fifth post will then formulate categorical concepts for our enriched indexed categories that require some or all of these additional properties provided by an indexed cosmos.

At some point, there will be a post on virtual double categories as they (or, even better, augmented virtual double categories) are what will really be behind the notion of enriched indexed categories we’ll define. Basically, we’ll secretly be spelling out a specific instance of the |\mathsf{Mod}| construction.

When I was young and dumb and first learning category theory, I got it into my head that arguments involving sets were not “categorical”. This is not completely crazy as the idea of category theory being an alternate “foundation” and categorical critiques of set theoretic reasoning are easy to find. As such, I tended to neglect techniques that significantly leveraged |\mathbf{Set}|, and, in particular, representability. Instead, I’d prefer arguments using universal arrows as those translate naturally and directly to 2-categories.

This was a mistake. I have long since completely reversed my position on this for
both practical and theoretical reasons. Practically, representability and related
techniques provide very concise definitions which lead to concise proofs which I
find relatively easy to formulate and easy to verify. This is especially true when
combined with the (co)end calculus. It’s also the case that for a lot of math you
simply don’t need any potential generality you might gain by, e.g. being able to
use an arbitrary 2-category. Theoretically, I’ve gained a better understanding
of where and how category theory is (or is not) “foundational”, and a better understanding
of what about set theory categorists were critiquing. Category theory as a whole
does *not* provide an alternate foundation for mathematics as that term is usually
understood by mathematicians. A branch of category theory, topos theory, does, but
a topos is fairly intentionally designed to give a somewhat |\mathbf{Set}|-like
experience. Similarly, many approaches to higher category theory still include
a |\mathbf{Set}|-like level.

This is, of course, not to suggest ideas like universal arrows *aren’t* important
or can’t lead to elegant proofs.

Below is a particular example of attacking a problem from the perspective of representability. I use this example more because it is a neat proof that I hadn’t seen before. There are plenty of simpler compelling examples, such as proving that right(/left) adjoints are (co)continuous, and I regularly use representability in proofs I presented on, e.g. the Math StackExchange.

An elementary topos, |\mathcal E|, can be described as a category with finite limits and power objects. Having power objects means having a functor |\mathsf P : \mathcal E^{op} \to \mathcal E| such that |\mathcal E(A,\mathsf PB) \cong \mathsf{Sub}(A \times B)| natural in |A| and |B| where |\mathsf{Sub}| is the (contravariant) functor that takes an object to its set of subobjects. The action of |\mathsf{Sub}(f)| for an arrow |f : A \to B| is a function |m \mapsto f^*(m)| where |m| is a (representative) monomorphism and |f^*(m)| is the pullback of |f| along |m| which is a monomorphism by basic facts about pullbacks. In diagrammatic form: $$\require{amscd} \begin{CD} f^{-1}(B') @>f^\ast(m)>> A \\ @VVV @VVfV \\ B' @>>m> B \end{CD}$$

This is a characterization of |\mathsf P| via representability. We are saying that |\mathsf PB| represents the functor |\mathsf{Sub}(- \times B)| parameterized in |B|.

A well-known and basic fact about elementary toposes is that they are cartesian closed. (Indeed, finite limits + cartesian closure + a subobject classifier is a common alternative definition.) Cartesian closure can be characterized as |\mathcal E(- \times A, B) \cong \mathcal E(-,B^A)| which characterizes the exponent, |B^A|, via representability. Namely, that |B^A| represents the functor |\mathcal E(- \times A, B)| parameterized in |A|. Proving that elementary toposes are cartesian closed is not too difficult, but it is a bit fiddly. This is the example that I’m going to use.

All the proofs I reference rely on the following basic facts about an elementary topos.

We have the monomorphism |\top : 1 \to \mathsf P1| induced by the identity arrow |\mathsf P1 \to \mathsf P1|.

We need the lemma that |\mathcal E(A \times B,PC) \cong \mathcal E(A,\mathsf P(B \times C))|. **Proof**:
$$\begin{align}
\mathcal E(A \times B,\mathsf PC) \cong \mathsf{Sub}((A \times B) \times C) \cong \mathsf{Sub}(A \times (B \times C)) \cong \mathcal E(A,\mathsf P(B \times C))\ \square
\end{align}$$

Since the arrow |\langle id_A, f\rangle : A \to A \times B| is a monomorphism for any arrow |f : A \to B|, the map |f \mapsto \langle id, f \rangle| is a map from |\mathcal E(-, B)| to |\mathsf{Sub}(- \times B)|. Using |\mathsf{Sub}(- \times B) \cong \mathcal E(-,\mathsf PB)|, we get a map |\mathcal E(-, B) \to \mathsf{Sub}(- \times B) \cong \mathcal E(-,\mathsf PB)|. By Yoneda, i.e. by evaluating it at |id|, we get the singleton map: |\{\}_B : B \to \mathsf PB|. If we can show that |\{\}| is a monomorphism, then, since |\mathsf PA \cong \mathsf P(A \times 1)|, we’ll get an arrow |\sigma : \mathsf PA \to \mathsf P1| such that

$$\begin{CD} A @>\{\}_A>> \mathsf PA \\ @VVV @VV\sigma_AV \\ 1 @>>\top> \mathsf P1 \end{CD}$$

is a pullback.

|\{\}_A| is a monomorphism because any |f, g : X \to A| gets mapped by the above to |\langle id_X, f\rangle| and |\langle id_X, g\rangle| which represent the same subobject when |\{\} \circ f = \{\} \circ g|. Therefore, there’s an isomorphism |j : X \cong X| such that |\langle id_X, f\rangle \circ j = \langle j, f \circ j\rangle = \langle id_X, g\rangle| but this means |j = id_X| and thus |f = g|.

To restate the problem: given the above setup, we want to show that the elementary topos |\mathcal E| is cartesian closed.

Toposes, Triples, and Theories
by Barr and Wells actually provides *two* proofs of this statement: Theorem 4.1 of
Chapter 5. The first proof is in exactly the representability-style approach
I’m advocating, but it relies on earlier results about how a topos relates to
its slice categories. The second proof is more concrete and direct, but it also
involves |\mathsf P\mathsf P\mathsf P\mathsf P B|…

Sheaves in Geometry and Logic by Mac Lane and Moerdijk also has this result as Theorem 1 of section IV.2 “The Construction of Exponentials”. The proof starts on page 167 and finishes on 169. The idea is to take the set theoretic construction of functions via their graphs and interpret that into topos concepts. This proof involves a decent amount of equational reasoning (either via diagrams or via generalized elements).

Todd Trimble’s argument is very similar to the following argument.

Contrast these to Dan Doel’s proof^{1}
using representability, which proceeds as follows. (Any mistakes are mine.)

Start with the pullback induced by the singleton map.

$$\begin{CD} B @>\{\}_B>> \mathsf PB \\ @VVV @VV\sigma_BV \\ 1 @>>\top> \mathsf P1 \end{CD}$$

Apply the functor |\mathcal E(= \times A,-)| which preserves the fact that it is a pullback via continuity. $$\begin{CD} \mathcal{E}(-\times A,B) @>>> \mathcal{E}(- \times A,\mathsf PB) \\ @VVV @VVV \\ \mathcal{E}(- \times A,1) @>>> \mathcal{E}(- \times A,\mathsf P1) \end{CD}$$

Note:

- |\mathcal E(- \times A,1) \cong 1 \cong \mathcal E(-,1)| (by continuity)
- |\mathcal E(- \times A,\mathsf PB) \cong \mathcal E(-,\mathsf P(A \times B))| (by definition of power objects)
- |\mathcal E(- \times A,\mathsf P1) \cong \mathcal E(-,\mathsf PA)| (because |A \times 1 \cong A|)

This means that the above pullback is also the pullback of $$\begin{CD} \mathcal E(-\times A,B) @>>> \mathcal E(-,\mathsf P(A\times B)) \\ @VVV @VVV \\ \mathcal E(-,1) @>>> \mathcal E(-,\mathsf PA) \end{CD}$$

Since |\mathcal E| has all finite limits, it has the following pullback

$$\begin{CD} X @>>> \mathsf P(A \times B) \\ @VVV @VVV \\ 1 @>>> \mathsf PA \end{CD}$$

where the bottom and right arrows are induced by the corresponding arrows of the previous diagram by Yoneda. Applying |\mathcal E(=,-)| to this diagram gives another pullback diagram by continuity

$$\begin{CD} \mathcal E(-,X) @>>> \mathcal E(-,\mathsf P(A\times B)) \\ @VVV @VVV \\ \mathcal E(-,1) @>>> \mathcal E(-,\mathsf PA) \end{CD}$$

which is to say |\mathcal E(- \times A, B) \cong \mathcal E(-, X)| because pullbacks are unique up to isomorphism, so |X| satisfies the universal property of |B^A|, namely |\mathcal E(- \times A, B) \cong \mathcal E(-,B^A)|.

Sent to me almost exactly three years ago.↩︎