Hedonistic Learning

The Pullback Lemma in Gory Detail (Redux)

2024-01-14 17:33:54-08:00

Introduction

Andrej Bauer has a paper titled The pullback lemma in gory detail that goes over the proof of the pullback lemma in full detail. This is a basic result of category theory and most introductions leave it as an exercise. It is a good exercise, and you should prove it yourself before reading this article or Andrej Bauer’s.

Andrej Bauer’s proof is what most introductions are expecting you to produce. I very much like the representability perspective on category theory and like to see what proofs look like using this perspective.

So this is a proof of the pullback lemma from the perspective of representability.

Preliminaries

The key thing we need here is a characterization of pullbacks in terms of representability. To just jump to the end, we have for |f : A \to C| and |g : B \to C|, |A \times_{f,g} B| is the pullback of |f| and |g| if and only if it represents the functor \[\{(h, k) \in \mathrm{Hom}({-}, A) \times \mathrm{Hom}({-}, B) \mid f \circ h = g \circ k \}\]

That is to say we have the natural isomorphism \[ \mathrm{Hom}({-}, A \times_{f,g} B) \cong \{(h, k) \in \mathrm{Hom}({-}, A) \times \mathrm{Hom}({-}, B) \mid f \circ h = g \circ k \} \]

We’ll write the left to right direction of the isomorphism as |\langle u,v\rangle : U \to A \times_{f,g} B| where |u : U \to A| and |v : U \to B| and they satisfy |f \circ u = g \circ v|. Applying the isomorphism right to left on the identity arrow gives us two arrows |p_1 : A \times_{f,g} B \to A| and |p_2 : A \times_{f,g} B \to B| satisfying |p_1 \circ \langle u, v\rangle = u| and |p_2 \circ \langle u,v \rangle = v|. (Exercise: Show that this follows from being a natural isomorphism.)

One nice thing about representability is that it reduces categorical reasoning to set-theoretic reasoning that you are probably already used to, as we’ll see. You can connect this definition to a typical universal property based definition used in Andrej Bauer’s article. Here we’re taking it as the definition of the pullback.

Proof

The claim to be proven is if the right square in the below diagram is a pullback square, then the left square is a pullback square if and only if the whole rectangle is a pullback square. \[ \xymatrix { A \ar[d]_{q_1} \ar[r]^{q_2} & B \ar[d]_{p_1} \ar[r]^{p_2} & C \ar[d]^{h} \\ X \ar[r]_{f} & Y \ar[r]_{g} & Z }\]

Rewriting the diagram as equations, we have:

Theorem: If |f \circ q_1 = p_1 \circ q_2|, |g \circ p_1 = h \circ p_2|, and |(B, p_1, p_2)| is a pullback of |g| and |h|, then |(A, q_1, q_2)| is a pullback of |f| and |p_1| if and only if |(A, q_1, p_2 \circ q_2)| is a pullback of |g \circ f| and |h|.

Proof: If |(A, q_1, q_2)| was a pullback of |f| and |p_1| then we’d have the following.

\[\begin{align} \mathrm{Hom}({-}, A) & \cong \{(u_1, u_2) \in \mathrm{Hom}({-}, X)\times\mathrm{Hom}({-}, B) \mid f \circ u_1 = p_1 \circ u_2 \} \\ & \cong \{(u_1, (v_1, v_2)) \in \mathrm{Hom}({-}, X)\times\mathrm{Hom}({-}, Y)\times\mathrm{Hom}({-}, C) \mid f \circ u_1 = p_1 \circ \langle v_1, v_2\rangle \land g \circ v_1 = h \circ v_2 \} \\ & = \{(u_1, (v_1, v_2)) \in \mathrm{Hom}({-}, X)\times\mathrm{Hom}({-}, Y)\times\mathrm{Hom}({-}, C) \mid f \circ u_1 = v_1 \land g \circ v_1 = h \circ v_2 \} \\ & = \{(u_1, v_2) \in \mathrm{Hom}({-}, X)\times\mathrm{Hom}({-}, C) \mid g \circ f \circ u_1 = h \circ v_2 \} \end{align}\]

This overall natural isomorphism, however, is exactly what it means for |A| to be a pullback of |g \circ f| and |h|. We verify the projections are what we expect by pushing |id_A| through the isomorphism. By assumption, |u_1| and |u_2| will be |q_1| and |q_2| respectively in the first isomorphism. We see that |v_2 = p_2 \circ \langle v_1, v_2\rangle = p_2 \circ q_2|.

We simply run the isomorphism backwards to get the other direction of the if and only if. |\square|

The simplicity and compactness of this proof demonstrates why I like representability.

Universal Quantification and Infinite Conjunction

2024-01-02 22:00:41-08:00

Introduction

It is not uncommon for universal quantification to be described as (potentially) infinite conjunction¹. Quoting Wikipedia’s Quantifier_(logic) page (my emphasis):

For a finite domain of discourse |D = \{a_1,\dots,a_n\}|, the universal quantifier is equivalent to a logical conjunction of propositions with singular terms |a_i| (having the form |Pa_i| for monadic predicates).

The existential quantifier is equivalent to a logical disjunction of propositions having the same structure as before. For infinite domains of discourse, the equivalences are similar.

While there’s a small grain of truth to this, I think it is wrong and/or misleading far more often than it’s useful or correct. Indeed, it takes a bit of effort to even get a statement that makes sense at all. There’s a bit of conflation between syntax and semantics that’s required to have it naively make sense, unless you’re working (quite unusually) in an infinitary logic where it is typically outright false.

What harm does this confusion do? The most obvious harm is that this view does not generalize to non-classical logics. I’ll focus on constructive logics, in particular. Besides causing problems in these contexts, which maybe you think you don’t care about, it betrays a significant gap in understanding of what universal quantification actually is. Even in purely classical contexts, this confusion often manifests, e.g., in confusion about |\omega|-inconsistency.

So what is the difference between universal quantification and infinite conjunction? Well, the most obvious difference is that infinite conjunction is indexed by some (meta-theoretic) set that doesn’t have anything to do with the domain the universal quantifier quantifies over. However, even if these sets happened to coincide² there are still differences between universal quantification and infinite conjunction. The key is that universal quantification requires the predicate being quantified over to hold uniformly, while infinite conjunction does not. It just so happens that for the standard set-theoretic semantics of classical first-order logic this “uniformity” constraint is degenerate. However, even for classical first-order logic, this notion of uniformity will be relevant.

Classical Semantic View

I want to start in the context where this identification is closest to being true, so I can show where the idea comes from. The summary of this section is that the standard, classical, set-theoretic semantics of universal quantification is equivalent to an infinitary generalization of the semantics of conjunction. The issue is “infinitary generalization of the semantics of conjunction” isn’t the same as “semantics of infinitary conjunction”.

The standard set-theoretic semantics of classical first-order logic interprets each formula, |\varphi|, as a subset of |D^{\mathsf{fv}(\varphi)}| where |D| is a given domain set and |\mathsf{fv}| computes the (necessarily finite) set of free variables of |\varphi|. Traditionally, |D^{\mathsf{fv}(\varphi)}| would be identified with |D^n| where |n| is the cardinality of |\mathsf{fv}(\varphi)|. This involves an arbitrary mapping of the free variables of |\varphi| to the numbers |1| to |n|. The semantics of a formula then becomes an |n|-ary set-theoretic relation.

The interpretation of binary conjunction is straightforward:

\[\den{\varphi \land \psi} = \den{\varphi} \cap \den{\psi}\]

where |\den{\varphi}| stands for the interpretation of the formula |\varphi|. To be even more explicit, I should index this notation by a structure which specifies the domain, |D|, as well as the interpretations of any predicate or function symbols, but we’ll just consider this fixed but unspecified.

The interpretation of universal quantification is more complicated but still fairly straightforward:

\[\den{\forall x.\varphi} = \bigcap_{d \in D}\left\{\bar y|_{\mathsf{fv}(\varphi) \setminus \{x\}} \mid \bar y \in \den{\varphi} \land \bar y(x) = d\right\}\]

Set-theoretically, we have:

\[\begin{align} \bar z \in \bigcap_{d \in D}\left\{\bar y|_{\mathsf{fv}(\varphi) \setminus \{x\}} \mid \bar y \in \den{\varphi} \land \bar y(x) = d\right\} \iff & \forall d \in D. \bar z \in \left\{\bar y|_{\mathsf{fv}(\varphi) \setminus \{x\}} \mid \bar y \in \den{\varphi} \land \bar y(x) = d\right\} \\ \iff & \forall d \in D. \exists \bar y \in \den{\varphi}. \bar z = \bar y|_{\mathsf{fv}(\varphi) \setminus \{x\}} \land \bar y(x) = d \\ \iff & \forall d \in D. \bar z[x \mapsto d] \in \den{\varphi} \end{align}\]

where |f[x \mapsto c]| extends a function |f \in D^{S}| to a function in |D^{S \cup \{x\}}| via |f[x \mapsto c](v) = \begin{cases}c, &\textrm{ if }v = x \\ f(v), &\textrm{ if }v \neq x\end{cases}|. The final |\iff| arises because |\bar z[x \mapsto d]| is the unique function which extends |\bar z| to the desired domain such that |x| is mapped to |d|. Altogether, this illustrates our desired semantics of the interpretation of |\forall x.\varphi| being the interpretations of |\varphi| which hold when |x| is interpreted as any element of the domain.

This demonstrates the summary that the semantics of quantification is an infinitary version of the semantics of conjunction, as |\bigcap| is an infinitary version of |\cap|. But even here there are substantial cracks in this perspective.

Infinitary Logic

The first problem is that we don’t have an infinitary conjunction so saying universal quantification is essentially infinitary conjunction doesn’t make sense. However, it’s easy enough to formulate the syntax and semantics of infinitary conjunction (assuming we have a meta-theoretic notion of sets).

Syntactically, for a (meta-theoretic) set |I| and an |I|-indexed family of formulas |\{\varphi_i\}_{i \in I}|, we have the infinitary conjunction |\bigwedge_{i \in I} \varphi_i|.

The set-theoretic semantics of this connective is a direct generalization of the binary conjunction case:

\[\bigden{\bigwedge_{i \in I}\varphi_i} = \bigcap_{i \in I}\den{\varphi_i}\]

If |I = \{1,2\}|, we recover exactly the binary conjunction case.

Equipped with a semantics of actual infinite conjunction, we can compare to the semantics of universal quantification case and see where things go wrong.

The first problem is that it makes no sense to choose |I| to be |D|. The formula |\bigwedge_{i \in I} \varphi_i| can be interpreted with respect to many different domains. So any particular choice of |D| would be wrong for most semantics. This is assuming that our syntax’s meta-theoretic sets were the same as our semantics’ meta-theoretic sets, which need not be the case at all³.

An even bigger problem is that infinitary conjunction expects a family of formulas while with universal quantification has just one. This is one facet of the uniformity I mentioned. Universal quantification has one formula that is interpreted a single way (with respect to the given structure). The infinitary intersection expression is computing a set out of this singular interpretation. Infinitary conjunction, on the other hand, has a family of formulas which need have no relation to each other. Each of these formulas is independently interpreted and then all those separate interpretations are combined with an infinitary intersection. The problem we have is that there’s generally no way to take a formula |\varphi| with free variable |x| and an element |d \in D| and make a formula |\varphi_d| with |x| not free such that |\bar y[x \mapsto d] \in \den{\varphi} \iff \bar y \in \den{\varphi_d}|. A simple cardinality argument shows that: there are only countably many (finitary) formulas, but there are plenty of uncountable domains. This is why |\omega|-inconsistency is possible. We can easily have elements in the domain which cannot be captured by any formula.

Syntactic View

Instead of taking a semantic view, let’s take a syntactic view of universal quantification and infinitary conjunction, i.e. let’s compare the rules that characterize them. As before, the first problem we have is that traditional first-order logic does not have infinitary conjunction, but we can easily formulate what the rules would be.

The elimination rules are superficially similar but have subtle but important distinctions:

\[\frac{\Gamma \vdash \forall x.\varphi}{\Gamma \vdash \varphi[x \mapsto t]}\forall E,t \qquad \frac{\Gamma \vdash \bigwedge_{i \in I} \varphi_i}{\Gamma \vdash \varphi_j}{\wedge}E,j\] where |t| is a term, |j| is an element of |I|, and |\varphi[x \mapsto t]| corresponds to syntactically substituting |t| for |x| in |\varphi| in a capture-avoiding way. A first, not-so-subtle distinction is if |I| is an infinite set, then |\bigwedge_{i \in I}\varphi_i| is an infinitely large formula. Another pretty obvious issue is universal quantification is restricted to instantiating terms while |I| stands for either an arbitrary (meta-theoretic) set or it may stand for some particular (meta-theoretic) set, e.g. |\mathbb N|. Either way, it is typically not the set of terms of the logic.

Arguably, this isn’t an issue since the claim isn’t that every infinite conjunction corresponds to a universal quantification, but only that universal quantification corresponds to some infinite conjunction. The set of terms is a possible choice for |I|, so that shouldn’t be a problem. Well, whether it’s a problem or not depends on how you set up the syntax of the language. In my preferred way of handling the syntax of logical formulas, I index each formula by the set of free variables that may occur in that formula. This means the set of terms varies with the set of possible free variables. Writing |\vdash_V \varphi| to mean |\varphi| is well-formed and provable in a context with free variables |V|, then we would want the following rule:

\[\frac{\vdash_V \varphi}{\vdash_U \varphi}\] where |V \subseteq U|. This simply states that if a formula is provable, it should remain provable even if we add more (unused) free variables. This causes a problem with having an infinitary conjunction indexed by terms. Writing |\mathsf{Term}(V)| for the set of terms with (potential) free variables in |V|, then while |\vdash_V \bigwedge_{t \in \mathsf{Term}(V)}\varphi_t| might be okay, this would also lead to |\vdash_U \bigwedge_{t \in \mathsf{Term}(V)}\varphi_t| which would also hold but would no longer correspond to universal quantification in a context with free variables in |U|. This really makes a difference. For example, for many theories, such as the usual presentation of ZFC, |\mathsf{Term}(\varnothing) = \varnothing|, i.e. there are no closed terms. As such, |\vdash_\varnothing \forall x.\bot| is neither provable (which we wouldn’t expect it to be) nor refutable without additional axioms. On the other hand, |\bigwedge_{i \in \varnothing}\bot| is |\top| and thus trivially provable. If we consider |\vdash_{\{y\}} \forall x.\bot| next, it becomes refutable. This doesn’t contradict our earlier rule about adding free variables because |\vdash_\varnothing \forall x.\bot| wasn’t provable and so the rule says nothing. On the other hand, that rule does require |\vdash_{\{y\}} \bigwedge_{i \in \varnothing}\bot| to be provable, and it is. Of course, it no longer corresponds to |\forall x.\bot| with this set of free variables. The putative corresponding formula would be |\bigwedge_{i \in \{y\}}\bot| which is indeed refutable.

With the setup above, we can’t get the elimination rule for |\bigwedge| to correspond to the elimination rule for |\forall|, because there isn’t a singular set of terms. However, a more common if less clean approach is to allow all free variables all the time, i.e. to fix a single countably infinite set of variables once and for all. This would “resolve” this problem.

The differences in the introduction rules are more stark. The rules are:

\[\frac{\Gamma \vdash \varphi \quad x\textrm{ not free in }\Gamma}{\Gamma \vdash \forall x.\varphi}\forall I \qquad \frac{\left\{\Gamma \vdash \varphi_i \right\}_{i \in I}}{\Gamma \vdash \bigwedge_{i \in I}\varphi_i}{\wedge}I\]

Again, the most blatant difference is that (when |I| is infinite) |{\wedge}I| corresponds to an infinitely large derivation. Again, the uniformity aspects show through. |\forall I| requires a single derivation that will handle all terms, whereas |{\wedge}I| allows a different derivation for each |i \in I|.

We don’t run into the same issue as in the semantic view with needing to turn elements of the domain into terms/formulas. Given a formula |\varphi| with free variable |x|, we can easily make a formula |\varphi_t| for every term |t|, namely |\varphi_t = \varphi[x \mapsto t]|. We won’t have the issue that leads to |\omega|-inconsistency because |\forall x.\varphi| is derivable from |\bigwedge_{t \in \mathsf{Term}(V)}\varphi[x \mapsto t]|. Of course, the reason this is true is because one of the terms in |\mathsf{Term}(V)| will be a variable not occurring in |\Gamma| allowing us to derive the premise of |\forall I|. On the other hand, if we choose |I = \mathsf{Term}(\varnothing)|, i.e. only consider closed terms, which is what the |\omega| rule in arithmetic is doing, then we definitely can get |\omega|-inconsistency-like situations. Most notably, in the case of theories, like ZFC, which have no closed terms.

Constructive View

A constructive perspective allows us to accentuate the contrast between universal quantification and infinitary conjunction even more as well as bring more clarity to the notion of uniformity.

We’ll start with the BHK interpretation of Intuitionistic logic and specifically a realizabilty interpretation. For this, we’ll allow infinitary conjunction only for |I = \mathbb N|.

I’ll write |n\textbf{ realizes }\varphi| for the statement that the natural number |n| realizes the formula |\varphi|. As in the linked articles, we’ll need a computable pairing function which computably encodes a pair of natural numbers as a natural number. I’ll just write this using normal pairing notation, i.e. |(n,m)|. We’ll also need Gödel numbering to computably map a natural number |n| to a computable function |f_n|.

\[\begin{align} (n_0, n_1)\textbf{ realizes }\varphi_1 \land \varphi_2 \quad & \textrm{if and only if} \quad n_0\textbf{ realizes }\varphi_0\textrm{ and } n_1\textbf{ realizes }\varphi_1 \\ n\textbf{ realizes }\forall x.\varphi \quad & \textrm{if and only if}\quad \textrm{for all }m, f_n(m)\textbf{ realizes }\varphi[x \mapsto m] \\ (k, n_k)\textbf{ realizes }\varphi_1 \lor \varphi_2 \quad & \textrm{if and only if} \quad k \in \{0, 1\}\textrm{ and }n_k\textbf{ realizes }\varphi_k \\ n\textbf{ realizes }\neg\varphi \quad & \textrm{if and only if} \quad\textrm{there is no }m\textrm{ such that }m\textbf{ realizes }\varphi \end{align}\]

This example illustrates the uniformity constraint. Assuming a traditional, classical meta-language, e.g. ZFC, then it is the case that |(\varphi\lor\neg\varphi)[x \mapsto m]| is realized for each |m| in the case where |\varphi| is asserting the halting of the |x|-th Turing machine⁴. But this interpretation of universal quantification requires not only that the quantified formula holds for all naturals, but also that we can computably find this out.

It’s clear that trying to formulate a notion of infinitary conjunction with regards to realizability would require using something other than natural numbers as realizers if we just directly generalize the finite conjunction case. For example, we might use potentially infinite sequences of natural numbers as realizers. Regardless, the discussion of the previous example makes it clear an interpretation of infinitary conjunction can’t be done in standard computability⁵, while, obviously, universal quantification can.

Categorical View

The categorical semantics of universal quantification and conjunction are quite different which also suggests that they are not related, at least not in some straightforward way.

One way to get to categorical semantics is to restate traditional, set-theoretic semantics in categorical terms. Traditionally, the semantics of a formula is a subset of some product of the domain set, one for each free variable. Categorically, that suggests we want finite products and the categorical semantics of a formula should be a subobject of a product of some object representing the domain.

Conjunction is traditionally represented via intersection of subsets, and categorically we form the intersection of subobjects via pulling back. So to support finite conjunctions, we need our category to additionally have finite pullbacks of monomorphisms. Infinitary conjunctions simply require infinitely wide pullbacks of monomorphisms. However, we can start to see some cracks here. What does it mean for a pullback to be infinitely wide? It means the obvious thing; namely, that we have an infinite set of monomorphisms sharing a codomain, and we’ll take the limit of this diagram. The key here, though, is “set”. Regardless of whatever the objects of our semantic category are, the infinitary conjunctions are indexed by a set.

To talk about the categorical semantics of universal quantification, we need to bring to the foreground some structure that we have been leaving – and traditionally accounts do leave – in the background. Before, I said the semantics of a formula, |\varphi|, depends on the free variables in that formula, e.g. if |D| is our domain object, then the semantics of a formula with three free variables would be a subobject of |\prod_{v \in \mathsf{fv}(\varphi)}D \cong D\times D \times D| which I’ll continue to write as |D^{\mathsf{fv}(\varphi)}| though now it will be interpreted as a product rather than a function space. For |\mathbf{Set}|, this makes no difference. It would be more accurate to say that a formula can be given semantics in any product of the domain object indexed by any superset of the free variables. This is just to say that a formula doesn’t need to use every free variable that is available. Nevertheless, even if it is induced by the same formula, a subobject of |D^{\mathsf{fv}(\varphi)}| is a different subobject than a subobject of |D^{\mathsf{fv}(\varphi) \cup \{u\}}| where |u| is a variable not free in |\varphi|, so we need a way of relating the semantics of formulas considered with respect to different sets of free variables.

To do this, we will formulate a category of contexts and index our semantics by it. Fix a category |\mathcal C| and an object |D| of |\mathcal C|. Our category of contexts, |\mathsf{Ctx}|, will be the full subcategory of |\mathcal C| with objects of the form |D^S| where |S| is a finite subset of |V|, a fixed set of variables. We’ll assume these products exist, though typically we’ll just assume that |\mathcal C| has all finite products. From here, we use the |\mathsf{Sub}| functor. |\mathsf{Sub} : \mathsf{Ctx}^{op} \to \mathbf{Pos}| maps an object of |\mathsf{Ctx}| to the poset of its subobjects as objects of |\mathcal C|⁶. Now an arrow |f : D^{\{x,y,z,w\}} \to D^{\{x,y,z\}}| would induce a monotonic function |\mathsf{Sub}(f) : \mathsf{Sub}(D^{\{x,y,z\}}) \to \mathsf{Sub}(D^{\{x,y,z,w\}})|. This is defined for each subobject by pulling back a representative monomorphism of that subobject along |f|. Arrows of |\mathsf{Ctx}| are the semantic analogues of substitutions, and |\mathsf{Sub}(f)| applies these “substitutions” to the semantics of formulas.

Universal quantification is then characterized as the (indexed) right adjoint (Galois connection in this context) of |\mathsf{Sub}(\pi^x)| where |\pi^x : D^S \to D^{S \setminus \{x\}}| is just projection. The indexed nature of this adjoint leads to Beck-Chevalley conditions reflecting the fact universal quantification should respect substitution. |\mathsf{Sub}(\pi^x)| corresponds to adding |x| as a new, unused free variable to a formula. Let |U| be a subobject of |D^{S \setminus \{x\}}| and |V| a subobject of |D^S|. Furthermore, write |U \sqsubseteq U’| to indicate that |U| is a subobject of the subobject |U’|, i.e. that the monos that represent |U| factor through the monos that represent |U’|. The adjunction then states: \[\mathsf{Sub}(\pi^x)(U) \sqsubseteq V\quad \textrm{if and only if}\quad U \sqsubseteq \forall_x(V)\] The |\implies| direction is a fairly direct semantic analogue of the |\forall I| rule: \[\frac{\Gamma \vdash \varphi\quad x\textrm{ not free in }\Gamma}{\Gamma \vdash \forall x.\varphi}\] Indeed, it is easy to show that the converse of this rule is derivable with |\forall E| validating the semantic “if and only if”. To be clear, the full adjunction is natural in |U| and |V| and indexed, effectively, in |S|.

Incidentally, we’d also want the semantics of infinite conjunctions to respect substitution, so they too have a Beck-Chevalley condition they satisfy and give rise to an indexed right adjoint.

It’s hard to even compare the categorical semantics of infinitary conjunction and universal quantification, let alone conflate them, even when |\mathcal C = \mathbf{Set}|. This isn’t too surprising as these semantics work just fine for constructive logics where, as illustrated earlier, these can be semantically distinct. As mentioned, both of these constructs can be described by indexed right adjoints. However, they are adjoints between very different indexed categories. If |\mathcal M| is our indexed category (above it was |\mathsf{Sub}|), then we’ll have |I|-indexed products if |\Delta_{\mathcal M} : \mathcal M \to [DI, -] \circ \mathcal M| has an indexed right adjoint where |D : \mathbf{Set} \to \mathbf{cat}| is the discrete (small) category functor. For |\mathcal M| to have universal quantification, we need an indexed right adjoint to an indexed functor |\mathcal M \circ \mathsf{cod} \circ \iota \to \mathcal M \circ \mathsf{dom} \circ \iota| where |\iota : s(\mathsf{Ctx}) \hookrightarrow \mathsf{Ctx}^{\to}| is the full subcategory of the arrow category |\mathsf{Ctx}^{\to}| consisting of just the projections.

Conclusion

My hope is that the preceding makes it abundantly clear that viewing universal quantification as some kind of special “infinite conjunction” is not sensible even approximately. To do so is to seriously misunderstand universal quantification. Most discussions “equating” them involve significant conflations of syntax and semantics where a specific choice of domain is fixed and elements of that specific domain are used as terms.

A secondary goal was to illustrate an aspect of logic from a variety of perspectives and illustrate some of the concerns in meta-logical reasoning. For example, quantifiers and connectives are syntactical concepts and thus can’t depend on the details of the semantic domain. As another example, better perspectives on quantifiers and connectives are more robust to weakening the logic. I’d say this is especially true when going from classical to constructive logic. Structural proof theory and categorical semantics are good at formulating logical concepts modularly so that they still make sense in very weak logics.

Unfortunately, the traditional trend towards minimalism strongly pushes in the other direction leading to the exploiting of every symmetry and coincidence a stronger logic (namely classical logic) provides producing definitions that don’t survive even mild weakening of the logic⁷. The attempt to identify universal quantification with infinite conjunction here takes that impulse too far and doesn’t even work in classical logic as demonstrated. While there’s certainly value in recognizing redundancy, I personally find minimizing logical assumptions far more important and valuable than minimizing (primitive) logical connectives.

“Universal statements are true if they are true for every individual in the world. They can be thought of as an infinite conjunction,” from some random AI lecture notes. You can find many others.↩︎
The domain doesn’t even need to be a set.↩︎
For example, we may formulate our syntax in a second-order arithmetic identifying our syntax’s meta-theoretic sets with unary predicates, while our semantics is in ZFC. Just from cardinality concerns, we know that there’s no way of injectively mapping every ZFC set to a set of natural numbers.↩︎
It’s probably worth pointing out that not only will this classical meta-language not tell us whether it’s |\varphi[x \mapsto m]| or |\neg\varphi[x \mapsto m]| that holds for every specific |m|, but it’s easy to show (assuming consistency of ZFC) that |\varphi[x \mapsto m]| is independent of ZFC for specific values of |m|. For example, it’s easy to make a Turing machine that halts if and only if it finds a contradiction in the theory of ZFC.↩︎
Interestingly, for some models of computation, e.g. ones based on Turing machines, infinitary disjunction, or, specifically, |\mathbb N|-ary disjunction is not problematic. Given an infinite sequence of halting Turing machines, we can interleave their execution such that every Turing machine in the sequence will halt at some finite time. Accordingly, extending the definition of disjunction in realizability to the |\mathbb N|-ary case does not run into any of the issues that |\mathbb N|-ary conjunction has and is completely unproblematic. We just let |k| be an arbitrary natural instead of just |\{0, 1\}|.↩︎
This is a place we could generalize the categorical semantics further. There’s no reason we need to consider this particular functor. We could consider other functors from |\mathsf{Ctx}^{op} \to \mathbf{Pos}|, i.e. other indexed |(0,1)|-categories. This setup is called a hyperdoctrine ↩︎
The most obvious example of this is defining quantifiers and connectives in terms of other connectives particularly when negation is involved. A less obvious example is the overwhelming focus on |\mathbf 2|-valued semantics when classical logic naturally allows arbitrary Boolean-algebra-valued semantics.↩︎

What is the coproduct of two groups?

2023-12-21 18:47:57-08:00

Introduction

The purpose of this article is to answer the question: what is the coproduct of two groups? The approach, however, will be somewhat absurd. Instead of simply presenting a construction and proving that it satisfies the appropriate universal property, I want to find the general answer and simply instantiate it for the case of groups.

Specifically, this will be a path through the theory of Lawvere theories and their models with the goal of motivating some of the theory around it in pursuit of the answer to this relatively simple question.

If you really just want to know the answer to the title question, then the construction is usually called the free product and is described on the linked Wikipedia page.

Groups as Models of a Lawvere Theory

A group is a model of an equational theory. This means a group is described by a set equipped with a collection of operations that must satisfy some equations. So we’d have a set, |G|, and operations |\mathtt{e} : () \to G|, |\mathtt{i} : G \to G|, and |\mathtt{m} : G \times G \to G|. These operations satisfy the equations, \[ \begin{align} \mathtt{m}(\mathtt{m}(x, y), z) = \mathtt{m}(x, \mathtt{m}(y, z)) \\ \mathtt{m}(\mathtt{e}(), x) = x = \mathtt{m}(x, \mathtt{e}()) \\ \mathtt{m}(\mathtt{i}(x), x) = \mathtt{e}() = \mathtt{m}(x, \mathtt{i}(x)) \end{align} \] universally quantified over |x|, |y|, and |z|.

These equations can easily be represented by commutative diagrams, i.e. equations of compositions of arrows, in any category with finite products of an object, |G|, with itself. For example, the left inverse law becomes: \[ \mathtt{m} \circ (\mathtt{i} \times id_G) = \mathtt{e} \circ {!}_G \] where |{!}_G : G \to 1| is the unique arrow into the terminal object corresponding to the |0|-ary product of copies of |G|.

One nice thing about this categorical description is that we can now talk about a group object in any category with finite products. Even better, we can make this pattern describing what a group is first-class. The (Lawvere) theory of a group is a (small) category, |\mathcal{T}_{\mathbf{Grp}}| whose objects are an object |\mathsf{G}| and all its powers, |\mathsf{G}^n|, where |\mathsf{G}^0 = 1| and |\mathsf{G}^{n+1} = \mathsf{G} \times \mathsf{G}^n|. The arrows consist of the relevant projection and tupling operations, the three arrows above, |\mathsf{m} : \mathsf{G}^2 \to \mathsf{G}^1|, |\mathsf{i} : \mathsf{G}^1 \to \mathsf{G}^1|, |\mathsf{e} : \mathsf{G}^0 \to \mathsf{G}^1|, and all composites that could be made with these arrows. See my previous article for a more explicit description of this, but it should be fairly intuitive.

An actual group is then, simply, a finite-product-preserving functor |\mathcal{T}_{\mathbf{Grp}} \to \mathbf{Set}|. It must be finite-product-preserving so the image of |\mathsf{m}| actually gets sent to a binary function and not some function with some arbitrary domain. The category, |\mathbf{Grp}|, of groups and group homomorphisms is equivalent to the category |\mathbf{Mod}_{\mathcal{T}_{\mathbf{Grp}}}| which is defined to be the full subcategory of the category of functors from |\mathcal{T}_{\mathbf{Grp}} \to \mathbf{Set}| consisting of the functors which preserve finite products. While we’ll not explore it more here, we could use any category with finite products as the target, not just |\mathbf{Set}|. For example, we’ll show that |\mathbf{Grp}| has finite products, and in fact all limits and colimits, so we can talk about the models of the theory of groups in the category of groups. This turns out to be equivalent to the category of Abelian groups via the well-known Eckmann-Hilton argument.

A Bit of Organization

First, a construction that will become even more useful later. Given any category, |\mathcal{C}|, we define |\mathcal{C}^{\times}|, or, more precisely, an inclusion |\sigma : \mathcal{C} \hookrightarrow \mathcal{C}^{\times}| to be the free category-with-finite-products generated from |\mathcal{C}|. Its universal property is: given any functor |F : \mathcal{C} \to \mathcal{E}| into a category-with-finite-products |\mathcal E|, there exists a unique finite-product-preserving functor |\bar{F} : \mathcal{C}^{\times} \to \mathcal E| such that |F = \bar{F} \circ \sigma|.

An explicit construction of |\mathcal{C}^{\times}| is the following. Its objects consist of (finite) lists of objects of |\mathcal{C}| with concatenation as the categorical product and the empty list as the terminal object. The arrows are tuples with a component for each object in the codomain list. Each component is a pair of an index into the domain list and an arrow from the corresponding object in the domain list to the object in the codomain list for this component. For example, the arrow |[A, B] \to [B, A]| would be |((1, id_B), (0, id_A))|. Identity and composition is straightforward. |\sigma| then maps each object to a singleton list and each arrow |f| to |((0, f))|.

Like most free constructions, this construction completely ignores any finite products the original category may have had. In particular, we want the category |\mathcal{T}_{\mathbf{Set}} = \mathbf{1}^{\times}|, called the theory of a set. The fact that the one object of the category |\mathbf{1}| is terminal has nothing to do with its image via |\sigma| which is not the terminal object.

We now define the general notion of a (Lawvere) theory as a small category with finite products, |\mathcal{T}|, equipped with a finite-product-preserving, identity-on-objects functor |\mathcal{T}_{\mathbf{Set}} \to \mathcal{T}|. A morphism of (Lawvere) theories is a finite-product-preserving functor that preserves these inclusions a la: \[ \xymatrix { & \mathcal{T}_{\mathbf{Set}} \ar[dl] \ar[dr] & \\ \mathcal{T}_1 \ar[rr] & & \mathcal{T}_2 } \]

The identity-on-objects aspect of the inclusion of |\mathcal{T}_{\mathbf{Set}}| along with finite-product-preservation ensures that the only objects in |\mathcal{T}| are powers of a single object which we’ll generically call |\mathsf{G}|. This is sometimes called the “generic object”, though the term “generic object” has other meanings in category theory.

A model of a theory (in |\mathbf{Set}|) is then simply a finite-product-preserving functor into |\mathbf{Set}|. |\mathbf{Mod}_{\mathcal{T}}| is the full subcategory of functors from |\mathcal{T} \to \mathbf{Set}| which preserve finite products. The morphisms of models are simply the natural transformations. As an exercise, you should show that for a natural transformation |\tau : M \to N| where |M| and |N| are two models of the same theory, |\tau_{\mathsf{G}^n} = \tau_{\mathsf{G}}^n|.

The Easy Categorical Constructions

This relatively simple definition of model already gives us a large swathe of results. An easy result in basic category theory is that (co)limits in functor categories are computed pointwise whenever the corresponding (co)limits exist in the codomain category. In our case, |\mathbf{Set}| has all (co)limits, so all categories of |\mathbf{Set}|-valued functors have all (co)limits and they are computed pointwise.

However, the (co)limit of finite-product-preserving functors into |\mathbf{Set}| may not be finite-product-preserving, so we don’t immediately get that |\mathbf{Mod}_{\mathcal{T}}| has all (co)limits (and they are computed pointwise). That said, finite products are limits and limits commute with each other, so we do get that |\mathbf{Mod}_{\mathcal{T}}| has all limits and they are computed pointwise. Similarly, sifted colimits, which are colimits that commute with finite products in |\mathbf{Set}| also exist and are computed pointwise in |\mathbf{Mod}_{\mathcal{T}}|. Sifted colimits include the better known filtered colimits which commute with all finite limits.

I’ll not elaborate on sifted colimits. We’re here for (finite) coproducts, and, as you’ve probably already guessed, coproducts are not sifted colimits.

When the Coproduct of Groups is Easy

There is one class of groups whose coproduct is easy to compute for general reasons: the free groups. The free group construction, like most “free constructions”, is a left adjoint and left adjoints preserve colimits, so the coproduct of two free groups is just the free group on the coproduct, i.e. disjoint union, of their generating sets. We haven’t defined the free group yet, though.

Normally, the free group construction would be defined as left adjoint to the underlying set functor. We have a very straightforward way to define the underlying set functor. Define |U : \mathbf{Mod}_{\mathcal T} \to \mathbf{Set}| as |U(M) = M(\mathsf{G}^1)| and |U(\tau) = \tau_{\mathsf{G}^1}|. Identifying |\mathsf{G}^1| with the functor |\mathsf G : \mathbf{1} \to \mathcal{T}| we have |U(M) = M \circ \mathsf{G}| giving a functor |\mathbf{1} \to \mathbf{Set}| which we identify with a set. The left adjoint to precomposition by |\mathsf{G}| is the left Kan extension along |\mathsf{G}|.

We then compute |F(S) = \mathrm{Lan}_{\mathsf{G}}(S) \cong \int^{{*} : \mathbf{1}} \mathcal{T}(\mathsf{G}({*}), {-}) \times S({*}) \cong \mathcal{T}(\mathsf{G}^1, {-}) \times S|. This is the left Kan extension and does form an adjunction but not with the category of models because the functor produced by |F(S)| does not preserve finite products. We should have |F(S)(\mathsf{G}^n) \cong F(S)(\mathsf{G})^n|, but substituting in the definition of |F(S)| clearly does not satisfy this. For example, consider |F(\varnothing)(\mathsf{G}^0)|.

We can and will show that the left Kan extension of a functor into |\mathbf{Set}| preserves finite products when the original functor did. Once we have that result we can correct our definition of the free construction. We simply replace |S : \mathbf{1} \to \mathbf{Set}| with a functor that does preserve finite products, namely |\bar{S} : \mathbf{1}^{\times} \to \mathbf{Set}|. Of course, |\mathbf{1}^{\times}| is exactly our definition of |\mathcal{T}_{\mathbf{Set}}|. We see now that a model of |\mathcal{T}_{\mathbf{Set}}| is the same thing as having a set, hence the name. Indeed, we have an equivalence of categories between |\mathbf{Set}| and |\mathbf{Mod}_{\mathcal{T}_{\mathbf{Set}}}|. (More generally, this theory is called “the theory of an object” as we may consider models in categories other than |\mathbf{Set}|, and we’ll still have this relation.)

The correct definition of |F| is |F(S) = \mathrm{Lan}_{\iota}(\bar S) \cong \int^{\mathsf{G}^n:\mathcal{T}_{\mathbf{Set}}} \mathcal{T}(\iota(\mathsf{G}^n), {-}) \times \bar{S}(\mathsf{G}^n) \cong \int^{\mathsf{G}^n:\mathcal{T}_{\mathbf{Set}}} \mathcal{T}(\iota(\mathsf{G}^n), {-}) \times S^n| where |\iota : \mathcal{T}_{\mathbf{Set}} \to \mathcal{T}| is the inclusion we give as part of the definition of a theory. We can also see |\iota| as |\bar{\mathsf{G}}|.

We can start to see the term algebra in this definition. An element of |F(S)| is a choice of |n|, an |n|-tuple of elements of |S|, and a (potentially compound) |n|-ary operation. We can think of an element of |\mathcal{T}(\mathsf{G}^n, {-})| as a term with |n| free variables which we’ll label with the elements of |S^n| in |F(S)|. The equivalence relation in the explicit construction of the coend allows us to swap projections and tupling morphisms from the term to the tuple of labels. For example, it equates a unary term paired with one label with a binary term paired with two labels but where the binary term immediately discards one of its inputs. Essentially, if you are given a unary term and two labels, you can either discard one of the labels or you can make the unary term binary by precomposing with a projection. Similarly for tupling.

It’s still not obvious this definition produces a functor which preserves finite products. As a lemma to help in the proof of that fact, we have a bit of coend calculus.

Lemma 1: Let |F \dashv U : \mathcal{D} \to \mathcal{C}| and |H : \mathcal D^{op} \times \mathcal{C} \to \mathcal{E}|. Then, |\int^C H(FC, C) \cong \int^D H(D, UD)| when one, and thus both, exist. Proof: \[ \begin{align} \mathcal{E}\left(\int^C H(FC, C), {-}\right) & \cong \int_C \mathcal{E}(H(FC, C), {-}) \tag{continuity} \\ & \cong \int_C \int_D [\mathcal{D}(FC, D), \mathcal{E}(H(D, C), {-})] \tag{Yoneda} \\ & \cong \int_C \int_D [\mathcal{C}(C, UD), \mathcal{E}(H(D, C), {-})] \tag{adjunction} \\ & \cong \int_D \int_C [\mathcal{C}(C, UD), \mathcal{E}(H(D, C), {-})] \tag{Fubini} \\ & \cong \int_D \mathcal{E}(H(D, UD), {-}) \tag{Yoneda} \\ & \cong \mathcal{E}\left(\int^D H(D, UD), {-}\right) \tag{continuity} \\ & \square \end{align} \]

Using the adjunction |\Delta \dashv \times : \mathcal{C} \times \mathcal{C}\to \mathcal{C}| gives the following corollary.

Corollary 2: For any |H : \mathcal{C}^{op} \times \mathcal{C}^{op} \times \mathcal{C} \to \mathcal{E}|, \[\int^{C} H(C, C, C) \cong \int^{C_1}\int^{C_2} H(C_1, C_2, C_1 \times C_2)\] when both exists. This allows us to combine two (co)ends into one.

Now our theorem.

Theorem 3: Let |F : \mathcal{T}_1 \to \mathbf{Set}| and |J : \mathcal{T}_1 \to \mathcal{T}_2| where |\mathcal{T}_1| and |\mathcal{T}_2| have finite products. Then |\mathrm{Lan}_J(F)| preserves finite products if |F| does.

Proof: \[ \begin{flalign} \mathrm{Lan}_J(F)(X \times Y) & \cong \int^A \mathcal{T}_2(J(A), X \times Y) \times F(A) \tag{coend formula for left Kan extension} \\ & \cong \int^A \mathcal{T}_2(J(A), X) \times \mathcal{T}_2(J(A), Y) \times F(A) \tag{continuity} \\ & \cong \int^{A_1}\int^{A_2}\mathcal{T}_2(J(A_1), X) \times \mathcal{T}_2(J(A_2), Y) \times F(A_1 \times A_2) \tag{Corollary 2} \\ & \cong \int^{A_1}\int^{A_2}\mathcal{T}_2(J(A_1), X) \times \mathcal{T}_2(J(A_2), Y) \times F(A_1) \times F(A_2) \tag{finite product preservation} \\ & \cong \left(\int^{A_1}\mathcal{T}_2(J(A_1), X) \times F(A_1) \right) \times \left(\int^{A_2}\mathcal{T}_2(J(A_2), Y) \times F(A_2)\right) \tag{commutativity and cocontinuity of $\times$} \\ & \cong \mathrm{Lan}_J(F)(X) \times \mathrm{Lan}_J(F)(Y) \tag{coend formula for left Kan extension} \\ & \square \end{flalign} \]

The Coproduct of Groups

To get general coproducts (and all colimits), we’ll show that |\mathbf{Mod}_{\mathcal{T}}| is a reflective subcategory of |[\mathcal{T}, \mathbf{Set}]|. Write |\iota : \mathbf{Mod}_{\mathcal{T}} \hookrightarrow [\mathcal{T}, \mathbf{Set}]|. If we had a functor |R| such that |R \dashv \iota|, then we have |R \circ \iota = Id| which allows us to quickly produce colimits in the subcategory via |\int^I D(I) \cong R\int^I \iota D(I)|. It’s easy to verify that |R\int^I \iota D(I)| has the appropriate universal property to be |\int^I D(I)|.

We’ll compute |R| by composing two adjunctions. First, we have |\bar{({-})} \dashv \iota({-}) \circ \sigma : \mathbf{Mod}_{\mathcal{T}^{\times}} \to [\mathcal T, \mathbf{Set}]|. This is essentially the universal property of |\mathcal{T}^{\times}|. When |\mathcal{T}| has finite products, which, of course, we’re assuming, then we can use the universal property of |\mathcal{T}^{\times}| to factor |Id_{\mathcal{T}}| into |Id = \bar{Id} \circ \sigma|. The second adjunction is then |\mathrm{Lan}_{\bar{Id}} \dashv {-} \circ \bar{Id} : \mathbf{Mod}_{\mathcal{T}} \to \mathbf{Mod}_{\mathcal{T}^{\times}}|. The left adjoint sends finite-product-preserving functors to finite-product-preserving functors via Theorem 3. The right adjoint is the composition of finite-product-preserving functors.

The composite of the left adjoints is |\iota({-} \circ \bar{Id}) \circ \sigma = \iota({-}) \circ \bar{Id} \circ \sigma = \iota({-})|. The composite of the right adjoint is \[ \begin{align} R(F) & = \mathrm{Lan}_{\bar{Id}}(\bar{F}) \\ & \cong \int^X \mathcal{T}(\bar{Id}(X), {-}) \times \bar{F}(X) \\ & \cong \int^X \mathcal{T}\left(\prod_{i=1}^{\lvert X\rvert} X_i, {-}\right) \times \prod_{i=1}^{\lvert X \rvert} F(X_i) \end{align} \] where we view the list |X : \mathcal{T}^{\times}| as a |\lvert X\rvert|-tuple with components |X_i|.

This construction of the reflector, |R|, is quite similar to the free construction. The main difference is that here we factor |Id| via |\mathcal{T}^{\times}| where there we factored |\mathsf{G} : \mathbf{1} \to \mathcal{T}| via |\mathbf{1}^{\times} = \mathcal{T}_{\mathbf{Set}}|.

Let’s now explicitly describe the coproducts via |R|. As a warm-up, we’ll consider the initial object, i.e. nullary coproducts. We consider |R(\Delta 0)|. Because |0 \times S = 0|, the only case in the coend that isn’t |0| is when |\lvert X \rvert = 0| so the underlying set of the coend reduces to |\mathcal{T}(\mathsf{G}^0, \mathsf{G}^1)|, i.e. the nullary terms. For groups, this is just the unit element. For bounded lattices, it would be the two element set consisting of the top and bottom elements. For lattices without bounds, it would be the empty set. Of course, |R(\Delta 0)| matches |F(0)|, i.e. the free model on |0|.

Next, we consider two models |G| and |H|. First, we compute to the coproduct of |G| and |H| as (plain) functors which is just computed pointwise, i.e. |(G+H)(\mathsf{G}^n) = G(\mathsf{G}^n)+H(\mathsf{G}^n) \cong G(\mathsf{G^1})^n + H(\mathsf{G^1})^n|. Considering the case where |X_i = \mathsf{G}^1| for all |i| and where |\lvert X \rvert = n|, which subsumes all the other cases, we see we have a term with |n| free variables each labelled by either an element of |G| or an element of |H|. If we normalized the term into a list of variables representing a product of variables, then we’d have a essentially a word as described on the Wikipedia page for the free product. If we then only considered quotienting by the equivalences induced by projection and tupling, we’d have the free group on the disjoint union of the underlying sets of the |G| and |H|. However, for |R|, we quotient also by the action of the other operations. The lists of objects with |X_i \neq \mathsf{G}^1| come in here to support equating non-unary ops. For example, a pair of the binary term |\mathsf{m}| and the 2-tuple of elements |(g_1, g_2)| for |g_1, g_2 \in U(G)|, will be equated with the pair of the unary term |id| and the 1-tuple of elements |(g)| where |g = g_1 g_2| in |G|. Similarly for |H| and the other operations (and terms generally). Ultimately, the quotient identifies every element with an element that consists of a pair of a term that is a fully right associated set of multiplications ending in a unit where each variable is labelled with an element from |U(G)| or |U(H)| in an alternating fashion. These are the reduced words in the Wikipedia article.

This, perhaps combined with a more explicit spelling out of the equivalence relation, should make it clear that this construction does actually correspond to the usual free product construction. The name “free product” is also made a bit clearer, as we are essentially building the free group on the disjoint union of the underlying sets of the inputs, and then quotienting that to get the result. While there are some categorical treatments of normalization, the normalization arguments used above were not guided by the category theory. The (underlying sets of the) models produced by the above |F| and |R| functors big equivalence classes of “terms”. The above constructions provide no guidance for finding “good” representatives of those equivalence classes.

Conclusions

This was, of course, a very complex and round-about way of answering the title question. Obviously the real goal was illustrating these ideas and illustrating how “abstract” categorical reasoning can lead to relatively “concrete” results. Of course, these concrete constructions are derived from other concrete constructions, usually concrete constructions of limits and colimits in |\mathbf{Set}|. That said, category theory allows you to get a lot from a small collection of relatively simple concrete constructions. Essentially, category theory is like a programming language with a small set of primitives. You can write “abstract” programs in terms of that language, but once you provide an “implementation” for those primitives, all those “abstract” programs can be made concrete.

I picked (finite) coproducts, in particular, as they are where a bunch of complexity suddenly arises when studying algebraic objects categorically, but (finite) coproducts are still fairly simple.

For Lawvere theories, one thing to note is that the Lawvere theory is independent of the presentation. Any presentation of the axioms of a group would give rise to the same Lawvere theory. Of course, to explicitly describe the category would end up requiring a presentation of the category anyway. Beyond Lawvere theories are algebraic theories and algebraic categories, and further into essentially algebraic theories and categories. These extend to the multi-sorted case and then into the finite limit preserving case. The theory of categories, for example, cannot be presented as a Lawvere theory but is an essentially algebraic theory. There’s much more that can be said even about specifically Lawvere theories, both from a theoretical perspective, starting with monadicity, and from practical perspectives like algebraic effects.

Familiarity with the properties of functor categories, and especially categories of (co)presheaves was behind many of these results, and many that I only mentioned in passing. It is always useful to learn more about categories of presheaves. That said, most of the theory works in an enriched context and often without too many assumptions. The fact that all we need to talk about models is for the codomains of the functors to have finite products allows quite broad application. We can talk about algebraic objects almost anywhere. For example, sheaves of rings, groups, etc. can equivalently be described as models of the theories of rings, groups, etc. in sheaves of sets.

Kan extensions unsurprisingly played a large role, as they almost always do when you’re talking about (co)presheaves. One of the motivations for me to make this article was a happy confluence of things I was reading leading to a nice, coend calculus way of describing and proving finite-product-preservation for free models.

Thinking about what exactly was going on around finite-product-preservation was fairly interesting. The incorrect definition of the free model functor could be corrected in a different (though, of course, ultimately equivalent) way. The key is to remember that the coend formula for the left Kan extension is generally a copower and not a cartesian product. The copower for |\mathbf{Set}|-valued functors is different from the copower for finite-product-preserving |\mathbf{Set}|-valued functors. For a category with (arbitrary) coproducts, the copower corresponds to the coproduct of a constant family. We get, |F(S) \cong \coprod_{S} \mathcal T(\mathsf{G}^1, {-})| as is immediately evident from |F| being a left adjoint and a set |S| being the coproduct of |1| |S|-many times. For the purposes of this article, this would have been less than satisfying as figuring out what coproducts were was the nominal point.

That said, it isn’t completely unsatisfying as this defines the free model in terms of a coproduct of, specifically, representables and those are more tractable. In particular, an easy and neat exercise is to work out what |\mathcal{T}(\mathsf{G}^n, {-}) + \mathcal{T}(\mathsf{G}^m, {-})| is. Just use Yoneda and work out what must be true of the mapping out property, and remember that the object you’re mapping into preserves finite products. Once you have finite coproducts described, you can get all the rest via filtered colimits, and since those commute with finite products that gives us arbitrary coproducts.

Preserving, Reflecting, and Creating Limits

2023-03-20 22:39:28-07:00

Introduction

This is a brief article about the notions of preserving, reflecting, and creating limits and, by duality, colimits. Preservation is relatively intuitive, but the distinction between reflection and creation is subtle.

Preservation of Limits

Other than that subtlety, preservation is fairly intuitive.

Reflection of Limits versus Creation of Limits

A functor, |F|, reflects limits when whenever the image of a cone is a limiting cone, then the original cone was a limiting cone. For products this would mean that if we had a wedge |A \stackrel{p}{\leftarrow} Z \stackrel{q}{\to} B|, and |FZ| was the product of |FA| and |FB| with projections |Fp| and |Fq|, then |Z| was the product of |A| and |B| with projections |p| and |q|.

A functor, |F|, creates limits when whenever the image of a diagram has a limit, then the diagram itself has a limit and |F| preserves the limiting cones. For products this would mean if |FX| and |FY| had a product, |FX \times FY|, then |X| and |Y| have a product and |F(X \times Y) \cong FX \times FY| via the canonical morphism.

Creation of limits implies reflection of limits since we can just ignore the apex of the cone. While creation is more powerful, often reflection is enough in practice as we usually have a candidate limit, i.e. a cone. Again, this is often not made too explicit.

Example

Consider the posets:

$$\xymatrix{ & & & c \\ X\ar@{}[r]|{\Large{=}} & a \ar[r] & b \ar[ur] \ar[dr] & \\ & & & d \save "1,2"."3,4"*+[F]\frm{} \restore } \qquad \xymatrix{ & & c \\ Y\ar@{}[r]|{\Large{=}} & b \ar[ur] \ar[dr] & \\ & & d \save "1,2"."3,3"*+[F]\frm{} \restore } \qquad \xymatrix{ & c \\ Z\ar@{}[r]|{\Large{=}} & \\ & d \save "1,2"."3,2"*+[F]\frm{} \restore }$$

Failure of reflection

Let |X=\{a, b, c, d\}| with |a \leq b \leq c| and |b \leq d| mapping to |Y=\{b, c, d\}| where |a \mapsto b|. Reflection fails because |a| maps to a meet but is not itself a meet.

Failure of creation

If we change the source to just |Z=\{c, d\}|, then creation fails because |c| and |d| have a meet in the image but not in the source. Reflection succeeds, though, because there are no non-trivial cones in the source, so every cone (trivially) gets mapped to a limit cone. It’s just that we don’t have any cones with both |c| and |d| in them.

In general, recasting reflection and creation of limits for posets gives us: Let |F: X \to Y| be a monotonic function. |F| reflects limits if every lower bound that |F| maps to a meet is already a meet. |F| creates limits if whenever |F[U]| has a meet for |U \subseteq X|, then |U| already had a meet and |F| sends the meet of |U| to the meet of |F[U]|.

Overlaps

2021-01-05 19:46:59-08:00

Introduction

As is usually the case, even if you are not philosophically a constructivist, taking a constructivist perspective can often lead to better definitions and easier to see connections. In this case, constructivism suggests the more positive statement |\exists x. x \in A \land x \in B| be the definition of “overlaps”. However, given that we now have two (constructively) non-equivalent definitions, it is better to introduce notation to abstract from the particular definition. In many cases, it makes sense to have a primitive notion of “overlaps”. Here I will use the notation |A \between B| which is the most common option I’ve seen.

Properties

If we want to characterize these operations via an adjunction, or, more precisely, a Galois connection, we have a slight awkwardness arising from |\subseteq| and |\between| being binary predicates on sets. So, as a first step we’ll identify sets with predicates via, for a set |A|, |\underline A(x) \equiv x \in A|. In terms of predicates, the adjunctions we want are just a special case of the adjunctions characterizing the quantifiers.

\[\underline U(x) \land P \to \underline A(x) \iff P \to U \subseteq A\]

\[U \between B \to Q \iff \underline B(x) \to (\underline U(x) \to Q)\]

What we actually want is a formula of the form |U \between B \to Q \iff B \subseteq (\dots)|. To do this, we need an operation that will allow us to produce a set from a predicate. This is exactly what set comprehension does. For reasons that will become increasingly clear, we’ll assume that |A| and |B| are subsets of a set |X|. We will then consider quantification relative to |X|. The result we get is:

\[\{x \in U \mid P\} \subseteq A \iff \{x \in X \mid x \in U \land P\} \subseteq A \iff P \to U \subseteq A\]

\[U \between B \to Q \iff B \subseteq \{x \in X \mid x \in U \to Q\} \iff B \subseteq \{x \in U \mid \neg Q\}^c\]

The first and last equivalences require additionally assuming |U \subseteq X|. The last equivalence requires classical reasoning. You can already see motivation to limit to subsets of |X| here. First, set complementation, the |(-)^c|, only makes sense relative to some containing set. Next, if we choose |Q \equiv \top|, then the latter formulas state that no matter what |B| is it should be a subset of the expression that follows it. Without constraining to subsets of |X|, this would require a universal set which doesn’t exist in typical set theories.

Choosing |P| as |\top|, |Q| as |\bot|, and |B| as |A^c| leads to the familiar |\neg (U \between A^c) \iff U \subseteq A|, i.e. |U| is a subset of |A| if and only if it doesn’t overlap |A|’s complement.

Incidentally, characterizing |\subseteq| and |\between| in terms of Galois connections, i.e. adjunctions, immediately gives us some properties for free via continuity. We have |U \subseteq \bigcap_{i \in I}A_i \iff \forall i\in I.U \subseteq A_i| and |U \between \bigcup_{i \in I}A_i \iff \exists i \in I.U \between A_i|. This is relative to a containing set |X|, so |\bigcap_{i \in \varnothing}A_i = X|, and |U| and each |A_i| are assumed to be subsets of |X|.

Categorical Perspective

Below I’ll perform a categorical analysis of the situation. I’ll mostly be using categorical notation and perspectives to manipulate normal sets. That said, almost all of what I say will be able to be generalized immediately just by reinterpreting the symbols.

To make things a bit cleaner in the future, and to make it easier to apply these ideas beyond sets, I’ll introduce the concept of a Heyting algebra. A Heyting algebra is a partially ordered set |H| satisfying the following:

|H| has two elements called |\top| and |\bot| satisfying for all |x| in |H|, |\bot \leq x \leq \top|.
We have operations |\land| and |\lor| satisfying for all |x|, |y|, |z| in |H|, |x \leq y \land z| if and only |x \leq y| and |x \leq z|, and similarly for |\lor|, |x \lor y \leq z| if and only |x \leq z| and |y \leq z|.
We have an operation |\to| satisfying for all |x|, |y|, and |z| in |H|, |x \land y \leq z| if and only if |x \leq y \to z|.

For those familiar with category theory, you might recognize this as simply the decategorification of the notion of a bicartesian closed category. We can define the pseudo-complement, |\neg x \equiv x \to \bot|.

Any Boolean algebra is an example of a Heyting algebra where we can define |x \to y| via |\neg x \lor y| where here |\neg| is taken as primitive. In particular, subsets of a given set ordered by inclusion form a Boolean algebra, and thus a Heyting algebra. The |\to| operation can also be characterized by |x \leq y \iff (x \to y) = \top|. This lets us immediately see that for subsets of |X|, |(A \to B) = \{x \in X \mid x \in A \to x \in B\}|. All this can be generalized to the subobjects in any Heyting category.

As the notation suggests, intuitionistic logic (and thus classical logic) is another example of a Heyting algebra.

We’ll write |\mathsf{Sub}(X)| for the partially ordered set of subsets of |X| ordered by inclusion. As mentioned above, this is (classically) a Boolean algebra and thus a Heyting algebra. Any function |f : X \to Y| gives a monotonic function |f^* : \mathsf{Sub}(Y) \to \mathsf{Sub}(X)|. Note the swap. |f^*(U) \equiv f^{-1}(U)|. (Alternatively, if we think of subsets in terms of characteristic functions, |f^*(U) \equiv U \circ f|.) Earlier, we needed a way to turn predicates into sets. In this case, we’ll go the other way and identify truth values with subsets of |1| where |1| stands for an arbitrary singleton set. That is, |\mathsf{Sub}(1)| is the poset of truth values. |1| being the terminal object of |\mathbf{Set}| induces the (unique) function |!_U : U \to 1| for any set |U|. This leads to the important monotonic function |!_U^* : \mathsf{Sub}(1) \to \mathsf{Sub}(U)|. This can be described as |!_U^*(P) = \{x \in U \mid P\}|. Note, |P| cannot contain |x| as a free variable. In particular |!_U^*(\bot) = \varnothing| and |!_U^*(\top) = U|. This monotonic function has left and right adjoints:

\[\exists_U \dashv {!_U^*} \dashv \forall_U : \mathsf{Sub}(U) \to \mathsf{Sub}(1)\]

|\exists_U(A) \equiv \exists x \in U. x \in A| and |\forall_U(A) \equiv \forall x \in U. x \in A|. It’s easily verified that each of these functions are monotonic.¹

It seems like we should be done. These formulas are the formulas I originally gave for |\between| and |\subseteq| in terms of quantifiers. The problem here is that these functions are only defined for subsets of |U|. This is especially bad for interpreting |U \between A| as |\exists_U(A)| as it excludes most of the interesting cases where |U| partially overlaps |A|. What we need is a way to extend |\exists_U| / |\forall_U| beyond subsets of |U|. That is, we need a suitable monotonic function |\mathsf{Sub}(X) \to \mathsf{Sub}(U)|.

Assume |U \subseteq X| and that we have an inclusion |\iota_U : U \hookrightarrow X|. Then |\iota_U^* : \mathsf{Sub}(X) \to \mathsf{Sub}(U)| and |\iota_U^*(A) = U \cap A|. This will indeed allow us to define |\subseteq| and |\between| as |U \subseteq A \equiv \forall_U(\iota_U^*(A))| and |U \between A \equiv \exists_U(\iota_U^*(A))|. We have:

\[\iota_U[-] \dashv \iota_U^* \dashv U \to \iota_U[-] : \mathsf{Sub}(U) \to \mathsf{Sub}(X)\]

We can recover the earlier adjunctions by simply using these two pairs of adjunctions. \[\begin{align} U \between B \to Q & \iff \exists_U(\iota_U^*(B)) \to Q \\ & \iff \iota_U^*(B) \subseteq {!}_U^*(Q) \\ & \iff B \subseteq U \to \iota_U[{!}_U^*(Q)] \\ & \iff B \subseteq \{x \in X \mid x \in U \to Q\} \end{align}\]

Here the |\iota_U[-]| is crucial so that we use the |\to| of |\mathsf{Sub}(X)| and not |\mathsf{Sub}(U)|.

\[\begin{align} P \to U \subseteq A & \iff P \to \forall_U(\iota_U^*(A)) \\ & \iff {!}_U^*(P) \subseteq \iota_U^*(A) \\ & \iff \iota_U[{!}_U^*(P)] \subseteq A \\ & \iff \{x \in X \mid x \in U \land P\} \subseteq A \end{align}\]

In this case, the |\iota_U[-]| is truly doing nothing because |\{x \in X \mid x \in U \land P\}| is the same as |\{x \in U \mid P\}|.

While we have |{!}_U^* \circ \exists_U \dashv {!}_U^* \circ \forall_U|, we see that the inclusion of |\iota_U^*| is what breaks the direct connection between |U \between A| and |U \subseteq A|.

Examples

As a first example, write |\mathsf{Int}A| for the interior of |A| and |\bar A| for the closure of |A| each with respect to some topology on a containing set |X|. One way to define |\mathsf{Int}A| is |x \in \mathsf{Int}A| if and only if there exists an open set containing |x| that’s a subset of |A|. Writing |\mathcal O(X)| for the set of open sets, we can express this definition in symbols: \[x \in \mathsf{Int}A \iff \exists U \in \mathcal O(X). x \in U \land U \subseteq A\] We have a “dual” notion: \[x \in \bar A \iff \forall U \in \mathcal O(X). x \in U \to U \between A\] That is, |x| is in the closure of |A| if and only if every open set containing |x| overlaps |A|.

As another example, here is a fairly unusual way of characterizing a compact subset |Q|. |Q| is compact if and only if |\{U \in \mathcal O(X) \mid Q \subseteq U\}| is open in |\mathcal O(X)| equipped with the Scott topology ³. As before, this suggests a “dual” notion characterized by |\{U \in \mathcal O(X) \mid O \between U\}| being an open subset. A set |O| satisfying this is called overt. This concept is never mentioned in traditional presentations of point-set topology because every subset is overt. However, if we don’t require that arbitrary unions of open sets are open (and only require finite unions to be open) as happens in synthetic topology or if we aren’t working in a classical context then overtness becomes a meaningful concept.

One benefit of the intersection-based definition of overlaps is that it is straightforward to generalize to many sets overlapping, namely |\bigcap_{i\in I} A_i \neq \varnothing|. This is also readily expressible using quantifiers as: |\exists x.\forall i \in I. x \in A_i|. As before, having an explicit “universe” set also clarifies this. So, |\exists x \in X.\forall i \in I. x \in A_i| with |\forall i \in I. A_i \subseteq X| would be better. The connection of |\between| to |\subseteq| suggests instead of this fully symmetric presentation, it may still be worthwhile to single out a set producing |\exists x \in U.\forall i \in I. x \in A_i| where |U \subseteq X|. This can be read as “there is a point in |U| that touches/meets/overlaps every |A_i|”. If desired we could notate this as |U \between \bigcap_{i \in I}A_i|. Negating and complementing the |A_i| leads to the dual notion |\forall x \in U.\exists i \in I.x \in A_i| which is equivalent to |U \subseteq \bigcup_{i \in I}A_i|. This dual notion could be read as “the |A_i| (jointly) cover |U|” which is another common and important concept in mathematics.

Conclusion

Ultimately, the concept of two (or more) sets overlapping comes up quite often. The usual circumlocution, |A \cap B \neq \varnothing|, is both notationally and conceptually clumsy. Treating overlapping as a first-class notion via notation and formulating definitions in terms of it can reveal some common and important patterns.

If one wanted to be super pedantic, I should technically write something like |\{\star \mid \exists x \in U. x \in A\}| where |1 = \{\star\}| because elements of |\mathsf{Sub}(1)| are subsets of |1|. Instead, we’ll conflate subsets of |1| and truth values.↩︎
If we think of subobjects as (equivalence classes of) monomorphisms as is typical in category theory, then because |\iota_U| is itself a monomorphism, the direct image, |\iota_U[-]|, is simply post-composition by |\iota_U|, i.e. |\iota_U \circ {-}|.↩︎
The Scott topology is the natural topology on the space of continuous functions |X \to \Sigma| where |\Sigma| is the Sierpinski space.↩︎

Complex-Step Differentiation

2020-08-08 22:28:55-07:00

Introduction

Complex-step differentiation is a simple and effective technique for numerically differentiating a(n analytic) function. Discussing it is a neat combination of complex analysis, numerical analysis, and ring theory. We’ll see that it is very closely connected to forward-mode automatic differentiation (FAD). For better or worse, while widely applicable, the scenarios where complex-step differentiation is the best solution are a bit rare. To apply complex-step differentiation, you need a version of your desired function that operates on complex numbers. If you have that, then you can apply complex-step differentiation immediately. Otherwise, you need to adapt the function to complex arguments. This can be done essentially automatically using the same techniques as automatic differentiation, but at that point you might as well use automatic differentiation. Adapting the code to complex numbers or AD takes about the same amount of effort, however, the AD version will be more efficient, more accurate, and easier to use.

Nevertheless, this serves as a simple example to illustrate several theoretical and practical ideas.

Numerical Differentiation

The problem we’re solving is given a function |f : \mathbb R \to \mathbb R| which is differentiable around a point |x_0|, we’d like to compute its derivative |f’| at |x_0|. In many cases, |f| is real analytic at the point |x_0| meaning |f| has a Taylor series which converges to |f| in some open interval containing |x_0|.

The most obvious way of numerically differentiating |f| is to approximate the limit in the definition of the derivative, \[f’(x) = \lim_{h\to 0} [f(x + h) - f(x)] / h\] by simply choosing a small value for |h| rather than taking the limit. When |f| is real analytic at |x|, we can analyze the quality of this approximation by expanding |f(x + h)| in a Taylor series at |x|. This produces \[[f(x + h) - f(x)]/h = f’(x) + O(h)\] A slight tweak produces a better result with the same number of evaluations of |f|. Specifically, the Taylor series of |f(x + h) - f(x - h)| at |x| is equal to the odd part the Taylor series of |f(x + h)| at |x|. This leads to the Central Differences formula:

$$f'(x) + O(h^2) = \frac{f(x + h) - f(x - h)}{2h}$$

The following interactively illustrates this using the function |f(x) = x^{9/2}||f(x) = \sin(x)||f(x) = e^x||f(x) = e^x/(\sin(x)^3 + \cos(x)^3)| evaluated at |x_0 =| . The correct answer to |17| digits is |f’(||) {}={}|. The slider ranges from |h=10^{-2}| to |h=10^{-20}|.

|h|:
|f’(||)|:
error:

If you play with the slider using the first example, you’ll see that the error decreases until around |10^{-5}| after which it starts increasing until |10^{-15}| where it is off by more than |1|. At |10^{-16}| the estimated derivative is |0| which is, of course, completely incorrect. Even at |10^{-5}| the error is on the order of |10^{-9}| which is much higher than the double precision floating point machine epsilon of approximately |10^{-16}|.

There are two issues here. First, we have the issue that if |x_0 \neq 0|, then |x_0 + h = x_0| for sufficiently small |h|. This happens when |x_0/h| has a magnitude of around |10^{16}| or more.

The second issue here is known as catastrophic cancellation. For simplicity, let’s say |f(x)=1|. (It’s actually about |6.2| for the first example.) Let’s further say for some small value of |h|, |f(x+h) = 1.00000000000020404346|. The value we care about is the |0.00000000000020404346|, but given limited precision, we might have |f(x + h) = 1.000000000000204|, meaning we only have three digits of precision for the value we care about. Indeed, as |h| becomes smaller we’ll lose more and more precision in our desired value until we lose all precision which happens when |f(x + h) = f(x)|. It is generally a bad idea numerically to calculate a small value by subtracting two larger values for this reason.

We have a dilemma. For the theory, we want as small a value of |h| as possible without being zero. In practice, we start losing precision as |h| gets smaller, and generally larger values of |h| are going to be less impacted by this.

Let’s set this aside for now and look at other ways of numerically computing the derivative in the hopes that we can avoid this problem.

Cauchy’s Residue Theorem

If we talk about functions |f : \mathbb C \to \mathbb C|, the analogue of real analyticity is holomorphicity or complex analyticity. A complex function is holomorphic if it satisfies the Cauchy-Riemann equations. (See the Appendix for more details about where the Cauchy-Riemann equations come from.) A complex function is complex analytic if it has a Taylor series which converges to the function. It can be proven that these two descriptions are equivalent, though this isn’t a trivial fact. We can also talk about functions that are holomorphic or complex analytic on an open subset of |\mathbb C| and at a point by considering an open subset around that point. The typical choice of open subset is some suitably small open disk in the complex plane about the point. (Other common domains are ellipses, infinite strips, and areas bounded by Hankel contours and variations such as sideways opening parabolas.)

A major fact about holomorphic functions is the Cauchy integral theorem. If |f| is a holomorphic function inside a (suitably nice) closed curve |\Gamma| in the complex plane, then |\oint_\Gamma f(z)\mathrm dz = 0|. Again, |\Gamma| will typically be chosen to be some circle. (Integrals like this in the complex plane are often called contour integrals and the curves we’re integrating along are called contours.)

Things get really interesting when we generalize to meromorphic functions which are complex functions that are holomorphic except at an isolated set of points. These take the form of poles which are points |z_0| such that |1/f(z_0) = 0|, i.e. poles are where a function goes to infinity as, e.g., |1/z| does at |0|. The generalization of Cauchy’s integral theorem is Cauchy’s Residue Theorem. This theorem is surprising and is one of the most useful theorems in all of mathematics both theoretically and practically.

We’ll only need a common special case of it. Let |f| be a holomorphic function, then |f(z)/(z - z_0)^n| is a meromorphic function with a single pole of order |n| at |z_0|. If |\Gamma| is a positively oriented, simple closed curve containing |z_0|, then $$f^{(n-1)}(z_0) = \frac{(n-1)!}{2 \pi i}\oint_{\Gamma} \frac{f(z)\mathrm dz}{(z - z_0)^n}$$ In this case, |f^{(n-1)}(z_0)/(n-1)!| is the residue of |f(z)/(z - z_0)^n| at |z_0|. More generally, if there are multiple poles in the area bounded by |\Gamma|, then we will sum up their residues.

This formula provides us a means of calculating the |(n-1)|-st Taylor coefficient of a complex analytic function at any point. For our particular purposes, we’ll only need the |n=2| case, \[f’(z_0) = \frac{1}{2 \pi i}\oint_{\Gamma} \frac{f(z)\mathrm dz}{(z - z_0)^2}\]

For the remainder of this section I want to give some examples of how Cauchy’s Residue Theorem is used both theoretically and practically. This whole article will itself be another practical example of Cauchy’s Residue Theorem. This is not exhaustive by any means.

To start illustrating some of the surprising properties of this theorem, we can take the |n=1| case which states that we can evaluate a holomorphic function at any point via |f(z_0) = \frac{1}{2 \pi i}\oint_{\Gamma} \frac{f(z)\mathrm dz}{z - z_0}| where |\Gamma| is any contour which bounds an area containing |z_0|. This leads to an interesting discreteness. Not only can we evaluate a (holomorphic) function (or any of its derivatives) at a point via the values of the function on a contour, the only significant constraint on that contour is that it bound an area containing the desired point. In other words, no matter how we deform the contour the integral is constant except when we deform the contour so as not to bound an area containing the point being evaluated, at which point the integral’s value is |0|¹. It may seem odd to use an integral to evaluate a function at a point, but it can be useful when there are numerical issues with evaluating the function near the desired point². In fact, these results show that if we know the values of a holomorphic function on the boundary of a given open subset of the complex plane, then we know the value of the function everywhere. In this sense, holomorphic functions (and analytic functions in general) are extremely rigid.

This leads to the notion of analytic continuation where we try to compute an analytic function beyond its overt domain of definition. This is the basis of most “sums of divergent series”. For example, there is the first-year calculus fact that the sum of the infinite series |\sum_{n=0}^\infty x^n| is |1/(1-x)| converging on the interval |x \in (-1,1)|. In fact, the proof of convergence only needs |\|x\| < 1| so we can readily generalize to complex |z| with |\|z\| < 1|, i.e. |z| contained in the open unit disk. However, |1/(1-z)| is a meromorphic function that is holomorphic everywhere except for |z=1|, therefore there is a unique analytic function defined everywhere except |z=1| that agrees with the infinite sum on the unit disk, namely |1/(1-z)| itself. Choosing |z=-1| leads to the common example of “summing a divergent series” with “|\sum_{n=0}^\infty (-1)^n = 1/2|” which really means “the value at |-1| of the unique complex analytic function which agrees with this infinite series when it converges”.

Sticking with just evaluation, applying the Cauchy Residue theorem to quadrature, i.e. numerical integration, leads to an interesting connection to a rational approximation problem. Say we want to compute |\int_{-1}^1 f(x) \mathrm dx|, we can use the Cauchy integral to evaluate |f(x)| leading to $$\int_{-1}^1 f(x) \mathrm dx = \int_{-1}^1 \frac{1}{2\pi i}\oint \frac{f(z)\mathrm dz}{z - x}\mathrm dx = \frac{1}{2\pi i}\oint f(z)\int_{-1}^1 \frac{\mathrm dx}{z - x}\mathrm dz = \frac{1}{2\pi i}\oint f(z)\log\left(\frac{z+1}{z-1}\right)\mathrm dz$$ A quadrature formula looks like |\int_{-1}^1 f(x) \mathrm dx \approx \sum_{k=1}^N w_k f(x_k)|. The sum can be written as a Cauchy integral of |\oint f(z)\sum_{k=1}^N \frac{1}{2\pi i}\frac{w_k\mathrm dz}{z - x_k}|. We thus have $$\left|\frac{1}{2\pi i}\oint f(z)\left[\log\left(\frac{z+1}{z-1}\right) - \sum_{k=1}^N \frac{w_k}{z - x_k}\right]\mathrm dz\right|$$ as the error of the approximation. The sum is a rational function (in partial fraction form) and thus the error is minimized by points (|x_k|) and weights (|w_k|) that lead to better rational approximations of |\log((z+1)/(z-1))|³.

The ability to calculate coefficients of the Taylor series of a holomorphic function is, by itself, already a valuable tool in both theory and practice. In particular, the coefficients of a generating function or a Z-transform can be computed with Cauchy integrals. This has applications in probability theory, statistics, finance, combinatorics, recurrences, differential equations, and signal processing. Indeed, when |z_0 = 0| and |\Gamma| is the unit circle, then the Cauchy integral is a component of the Fourier series of |f|. Approximating these integrals with the Trapezoid Rule (which we’ll discuss in a bit) produces the Discrete Fourier Transform.

Let |p| be a polynomial and, for simplicity, assume all its zeroes are of multiplicity one. Then |1/p(z)| is a meromorphic function that’s holomorphic everywhere except for the roots of |p|. The Cauchy integral |\frac{1}{2\pi i}\oint_{\Gamma} \frac{p’(z)\mathrm dz}{p(z)}| counts the number of roots of |p| contained in the area bounded by |\Gamma|. If we know there is only one root of |p| within the area bounded by |\Gamma|, then we can compute that root with |\frac{1}{2\pi i}\oint_{\Gamma} \frac{z p’(z)\mathrm dz}{p(z)}|. A better approach is to use the formulas |\left(\oint \frac{z\mathrm dz}{p(z)}\right)/\left(\oint \frac{\mathrm dz}{p(z)}\right)|. Similar ideas can be used to adapt this to counting and finding multiple roots. See Numerical Algorithms based on Analytic Function Values at Roots of Unity by Austin, Kravanja, and Trefethen (2014) which is a good survey in general.

Another very common use of Cauchy’s Residue Theorem is to sum (convergent) infinite series. |\tan(\pi z)/\pi| has a zero at |z = k| for each integer |k| and a non-zero derivative at those points. In fact, the derivative is |1|. Alternatively, we could use |\sin(\pi z)/\pi| which has a zero at |z = k| for each integer |k| but has derivative |(-1)^k| at those points. Therefore, |\pi\cot(\pi z) = \pi/\tan(\pi z)| has a (first-order) pole at |z = k| for each integer |k| with residue |1|. In particular, if |f| is a holomorphic function (at least near the real axis), then the value of the Cauchy integral of |f(z)\pi\cot(\pi z)| along a Hankel contour will be |2\pi i \sum_{k=0}^\infty f(k)|. Along an infinite strip around the real axis we’d get |2 \pi i \sum_{k=-\infty}^\infty f(k)|. As an example, we can consider the famous sum, |\sum_{k=1}^\infty 1/k^2|. It can be shown that if |f| is a meromorphic function whose poles are not located at integers and |\vert zf(z)\vert| is bounded for sufficiently large |\vert z\vert|, then |\oint f(z)\pi \cot(\pi z)\mathrm dz = 0|. We thus have that \[\sum_{k=-\infty}^{\infty} f(k) = -\sum_j \mathrm{Res}(f(z)\pi\cot(\pi z); z_j)\] where |z_j| are the poles of |f|. In particular, |f(z) = \frac{1}{z^2 + a^2}| has (first-order) poles at |\pm ai|. This gives us simply $$\sum_{k=-\infty}^{\infty} \frac{1}{k^2 + a^2} = -\pi\frac{\cot(\pi a i)-\cot(-\pi a i)}{2ai} = \frac{\pi}{a}\coth(\pi a)$$ where I’ve used |\coth(x) = i\cot(xi)| and the fact that |\coth| is an odd function. Exploiting the symmetry of the sum gives us $$\sum_{k=1}^{\infty} \frac{1}{k^2 + a^2} = \frac{\pi}{2a}\coth(\pi a) - \frac{1}{2a^2}$$ By expanding |\coth| in a Laurent series, we see that the limit of the right-hand side as |a| approaches |0| is |\frac{\pi^2}{6}|. While contour integration is quite effective for coming up with analytic solutions to infinite sums, numerically integrating the contour integrals is also highly effective as illustrated in Talbot quadratures and rational approximations by Trefethen, Weideman, and Schmelzer (2006), for example.

Computing the Integrals

We’ve seen in the previous section that |f’(z_0) = \frac{1}{2\pi i}\oint_{\Gamma} \frac{f(z)\mathrm dz}{(z-z_0)^2}|. This doesn’t much help us if we don’t have a way to compute the integrals. From this point forward, fix |\Gamma| as a circle of radius |h| centered on |z_0|.

Before that, let’s consider numerical integration in general. Say we want to integrate the real function |f| from |0| to |b|, i.e. we want to calculate |\int_0^b f(x)\mathrm dx|. The most obvious way to go about it is to approximate the Riemann sums that define the (Riemann) integral. This would produce a formula like |\int_0^b f(x)\mathrm dx \approx \frac{b}{N}\sum_{k=0}^{N-1} f(bk/N)| corresponding to summing the areas of rectangles whose left points are the values of |f|. As before with central differences, relatively minor tweaks will give better approximations. In particular, we get the two roughly equivalent approximations of the Midpoint Rule \[\int_0^b f(x)\mathrm dx \approx \frac{b}{N}\sum_{k=0}^{N-1} f\left(\frac{b(k+(1/2))}{N}\right)\] where we take the midpoint rather than the left or right point, and the Trapezoid Rule \[\int_0^b f(x)\mathrm dx \approx \frac{b}{2N}\sum_{k=0}^{N-1}[f(b(k+1)/N) + f(bk/N)]\] where we average the left and right Riemann sums. While both of these perform substantially better than the left/right Riemann sums, they are still rather basic quadrature rules; the error decreases as |O(1/N^2)|.

Something special happens when |f| is a periodic function. First, the Trapezoid rule reduces to |\frac{b}{N}\sum_{k=0}^{N-1} f(bk/N)|. More importantly, the Midpoint rule and the Trapezoid rule both start converging geometrically rather than quadratically. Furthermore, for the particular case we’re interested in, namely integrating analytic functions along a circle in the complex plane, these quadrature rules are optimal. Let |\zeta| be the |2N|-th root of unity. The Trapezoid rule corresponds to sum the values of |f| at the even powers of |\zeta| scaled by the radius |h| and translated by |z_0|, and the Midpoint rule corresponds to the sum of the odd powers.

We now have two parameters for approximating a Cauchy integral via the Trapezoid or Midpoint rules: the radius |h| and the number of points |N|.

Complex-Step Differentiation corresponds to approximating the Cauchy integral for the derivative using the extreme case of the Midpoint rule with |N=2| and very small radii (i.e. values of |h|). Meanwhile, Central Differences corresponds to the extreme case of using the Trapezoid rule with |N=2| and very small radii. To spell this out a bit more, we perform the substitution |z - z_0 = he^{\theta i}| which leads to |\mathrm dz = hie^{\theta i}\mathrm d\theta| and $$\frac{1}{2\pi i}\oint_{|z - z_0| = h} \frac{f(z)\mathrm dz}{(z - z_0)^2} = \frac{1}{2 \pi h}\int_0^{2\pi} f(z_0 + he^{\theta i})e^{-\theta i}\mathrm d\theta$$

Applying the Trapezoid rule to the right hand side of this corresponds to picking |\theta = 0, \pi|, while applying the Midpoint rule corresponds to picking |\theta = \pm \pi/2|. |e^{\theta i} = \pm 1| for |\theta = 0, \pi|, and |e^{\theta i} = \pm i| for |\theta = \pm \pi/2|. For the Trapezoid rule, this leads to \[f’(z_0) \approx \frac{1}{2h}[f(z_0 + h) - f(z_0 - h)]\] which is Central Differences. For the Midpoint rule, this leads to \[f’(z_0) \approx \frac{1}{2hi}[f(z_0 + hi) - f(z_0 - hi)]\] This is Complex-Step Differentiation when |z_0| is real.

Complex-Step Differentiation

As just calculated, Complex-Step Differentiation computes the derivative at the real number |x_0| via the formula: $$f'(x_0) \approx \frac{1}{2hi} [f(x_0 + hi) - f(x_0 - hi)]$$ Another perspective on this formula is that it is just the Central Differences formula along the imaginary axis instead of the real axis.

When |f| is complex analytic and real-valued on real arguments, then we have |f(\overline z) = \overline{f(z)}| where |\overline z| is the complex conjugate of |z|, i.e. it maps |a + bi| to |a - bi| or |re^{\theta i}| to |re^{-\theta i}|. This leads to |f(x_0 + hi) - f(\overline{x_0 + hi}) = f(x_0 + hi) - \overline{f(x_0 + hi)} = 2i\operatorname{Im}(f(x_0 + hi))|. This lets us simplify Complex-Step Differentiation to |f’(x_0) \approx \operatorname{Im}(f(x_0 + h))/h|.

Here is the earlier interactive example but now using Complex-Step Differentiation. As |h| decreases in magnitude, the error steadily decreases until there is no error at all.

|h|:
|f’(||)|:
error:

This formula using |\operatorname{Im}| avoids catastrophic cancellation simply by not doing a subtraction. However, it turns out for real |x_0| (which is necessary to derive the simplified formula), there isn’t a problem either way. Using the first form of the Complex-Step Differentiation formula is also numerically stable. The key here is that the imaginary part of |x_0| and |f(x_0)| are both |0| and so we don’t get catastrophic cancellation for the same reason we wouldn’t get it with Central Differences if |f(x_0) = 0|. This suggests that if we wanted to evaluate |f’| at some non-zero point on the imaginary axis, Complex-Step Differentiation would perform poorly while Central Differences would perform well. Further, if we wanted to evaluate |f’| at some point not on either the real or imaginary axes, neither approach would perform well. In this case, choosing different values for |N| and the radius would be necessary⁴.

A third perspective on Complex-Step Differentiation comes when we think about which value of |h| should we use. The smaller |f’(x_0)| is, the smaller we’d want |h| to be. Unlike Central Differences, there is little stopping us from having |h| be very small and values like |h=10^{-100}| are typical. In fact, around |h=10^{-155}| in double precision floating point arithmetic, |h| gets the theoretically useful property that |h^2 = 0| due to underflow. In this case, |x_0 + hi| behaves like |x_0 + \varepsilon| where |\varepsilon^2 = 0|. This is the defining property of the ring of dual numbers. Dual numbers are exactly what are used in forward-mode automatic differentiation.

Forward-Mode Automatic Differentiation

In this example there is no interactivity as we are not estimating the derivative in the AD case but instead calculating it in parallel. There is no |h| parameter.

|f’(||)|:
error:

As the end of the previous section indicated, Complex-Step Differentiation approximates this (often exactly) by using |hi| as |\varepsilon|. Nevertheless, this is not ideal. Often the complex versions of a function will be more costly than their dual number counterparts. For example, |(a + bi)(c + di) = (ac - bd) + (ad + bc)i| involves four real multiplications and two additions. |(a + b\varepsilon)(c + d\varepsilon) = ac + (ad + bc)\varepsilon| involves three real multiplications and one addition on the other hand.

References

Using Complex Variables to Estimate Derivatives of Real Functions by Squire and Trapp (1998) is the first(?) published paper specifically about the idea of complex-step differentiation. It’s a three page paper and the authors are not claiming any originality but just demonstrating the effectiveness of ideas from the ’60s that the authors found to be underappreciated.

The Complex-Step Derivative Approximation by Martins, Sturdza, and Alonso (2003) does a much deeper dive into the theory behind complex-step differentiation and its connections to automatic differentiation.

You may have noticed the name “Trefethen” in many of the papers cited. Nick Trefethen and his collaborators have been doing amazing work for the past couple of decades, most notably in the Chebfun project. Looking at Trefethen’s book Approximation Theory and Approximation Practice (and lectures) recently reintroduced me to Trefethen’s work. This particular article was prompted by a footnote in the paper The Exponentially Convergent Trapezoidal Rule which I highly recommend. In fact, I highly recommend Chebfun as well as nearly all of Trefethen’s work. It is routinely compelling, interesting, and well presented.

Appendix

Using the language of Geometric Calculus, we can write a very general form of the Fundamental Theorem of Calculus. Namely, \[\int_{\mathcal M} \mathrm d^m\mathbf x \cdot \nabla f(\mathbf x) = \oint_{\partial \mathcal M}\mathrm d^{m-1}\mathbf x f(\mathbf x)\] where |\mathcal M| is an |m|-dimensional manifold. Here |f| is a multivector-valued vector function. If |m=2| and |\nabla f = 0|, then this would produce a formula very similar to the Cauchy integral formula.

Writing |f(x + yi) = u(x, y) + v(x, y)i|, the Cauchy-Riemann equations are |\frac{\partial u}{\partial x} = \frac{\partial v}{\partial y}| and |\frac{\partial u}{\partial y} = -\frac{\partial v}{\partial x}|. However, |\nabla f = 0| leads to the slightly different equations |\frac{\partial u}{\partial x} = -\frac{\partial v}{\partial y}| and |\frac{\partial u}{\partial y} = \frac{\partial v}{\partial x}|.

We can generalize the vector derivative, |\nabla|, to a multivector derivative |\nabla_X| where |X| is a multivector variable by using the generic formula for the directional derivative in a linear space and then defining |\nabla_X| to be a linear combination of directional derivatives. Given any |\mathbb R|-linear space |V| and an element |v \in V|, we can define the directional derivative of |f : V \to V| in the direction |v| via |\frac{\partial f}{\partial v}(x) \equiv \frac{\mathrm d f(x + \tau v)}{\mathrm d\tau}|. In our case, we have the basis vectors |\{1, \mathbf e_1, \mathbf e_2, I\}| though we only care about the even subalgebra corresponding to the basis vectors |\{1, I\}|. Define |\partial_1 f(x) \equiv \frac{\mathrm d f(x + \tau)}{\mathrm d \tau}| and |\partial_I f(x) \equiv \frac{\mathrm d f(x + \tau I)}{\mathrm d \tau}| assuming |f| is a spinor-valued function⁵. We can then define |\nabla_{\mathbf z} \equiv \partial_1 + I\partial_I|. We now have |\nabla_{\mathbf z} f = 0| is the equivalent to the Cauchy-Riemann equations where |f| is now a spinor-valued spinor function, i.e. a function of |\mathbf z|.

See Multivector Functions by Hestenes for more about this.

In terms of the general theory of partial differential equations, we are saying that |z^{-1}| is a Green’s function for |\nabla|. We can then understand everything that is happening here in terms of general results. In particular, it is the two-dimensional case of the results described in Multivector Functions by Hestenes.↩︎
See Numerical Algorithms based on Analytic Function Values at Roots of Unity by Austin, Kravanja, and Trefethen (2014) for an example. Also, with some minor tweaks, we can have that “point” be a matrix and these integrals can be used to calculate functions of matrices, e.g. the square root, exponent, inverse, and log of a matrix. See Computing |A^\alpha|, |\log(A)|, and Related Matrix Functions by Contour Integrals by Hale, Higham, and Trefethen (2009) for details.↩︎
See Is Gauss Quadrature Better than Clenshaw-Curtis? by Trefethen (2008) for more details.↩︎
While focused on issues that mostly come up with very high-order derivatives, e.g. |100|-th derivatives and higher, Accuracy and Stability of Computing High-Order Derivatives of Analytic Functions by Cauchy Integrals by Bornemann (2009) nevertheless has a good discussions of the concerns here.↩︎
If we allowed arbitrary multivector-valued functions, then we’d need to add a projection producing the tangential derivative.↩︎

Enriched Indexed Categories, Syntactically

2020-07-05 22:03:27-07:00

Introduction

This is part 3 in a series. See the previous part about internal languages for indexed monoidal categories upon which this part heavily depends.

In category theory, the hom-sets between two objects can often be equipped with some extra structure which is respected by identities and composition. For example, the set of group homomorphisms between two abelian groups is itself an abelian group by defining the operations pointwise. Similarly, the set of monotonic functions between two partially ordered sets (posets) is a poset again by defining the ordering pointwise. Linear functions between vector spaces form a vector space. The set of functors between small categories is a small category. Of course, the structure on the hom-sets can be different than the objects. Trivially, with the earlier examples a vector space is an abelian group, so we could say that linear functions form an abelian group instead of a vector space. Likewise groups are monoids. Less trivially, the set of relations between two sets is a partially ordered set via inclusion. There are many cases where instead of hom-sets we have hom-objects that aren’t naturally thought of as sets. For example, we can have hom-objects be non-negative (extended) real numbers from which the category laws become the laws of a generalized metric space. We can identify posets with categories who hom-objects are elements of a two element set or, even better, a two element poset with one element less than or equal to the other.

This general process is called enriching a category in some other category which is almost always called |\V| in the generic case. We then talk about having |\V|-categories and |\V|-functors, etc. In a specific case, it will be something like |\mathbf{Ab}|-categories for an |\mathbf{Ab}|-enriched category, where |\mathbf{Ab}| is the category of abelian groups. Unsurprisingly, not just any category will do for |\V|. However, it turns out very little structure is needed to define a notion of |\V|-category, |\V|-functor, |\V|-natural transformation, and |\V|-profunctor. The usual “baseline” is that |\V| is a monoidal category. As mentioned in the previous post, paraphrasing Bénabou, notions of “families of objects/arrows” are ubiquitous and fundamental in category theory. It is useful for our purposes to make this structure explicit. For very little cost, this will also provide a vastly more general notion that will readily capture enriched categories, indexed categories, and categories that are simultaneously indexed and enriched, of which internal categories are an example. The tool for this is a (Grothendieck) fibration aka a fibered category or the mostly equivalent concept of an indexed category.¹

To that end, instead of just a monoidal category, we’ll be using indexed monoidal categories. Typically, to get an experience as much like ordinary category theory as possible, additional structure is assumed on |\V|. In particular, it is assumed to be an (indexed) cosmos which means that it is an indexed symmetric monoidally closed category with indexed coproducts preserved by |\otimes| and indexed products and fiberwise finite limits and colimits (preserved by the indexed structure). This is quite a lot more structure which I’ll introduce in later parts. In this part, I’ll make no assumptions beyond having an indexed monoidal category.

Basic Category Theory in Indexed Monoidal Categories

The purpose of the machinery of the previous posts is to make this section seem boring and pedestrian. Other than being a little more explicit and formal, for most of the following concepts it will look like we’re restating the usual definitions of categories, functors, and natural transformations. The main exception is profunctors which will be presented in a quite different manner, though still in a manner that is easy to connect to the usual presentation. (We will see how to recover the usual presentation in later parts.)

While I’ll start by being rather explicit about indexes and such, I will start to suppress that detail over time as most of it is inferrable. One big exception is that right from the start I’ll omit the explicit dependence of primitive terms on indexes. For example, while I’ll write |\mathsf F(\mathsf{id}) = \mathsf{id}| for the first functor law, what the syntax of the previous posts says I should be writing is |\mathsf F(s, s; \mathsf{id}(s)) = \mathsf{id}(s)|.

To start, I want to introduce two different notions of |\V|-category, small |\V|-categories and large |\V|-categories, and talk about what this distinction actually means. I will proceed afterwards with the “large” notions, e.g. |\V|-functors between large |\V|-categories, as the small case will be an easy special case.

Small |\V|-categories

The theory of a small |\V|-category consists of:

an index type |\O|,
a linear type |s, t : \O \vdash \A(t, s)|,
a linear term |s : \O; \vdash \mathsf{id} : \A(s, s)|, and
a linear term |s, u, t : \O; g : \A(t, u), f : \A(u, s) \vdash g \circ f : \A(t, s)|

satisfying $$\begin{gather} s, t : \O; f : \A(t, s) \vdash \mathsf{id} \circ f = f = f \circ \mathsf{id} : \A(t, s)\qquad \text{and} \\ \\ s, u, v, t : \O; h : \A(t, v), g : \A(v, u), f : \A(u, s) \vdash h \circ (g \circ f) = (h \circ g) \circ f : \A(t, s) \end{gather}$$

In the notation of the previous posts, I’m saying |\O : \mathsf{IxType}|, |\A : (\O, \O) \to \mathsf{Type}|, |\mathsf{id} : (s : \O;) \to \A(s, s)|, and |\circ : (s, u, t : \O; \A(t, u), \A(u, s)) \to \A(t, s)| are primitives added to the signature of the theory. I’ll continue to use the earlier, more pointwise presentation above to describe the signature.

A small |\V|-category for an |\mathbf S|-indexed monoidal category |\V| is then an interpretation of this theory. That is, an object |O| of |\mathbf S| as the interpretation of |\O|, and an object |A| of |\V^{O\times O}| as the interpretation of |\A|. The interpretation of |\mathsf{id}| is an arrow |I_O \to \Delta_O^* A| of |\V^O|, where |\Delta_O : O \to O\times O| is the diagonal arrow |\langle id, id\rangle| in |\mathbf S|. The interpretation of |\circ| is an arrow |\pi_{23}^* A \otimes \pi_{12}^* A \to \pi_{13}^* A| of |\V^{O\times O \times O}| where |\pi_{ij} : X_1 \times X_2 \times X_3 \to X_i \times X_j| are the appropriate projections.

Since we can prove in the internal language that the choice of |()| for |\O|, |s, t : () \vdash I| for |\A|, |s, t : (); x : I \vdash x : I| for |\mathsf{id}|, and |s, u, t : (); f : I, g: I \vdash \mathsf{match}\ f\ \mathsf{as}\ *\ \mathsf{in}\ g : I| for |\circ| satisfies the laws of the theory of a small |\V|-category, we know we have a |\V|-category which I’ll call |\mathbb I| for any |\V|².

For |\V = \mathcal Fam(\mathbf V)|, |O| is a set of objects. |A| is an |(O\times O)|-indexed family of objects of |\mathbf V| which we can write |\{A(t,s)\}_{s,t\in O}|. The interpretation of |\mathsf{id}| is an |O|-indexed family of arrows of |\mathbf V|, |\{ id_s : I_s \to A(s, s) \}_{s\in O}|. Finally, the interpretation of |\circ| is a family of arrows of |\mathbf V|, |\{ \circ_{s,u,t} : A(t, u)\otimes A(u, s) \to A(t, s) \}_{s,u,t \in O}|. This is exactly the data of a (small) |\mathbf V|-enriched category. One example is when |\mathbf V = \mathbf{Cat}| which produces (strict) |2|-categories.

For |\V = \mathcal Self(\mathbf S)|, |O| is an object of |\mathbf S|. |A| is an arrow of |\mathbf S| into |O\times O|, i.e. an object of |\mathbf S/O\times O|. I’ll write the object part of this as |A| as well, i.e. |A : A \to O\times O|. The idea is that the two projections are the target and source of the arrow. The interpretation of |\mathsf{id}| is an arrow |ids : O \to A| such that |A \circ ids = \Delta_O|, i.e. the arrow produced by |ids| should have the same target and source. Finally, the interpretation of |\circ| is an arrow |c| from the pullback of |\pi_2 \circ A| and |\pi_1 \circ A| to |A|. The source of this arrow is the object of composable pairs of arrows. We further require |c| to produce an arrow with the appropriate target and source of the composable pair. This is exactly the data for a category internal to |\mathbf S|. An interesting case for contrast with the previous paragraph is that a category internal to |\mathbf{Cat}| is a double category.

For |\V = \mathcal Const(\mathbf V)|, the above data is exactly the data of a monoid object in |\mathbf V|. This is a formal illustration that a (|\mathbf V|-enriched) category is just an “indexed monoid”. Indeed, |\mathcal Const(\mathbf V)|-functors will be monoid homomorphisms and |\mathcal Const(\mathbf V)|-profunctors will be double-sided monoid actions. In particular, when |\mathbf V = \mathbf{Ab}|, we get rings, ring homomorphisms, and bimodules of rings. The intuitions here are the guiding ones for the construction we’re realizing.

Large |\V|-categories

A large |\V|-category is a model of a theory of the following form. There is

a collection of index types |\O_x|,
for each pair of index types |\O_x| and |\O_y|, a linear type |s : \O_x, t : \O_y \vdash \A_{yx}(t, s)|,
for each index type |\O_x|, a linear term |s : \O_x; \vdash \mathsf{id}_x : \A_{xx}(s, s)|, and
for each triple of index types, |\O_x|, |\O_y|, and |\O_z|, a linear term |s: \O_x, u : \O_y, t : \O_z; g : \A_{zy}(t, u), f : \A_{yx}(u, s) \vdash g \circ_{xyz} f : \A_{zx}(t, s)|

satisfying the same laws as small |\V|-categories, just with some extra subscripts. Clearly, a small |\V|-category is just a large |\V|-category where the collection of index types consists of just a single index type.

Small versus Large

The typical way of describing the difference between small and large (|\V|-)categories would be to say something like: “By having a collection of index types in a large |\V|-category, we can have a proper class of them. In a small |\V|-category, the index type of objects is interpreted as an object in a category, and a proper class can’t be an object of a category³.” However, for us, there’s a more directly relevant distinction. Namely, while we had a single theory of small |\V|-categories, there is no single theory of large |\V|-categories. Different large |\V|-categories correspond to models of (potentially) different theories. In other words, the notion of a small |\V|-category is able to be captured by our notion of theory but not the concept of a large |\V|-category. This extends to |\V|-functors, |\V|-natural transformations, and |\V|-profunctors. In the small case, we can define a single theory which captures each of these concepts, but that isn’t possible in the large case. In general, notions of “large” and “small” are about what we can internalize within the relevant object language, usually a set theory. Arguably, the only reason we speak of “size” and of proper classes being “large” is that the Axiom of Specification outright states that any subclass of a set is a set, so proper classes in ZFC can’t be subsets of any set. As I’ve mentioned elsewhere, you can definitely have set theories with proper classes that are contained in even finite sets, so the issue isn’t one of “bigness”.

The above discussion also explains the hand-wavy word “collection”. The collection is a collection in the meta-language in which we’re discussing/formalizing the notion of theory. When working within the theory of a particular large |\V|-category, all the various types and terms are just available ab initio and are independent. There is no notion of “collection of types” within the theory and nothing indicating that some types are part of a “collection” with others.

Another perspective on this distinction between large and small |\V|-categories is that small |\V|-categories have a family of arrows, identities, and compositions with respect to the notion of “family” represented by our internal language. If we hadn’t wanted to bother with formulating the internal language of an indexed monoidal category, we could have still defined the notion of |\V|-category with respect to the internal language of a (non-indexed) monoidal category. It’s just that all such |\V|-categories (except for monoid objects) would have to be large |\V|-categories. That is, the indexing and notion of “family” would be at a meta-level. Since most of the |\V|-categories of interest will be large (though, generally a special case called a |\V|-fibration which reins in the size a bit), it may seem that there was no real benefit to the indexing stuff. Where it comes in, or rather where small |\V|-categories come in, is that our notion of (co)complete means “has all (co)limits of small diagrams” and small diagrams are |\V|-functors from small |\V|-categories.⁴ There are several other places, e.g. the notion of presheaf, where we implicitly depend on what we mean by “small |\V|-category”. So while we won’t usually be focused on small |\V|-categories, which |\V|-categories are small impacts the structure of the whole theory.

|\V|-functors

The formulation of |\V|-functors is straightforward. As mentioned before, I’ll only present the “large” version.

Formally, we can’t formulate a theory of just a |\V|-functor, but rather we need to formulate a theory of “a pair of |\V|-categories and a |\V|-functor between them”.

for each index type |\O_x|, an index type |\O’_{F_x}| and an index term, |s : \O_x \vdash \mathsf F_x(s) : \O’_{F_x}|, and
for each pair of index types, |\O_x| and |\O_y|, a linear term |s : \O_x, t : \O_y; f : \A_{yx}(t, s) \vdash \mathsf F_{yx}(f) : \A’_{F_yF_x}(\mathsf F_y(t), \mathsf F_x(s))|

satisfying $$\begin{gather} s : \O_x; \vdash \mathsf F_{xx}(\mathsf{id}_x) = \mathsf{id}'_{F_x}: \A'_{F_xF_x}(F_x(s), F_x(s))\qquad\text{and} \\ s : \O_x, u : \O_y, t : \O_z; g : \A_{zy}(t, u), f : \A_{yx}(u, s) \vdash \mathsf F_{zx}(g \circ_{xyz} f) = F_{zy}(g) \circ'_{F_xF_yF_z} F_{yx}(f) : \A'_{F_zF_x}(F_z(t), F_x(s)) \end{gather}$$

The assignment of |\O’_{F_x}| for |\O_x| is, again, purely metatheoretical. From within the theory, all we know is that we happen to have some index types named |\O_x| and |\O’_{F_x}| and some data relating them. The fact that there is some kind of mapping of one to the other is not part of the data.

Next, I’ll define |\V|-natural transformations . As before, what we’re really doing is defining |\V|-natural transformations as a model of a theory of “a pair of (large) |\V|-categories with a pair of |\V|-functors between them and a |\V|-natural transformation between those”. As before, I’ll use primes to indicate the types and terms of the second of each pair of subtheories. Unlike before, I’ll only mention what is added which is:

for each index type |\O_x|, a linear term |s : \O_x; \vdash \tau_x : \A’_{F’_xF_x}(\mathsf F’_x(s), \mathsf F_x(s))|

satisfying $$\begin{gather} s : \O_x, t : \O_y; f : \A_{yx}(t, s) \vdash \mathsf F'_{yx}(f) \circ'_{F_xF'_xF'_y} \tau_x = \tau_y \circ'_{F_xF_yF'_y} \mathsf F_{yx}(f) : \A'_{F'_yF_x}(\mathsf F'_y(t), \mathsf F_x(s)) \end{gather}$$

In practice, I’ll suppress the subscripts on all but index types as the rest are inferrable. This makes the above equation the much more readable $$\begin{gather} s : \O_x, t : \O_y; f : \A(t, s) \vdash \mathsf F'(f) \circ' \tau = \tau \circ' \mathsf F(f) : \A'(\mathsf F'(t), \mathsf F(s)) \end{gather}$$

|\V|-profunctors

Here’s where we need to depart from the usual story. In the usual story, a |\mathbf V|-enriched profunctor |P : \mathcal C \proarrow \mathcal D| is a |\mathbf V|-enriched functor |P : \mathcal C\otimes\mathcal D^{op}\to\mathbf V| (or, often, the opposite convention is used |P : \mathcal C^{op}\otimes\mathcal D \to \mathbf V|). There are many problems with this definition in our context.

Without symmetry, we have no definition of opposite category.
Without symmetry, the tensor product of |\mathbf V|-enriched categories doesn’t make sense.
|\mathbf V| is not itself a |\mathbf V|-enriched category, so it doesn’t make sense to talk about |\mathbf V|-enriched functors into it.
Even if it was, we’d need some way of converting between arrows of |\mathbf V| as a category and arrows of |\mathbf V| as a |\mathbf V|-enriched category.
The equation |P(g \circ f, h \circ k) = P(g, k) \circ P(f, h)| requires symmetry. (This is arguably 2 again.)

All of these problems are solved when |\mathbf V| is a symmetric monoidally closed category.

Alternatively, we can reformulate the notion of a |\V|-profunctor so that it works in our context and is equivalent to the usual one when it makes sense.⁵ To this end, at a low level a |\mathbf V|-enriched profunctor is a family of arrows $$\begin{gather}P : \mathcal C(t, s)\otimes\mathcal D(s', t') \to [P(s, s'), P(t, t')]\end{gather}$$ which satisfies $$\begin{gather}P(g \circ f, h \circ k)(p) = P(g, k)(P(f, h)(p))\end{gather}$$ in the internal language of a symmetric monoidally closed category among other laws. We can uncurry |P| to eliminate the need for closure, getting $$\begin{gather}P : \mathcal C(t, s)\otimes \mathcal D(s', t')\otimes P(s, s') \to P(t, t')\end{gather}$$ satisfying $$\begin{gather}P(g \circ f, h \circ k, p) = P(g, k, P(f, h, p))\end{gather}$$ We see that we’re always going to need to permute |f| and |h| past |k| unless we move the third argument to the second producing the nice $$\begin{gather}P : \mathcal C(t, s)\otimes P(s, s') \otimes \mathcal D(s', t') \to P(t, t')\end{gather}$$ and the law $$\begin{gather}P(g \circ f, p, h \circ k) = P(g, P(f, p, h), k)\end{gather}$$ which no longer requires symmetry. This is also where the order of the arguments of |\circ| drives the order of the arguments of |\V|-profunctors.

for each pair of index types |\O_x| and |\O’_{x’}|, a linear type |s : \O_x, t : \O’_{x’} \vdash \mathsf P_{x’x}(t, s)|, and
for each quadruple of index types |\O_x|, |\O_y|, |\O’_{x’}|, and |\O’_{y’}|, a linear term |s : \O_x, s’ : \O’_{x’}, t : \O_y, t’ : \O’_{y’}; f : \A_{yx}(t, s), p : \mathsf P_{xx’}(s, s’), h : \A’_{x’y’}(s’, t’) \vdash \mathsf P_{yxx’y’}(f, p, h) : \mathsf P_{yy’}(t, t’)|

satisfying $$\begin{align} & s : \O_x, s' : \O'_{x'}; p : \mathsf P(s, s') \vdash \mathsf P(\mathsf{id}, p, \mathsf{id}') = p : \mathsf P(s, s') \\ \\ & s : \O_x, s' : \O'_{x'}, u : \O_y, u' : \O'_{y'}, t : \O_z, t' : \O'_{z'}; \\ & g : \A(t, u), f : \A(u, s), p : \mathsf P(s, s'), h : \A'(s', u'), k : \A'(u', t') \\ \vdash\ & \mathsf P(g \circ f, p, h \circ' k) = \mathsf P(g, \mathsf P(f, p, h), k) : \mathsf P(t, t') \end{align}$$

This can also be equivalently presented as a pair of a left and a right action satisfying bimodule laws. We’ll make the following definitions |\mathsf P_l(f, p) = \mathsf P(f, p, \mathsf {id})| and |\mathsf P_r(p, h) = \mathsf P(\mathsf{id}, p ,h)|.

A |\V|-presheaf on |\mathcal C| is a |\V|-profunctor |P : \mathbb I \proarrow \mathcal C|. Similarly, a |\V|-copresheaf on |\mathcal C| is a |\V|-profunctor |P : \mathcal C \proarrow \mathbb I|.

Of course, we have the fact that the term $$\begin{gather} s : \O_x, t : \O_y, s' : \O_z, t' : \O_w; h : \A(t, s), g : \A(s, s'), f : \A(s', t') \vdash h \circ g \circ f : \A(t, t') \end{gather}$$ witnesses the interpretation of |\A| as a |\V|-profunctor |\mathcal C \proarrow \mathcal C| for any |\V|-category, |\mathcal C|, which we’ll call the hom |\V|-profunctor. More generally, given a |\V|-profunctor |P : \mathcal C \proarrow \mathcal D|, and |\V|-functors |F : \mathcal C’ \to \mathcal C| and |F’ : \mathcal D’ \to \mathcal D|, we have the |\V|-profunctor |P(F, F’) : \mathcal C’ \proarrow \mathcal D’| defined as $$\begin{gather} s : \O_x, s' : \O'_{x'}, t : \O_y, t' : \O'_{y'}; f : \A(t, s), p : \mathsf P(\mathsf F(s), \mathsf F'(s')), f' : \A'(s', t') \vdash \mathsf P(\mathsf F(f), p, \mathsf F'(f')) : \mathsf P(\mathsf F(t), \mathsf F'(t')) \end{gather}$$ In particular, we have the representable |\V|-profunctors when |P| is the hom |\V|-profunctor and either |F| or |F’| is the identity |\V|-functor, e.g. |\mathcal C(Id, F)| or |\mathcal C(F, Id)|.

Multimorphisms

There’s a natural notion of morphism of |\V|-profunctors which we could derive either via passing the notion of natural transformation of the bifunctorial view through the same reformulations as above, or by generalizing the notion of a bimodule homomorphism. This would produce a notion like: a |\V|-natural transformation from |\alpha : P \to Q| is a |\alpha : P(t, s) \to Q(t, s)| satisfying |\alpha(P(f, p, h)) = Q(f, \alpha(p), h)|. While there’s nothing wrong with this definition, it doesn’t quite meet our needs. One way to see this is that it would be nice to have a bicategory whose |0|-cells were |\V|-categories, |1|-cells |\V|-profunctors, and |2|-cells |\V|-natural transformations as above. The problem there isn’t the |\V|-natural transformations but the |1|-cells. In particular, we don’t have composition of |\V|-profunctors. In the analogy with bimodules, we don’t have tensor products so we can’t reduce multilinear maps to linear maps; therefore, linear maps don’t suffice, and we really want a notion of multilinear maps.

So, instead of a bicategory what we’ll have is a virtual bicategory (or, more generally, a virtual double category). A virtual bicategory is to a bicategory what a multicategory is to a monoidal category, i.e. multicategories are “virtual monoidal categories”. The only difference between a virtual bicategory and a multicategory is that instead of our multimorphisms having arbitrary lists of objects as their sources, our “objects” (|1|-cells) themselves have sources and targets (|0|-cells) and our multimorphisms (|2|-cells) have composable sequences of |1|-cells as their sources.

A |\V|-multimorphism from a composable sequence of |\V|-profunctors |P_1, \dots, P_n| to the |\V|-profunctor |Q| is a model of the theory consisting of the various necessary subtheories and:

a linear term, |s_0 : \O_{x_0}^0, \dots, s_n : \O_{x_n}^n; p_1 : \mathsf P_{x_0x_1}^1(s_0, s_1), \dots, p_n : \mathsf P_{x_{n-1}x_n}^n(s_{n-1}, s_n) \vdash \tau_{x_0\cdots x_n}(p_1, \dots, p_n) : \mathsf Q_{x_0x_n}(s_0, s_n)|

satisfying $$\begin{align} & t, s_0 : \O^0, \dots, s_n : \O^n; f : \A^0(t, s_0), p_1 : \mathsf P^1(s_0, s_1), \dots, p_n : \mathsf P^n(s_{n-1}, s_n) \\ \vdash\ & \tau(\mathsf P_l^0(f, p_1), \dots, p_n) = \mathsf Q_l(f, \tau(p_1, \dots, p_n)) : \mathsf Q(t, s_n) \\ \\ & s_0 : \O^0, \dots, s_n, s : \O^n; p_1 : \mathsf P^1(s_0, s_1), \dots, p_n : \mathsf P^n(s_{n-1}, s_n), f : \A^n(s_n, s) \\ \vdash\ & \tau(p_1, \dots, \mathsf P_r^n(p_n, f)) = \mathsf Q_r(\tau(p_1, \dots, p_n), f) : \mathsf Q(s_0, s) \\ \\ & s_0 : \O^0, \dots, s_n : \O^n; \\ & p_1 : \mathsf P^1(s_0, s_1), \dots, p_i : \mathsf P^i(s_{i-1}, s_i), f : \A^i(s_i, s_{i+1}), p_{i+1} : \mathsf P^{i+1}(s_i, s_{i+1}), \dots, p_n : \mathsf P^n(s_{n-1}, s_n) \\ \vdash\ & \tau(p_1, \dots, \mathsf P_r^i(p_i, f), p_{i+1}, \dots, p_n) = \tau(p_1, \dots, p_i, \mathsf P_l^{i+1}(f, p_{i+1}) \dots, p_n) : \mathsf Q(s_0, s_n) \end{align}$$ except for the |n=0| case in which case the only law is $$\begin{gather} t, s : \O^0; f : \A^0(t, s) \vdash \mathsf Q_l(f, \tau()) = \mathsf Q_r(\tau(), f) : \mathsf Q(t, s) \end{gather}$$

The laws involving the action of |\mathsf Q| are called external equivariance, while the remaining law is called internal equivariance. We’ll write |\V\mathbf{Prof}(P_1, \dots, P_n; Q)| for the set of |\V|-multimorphisms from the composable sequence of |\V|-profunctors |P_1, \dots, P_n| to the |\V|-profunctor |Q|.

As with multilinear maps, we can characterize composition via a universal property. Write |Q_1\diamond\cdots\diamond Q_n| for the composite |\V|-profunctor (when it exists) of the composable sequence |Q_1, \dots, Q_n|. We then have for any pair of composable sequences |R_1, \dots, R_m| and |S_1, \dots, S_k| which compose with |Q_1, \dots, Q_n|, $$\begin{gather} \V\mathbf{Prof}(R_1,\dots, R_m, Q_1 \diamond \cdots \diamond Q_n, S_1, \dots, S_k; -) \cong \V\mathbf{Prof}(R_1,\dots, R_m, Q_1, \dots, Q_n, S_1, \dots, S_k; -) \end{gather}$$ where the forward direction is induced by precomposition with a |\V|-multimorphism |Q_1, \dots, Q_n \to Q_1 \diamond \cdots \diamond Q_n|. A |\V|-multimorphism with this property is called opcartesian. The |n=0| case is particularly important and, for a |\V|-category |\mathcal C|, produces the unit |\V|-profunctor, |U_\mathcal C : \mathcal C \proarrow \mathcal C| as the composite of the empty sequence. When we have all composites, |\V\mathbf{Prof}| becomes an actual bicategory rather than a virtual bicategory. |\V\mathbf{Prof}| always has all units, namely the hom |\V|-profunctors. Much like we can define the tensor product of modules by quotienting the tensor product of their underlying abelian groups by internal equivariance, we will find that we can make composites when we have enough (well-behaved) colimits⁶.

Related to composites, we can talk about left/right closure of |\V\mathbf{Prof}|. In this case we have the natural isomorphisms: $$\begin{gather} \V\mathbf{Prof}(Q_1,\dots, Q_n, R; S) \cong \V\mathbf{Prof}(Q_1, \dots, Q_n; R \triangleright S) \\ \V\mathbf{Prof}(R, Q_1, \dots, Q_n; S) \cong \V\mathbf{Prof}(Q_1, \dots, Q_n;S \triangleleft R) \end{gather}$$ Like composites, this merely characterizes these constructs; they need not exist in general. These will be important when we talk about Yoneda and (co)limits in |\V|-categories.

A |\V|-natural transformation |\alpha : F \to G : \mathcal C \to \mathcal D| is the same as |\alpha\in\V\mathbf{Prof}(;\mathcal D(G, F))|.

Example Proof

Just as an example, let’s prove a basic fact about categories for arbitrary |\V|-categories. This will use an informal style.

The fact will be that full and faithful functors reflect isomorphisms. Let’s go through the typical proof for the ordinary category case.

Suppose we have an natural transformation |\varphi : \mathcal D(FA, FB) \to \mathcal C(A, B)| natural in |A| and |B| such that |\varphi| is an inverse to |F|, i.e. the action of the functor |F| on arrows. If |Ff \circ Fg = id| and |Fg \circ Ff = id|, then by the naturality of |\varphi|, |\varphi(id) = \varphi(Ff \circ id \circ Fg) = f \circ \varphi(id) \circ g| and similarly with |f| and |g| switched. We now just need to show that |\varphi(id) = id| but |id = F(id)|, so |\varphi(id) = \varphi(F(id)) = id|. |\square|

Now in the internal language. We’ll start with the theory of a |\V|-functor, so we have |\O|, |\O’|, |\A|, |\A’|, and |\mathsf F|. While the previous paragraph talks about a natural transformation, we can readily see that it’s really a multimorphism. In our case, it is a |\V|-multimorphism |\varphi| from |\A’(\mathsf F, \mathsf F)| to |\A|. Before we do that though, we need to show that |\mathsf F| itself is a |\V|-multimorphism. This corresponds to the naturality of the action on arrows of |F| which we took for granted in the previous paragraph. This is quickly verified: the external equivariance equations are just the functor law for composites. The additional data we have is two linear terms |\mathsf f| and |\mathsf g| such that |\mathsf F(\mathsf f) \circ \mathsf F(\mathsf g) = \mathsf{id}| and |\mathsf F(\mathsf g) \circ \mathsf F(\mathsf f) = \mathsf{id}|. Also, |\varphi(\mathsf F(h)) = h|. The result follows through almost identically to the previous paragraph. |\varphi(\mathsf{id}) = \varphi(\mathsf F(\mathsf f) \circ \mathsf F(\mathsf g)) = \varphi(\mathsf F(\mathsf f) \circ \mathsf{id} \circ \mathsf F(\mathsf g))|, we apply external equivariance twice to get |\mathsf f \circ \varphi(\mathsf{id}) \circ \mathsf g|. The functor law for |\mathsf{id}| gives |\varphi(\mathsf{id}) = \varphi(\mathsf F(\mathsf{id})) = \mathsf{id}|. A quick glance verifies that all these equations use their free variables linearly as required. |\square|

As a warning, in the above |\mathsf f| and |\mathsf g| are not free variables but constants, i.e. primitive linear terms. Thus there is no issue with an equation like |\mathsf F(\mathsf f) \circ \mathsf F(\mathsf g) = \mathsf{id}| as both sides have no free variables.

This is a very basic result but, again, the payoff here is how boring and similar to the usual case this is. For contrast, the definition of an internal profunctor is given here. This definition is easier to connect to our notion of |\V|-presheaf, specifically a |\mathcal Self(\mathbf S)|-presheaf, than it is to the usual |\mathbf{Set}|-valued functor definition. While not hard, it would take me a bit of time to even formulate the above proposition, and a proof in terms of the explicit definitions would be hard to recognize as just the ordinary proof.

For fun, let’s figure out what the |\mathcal Const(\mathbf{Ab})| case of this result says explicitly. A |\mathcal Const(\mathbf{Ab})|-category is a ring, a |\mathcal Const(\mathbf{Ab})|-functor is a ring homomorphism, and a |\mathcal Const(\mathbf{Ab})|-profunctor is a bimodule. Let |R| and |S| be rings and |f : R \to S| be a ring homomorphism. An isomorphism in |R| viewed as a |\mathcal Const(\mathbf{Ab})|-category is just an invertible element. Every ring, |R|, is an |R|-|R|-bimodule. Given any |S|-|S|-bimodule |P|, we have an |R|-|R|-bimodule |f^*(P)| via restriction of scalars, i.e. |f^*(P)| has the same elements as |P| and for |p \in f^*(P)|, |rpr’ = f(r)pf(r’)|. In particular, |f| gives rise to a bimodule homomorphism, i.e. a linear function, |f : R \to f^*(S)| which corresponds to its action on arrows from the perspective of |f| as a |\mathcal Const(\mathbf{Ab})|-functor. If this linear transformation has an inverse, then the above result states that when |f(r)| is invertible so is |r|. So to restate this all in purely ring theoretic terms, given a ring homomorphism |f : R \to S| and an abelian group homomorphism |\varphi : S \to R| satisfying |\varphi(f(rst)) = r\varphi(f(s))t| and |\varphi(f(r)) = r|, then if |f(r)| is invertible so is |r|.

Indexed categories are equivalent to cloven fibrations and, if you have the Axiom of Choice, all fibrations can be cloven. Indexed categories can be viewed as presentations of fibrations.↩︎
This suggests that we could define a small |\V|-category |\mathcal C \otimes \mathcal D| where |\mathcal C| and |\mathcal D| are small |\V|-categories. Start formulating a definition of such a |\V|-category. You will get stuck. Where? Why? This implies that the (ordinary, or better, 2-)category of small |\V|-categories does not have a monoidal product with |\mathbb I| as unit in general.↩︎
With a good understanding of what a class is, it’s clear that it doesn’t even make sense to have a proper class be an object. In frameworks with an explicit notion of ”class”, this is often manifested by saying that a class that is an element of another class is a set (and thus not a proper class).↩︎
This suggests that it might be interesting to consider categories that are (co)complete with respect to this monoid notion of “small”. I don’t think I’ve ever seen a study of such categories. (Co)limits of monoids are not trivial.↩︎
This is one of the main things I like about working in weak foundations. It forces you to come up with better definitions that make it clear what is and is not important and eliminates coincidences. Of course, it also produces definitions and theorems that are inherently more general too.↩︎
This connection isn’t much of a surprise as the tensor product of modules is exactly the (small) |\mathcal Const(\mathbf{Ab})| case of this.↩︎

Internal Language of Indexed Monoidal Categories

2020-07-05 19:00:56-07:00

Introduction

This is part 2 in a series. See the previous part about internal languages for (non-indexed) monoidal categories. The main application I have in mind – enriching in indexed monoidal categories – is covered in the next post.

As Jean Bénabou pointed out in Fibered Categories and the Foundations of Naive Category Theory (PDF) notions of “families of objects/arrows” are ubiquitous and fundamental in category theory. One of the more noticeable places early on is in the definition of a natural transformation as a family of arrows. However, even in the definition of category, identities and compositions are families of functions, or, in the enriched case, arrows of |\mathbf V|. From a foundational perspective, one place where this gets really in-your-face is when trying to formalize the notion of (co)completeness. It is straightforward to make a first-order theory of a finitely complete category, e.g. this one. For arbitrary products and thus limits, we need to talk about families of objects. To formalize the usual meaning of this in a first-order theory would require attaching an entire first-order theory of sets, e.g. ZFC, to our notion of complete category. If your goals are of a foundational nature like Bénabou’s were, then this is unsatisfactory. Instead, we can abstract out what we need of the notion of “family”. The result turns out to be equivalent to the notion of a fibration.

My motivations here are not foundational but leaving the notion of “family” entirely meta-theoretical means not being able to talk about it except in the semantics. Bénabou’s comment suggests that at the semantic level we want not just a monoidal category, but a fibration of monoidal categories¹. At the syntactic level, it suggests that there should be a built-in notion of “family” in our language. We accomplish both of these goals by formulating the internal language of an indexed monoidal category.

As a benefit, we can generalize to other notions of “family” than set-indexed families. We’ll clearly be able to formulate the notion of an enriched category. It’s also clear that we’ll be able to formulate the notion of an indexed category. Of course, we’ll also be able to formulate the notion of a category that is both enriched and indexed which includes the important special case of an internal category. We can also consider cases with trivial indexing which, in the unenriched case, will give us monoids, and in the |\mathbf{Ab}|-enriched case will give us rings.

Indexed Monoidal Categories

Following Shulman’s Enriched indexed categories, let |\mathbf{S}| be a category with a cartesian monoidal structure, i.e. finite products. Then an |\mathbf{S}|-indexed monoidal category is simply a pseudofunctor |\V : \mathbf{S}^{op} \to \mathbf{MonCat}|. A pseudofunctor is like a functor except that the functor laws only hold up to isomorphism, e.g. |\V(id)\cong id|. |\mathbf{MonCat}| is the |2|-category of monoidal categories, strong monoidal functors ², and monoidal natural transformations. We’ll write |\V(X)| as |\V^X| and |\V(f)| as |f^*|. We’ll never have multiple relevant indexed monoidal categories so this notation will never be ambiguous. We’ll call the categories |\V^X| fiber categories and the functors |f^*| reindexing functors. The cartesian monoidal structure on |\mathbf S| becomes relevant when we want to equip the total category, |\int\V|, (computed via the Grothendieck construction in the usual way) with a monoidal structure. In particular, the tensor product of |A \in \V^X| and |B \in \V^Y| is an object |A\otimes B \in \V^{X\times Y}| calculated as |\pi_1^*(A) \otimes_{X\times Y} \pi_2^*(B)| where |\otimes_{X\times Y}| is the monoidal tensor in |\V^{X\times Y}|. The unit, |I|, is the unit |I_1 \in \V^1|.

The two main examples are: |\mathcal Fam(\mathbf V)| where |\mathbf V| is a (non-indexed) monoidal category and |\mathcal Self(\mathbf S)| where |\mathbf S| is a category with finite limits. |\mathcal Fam(\mathbf V)| is a |\mathbf{Set}|-indexed monoidal category with |\mathcal Fam(\mathbf V)^X| defined as the category of |X|-indexed families of objects of |\mathbf V|, families of arrows between them, and an index-wise monoidal product. We can identify |\mathcal Fam(\mathbf V)^X| with the functor category |[DX, \mathbf V]| where |D : \mathbf{Set} \to \mathbf{cat}| takes a set |X| to a small discrete category. Enriching in indexed monoidal category |\mathcal Fam(\mathbf V)| will be equivalent to enriching in the non-indexed monoidal category |\mathbf V|, i.e. the usual notion of enrichment in a monoidal category. |\mathcal Self(\mathbf S)| is an |\mathbf S|-indexed monoidal category and |\mathcal Self(\mathbf S)^X| is the slice category |\mathbf S/X| with its cartesian monoidal structure. |f^*| is the pullback functor. |\mathcal Self(\mathbf S)|-enriched categories are categories internal to |\mathbf S|. A third example we’ll find interesting is |\mathcal Const(\mathbf V)| for a (non-indexed) monoidal category, |\mathbf V|, which is a |\mathbf 1|-indexed monoidal category, which corresponds to an object of |\mathbf{MonCat}|, namely |\mathbf V|.

The Internal Language of Indexed Monoidal Categories

This builds on the internal language of a monoidal category described in the previous post. We’ll again have linear types and linear terms which will be interpreted into objects and arrows in the fiber categories. To indicate the dependence on the indexing, we’ll use two contexts: |\Gamma| will be an index context containing index types and index variables, which will be interpreted into objects and arrows of |\mathbf S|, while |\Delta|, the linear context, will contain linear types and linear variables as before except now linear types will be able to depend on index terms. So we’ll have judgements that look like: $$\begin{gather} \Gamma \vdash A \quad \text{and} \quad \Gamma; \Delta \vdash E : B \end{gather}$$ The former indicates that |A| is a linear type indexed by the index variables of |\Gamma|. The latter states that |E| is a linear term of linear type |B| in the linear context |\Delta| indexed by the index variables of |\Gamma|. We’ll also have judgements for index types and index terms: $$\begin{gather} \vdash X : \square \quad \text{and} \quad \Gamma \vdash E : Y \end{gather}$$ The former asserts that |X| is an index type. The latter asserts that |E| is an index term of index type |Y| in the index context |\Gamma|.

Since each fiber category is monoidal, we’ll have all the rules from before just with an extra |\Gamma| hanging around. Since our indexing category, |\mathbf S|, is also monoidal, we’ll also have copies of these rules at the level of indexes. However, since |\mathbf S| is cartesian monoidal, we’ll also have the structural rules of weakening, exchange, and contraction for index terms and types. To emphasize the cartesian monoidal structure of indexes, I’ll use the more traditional Cartesian product and tuple notation: |\times| and |(E_1, \dots, E_n)|. This notation allows a bit more uniformity as the |n=0| case can be notated by |()|.

The only really new rule is the rule that allows us to move linear types and terms from one index context to another, i.e. the rule that would correspond to applying a reindexing functor. I call this rule Reindex and, like Cut, it will be witnessed by substitution. Like Cut, it will also be a rule which we can eliminate. At the semantic level, this elimination corresponds to the fact that to understand the interpretation of any particular (linear) term, we can first reindex everything, i.e. all the interpretations of all subterms, into the same fiber category and then we can work entirely within that one fiber category. The Reindex rule is: $$\begin{gather} \dfrac{\Gamma \vdash E : X \quad \Gamma', x : X; a_1 : A_1, \dots, a_n : A_n \vdash E' : B}{\Gamma',\Gamma; a_1 : A_1[E/x], \dots, a_n : A_n[E/x] \vdash E'[E/x] : B[E/x]}\text{Reindex} \end{gather}$$

By representing reindexing by syntactic substitution, we’re requiring the semantics of (linear) type and term formation operations to be respected by reindexing functors. This is exactly the right thing to do as the appropriate notion of, say, indexed coproducts, which would correspond to sum types, is coproducts in each fiber category which are preserved by reindexing functors.

Below I provide a listing of rules and equations.

Relation to Parameterized and Dependent Types

None of this section is necessary for anything else.

This notion of (linear) types and terms being indexed by other types and terms is reminiscent of parametric types or dependent types. The machinery of indexed/fibered categories is also commonly used in the categorical semantics of parameterized and dependent types. However, there are important differences between those cases and our case.

In the case of parameterized types, we have types and terms that depend on other types. In this case, we have kinds, which are “types of types”, which classify types which in turn classify terms. If we try to set up an analogy to our situation, index types would correspond to kinds and index terms would correspond to types. The most natural thing to continue would be to have linear terms correspond to terms, but we start to see the problem. Linear terms are classified by linear types, but linear types are not index terms. They don’t even induce index terms. In the categorical semantics of parameterized types, this identification of types with (certain) expressions classified by kinds is handled by the notion of a generic object. A generic object corresponds to the kind |\mathsf{Type}| (what Haskell calls *). The assumption of a generic object is a rather strong assumption and one that none of our example indexed monoidal categories support in general.

A similar issue occurs when we try to make an analogy to dependent types. The defining feature of a dependent type system is that types can depend on terms. The problem with such a potential analogy is that linear types and terms do not induce index types and terms. A nice way to model the semantics of dependent types is the notion of a comprehension category. This, however, is additional structure beyond what we are given by an indexed monoidal category. However, comprehension categories will implicitly come up later when we talk about adding |\mathbf S|-indexed (co)products. These comprehension categories will share the same index category as our indexed monoidal categories, namely |\mathbf S|, but will have different total categories. Essentially, a comprehension category shows how objects (and arrows) of a total category can be represented in the index category. We can then talk about having (co)products in a different total category with same index category with respect to those objects picked out by the comprehension category. We get dependent types in the case where the total categories are the same. (More precisely, the fibrations are the same.) Sure enough, we will see that when |\mathcal Self(\mathbf S)| has |\mathbf S|-indexed products, then |\mathbf S| is, indeed, a model of a dependent type theory. In particular, it is locally cartesian closed.

Rules for an Indexed Monoidal Category

$$\begin{gather} \dfrac{\vdash X : \square}{x : X \vdash x : X}\text{IxAx} \qquad \dfrac{\Gamma\vdash E : X \quad \Gamma', x : X \vdash E': Y}{\Gamma',\Gamma \vdash E'[E/x] : Y}\text{IxCut} \\ \\ \dfrac{\vdash Y : \square \quad \Gamma\vdash E : X}{\Gamma, y : Y \vdash E : X}\text{Weakening},\ y\text{ fresh} \qquad \dfrac{\Gamma, x : X, y : Y, \Gamma' \vdash E : Z}{\Gamma, y : Y, x : X, \Gamma' \vdash E : Z}\text{Exchange} \qquad \dfrac{\Gamma, x : X, y : Y \vdash E : Z}{\Gamma, x : X \vdash E[x/y] : Z}\text{Contraction} \\ \\ \dfrac{\mathsf X : \mathsf{IxType}}{\vdash \mathsf X : \square}\text{PrimIxType} \qquad \dfrac{\vdash X_1 : \square \quad \cdots \quad \vdash X_n : \square}{\vdash (X_1, \dots, X_n) : \square}{\times_n}\text{F} \\ \\ \dfrac{\Gamma \vdash E_1 : X_1 \quad \cdots \quad \Gamma \vdash E_n : X_n \quad \mathsf F : (X_1, \dots, X_n) \to Y}{\Gamma \vdash \mathsf F(E_1, \dots, E_n) : Y}\text{PrimIxTerm} \\ \\ \dfrac{\Gamma_1 \vdash E_1 : X_1 \quad \cdots \quad \Gamma_n \vdash E_n : X_n}{\Gamma_1,\dots,\Gamma_n \vdash (E_1, \dots, E_n) : (X_1, \dots, X_n)}{\times_n}\text{I} \qquad \dfrac{\Gamma \vdash E : (X_1, \dots, X_n) \quad x_1 : X_1, \dots, x_n : X_n, \Gamma' \vdash E' : Y}{\Gamma, \Gamma' \vdash \mathsf{match}\ E\ \mathsf{as}\ (x_1, \dots, x_n)\ \mathsf{in}\ E' : Y}{\times_n}\text{E} \\ \\ \dfrac{\Gamma \vdash E_1 : X_1 \quad \cdots \quad \Gamma \vdash E_n : X_n \quad \mathsf A : (X_1, \dots, X_n) \to \mathsf{Type}}{\Gamma \vdash \mathsf A(E_1, \dots, E_n)}\text{PrimType} \\ \\ \dfrac{\Gamma \vdash A}{\Gamma; a : A \vdash a : A}\text{Ax} \qquad \dfrac{\Gamma; \Delta_1 \vdash E_1 : A_1 \quad \cdots \quad \Gamma; \Delta_n \vdash E_n : A_n \quad \Gamma; \Delta_l, a_1 : A_1, \dots, a_n : A_n, \Delta_r \vdash E: B}{\Gamma; \Delta_l, \Delta_1, \dots, \Delta_n, \Delta_r \vdash E[E_1/a_1, \dots, E_n/a_n] : B}\text{Cut} \\ \\ \dfrac{\Gamma \vdash E : X \quad \Gamma', x : X; a_1 : A_1, \dots, a_n : A_n \vdash E' : B}{\Gamma',\Gamma; a_1 : A_1[E/x], \dots, a_n : A_n[E/x] \vdash E'[E/x] : B[E/x]}\text{Reindex} \\ \\ \dfrac{}{\Gamma\vdash I}I\text{F} \qquad \dfrac{\Gamma\vdash A_1 \quad \cdots \quad \Gamma \vdash A_n}{\Gamma \vdash A_1 \otimes \cdots \otimes A_n}{\otimes_n}\text{F}, n \geq 1 \\ \\ \dfrac{\Gamma \vdash E_1 : X_1 \quad \cdots \quad \Gamma \vdash E_n : X_n \quad \Gamma; \Delta_1 \vdash E_1' : A_1 \quad \cdots \quad \Gamma; \Delta_m \vdash E_m' : A_m \quad \mathsf f : (x_1 : X_1, \dots, x_n : X_n; A_1, \dots, A_m) \to B}{\Gamma; \Delta_1, \dots, \Delta_m \vdash \mathsf f(E_1, \dots, E_n; E_1', \dots, E_m') : B}\text{PrimTerm} \\ \\ \dfrac{}{\Gamma; \vdash * : I}I\text{I} \qquad \dfrac{\Gamma; \Delta \vdash E : I \quad \Gamma; \Delta_l, \Delta_r \vdash E' : B}{\Gamma; \Delta_l, \Delta, \Delta_r \vdash \mathsf{match}\ E\ \mathsf{as}\ *\ \mathsf{in}\ E' : B}I\text{E} \\ \\ \dfrac{\Gamma; \Delta_1 \vdash E_1 : A_1 \quad \cdots \quad \Gamma; \Delta_n \vdash E_n : A_n}{\Gamma; \Delta_1,\dots,\Delta_n \vdash E_1 \otimes \cdots \otimes E_n : A_1 \otimes \cdots \otimes A_n}{\otimes_n}\text{I} \\ \\ \dfrac{\Gamma; \Delta \vdash E : A_1 \otimes \cdots \otimes A_n \quad \Gamma; \Delta_l, a_1 : A_1, \dots, a_n : A_n, \Delta_r \vdash E' : B}{\Gamma; \Delta_l, \Delta, \Delta_r \vdash \mathsf{match}\ E\ \mathsf{as}\ (a_1 \otimes \cdots \otimes a_n)\ \mathsf{in}\ E' : B}{\otimes_n}\text{E},n \geq 1 \end{gather}$$

Equations

$$\begin{gather} \dfrac{\Gamma_1 \vdash E_1 : X_1 \quad \cdots \quad \Gamma_n \vdash E_n : X_n \qquad x_1 : X_1, \dots, x_n : X_n, \Gamma \vdash E : Y}{\Gamma_1, \dots, \Gamma_n, \Gamma \vdash (\mathsf{match}\ (E_1, \dots, E_n)\ \mathsf{as}\ (x_1, \dots, x_n)\ \mathsf{in}\ E) = E[E_1/x_1, \dots, E_n/x_n] : Y}{\times_n}\beta \\ \\ \dfrac{\Gamma \vdash E : (X_1, \dots, X_n) \qquad \Gamma, x : (X_1, \dots, X_n) \vdash E' : B}{\Gamma \vdash E'[E/x] = \mathsf{match}\ E\ \mathsf{as}\ (x_1, \dots, x_n)\ \mathsf{in}\ E'[(x_1, \dots, x_n)/x] : B}{\times_n}\eta \\ \\ \dfrac{\Gamma \vdash E_1 : (X_1, \dots, X_n) \qquad x_1 : X_1, \dots, x_n : X_n \vdash E_2 : Y \quad y : Y \vdash E_3 : Z}{\Gamma \vdash (\mathsf{match}\ E_1\ \mathsf{as}\ (x_1, \dots, x_n)\ \mathsf{in}\ E_3[E_2/y]) = E_3[(\mathsf{match}\ E_1\ \mathsf{as}\ (x_1, \dots, x_n)\ \mathsf{in}\ E_2)/y] : Z}{\times_n}\text{CC} \\ \\ \dfrac{\Gamma;\vdash E : B}{\Gamma;\vdash (\mathsf{match}\ *\ \mathsf{as}\ *\ \mathsf{in}\ E) = E : B}{*}\beta \qquad \dfrac{\Gamma; \Delta \vdash E : I \qquad \Gamma; \Delta_l, a : I, \Delta_r \vdash E' : B}{\Gamma; \Delta_l, \Delta, \Delta_r \vdash E'[E/a] = (\mathsf{match}\ E\ \mathsf{as}\ *\ \mathsf{in}\ E'[{*}/a]) : B}{*}\eta \\ \\ \dfrac{\Gamma; \Delta_1 \vdash E_1 : A_1 \quad \cdots \quad \Gamma; \Delta_n \vdash E_n : A_n \qquad \Gamma; \Delta_l, a_1 : A_1, \dots, a_n, \Delta_r : A_n \vdash E : B}{\Gamma; \Delta_l, \Delta_1, \dots, \Delta_n, \Delta_r \vdash (\mathsf{match}\ E_1\otimes\cdots\otimes E_n\ \mathsf{as}\ a_1\otimes\cdots\otimes a_n\ \mathsf{in}\ E) = E[E_1/a_1, \dots, E_n/a_n] : B}{\otimes_n}\beta \\ \\ \dfrac{\Gamma; \Delta \vdash E : A_1 \otimes \cdots \otimes A_n \qquad \Gamma; \Delta_l, a : A_1 \otimes \cdots \otimes A_n, \Delta_r \vdash E' : B}{\Gamma; \Delta_l, \Delta, \Delta_r \vdash E'[E/a] = \mathsf{match}\ E\ \mathsf{as}\ a_1\otimes\cdots\otimes a_n\ \mathsf{in}\ E'[(a_1\otimes\cdots\otimes a_n)/a] : B}{\otimes_n}\eta \\ \\ \dfrac{\Gamma; \Delta \vdash E_1 : I \qquad \Gamma; \Delta_l, \Delta_r \vdash E_2 : B \qquad \Gamma; b : B \vdash E_3 : C}{\Gamma; \Delta_l, \Delta, \Delta_r \vdash (\mathsf{match}\ E_1\ \mathsf{as}\ *\ \mathsf{in}\ E_3[E_2/b]) = E_3[(\mathsf{match}\ E_1\ \mathsf{as}\ *\ \mathsf{in}\ E_2)/b] : C}{*}\text{CC} \\ \\ \dfrac{\Gamma; \Delta \vdash E_1 : A_1 \otimes \cdots \otimes A_n \qquad \Gamma; \Delta_l, a_1 : A_1, \dots, a_n : A_n, \Delta_r \vdash E_2 : B \qquad \Gamma; b : B \vdash E_3 : C}{\Gamma; \Delta_l, \Delta, \Delta_r \vdash (\mathsf{match}\ E_1\ \mathsf{as}\ a_1 \otimes \cdots \otimes a_n\ \mathsf{in}\ E_3[E_2/b]) = E_3[(\mathsf{match}\ E_1\ \mathsf{as}\ a_1 \otimes \dots \otimes a_n\ \mathsf{in}\ E_2)/b] : C}{\otimes_n}\text{CC} \end{gather}$$

|\mathsf X : \mathsf{IxType}| means |\mathsf X| is a primitive index type in the signature. |\mathsf A : (X_1, \dots, X_n) \to \mathsf{Type}| means that |\mathsf A| is a primitive linear type in the signature. |\mathsf F : (X_1, \dots, X_n) \to Y| and |\mathsf f : (x_1 : X_1, \dots, x_n : X_n; A_1, \dots, A_m) \to B| mean that |\mathsf F| and |\mathsf f| are assigned these types in the signature. In the latter case, it is assumed that |x_1 : X_1, \dots, x_n : X_n \vdash A_i| for |i = 1, \dots, m| and |x_1 : X_1, \dots, x_n : X_n \vdash B|. Alternatively, these assumptions could be added as additional hypotheses to the PrimTerm rule. Generally, every |x_i| will be used in some |A_j| or in |B|, though this isn’t technically required.

As before, I did not write the usual laws for equality (reflexivity and indiscernability of identicals) but they also should be included.

See the discussion in the previous part about the commuting conversion (|\text{CC}|) rules.

A theory in this language is free to introduce additional index types, operations on indexes, linear types, and linear operations.

Interpretation into an |\mathbf S|-indexed monoidal category

Fix an |\mathbf S|-indexed monoidal category |\V|. Write |\den{-}| for the (overloaded) interpretation function. Its value on primitive operations is left as a parameter.

Associators for the semantic |\times| and |\otimes| will be omitted below.

Interpretation of Index Types

$$\begin{align} \vdash X : \square \implies & \den{X} \in \mathsf{Ob}(\mathbf S) \\ \\ \den{\Gamma} = & \prod_{i=1}^n \den{X_i}\text{ where } \Gamma = x_1 : X_1, \dots, x_n : X_n \\ \den{(X_1, \dots, X_n)} = & \prod_{i=1}^n \den{X_i} \end{align}$$

Interpretation of Index Terms

$$\begin{align} \Gamma \vdash E : X \implies & \den{E} \in \mathbf{S}(\den{\Gamma}, \den{X}) \\ \\ \den{x_i} =\, & \pi_i \text{ where } x_1 : X_1, \dots, x_n : X_n \vdash x_i : X_i \\ \den{(E_1, \dots, E_n)} =\, & \den{E_1} \times \cdots \times \den{E_n} \\ \den{\mathsf{match}\ E\ \mathsf{as}\ (x_1, \dots, x_n)\ \mathsf{in}\ E'} =\, & \den{E'} \circ (\den{E} \times id_{\den{\Gamma'}}) \text{ where } \Gamma' \vdash E' : Y \\ \den{\mathsf F(E_1, \dots, E_n)} =\, & \den{\mathsf F} \circ (\den{E_1} \times \cdots \times \den{E_n}) \\ & \quad \text{ where }\mathsf F\text{ is an appropriately typed index operation} \end{align}$$

Witnesses of Index Derivations

IxAx is witnessed by identity, and IxCut by composition in |\mathbf S|. Weakening is witnessed by projection. Exchange and Contraction are witnessed by expressions that can be built from projections and tupling. This is very standard.

Interpretation of Linear Types

$$\begin{align} \Gamma \vdash A \implies & \den{A} \in \mathsf{Ob}(\V^{\den{\Gamma}}) \\ \\ \den{\Delta} =\, & \den{A_1}\otimes_{\den{\Gamma}}\cdots\otimes_{\den{\Gamma}}\den{A_n} \text{ where } \Delta = a_1 : A_1, \dots, a_n : A_n \\ \den{I} =\, & I_{\den{\Gamma}}\text{ where } \Gamma \vdash I \\ \den{A_1 \otimes \cdots \otimes A_n} =\, & \den{A_1}\otimes_{\den{\Gamma}} \cdots \otimes_{\den{\Gamma}} \den{A_n}\text{ where } \Gamma \vdash A_i \\ \den{\mathsf A(E_1, \dots, E_n)} =\, & \langle \den{E_1}, \dots, \den{E_n}\rangle^*(\den{\mathsf A}) \\ & \quad \text{ where }\mathsf A\text{ is an appropriately typed linear type operation} \end{align}$$

Interpretation of Linear Terms

$$\begin{align} \Gamma; \Delta \vdash E : A \implies & \den{E} \in \V^{\den{\Gamma}}(\den{\Delta}, \den{A}) \\ \\ \den{a} =\, & id_{\den{A}} \text{ where } a : A \\ \den{*} =\, & id_{I_{\den{\Gamma}}} \text{ where } \Gamma;\vdash * : I \\ \den{E_1 \otimes \cdots \otimes E_n} =\, & \den{E_1} \otimes_{\den{\Gamma}} \cdots \otimes_{\den{\Gamma}} \den{E_n} \text{ where } \Gamma; \Delta_i \vdash E_i : A_i \\ \den{\mathsf{match}\ E\ \mathsf{as}\ {*}\ \mathsf{in}\ E'} =\, & \den{E'} \circ (id_{\den{\Delta_l}} \otimes_{\den{\Gamma}} (\lambda_{\den{\Delta_r}} \circ (\den{E} \otimes_{\den{\Gamma}} id_{\den{\Delta_r}}))) \\ \den{\mathsf{match}\ E\ \mathsf{as}\ a_1 \otimes \cdots \otimes a_n\ \mathsf{in}\ E'} =\, & \den{E'} \circ (id_{\den{\Delta_l}} \otimes_{\den{\Gamma}} \den{E} \otimes_{\den{\Gamma}} id_{\den{\Delta_r}}) \\ \den{\mathsf f(E_1, \dots, E_n; E_1', \dots, E_n')} =\, & \langle \den{E_1}, \dots, \den{E_n}\rangle^*(\den{\mathsf f}) \circ (\den{E_1'} \otimes_{\den{\Gamma}} \cdots \otimes_{\den{\Gamma}} \den{E_n'}) \\ & \quad \text{ where }\mathsf f\text{ is an appropriately typed linear operation} \end{align}$$

Witnesses of Linear Derivations

As with the index derivations, Ax is witnessed by the identity, in this case in |\V^{\den{\Gamma}}|.

|\den{E[E_1/a_1,,E_n/a_n]} = \den{E} \circ (\den{E_1}\otimes\cdots\otimes\den{E_n})| witnesses Cut.

Roughly speaking, Reindex is witnessed by |\den{E}^*(\den{E’})|. If we were content to restrict ourselves to semantics in |\mathbf S|-indexed monoidal categories witnessed by functors, as opposed to pseudofunctors, into strict monoidal categories, then this would suffice. For an arbitrary |\mathbf S|-indexed monoidal category, we can’t be sure that the naive interpretation of |A[E/x][E’/y]|, i.e. |\den{E’}^*(\den{E}^*(\den{A}))|, which we’d get from two applications of the Reindex rule, is the same as the interpretation of |A[E[E’/y]/x]|, i.e. |\den{E \circ E’}^*(\den{A})|, which we’d get from IxCut followed by Reindex. On the other hand, |A[E/x][E’/y] = A[E[E’/y]/x]| is simply true syntactically by the definition of substitution (which I have not provided but is the obvious, usual thing). There are similar issues for (meta-)equations like |I[E/x] = I| and |(A_1 \otimes A_2)[E/x] = A_1[E/x] \otimes A_2[E/x]|.

The solution is that we essentially use a normal form where we eliminate the uses of Reindex. These normal form derivations will be reached by rewrites such as: $$\begin{gather} \dfrac{\dfrac{\mathcal D}{\Gamma' \vdash E : X} \qquad \dfrac{\dfrac{\mathcal D_1}{\Gamma, x : X; \Delta_1 \vdash E_1 : A_1} \quad \cdots \quad \dfrac{\mathcal D_n}{\Gamma, x : X; \Delta_n \vdash E_n : A_n}} {\Gamma, x : X; \Delta_1, \dots, \Delta_n \vdash E_1 \otimes \cdots \otimes E_n : A_1 \otimes \cdots \otimes A_n}} {\Gamma, \Gamma'; \Delta_1[E/x], \dots, \Delta_n[E/x] \vdash E_1[E/x] \otimes \cdots \otimes E_n[E/x] : A_1[E/x] \otimes \cdots \otimes A_n[E/x]} \\ \Downarrow \\ \dfrac{\dfrac{\dfrac{\mathcal D}{\Gamma' \vdash E : X} \quad \dfrac{\mathcal D_1}{\Gamma, x : X; \Delta_1 \vdash E_1 : A_1}} {\Gamma, \Gamma'; \Delta_1[E/x] \vdash E_1[E/x] : A_1[E/x]} \quad \cdots \quad \dfrac{\dfrac{\mathcal D}{\Gamma' \vdash E : X} \quad \dfrac{\mathcal D_n}{\Gamma, x : X; \Delta_n \vdash E_n : A_n}} {\Gamma, \Gamma'; \Delta_n[E/x] \vdash E_n[E/x] : A_n[E/x]}} {\Gamma, \Gamma'; \Delta_1[E/x], \dots, \Delta_n[E/x] \vdash E_1[E/x] \otimes \cdots \otimes E_n[E/x] : A_1[E/x] \otimes \cdots \otimes A_n[E/x]} \end{gather}$$

Semantically, this is witnessed by the strong monoidal structure, i.e. |\den{E}^*(\den{E_1} \otimes \cdots \otimes \den{E_n}) \cong \den{E}^*(\den{E_1}) \otimes \cdots \otimes \den{E}^*(\den{E_n})|. We need such rewrites for all (linear) rules that can immediately precede Reindex in a derivation. For |I\text{I}|, |I\text{E}|, |\otimes_n\text{E}|, and, as we’ve just seen, |\otimes_n\text{I}|, these rewrites are witnessed by |\den{E}^*| being a strong monoidal functor. The rewrites for |\text{Ax}| and |\text{Cut}| are witnessed by functorality of |\den{E}^*| and also strong monoidality for Cut. Finally, two adjacent uses of Reindex become an IxCut and a Reindex and are witnessed by the pseudofunctoriality of |(\_)^*|. (While we’re normalizing, we may as well eliminate Cut and IxCut as well.)

As the previous post alludes, monoidal structure is more than we need. If we pursue the generalizations described there in this indexed context, we eventually end up at augmented virtual double categories or virtual equipment.↩︎
The terminology here is a mess. Leinster calls strong monoidal functors “weak”. “Strong” also refers to tensorial strength, and it’s quite possible to have a “strong lax monoidal functor”. (In fact, this is what applicative functors are usually described as, though a strong lax closed functor would be a more direct connection.) Or the functors we’re talking about which are not-strong strong monoidal functors…↩︎

Internal Language of a Monoidal Category

2020-05-22 01:46:31-07:00

Introduction

This is the first post in a series of posts on doing enriched indexed category theory and using the notion of an internal language to make this look relatively mundane. The internal language aspects are useful for other purposes too, as will be illustrated in this post, for example. This is related to the post Category Theory, Syntactically. In particular, it can be considered half-way between the unary theories and the finite product theories described there.

First in this series – this post – covers the internal language of a monoidal category. This is fairly straightforward, but it already provides some use. For example, the category of endofunctors on a category is a strict monoidal category, and so we can take a different perspective on natural transformations. This will also motivate the notions of a (virtual) bicategory and an actegory. Throughout this post, I’ll give a fairly worked example of turning some categorical content into rules of a type-/proof-theory.

The second post will add indexing to the notion of monoidal category and introduce the very powerful and useful notion of an indexed monoidal category.

The third post will formulate the notion of categories enriched in an indexed monoidal category and give the definitions which don’t require any additional assumptions.

The fourth post will introduce the notion and internal language for an indexed cosmos. Normally, when we do enriched category theory, we want the category into which we’re enriching to not be just a monoidal category but a cosmos. This provides many additional properties. An indexed cosmos is just the analogue of that for indexed monoidal categories.

The fifth post will then formulate categorical concepts for our enriched indexed categories that require some or all of these additional properties provided by an indexed cosmos.

At some point, there will be a post on virtual double categories as they (or, even better, augmented virtual double categories) are what will really be behind the notion of enriched indexed categories we’ll define. Basically, we’ll secretly be spelling out a specific instance of the |\mathsf{Mod}| construction.

The Internal Language of Monoidal Categories

Fix a monoidal category called |\V|.

The internal language of a monoidal category is quite simple to describe¹. We’ll write terms in context. The term $$\begin{align} a_1 : A_1, \dots, a_n : A_n \vdash E : B\end{align}$$ will represent an arrow |A_1 \otimes \cdots \otimes A_n \to B| in |\V|. The |n = 0| case will be represented by omitting the context and will correspond to an arrow from the unit, |I|. However, there’s a catch. The term |E| must use all the variables |a_1, \dots, a_n| exactly once and in the order that they are listed in the context. (The |\mathsf{match}| construct will make this more complicated. Ultimately, a one-dimensional syntax isn’t that well suited to this situation.) For example, $$\begin{align}a_1 : A_1, a_2: A_2 \vdash \mathsf f(a_2, a_1) : B, \quad a : A_1 \vdash \mathsf g(a, a) : B, \quad \text{and} \quad a : A \vdash \mathsf b : B\end{align}$$ are all invalid. We’ll call the context the linear context, consisting of linear variables with linear types, which we’ll usually represent with the metavariable |\Delta|. The naming comes from the connections to ordered linear logic.

Substituting for the linear variables, written |E[E_1/a_1,\dots,E_n/a_n]|, corresponds to the composition |E \circ (E_1 \otimes \cdots \otimes E_n)|. You can work out what associativity and unit laws of composition would look like. It should be noted, though, that this is a meta-theorem. Substitution is defined in the typical, syntactic way, and we’d need to prove that for every term the interpretation of the result of substituting into that term is equal to the composition of the interpretations.

If we replaced arrows |A_1 \otimes \cdots \otimes A_n \to B| in a monoidal category with multiarrows |(A_1, \dots, A_n) \to B| in a multicategory², then we’d be done. However, monoidal categories correspond to representable multicategories. If we write, |\mathcal C(A_1, \dots, A_n; B)| for the set of multiarrows |(A_1, \dots, A_n) \to B| in the multicatgory |\mathcal C|, then |\mathcal C| being representable means we have $$\begin{align} \mathcal C(A_1, \dots, A_m, B_1, \dots, B_n, C_1, \dots, C_p; D) \cong \mathcal C(A_1, \dots, A_m, B_1 \otimes \cdots \otimes B_n, C_1, \dots, C_p; D) \end{align}$$ natural in |D| and multinatural in the |A_i| and |C_i|. This implies that every |n|-ary arrow is equivalent to a unary arrow.

Since it’s useful to know, I’ll go into some detail on how we derive natural-deduction-style rules from this categorical data. First, we note that the above natural isomorphism (at least when |m=0| and |p=0|) has the form of a universal property using representability and, specifically, is a mapping-out property. That is, we are saying that the functor |\mathcal C(B_1, \dots, B_n; \_)| is represented by |B_1 \otimes \cdots \otimes B_n|. Let’s call the left to right direction of the above natural isomorphism |\varphi|. While this universal property can, of course, be represented by the natural isomorphism as above, it can also be equivalently represented by a universal element. Namely, one particularly notable choice for |D| is |B_1 \otimes \cdots \otimes B_n| itself, at which point we can consider |\eta = \varphi^{-1}(id_{B_1 \otimes \cdots \otimes B_n}) : (B_1, \dots, B_n) \to B_1 \otimes \cdots \otimes B_n|. By using naturality of |\varphi^{-1}|, we can easily show that |\varphi^{-1}(f) = f \circ \eta|. To witness the fact that |\varphi^{-1}(\varphi(f)) = f|, we need |\varphi(f) \circ \eta = f|. We’ve now shown that a natural transformation |\varphi| as above and an element (in this case a multiarrow) |\eta| which satisfy |\varphi(\eta) = id| and |\varphi(f) \circ \eta = f| is equivalent to the natural isomorphism above.

To start translating this to rules, we look at |\eta| first. A direct translation would be to say we have the rule: $$\begin{align} \dfrac{}{a_1 : A_1, \dots, a_n : A_n \vdash \eta : A_1 \otimes \cdots \otimes A_n} \end{align}$$ This is unnatural because it treats |\eta| like a primitive open term. This also means that to use |\eta|, we’d need to use the Cut rule (which corresponds to substitution/composition) which would stymie Cut elimination. The solution is to mix a use of Cut into the rule itself producing the term |\eta[E_1/a_1, \dots, E_n/a_n]| which I’ll write more perspicuously as |E_1 \otimes \cdots \otimes E_n|. This gives rise to the rule: $$\begin{align} \dfrac{\Delta_1 \vdash E_1 : A_1 \quad \cdots \quad \Delta_n \vdash E_n : A_n}{\Delta_1,\dots,\Delta_n \vdash E_1 \otimes \cdots \otimes E_n : A_1 \otimes \cdots \otimes A_n}{\otimes_n}\text{I} \end{align}$$ Of course, we could restrict to just the |n=0| and |n=2| cases if we wanted. I’ll write the |n=0| case of |a_1\otimes\cdots\otimes a_n| as |*| in both the term and pattern cases. As the label for the rule, |\otimes_n I|, suggests, this is an introduction rule for |\otimes|.

The rule corresponding to |\varphi| takes less massaging. We’ll do the same trick of incorporating a Cut (Where?), but this makes a fairly minor difference in this case. The rule we get is: $$\begin{align} \dfrac{\Delta \vdash E : A_1 \otimes \cdots \otimes A_n \quad \Delta_l, a_1 : A_1, \dots, a_n : A_n, \Delta_r \vdash E' : B}{\Delta_l, \Delta, \Delta_r \vdash \mathsf{match}\ E\ \mathsf{as}\ a_1 \otimes \cdots \otimes a_n\ \mathsf{in}\ E' : B}{\otimes_n}\text{E},n \geq 1 \end{align}$$ Again, as the label suggests, this is an elimination rule for |\otimes|.

We then need equalities for the two equations and a third equality for naturality of |\varphi|. $$\begin{gather} \dfrac{\Delta_1 \vdash E_1 : A_1 \quad \cdots \quad \Delta_n \vdash E_n : A_n \quad \Delta_l, a_1 : A_1, \dots, a_n : A_n, \Delta_r \vdash E : B}{\Delta_l, \Delta_1, \dots, \Delta_n, \Delta_r \vdash (\mathsf{match}\ E_1\otimes\cdots\otimes E_n\ \mathsf{as}\ a_1\otimes\cdots\otimes a_n\ \mathsf{in}\ E) = E[E_1/a_1, \dots, E_n/a_n] : B}{\otimes}\beta \\ \\ \dfrac{\Delta \vdash E : A_1 \otimes \cdots \otimes A_n \quad \Delta_l, a : A_1 \otimes \cdots \otimes A_n, \Delta_r \vdash E' : B}{\Delta_l, \Delta, \Delta_r \vdash E'[E/a] = (\mathsf{match}\ E\ \mathsf{as}\ a_1\otimes\cdots\otimes a_n\ \mathsf{in}\ E'[(a_1\otimes\cdots\otimes a_n)/a]) : B}{\otimes}\eta \end{gather}$$ The first equation corresponds to an introduction immediately followed by an elimination which is the form of a |\beta|-rule. The second is an elimination followed by an introduction and gives rise to an |\eta|-rule.

Since we are considering a mapping-out property, the naturality equations gives rise to what is called a commuting conversion: $$\begin{align} \dfrac{\Delta \vdash E_1 : A_1 \otimes \cdots \otimes A_n \quad \Delta_l, a_1 : A_1, \dots, a_n : A_n, \Delta_r \vdash E_2 : B \quad b : B \vdash E_3 : C}{\Delta_l, \Delta, \Delta_r \vdash E_3[(\mathsf{match}\ E_1\ \mathsf{as}\ a_1 \otimes \dots \otimes a_n\ \mathsf{in}\ E_2)/b] = (\mathsf{match}\ E_1\ \mathsf{as}\ a_1 \otimes \cdots \otimes a_n\ \mathsf{in}\ E_3[E_2/b] : C}{\otimes}\text{CC} \end{align}$$

This rule is pretty bad from the perspective of structural proof theory. If I give you terms of the forms of the left and right hand sides of the equation, it can be quite difficult and potentially ambiguous to figure out what terms |E_2| and |E_3| should be. While this particular example can’t happen in our case, imagine if |E_3| did not use the variable |b|. This technically isn’t a problem for building a derivation or verifying it as you can just require that when someone invokes this rule they must specify what all the meta-variables are. Still, this causes difficulties for proof search and normalization and proofs of meta-theorems.

The solution is straightforward enough. We simply instantiate |E_3| with a concrete term. The not-so-straightforward part is knowing which terms we need. In this case, the main rule we need (and only one if we interpret the above as covering the |n = 0| case) is the following: $$\begin{align} \dfrac{\Delta \vdash E_1 : A_1 \otimes \cdots \otimes A_m \quad \Delta_l, a_1 : A_1, \dots, a_m : A_m, \Delta_r \vdash E_2 : B_1 \otimes \cdots \otimes B_n \quad \Delta_l', b_1 : B_1, \dots, b_n : B_n, \Delta_r' \vdash E_3 : C}{\begin{align}\Delta_l', \Delta_l, \Delta, \Delta_r, \Delta_r' \vdash &\ (\mathsf{match}\ (\mathsf{match}\ E_1\ \mathsf{as}\ a_1 \otimes \cdots \otimes a_m\ \mathsf{in}\ E_2)\ \mathsf{as}\ b_1 \otimes \cdots \otimes b_n\ \mathsf{in}\ E_3) \\ = &\ (\mathsf{match}\ E_1\ \mathsf{as}\ a_1 \otimes \cdots \otimes a_n\ \mathsf{in}\ \mathsf{match}\ E_2\ \mathsf{as}\ b_1 \otimes \cdots \otimes b_m\ \mathsf{in}\ E_3) : C\end{align}}{\otimes}{\otimes}\text{CC} \end{align}$$

You can see that this rule presents no difficulty in picking out the subterms that should correspond to the meta-variables. Fortunately, we don’t need a rule for every possible top-level term the |E_3| from the first rule could be, let alone the infinite number of possible instantiations of |E_3|. Unfortunately, we do need one for each elimination rule, and so the number of commuting conversions grows roughly quadratically with the number of connectives.

The motivation for all these rules – the |\beta| and |\eta| rules as well as the commuting conversions – is ideally to have a well-defined normal form for terms that we can systematically reach. In particular, we want two terms to have the same normal form if and only if they are semantically equivalent. If we fail to have normal forms, we’d still at least want two terms to be in the same equivalence class induced by the equations if and only if they are semantically equivalent. Often, we only consider normal forms modulo the equivalence induced by commuting conversions and then endeavor to ensure that this equivalence is easily decidable.

If we were to consider a mapping-in property, e.g. |\mathcal C(\_, B) \times \mathcal C(\_, C) \cong \mathcal C(\_, B \times C)| for the categorical product, the story would be very similar except with introduction and elimination switched. We’d also find that we don’t need a commuting conversion rule, though we would need to expand any existing commuting conversions with the new eliminator. You can work through it and try to find where the naturality equation went.

See the appendix for a full and compact listing of the rules and an explicit formulation of what it means to interpret this language into a monoidal category.

Examples

Before we go on to examples of monoidal categories, let’s consider an example of a theory that we can formulate in our internal language. The main example is the theory of monoids. We assume a type |\mathsf M| and operations |\mathsf e : () \to \mathsf M| and |\mathsf m : (\mathsf M, \mathsf M) \to \mathsf M|. To this we add the equations, |\mathsf m(\mathsf e(), a) = a = \mathsf m(a, \mathsf e())| and |\mathsf m(\mathsf m(a_1, a_2), a_3) = \mathsf m(a_1, \mathsf m(a_2, a_3))|. As we can readily verify, all these equations use the free variables in order and exactly once on each side of the equations. You can contrast this to the axioms of a group and see that those axioms aren’t valid in our internal language. A model of the theory of a monoid in a monoidal category is known as a monoid object. Another theory we could formulate in our internal language is the theory of a monoid action.

For mathematicians, the archetypal example of a monoidal category is the category of vector spaces over a field |k|, e.g. the real numbers. The monoidal product is the tensor product of vector spaces with unit |k|. The multiarrows of the associated multicategory are multilinear maps. The tensor product is symmetric which corresponds to adding the structural rule of Exchange to our list of rules. In practice, this means while we’re still required to use each variable in the context of a term exactly once, we are no longer required to use them in order. A model of the theory of monoids in this monoidal category, i.e. a monoid object in this monoidal category, is exactly an associative, unital |k|-algebra. Closely related, a ring is exactly a monoid object in the monoidal category of abelian groups.

Another major class of monoidal categories is cartesian monoidal categories where the monoidal product is the categorical product. The category of vector spaces has categorical products so it forms a monoidal category with that as well. Therefore part of the data of a monoidal category is a specific choice of monoidal product as there can easily be many inequivalent monoidal products. In terms of our internal language, a cartesian monoidal product corresponds to adding all the structural rules: Exchange, Weakening, and Contraction. This means that we are free to use variables however we like, i.e. we can use them in any order, ignore them, or use them multiple times. Usually, categorical products are presented in terms of a mapping-in property as I mentioned in the previous section. For a type theory, this leads to having tupling and projections. A good exercise would be to look up (or formulate) the rules for product types and show how the rules we’ve provided, in addition to the structural rules, allows us to define projections and show that the relevant equations are satisfied. A monoid object in |\mathbf{Set}| with respect to its cartesian monoidal product is exactly what we typically mean by a monoid.

A final example is the category of endofunctors on a given category which becomes a strict, non-symmetric monoidal category with composition as the monoidal product and the identity functor as the unit. Our internal language then provides a rather different perspective on natural transformations. A natural transformation |\tau : F \circ G \to H| is now viewed as a binary operation |\tau : (F, G) \to H|. To be clear, this is still interpreted as a family of (unary) arrows |\tau_A : F(G(A)) \to H(A)|. The action on arrows (which are natural transformations in this case) of the monoidal product is horizontal composition of natural transformations. The famous example is, of course, the natural transformations |\mu : T \circ T \to T| and |\eta : Id \to T| become the operations |\mu : (T, T) \to T| and |\eta : () \to T| and the monad laws are exactly the monoid laws. That is, a monad is a monoid object in the category of endofunctors equipped with this monoidal product.

Generalizations - (Virtual) Bicategories and Actegories

The category of endofunctors example is pretty nice, but it is weird to limit to just endofunctors. We can consider natural transformations from an arbitrary composable sequence of functors whose composite has the same source and target objects as the target functor. The source of this restriction is that our objects are (endo-)functors and the monoidal product needs to work on any pair of objects in any order. The solution to this is to use the fact that a monoidal category is exactly a one-object bicategory.

We can thus readily generalize to the internal language of a bicategory. As before, it was helpful to use the notion of a multicategory, at least in passing. The analogue of a multicategory in this context is a virtual bicategory. That is, a multicategory is to a monoidal category as a virtual bicategory is to a bicategory. Basically, a virtual bicategory is like a multicategory except that each object now has a specified source and target and instead of allowing arbitrary sequences as sources for multiarrows, we only allow composable sequences. Here, a composable sequence is a sequence of objects such that the target of one object in the sequence is the source of the next. The analogue of a representable multicategory is the existence of composites in our virtual bicategory. We can say a bicategory is a virtual bicategory that has all composites.

For our internal language, the only real change we need to make is to keep track of and enforce the composability constraint. One way of doing this is to modify our type formation judgement to |\vdash_S^T A| asserting that |A| is a linear type with source |S| and target |T|. Our typing judgement is similarly decorated producing |\Delta \vdash_S^T E : B|. |\Delta| is again of the form |a_1 : A_1, \dots, a_n : A_n|, but now there is the constraint that the source of |A_n| is |S|, the target of |A_1| is |T|, and the target of |A_i| is the source of |A_{i+1}| for |i < n|. This leads to rules like $$\begin{align} \dfrac{\Delta_1 \vdash_{T_1}^{T_0} E_1 : A_1 \quad \cdots \quad \Delta_n \vdash_{T_n}^{T_{n-1}} E_n : A_n}{\Delta_1,\dots,\Delta_n \vdash_{T_n}^{T_0} E_1 * \cdots * E_n : A_1 * \cdots * A_n} \end{align}$$ but nothing needs to change at the term level (though I did rename |\otimes| to |*| as that’s less misleading). The main (strict) bicategory would be |\mathbf{Cat}| where we’d interpret the linear types as functors and the linear terms as natural transformations.

An actegory is a monoidal category, |\mathcal C|, that acts on another category, |\mathcal D|. The quickest way of describing this is to say it is a strong monoidal functor from |\mathcal C \to [\mathcal D, \mathcal D]| where the (endo-)functor category |[\mathcal D, \mathcal D]| is equipped with composition as its monoidal product. We can uncurry this functor into a bifunctor |({-})\cdot({=}) : \mathcal C \times \mathcal D \to \mathcal D| satisfying |I\cdot D \cong D| and |(C \otimes C’)\cdot D \cong C \cdot (C’ \cdot D)|. For our |T|-algebras, |\mathcal C| would be |[\mathcal D, \mathcal D]|, and the monoidal functor would just be the identity.

We’d have the rules (among others): $$\begin{gather} \dfrac{}{/ d : D \vdash d : D} \\ \\ \dfrac{\Delta \vdash E : C \quad \Delta' / d : D \vdash E' : D'}{\Delta, \Delta' / d : D \vdash E \cdot E' : C \cdot D'} \\ \\ \dfrac{\Delta / d : D \vdash E : C \cdot D' \quad c : C / d' : D' \vdash E' : D''}{\Delta / d : D \vdash \mathsf{match}\ E\ \mathsf{as}\ c \cdot d'\ \mathsf{in}\ E' : D''} \end{gather}$$

Presumably, you could formulate a notion of “virtual actegory” where the arrows consist of a list of objects from a multicategory |\mathcal C| and a final object from a category |\mathcal D| as their source and an object of |\mathcal D| as their target. You could imagine going further (or alternately) for an analogue of a (virtual) bicategory which would, again, amount to using composable sequences. (The name “biactegory” is already taken.)

Regardless, the above framework allows us to have |t : T / a : A \vdash \alpha(t, a) : A|, and we can then express our desired equations for a |T|-algebra in the form of the laws of a monoid action. One place where this notation comes in handy is in the connections between |T|-algebras and absolute colimits.

Appendix

Rules for a Monoidal Category

$$\begin{gather} \dfrac{\vdash A}{a : A \vdash a : A}\text{Ax} \qquad \dfrac{\Delta_1 \vdash E_1 : A_1 \quad \cdots \quad \Delta_n \vdash E_n : A_n \quad \Delta_l, a_1 : A_1, \dots, a_n : A_n, \Delta_r \vdash E: B}{\Delta_l, \Delta_1, \dots, \Delta_n, \Delta_r \vdash E[E_1/a_1,\dots,E_n/a_n] : B}\text{Cut} \\ \\ \dfrac{}{\vdash I}I\text{F} \qquad \dfrac{\vdash A_1 \quad \cdots \quad \vdash A_n}{\vdash A_1 \otimes \cdots \otimes A_n}{\otimes_n}\text{F}, n \geq 1 \qquad \dfrac{\mathsf A : \mathsf{Type}}{\vdash \mathsf A}\text{PrimType} \\ \\ \dfrac{\Delta_1 \vdash E_1 : A_1 \quad \cdots \quad \Delta_n \vdash E_n : A_n \quad \mathsf f : (A_1, \dots, A_n) \to B}{\Delta_1, \dots, \Delta_n \vdash \mathsf f(E_1, \dots, E_n) : B}\text{PrimTerm} \\ \\ \dfrac{}{\vdash * : I}I\text{I} \qquad \dfrac{\Delta_1 \vdash E_1 : A_1 \quad \cdots \quad \Delta_n \vdash E_n : A_n}{\Delta_1,\dots,\Delta_n \vdash E_1 \otimes \cdots \otimes E_n : A_1 \otimes \cdots \otimes A_n}{\otimes_n}\text{I} \\ \\ \dfrac{\Delta \vdash E : I \quad \Delta_l, \Delta_r \vdash E' : B}{\Delta_l, \Delta, \Delta_r \vdash \mathsf{match}\ E\ \mathsf{as}\ {*}\ \mathsf{in}\ E' : B}I\text{E} \qquad \dfrac{\Delta \vdash E : A_1 \otimes \cdots \otimes A_n \quad \Delta_l, a_1 : A_1, \dots, a_n : A_n, \Delta_r \vdash E' : B}{\Delta_l, \Delta, \Delta_r \vdash \mathsf{match}\ E\ \mathsf{as}\ a_1 \otimes \cdots \otimes a_n\ \mathsf{in}\ E' : B}{\otimes_n}\text{E},n \geq 1 \end{gather}$$

Equations

$$\begin{gather} \dfrac{\Delta \vdash E : B}{\Delta \vdash (\mathsf{match}\ {*}\ \mathsf{as}\ {*}\ \mathsf{in}\ E) = E : B}{*}\beta \qquad \dfrac{\Delta \vdash E : I \quad \Delta_l, a : I, \Delta_r \vdash E' : B}{\Delta_l, \Delta, \Delta_r \vdash E'[E/a] = (\mathsf{match}\ E\ \mathsf{as}\ {*}\ \mathsf{in}\ E'[*/a]) : B}{*}\eta \\ \\ \dfrac{\Delta_1 \vdash E_1 : A_1 \quad \cdots \quad \Delta_n \vdash E_n : A_n \quad \Delta_l, a_1 : A_1, \dots, a_n : A_n, \Delta_r \vdash E : B}{\Delta_l, \Delta_1, \dots, \Delta_n, \Delta_r \vdash (\mathsf{match}\ E_1\otimes\cdots\otimes E_n\ \mathsf{as}\ a_1\otimes\cdots\otimes a_n\ \mathsf{in}\ E) = E[E_1/a_1, \dots, E_n/a_n] : B}{\otimes_n}\beta \\ \\ \dfrac{\Delta \vdash E : A_1 \otimes \cdots \otimes A_n \quad \Delta_l, a : A_1 \otimes \cdots \otimes A_n, \Delta_r \vdash E' : B}{\Delta_l, \Delta, \Delta_r \vdash E'[E/a] = (\mathsf{match}\ E\ \mathsf{as}\ a_1\otimes\cdots\otimes a_n\ \mathsf{in}\ E'[(a_1\otimes\cdots\otimes a_n)/a]) : B}{\otimes_n}\eta \\ \\ \dfrac{\Delta \vdash E_1 : I \quad \Delta_l, \Delta_r \vdash E_2 : I \quad \Delta_l', \Delta_r' \vdash E_3 : C}{\Delta_l', \Delta_l, \Delta, \Delta_r, \Delta_r' \vdash (\mathsf{match}\ (\mathsf{match}\ E_1\ \mathsf{as}\ {*}\ \mathsf{in}\ E_2)\ \mathsf{as}\ {*}\ \mathsf{in}\ E_3) = (\mathsf{match}\ E_1\ \mathsf{as}\ {*}\ \mathsf{in}\ \mathsf{match}\ E_2\ \mathsf{as}\ {*}\ \mathsf{in}\ E_3) : C}{*}{*}\text{CC} \\ \\ \dfrac{\Delta \vdash E_1 : A_1 \otimes \cdots \otimes A_n \quad \Delta_l, a_1 : A_1, \dots, a_n : A_n, \Delta_r \vdash E_2 : I\quad \Delta_l', \Delta_r' \vdash E_3 : C}{\begin{align}\Delta_l', \Delta_l, \Delta, \Delta_r, \Delta_r' \vdash &\ (\mathsf{match}\ (\mathsf{match}\ E_1\ \mathsf{as}\ a_1 \otimes \cdots \otimes a_n\ \mathsf{in}\ E_2)\ \mathsf{as}\ {*}\ \mathsf{in}\ E_3) \\ = &\ (\mathsf{match}\ E_1\ \mathsf{as}\ a_1 \otimes \cdots \otimes a_n\ \mathsf{in}\ \mathsf{match}\ E_2\ \mathsf{as}\ {*}\ \mathsf{in}\ E_3) : C\end{align}}{\otimes_n}{*}\text{CC} \\ \\ \dfrac{\Delta \vdash E_1 : I \quad \Delta_l, \Delta_r \vdash E_2 : B_1 \otimes \cdots \otimes B_m \quad \Delta_l', b_1 : B_1, \dots, b_m : B_m, \Delta_r' \vdash E_3 : C}{\begin{align}\Delta_l', \Delta_l, \Delta, \Delta_r, \Delta_r' \vdash &\ (\mathsf{match}\ (\mathsf{match}\ E_1\ \mathsf{as}\ {*}\ \mathsf{in}\ E_2)\ \mathsf{as}\ b_1 \otimes \cdots \otimes b_m\ \mathsf{in}\ E_3) \\ = &\ (\mathsf{match}\ E_1\ \mathsf{as}\ {*}\ \mathsf{in}\ \mathsf{match}\ E_2\ \mathsf{as}\ b_1 \otimes \cdots \otimes b_m\ \mathsf{in}\ E_3) : C\end{align}}{*}{\otimes_m}\text{CC} \\ \\ \dfrac{\Delta \vdash E_1 : A_1 \otimes \cdots \otimes A_n \quad \Delta_l, a_1 : A_1, \dots, a_n : A_n, \Delta_r \vdash E_2 : B_1 \otimes \cdots \otimes B_m \quad \Delta_l', b_1 : B_1, \dots, b_m : B_m, \Delta_r' \vdash E_3 : C}{\begin{align}\Delta_l', \Delta_l, \Delta, \Delta_r, \Delta_r' \vdash &\ (\mathsf{match}\ (\mathsf{match}\ E_1\ \mathsf{as}\ a_1 \otimes \cdots \otimes a_n\ \mathsf{in}\ E_2)\ \mathsf{as}\ b_1 \otimes \cdots \otimes b_m\ \mathsf{in}\ E_3) \\ = &\ (\mathsf{match}\ E_1\ \mathsf{as}\ a_1 \otimes \cdots \otimes a_n\ \mathsf{in}\ \mathsf{match}\ E_2\ \mathsf{as}\ b_1 \otimes \cdots \otimes b_m\ \mathsf{in}\ E_3) : C\end{align}}{\otimes_n}{\otimes_m}\text{CC} \end{gather}$$

|\mathsf A : \mathsf{Type}| means that |\mathsf A| is a primitive type in the signature. |\mathsf f : (A_1, \dots, A_n) \to B| means that |\mathsf f| is assigned this type in the signature.

I did not write them, but the usual laws for equality (reflexivity and indiscernability of identicals) should be included.

A theory in this language is free to introduce additional linear types and linear operations.

Interpretation into a monoidal category

Write |\den{-}| for the (overloaded) interpretation function. Its value on primitive operations is left as a parameter.

Associators for the semantic |\otimes| will be omitted below. We can arbitrarily assume a particular association of monoidal products and then the relevant associators are completely determined by the input and output types. There are multiple possible expressions for those associators, but the coherence conditions of monoidal categories guarantee that they are equal.

Interpretation of Linear Types

$$\begin{align} \vdash A \implies & \den{A} \in \mathsf{Ob}(\V) \\ \\ \den{\Delta} =\, & \den{A_1}\otimes \cdots \otimes \den{A_n} \text{ where } \Delta = a_1 : A_1, \dots, a_n : A_n \\ \den{I} =\, & I \\ \den{A_1 \otimes \cdots \otimes A_n} =\, & \den{A_1}\otimes \cdots \otimes \den{A_n} \\ \end{align}$$

Interpretation of Linear Terms

$$\begin{align} \Delta \vdash E : A \implies & \den{E} \in \V(\den{\Delta}, \den{A}) \\ \\ \den{a} =\, & id_{\den{A}} \text{ where } a : A \\ \den{*} =\, & id_I \\ \den{E_1 \otimes \cdots \otimes E_n} =\, & \den{E_1} \otimes \cdots \otimes \den{E_n} \text{ where }\Delta_i \vdash E_i : A_i \\ \den{\mathsf{match}\ E\ \mathsf{as}\ {*}\ \mathsf{in}\ E'} =\, & \den{E'} \circ (id_{\den{\Delta_l}} \otimes (\lambda_{\den{\Delta_r}} \circ (\den{E} \otimes id_{\den{\Delta_r}}))) \\ \den{\mathsf{match}\ E\ \mathsf{as}\ a_1 \otimes \cdots \otimes a_n\ \mathsf{in}\ E'} =\, & \den{E'} \circ (id_{\den{\Delta_l}} \otimes \den{E} \otimes id_{\den{\Delta_r}}) \\ \den{\mathsf f(E_1, \dots, E_n)} =\, & \den{\mathsf f} \circ (\den{E_1} \otimes \cdots \otimes \den{E_n}) \text{ where }\mathsf f\text{ is an appropriately typed linear operation} \end{align}$$

where |\lambda_B : I \otimes B \cong B| is the left unitor.

It would be even simpler if we talked about the internal language of a multicategory…↩︎
Every monoidal category gives rise to a multicategory in this way.↩︎

Example Representability Argument

2020-05-04 10:21:49-07:00

Introduction

When I was young and dumb and first learning category theory, I got it into my head that arguments involving sets were not “categorical”. This is not completely crazy as the idea of category theory being an alternate “foundation” and categorical critiques of set theoretic reasoning are easy to find. As such, I tended to neglect techniques that significantly leveraged |\mathbf{Set}|, and, in particular, representability. Instead, I’d prefer arguments using universal arrows as those translate naturally and directly to 2-categories.

This was a mistake. I have long since completely reversed my position on this for both practical and theoretical reasons. Practically, representability and related techniques provide very concise definitions which lead to concise proofs which I find relatively easy to formulate and easy to verify. This is especially true when combined with the (co)end calculus. It’s also the case that for a lot of math you simply don’t need any potential generality you might gain by, e.g. being able to use an arbitrary 2-category. Theoretically, I’ve gained a better understanding of where and how category theory is (or is not) “foundational”, and a better understanding of what about set theory categorists were critiquing. Category theory as a whole does not provide an alternate foundation for mathematics as that term is usually understood by mathematicians. A branch of category theory, topos theory, does, but a topos is fairly intentionally designed to give a somewhat |\mathbf{Set}|-like experience. Similarly, many approaches to higher category theory still include a |\mathbf{Set}|-like level.

This is, of course, not to suggest ideas like universal arrows aren’t important or can’t lead to elegant proofs.

Below is a particular example of attacking a problem from the perspective of representability. I use this example more because it is a neat proof that I hadn’t seen before. There are plenty of simpler compelling examples, such as proving that right(/left) adjoints are (co)continuous, and I regularly use representability in proofs I presented on, e.g. the Math StackExchange.

The Problem

An elementary topos, |\mathcal E|, can be described as a category with finite limits and power objects. Having power objects means having a functor |\mathsf P : \mathcal E^{op} \to \mathcal E| such that |\mathcal E(A,\mathsf PB) \cong \mathsf{Sub}(A \times B)| natural in |A| and |B| where |\mathsf{Sub}| is the (contravariant) functor that takes an object to its set of subobjects. The action of |\mathsf{Sub}(f)| for an arrow |f : A \to B| is a function |m \mapsto f^*(m)| where |m| is a (representative) monomorphism and |f^*(m)| is the pullback of |f| along |m| which is a monomorphism by basic facts about pullbacks. In diagrammatic form: $$\require{amscd} \begin{CD} f^{-1}(B') @>f^\ast(m)>> A \\ @VVV @VVfV \\ B' @>>m> B \end{CD}$$

This is a characterization of |\mathsf P| via representability. We are saying that |\mathsf PB| represents the functor |\mathsf{Sub}(- \times B)| parameterized in |B|.

A well-known and basic fact about elementary toposes is that they are cartesian closed. (Indeed, finite limits + cartesian closure + a subobject classifier is a common alternative definition.) Cartesian closure can be characterized as |\mathcal E(- \times A, B) \cong \mathcal E(-,B^A)| which characterizes the exponent, |B^A|, via representability. Namely, that |B^A| represents the functor |\mathcal E(- \times A, B)| parameterized in |A|. Proving that elementary toposes are cartesian closed is not too difficult, but it is a bit fiddly. This is the example that I’m going to use.

Common Setup

All the proofs I reference rely on the following basic facts about an elementary topos.

We have the monomorphism |\top : 1 \to \mathsf P1| induced by the identity arrow |\mathsf P1 \to \mathsf P1|.

We need the lemma that |\mathcal E(A \times B,PC) \cong \mathcal E(A,\mathsf P(B \times C))|. Proof: $$\begin{align} \mathcal E(A \times B,\mathsf PC) \cong \mathsf{Sub}((A \times B) \times C) \cong \mathsf{Sub}(A \times (B \times C)) \cong \mathcal E(A,\mathsf P(B \times C))\ \square \end{align}$$

Since the arrow |\langle id_A, f\rangle : A \to A \times B| is a monomorphism for any arrow |f : A \to B|, the map |f \mapsto \langle id, f \rangle| is a map from |\mathcal E(-, B)| to |\mathsf{Sub}(- \times B)|. Using |\mathsf{Sub}(- \times B) \cong \mathcal E(-,\mathsf PB)|, we get a map |\mathcal E(-, B) \to \mathsf{Sub}(- \times B) \cong \mathcal E(-,\mathsf PB)|. By Yoneda, i.e. by evaluating it at |id|, we get the singleton map: |\{\}_B : B \to \mathsf PB|. If we can show that |\{\}| is a monomorphism, then, since |\mathsf PA \cong \mathsf P(A \times 1)|, we’ll get an arrow |\sigma : \mathsf PA \to \mathsf P1| such that

$$\begin{CD} A @>\{\}_A>> \mathsf PA \\ @VVV @VV\sigma_AV \\ 1 @>>\top> \mathsf P1 \end{CD}$$

is a pullback.

|\{\}_A| is a monomorphism because any |f, g : X \to A| gets mapped by the above to |\langle id_X, f\rangle| and |\langle id_X, g\rangle| which represent the same subobject when |\{\} \circ f = \{\} \circ g|. Therefore, there’s an isomorphism |j : X \cong X| such that |\langle id_X, f\rangle \circ j = \langle j, f \circ j\rangle = \langle id_X, g\rangle| but this means |j = id_X| and thus |f = g|.

Other Proofs

To restate the problem: given the above setup, we want to show that the elementary topos |\mathcal E| is cartesian closed.

Toposes, Triples, and Theories by Barr and Wells actually provides two proofs of this statement: Theorem 4.1 of Chapter 5. The first proof is in exactly the representability-style approach I’m advocating, but it relies on earlier results about how a topos relates to its slice categories. The second proof is more concrete and direct, but it also involves |\mathsf P\mathsf P\mathsf P\mathsf P B|…

Sheaves in Geometry and Logic by Mac Lane and Moerdijk also has this result as Theorem 1 of section IV.2 “The Construction of Exponentials”. The proof starts on page 167 and finishes on 169. The idea is to take the set theoretic construction of functions via their graphs and interpret that into topos concepts. This proof involves a decent amount of equational reasoning (either via diagrams or via generalized elements).

Todd Trimble’s argument is very similar to the following argument.

Dan’s Proof via Representability

Contrast these to Dan Doel’s proof¹ using representability, which proceeds as follows. (Any mistakes are mine.)

Start with the pullback induced by the singleton map.

$$\begin{CD} B @>\{\}_B>> \mathsf PB \\ @VVV @VV\sigma_BV \\ 1 @>>\top> \mathsf P1 \end{CD}$$

Apply the functor |\mathcal E(= \times A,-)| which preserves the fact that it is a pullback via continuity. $$\begin{CD} \mathcal{E}(-\times A,B) @>>> \mathcal{E}(- \times A,\mathsf PB) \\ @VVV @VVV \\ \mathcal{E}(- \times A,1) @>>> \mathcal{E}(- \times A,\mathsf P1) \end{CD}$$

Note:

|\mathcal E(- \times A,1) \cong 1 \cong \mathcal E(-,1)| (by continuity)
|\mathcal E(- \times A,\mathsf PB) \cong \mathcal E(-,\mathsf P(A \times B))| (by definition of power objects)
|\mathcal E(- \times A,\mathsf P1) \cong \mathcal E(-,\mathsf PA)| (because |A \times 1 \cong A|)

This means that the above pullback is also the pullback of $$\begin{CD} \mathcal E(-\times A,B) @>>> \mathcal E(-,\mathsf P(A\times B)) \\ @VVV @VVV \\ \mathcal E(-,1) @>>> \mathcal E(-,\mathsf PA) \end{CD}$$

Since |\mathcal E| has all finite limits, it has the following pullback

$$\begin{CD} X @>>> \mathsf P(A \times B) \\ @VVV @VVV \\ 1 @>>> \mathsf PA \end{CD}$$

where the bottom and right arrows are induced by the corresponding arrows of the previous diagram by Yoneda. Applying |\mathcal E(=,-)| to this diagram gives another pullback diagram by continuity

$$\begin{CD} \mathcal E(-,X) @>>> \mathcal E(-,\mathsf P(A\times B)) \\ @VVV @VVV \\ \mathcal E(-,1) @>>> \mathcal E(-,\mathsf PA) \end{CD}$$

Sent to me almost exactly three years ago.↩︎