The most common way I’ve seen the statement “|A| overlaps |B|” formalized is |A \cap B \neq \varnothing|. To a constructivist, this definition isn’t very satisfying. In particular, this definition of overlaps does not allow us to constructively conclude that there exists an element contained in both |A| and |B|. That is, |A \cap B \neq \varnothing| does not imply |\exists x. x \in A \land x \in B| constructively.

As is usually the case, even if you are not philosophically a constructivist, taking a constructivist perspective can often lead to better definitions and easier to see connections. In this case, constructivism suggests the more positive statement |\exists x. x \in A \land x \in B| be the definition of “overlaps”. However, given that we now have two (constructively) non-equivalent definitions, it is better to introduce notation to abstract from the particular definition. In many cases, it makes sense to have a primitive notion of “overlaps”. Here I will use the notation |A \between B| which is the most common option I’ve seen.

We can more compactly write the quantifier-based definition as |\exists x \in A.x \in B| using a common set-theoretic abbreviation. This presentation suggests a perhaps surprising connection. If we swap the quantifier, we get |\forall x\in A.x \in B| which is commonly abbreviated |A \subseteq B|. This leads to a duality between |\subseteq| and |\between|, particularly in topological contexts. In particular, if we pick a containing set |X|, then |\neg(U \between A) \iff U \subseteq A^c| where the complement is relative to |X|, and |A| is assumed to be a subset of |X|. This is a De Morgan-like duality.

If we want to characterize these operations via an adjunction, or, more precisely, a Galois connection, we have a slight awkwardness arising from |\subseteq| and |\between| being binary predicates on sets. So, as a first step we’ll identify sets with predicates via, for a set |A|, |\underline A(x) \equiv x \in A|. In terms of predicates, the adjunctions we want are just a special case of the adjunctions characterizing the quantifiers.

\[\underline U(x) \land P \to \underline A(x) \iff P \to U \subseteq A\]

\[U \between B \to Q \iff \underline B(x) \to (\underline U(x) \to Q)\]

What we actually want is a formula of the form |U \between B \to Q \iff B \subseteq (\dots)|. To do this, we need an operation that will allow us to produce a set from a predicate. This is exactly what set comprehension does. For reasons that will become increasingly clear, we’ll assume that |A| and |B| are subsets of a set |X|. We will then consider quantification relative to |X|. The result we get is:

\[\{x \in U \mid P\} \subseteq A \iff \{x \in X \mid x \in U \land P\} \subseteq A \iff P \to U \subseteq A\]

\[U \between B \to Q \iff B \subseteq \{x \in X \mid x \in U \to Q\} \iff B \subseteq \{x \in U \mid \neg Q\}^c\]

The first and last equivalences require additionally assuming |U \subseteq X|. The last equivalence requires classical reasoning. You can already see motivation to limit to subsets of |X| here. First, set complementation, the |(-)^c|, only makes sense relative to some containing set. Next, if we choose |Q \equiv \top|, then the latter formulas state that *no matter what |B| is* it should be a subset of the expression that follows it. Without constraining to subsets of |X|, this would require a universal set which doesn’t exist in typical set theories.

Choosing |P| as |\top|, |Q| as |\bot|, and |B| as |A^c| leads to the familiar |\neg (U \between A^c) \iff U \subseteq A|, i.e. |U| is a subset of |A| if and only if it doesn’t overlap |A|’s complement.

Incidentally, characterizing |\subseteq| and |\between| in terms of Galois connections, i.e. adjunctions, immediately gives us some properties for free via continuity. We have |U \subseteq \bigcap_{i \in I}A_i \iff \forall i\in I.U \subseteq A_i| and |U \between \bigcup_{i \in I}A_i \iff \exists i \in I.U \between A_i|. This is relative to a containing set |X|, so |\bigcap_{i \in \varnothing}A_i = X|, and |U| and each |A_i| are assumed to be subsets of |X|.

Below I’ll perform a categorical analysis of the situation. I’ll mostly be using categorical notation and perspectives to manipulate normal sets. That said, almost all of what I say will be able to be generalized immediately just by reinterpreting the symbols.

To make things a bit cleaner in the future, and to make it easier to apply these ideas beyond sets, I’ll introduce the concept of a Heyting algebra. A Heyting algebra is a partially ordered set |H| satisfying the following:

- |H| has two elements called |\top| and |\bot| satisfying for all |x| in |H|, |\bot \leq x \leq \top|.
- We have operations |\land| and |\lor| satisfying for all |x|, |y|, |z| in |H|, |x \leq y \land z| if and only |x \leq y| and |x \leq z|, and similarly for |\lor|, |x \lor y \leq z| if and only |x \leq z| and |y \leq z|.
- We have an operation |\to| satisfying for all |x|, |y|, and |z| in |H|, |x \land y \leq z| if and only if |x \leq y \to z|.

For those familiar with category theory, you might recognize this as simply the decategorification of the notion of a bicartesian closed category. We can define the **pseudo-complement**, |\neg x \equiv x \to \bot|.

Any Boolean algebra is an example of a Heyting algebra where we can define |x \to y| via |\neg x \lor y| where here |\neg| is taken as primitive. In particular, subsets of a given set ordered by inclusion form a Boolean algebra, and thus a Heyting algebra. The |\to| operation can also be characterized by |x \leq y \iff (x \to y) = \top|. This lets us immediately see that for subsets of |X|, |(A \to B) = \{x \in X \mid x \in A \to x \in B\}|. All this can be generalized to the subobjects in any Heyting category.

As the notation suggests, intuitionistic logic (and thus classical logic) is another example of a Heyting algebra.

We’ll write |\mathsf{Sub}(X)| for the partially ordered set of subsets of |X| ordered by inclusion. As mentioned above, this is (classically) a Boolean algebra and thus a Heyting algebra. Any function |f : X \to Y| gives a monotonic function |f^* : \mathsf{Sub}(Y) \to \mathsf{Sub}(X)|. Note the swap. |f^*(U) \equiv f^{-1}(U)|. (Alternatively, if we think of subsets in terms of characteristic functions, |f^*(U) \equiv U \circ f|.) Earlier, we needed a way to turn predicates into sets. In this case, we’ll go the other way and identify truth values with subsets of |1| where |1| stands for an arbitrary singleton set. That is, |\mathsf{Sub}(1)| is the poset of truth values. |1| being the terminal object of |\mathbf{Set}| induces the (unique) function |!_U : U \to 1| for any set |U|. This leads to the important monotonic function |!_U^* : \mathsf{Sub}(1) \to \mathsf{Sub}(U)|. This can be described as |!_U^*(P) = \{x \in U \mid P\}|. Note, |P| cannot contain |x| as a free variable. In particular |!_U^*(\bot) = \varnothing| and |!_U^*(\top) = U|. This monotonic function has left and right adjoints:

\[\exists_U \dashv {!_U^*} \dashv \forall_U : \mathsf{Sub}(U) \to \mathsf{Sub}(1)\]

|F \dashv G| for monotonic functions |F : X \to Y| and |G : Y \to X| means |\forall x \in X. \forall y \in Y.F(x) \leq_Y y \iff x \leq_X G(y)|.

|\exists_U(A) \equiv \exists x \in U. x \in A| and |\forall_U(A) \equiv \forall x \in U. x \in A|. It’s easily verified that each of these functions are monotonic.^{1}

It seems like we should be done. These formulas are the formulas I originally gave for |\between| and |\subseteq| in terms of quantifiers. The problem here is that these functions are only defined for subsets of |U|. This is especially bad for interpreting |U \between A| as |\exists_U(A)| as it excludes most of the interesting cases where |U| partially overlaps |A|. What we need is a way to extend |\exists_U| / |\forall_U| beyond subsets of |U|. That is, we need a suitable monotonic function |\mathsf{Sub}(X) \to \mathsf{Sub}(U)|.

Assume |U \subseteq X| and that we have an inclusion |\iota_U : U \hookrightarrow X|. Then |\iota_U^* : \mathsf{Sub}(X) \to \mathsf{Sub}(U)| and |\iota_U^*(A) = U \cap A|. This will indeed allow us to define |\subseteq| and |\between| as |U \subseteq A \equiv \forall_U(\iota_U^*(A))| and |U \between A \equiv \exists_U(\iota_U^*(A))|. We have:

\[\iota_U[-] \dashv \iota_U^* \dashv U \to \iota_U[-] : \mathsf{Sub}(U) \to \mathsf{Sub}(X)\]

Here, |\iota_U[-]| is the direct image of |\iota_U|. This doesn’t really do anything in this case except witness that if |A \subseteq U| then |A \subseteq X| because |U \subseteq X|.^{2}

We can recover the earlier adjunctions by simply using these two pairs of adjunctions. \[\begin{align} U \between B \to Q & \iff \exists_U(\iota_U^*(B)) \to Q \\ & \iff \iota_U^*(B) \subseteq {!}_U^*(Q) \\ & \iff B \subseteq U \to \iota_U[{!}_U^*(Q)] \\ & \iff B \subseteq \{x \in X \mid x \in U \to Q\} \end{align}\]

Here the |\iota_U[-]| is crucial so that we use the |\to| of |\mathsf{Sub}(X)| and not |\mathsf{Sub}(U)|.

\[\begin{align} P \to U \subseteq A & \iff P \to \forall_U(\iota_U^*(A)) \\ & \iff {!}_U^*(P) \subseteq \iota_U^*(A) \\ & \iff \iota_U[{!}_U^*(P)] \subseteq A \\ & \iff \{x \in X \mid x \in U \land P\} \subseteq A \end{align}\]

In this case, the |\iota_U[-]| is truly doing nothing because |\{x \in X \mid x \in U \land P\}| is the same as |\{x \in U \mid P\}|.

While we have |{!}_U^* \circ \exists_U \dashv {!}_U^* \circ \forall_U|, we see that the inclusion of |\iota_U^*| is what breaks the direct connection between |U \between A| and |U \subseteq A|.

As a first example, write |\mathsf{Int}A| for the **interior** of |A| and |\bar A| for the **closure** of |A| each with respect to some topology on a containing set |X|. One way to define |\mathsf{Int}A| is |x \in \mathsf{Int}A| if and only if there exists an open set containing |x| that’s a subset of |A|. Writing |\mathcal O(X)| for the set of open sets, we can express this definition in symbols: \[x \in \mathsf{Int}A \iff \exists U \in \mathcal O(X). x \in U \land U \subseteq A\] We have a “dual” notion: \[x \in \bar A \iff \forall U \in \mathcal O(X). x \in U \to U \between A\] That is, |x| is in the closure of |A| if and only if every open set containing |x| overlaps |A|.

As another example, here is a fairly unusual way of characterizing a compact subset |Q|. |Q| is **compact** if and only if |\{U \in \mathcal O(X) \mid Q \subseteq U\}| is open in |\mathcal O(X)| equipped with the Scott topology^{3}. As before, this suggests a “dual” notion characterized by |\{U \in \mathcal O(X) \mid O \between U\}| being an open subset. A set |O| satisfying this is called **overt**. This concept is never mentioned in traditional presentations of point-set topology because *every* subset is overt. However, if we don’t require that *arbitrary* unions of open sets are open (and only require finite unions to be open) as happens in synthetic topology or if we aren’t working in a classical context then overtness becomes a meaningful concept.

One benefit of the intersection-based definition of overlaps is that it is straightforward to generalize to many sets overlapping, namely |\bigcap_{i\in I} A_i \neq \varnothing|. This is also readily expressible using quantifiers as: |\exists x.\forall i \in I. x \in A_i|. As before, having an explicit “universe” set also clarifies this. So, |\exists x \in X.\forall i \in I. x \in A_i| with |\forall i \in I. A_i \subseteq X| would be better. The connection of |\between| to |\subseteq| suggests instead of this fully symmetric presentation, it may still be worthwhile to single out a set producing |\exists x \in U.\forall i \in I. x \in A_i| where |U \subseteq X|. This can be read as “there is a point in |U| that touches/meets/overlaps every |A_i|”. If desired we could notate this as |U \between \bigcap_{i \in I}A_i|. Negating and complementing the |A_i| leads to the dual notion |\forall x \in U.\exists i \in I.x \in A_i| which is equivalent to |U \subseteq \bigcup_{i \in I}A_i|. This dual notion could be read as “the |A_i| (jointly) cover |U|” which is another common and important concept in mathematics.

Ultimately, the concept of two (or more) sets overlapping comes up quite often. The usual circumlocution, |A \cap B \neq \varnothing|, is both notationally and conceptually clumsy. Treating overlapping as a first-class notion via notation and formulating definitions in terms of it can reveal some common and important patterns.

If one wanted to be super pedantic, I should technically write something like |\{\star \mid \exists x \in U. x \in A\}| where |1 = \{\star\}| because elements of |\mathsf{Sub}(1)| are subsets of |1|. Instead, we’ll conflate subsets of |1| and truth values.↩︎

If we think of subobjects as (equivalence classes of) monomorphisms as is typical in category theory, then because |\iota_U| is itself a monomorphism, the direct image, |\iota_U[-]|, is simply post-composition by |\iota_U|, i.e. |\iota_U \circ {-}|.↩︎

The Scott topology is the natural topology on the space of continuous functions |X \to \Sigma| where |\Sigma| is the Sierpinski space.↩︎

Complex-step differentiation is a simple and effective technique for numerically differentiating a(n analytic) function. Discussing it is a neat combination of complex analysis, numerical analysis, and ring theory. We’ll see that it is very closely connected to forward-mode automatic differentiation (FAD). For better or worse, while widely applicable, the scenarios where complex-step differentiation is the *best* solution are a bit rare. To apply complex-step differentiation, you need a version of your desired function that operates on complex numbers. If you have that, then you can apply complex-step differentiation immediately. Otherwise, you need to adapt the function to complex arguments. This can be done essentially automatically using the same techniques as automatic differentiation, but at that point you might as well use automatic differentiation. Adapting the code to complex numbers or AD takes about the same amount of effort, however, the AD version will be more efficient, more accurate, and easier to use.

Nevertheless, this serves as a simple example to illustrate several theoretical and practical ideas.

The problem we’re solving is given a function |f : \mathbb R \to \mathbb R| which is differentiable around a point |x_0|, we’d like to compute its derivative |f’| at |x_0|. In many cases, |f| is real analytic at the point |x_0| meaning |f| has a Taylor series which converges to |f| in some open interval containing |x_0|.

The most obvious way of numerically differentiating |f| is to approximate the limit in the definition of the derivative, |f’(x) = \lim_{h\to 0} [f(x + h) - f(x)] / h|, by simply choosing a small value for |h| rather than taking the limit. When |f| is real analytic at |x|, we can analyze the quality of this approximation by expanding |f(x + h)| in a Taylor series at |x|. This produces |[f(x + h) - f(x)]/h = f’(x) + O(h)|. A slight tweak produces a better result with the same number of evaluations of |f|. Specifically, the Taylor series of |f(x + h) - f(x - h)| at |x| is equal to the odd part the Taylor series of |f(x + h)| at |x|. This leads to the **Central Differences** formula:

$$f'(x) + O(h^2) = \frac{f(x + h) - f(x - h)}{2h}$$

The following interactively illustrates this using the function |f(x) = x^{9/2}| evaluated at |x_0 =| . The correct answer to |17| digits is |f’(||) {}={}|. The slider ranges from |h=10^{-2}| to |h=10^{-20}|.

|h|:

|f’(||)|:

error:

If you play with the slider using the first example, you’ll see that the error decreases until around |10^{-5}| after which it starts increasing until |10^{-15}| where it is off by more than |1|. At |10^{-16}| the estimated derivative is |0| which is, of course, completely incorrect. Even at |10^{-5}| the error is on the order of |10^{-9}| which is much higher than the double precision floating point machine epsilon of approximately |10^{-16}|.

There are two issues here. First, we have the issue that if |x_0 \neq 0|, then |x_0 + h = x_0| for sufficiently small |h|. This happens when |x_0/h| has a magnitude of around |10^{16}| or more.

The second issue here is known as catastrophic cancellation. For simplicity, let’s say |f(x)=1|. (It’s actually about |6.2| for the first example.) Let’s further say for some small value of |h|, |f(x+h) = 1.00000000000020404346|. The value we care about is the |0.00000000000020404346|, but given limited precision, we might have |f(x + h) = 1.000000000000204|, meaning we only have three digits of precision for the value we care about. Indeed, as |h| becomes smaller we’ll lose more and more precision in our desired value until we lose all precision which happens when |f(x + h) = f(x)|. It is generally a bad idea numerically to calculate a small value by subtracting two larger values for this reason.

We would not have the first issue if |x_0 = 0| as in the second and fourth examples (|f(x) = e^x|). We would not have the second issue if |f(x) = 0| as in the second and third examples (|f(x) = \sin(x)| with |x_0 = \pi|). We have neither issue in the second example of |f(x) = \sin(x)| with |x_0 = 0|. This will become important later.

We have a dilemma. For the theory, we want as small a value of |h| as possible without being zero. In practice, we start losing precision as |h| gets smaller, and generally larger values of |h| are going to be less impacted by this.

Let’s set this aside for now and look at other ways of numerically computing the derivative in the hopes that we can avoid this problem.

If we talk about functions |f : \mathbb C \to \mathbb C|, the analogue of real analyticity is holomorphicity or complex analyticity. A complex function is **holomorphic** if it satisfies the Cauchy-Riemann equations. (See the Appendix for more details about where the Cauchy-Riemann equations come from.) A complex function is **complex analytic** if it has a Taylor series which converges to the function. It can be proven that these two descriptions are equivalent, though this isn’t a trivial fact. We can also talk about functions that are holomorphic or complex analytic on an open subset of |\mathbb C| and at a point by considering an open subset around that point. The typical choice of open subset is some suitably small open disk in the complex plane about the point. (Other common domains are ellipses, infinite strips, and areas bounded by Hankel contours and variations such as sideways opening parabolas.)

A major fact about holomorphic functions is the Cauchy integral theorem. If |f| is a holomorphic function inside a (suitably nice) closed curve |\Gamma| in the complex plane, then |\oint_\Gamma f(z)\mathrm dz = 0|. Again, |\Gamma| will typically be chosen to be some circle. (Integrals like this in the complex plane are often called **contour integrals** and the curves we’re integrating along are called **contours**.)

Things get really interesting when we generalize to meromorphic functions which are complex functions that are holomorphic except at an isolated set of points. These take the form of **poles** which are points |z_0| such that |1/f(z_0) = 0|, i.e. poles are where a function goes to infinity as, e.g., |1/z| does at |0|. The generalization of Cauchy’s integral theorem is Cauchy’s Residue Theorem. *This theorem is surprising and is one of the most useful theorems in all of mathematics both theoretically and practically*.

We’ll only need a common special case of it. Let |f| be a holomorphic function, then |f(z)/(z - z_0)^n| is a meromorphic function with a single pole of order |n| at |z_0|. If |\Gamma| is a positively oriented, simple closed curve containing |z_0|, then

$$f^{(n-1)}(z_0) = \frac{(n-1)!}{2 \pi i}\oint_{\Gamma} \frac{f(z)\mathrm dz}{(z - z_0)^n}$$

In this case, |f^{(n-1)}(z_0)/(n-1)!| is the **residue** of |f(z)/(z - z_0)^n| at |z_0|. More generally, if there are multiple poles in the area bounded by |\Gamma|, then we will sum up their residues.

This formula provides us a means of calculating the |(n-1)|-st Taylor coefficient of a complex analytic function at any point. For our particular purposes, we’ll only need the |n=2| case, |f’(z_0) = \frac{1}{2 \pi i}\oint_{\Gamma} \frac{f(z)\mathrm dz}{(z - z_0)^2}|.

For the remainder of this section I want to give some examples of how Cauchy’s Residue Theorem is used both theoretically and practically. This whole article will itself be another practical example of Cauchy’s Residue Theorem. This is not exhaustive by any means.

To start illustrating some of the surprising properties of this theorem, we can take the |n=1| case which states that we can evaluate a holomorphic function at any point via |f(z_0) = \frac{1}{2 \pi i}\oint_{\Gamma} \frac{f(z)\mathrm dz}{z - z_0}| where |\Gamma| is any contour which bounds an area containing |z_0|. This leads to an interesting discreteness. Not only can we evaluate a (holomorphic) function (or any of its derivatives) at a point via the values of the function on a contour, the only significant constraint on that contour is that it bound an area containing the desired point. In other words, no matter how we deform the contour the integral is constant except when we deform the contour so as not to bound an area containing the point being evaluated, at which point the integral’s value is |0|^{1}. It may seem odd to use an integral to evaluate a function at a point, but it can be useful when there are numerical issues with evaluating the function near the desired point^{2}. In fact, these results show that if we know the values of a holomorphic function on the boundary of a given open subset of the complex plane, then we know the value of the function *everywhere*. In this sense, holomorphic functions (and analytic functions in general) are extremely rigid.

This leads to the notion of analytic continuation where we try to compute an analytic function beyond its overt domain of definition. This is the basis of most “sums of divergent series”. For example, there is the first-year calculus fact that the sum of the infinite series |\sum_{n=0}^\infty x^n| is |1/(1-x)| converging on the interval |x \in (-1,1)|. In fact, the proof of convergence only needs |\|x\| < 1| so we can readily generalize to complex |z| with |\|z\| < 1|, i.e. |z| contained in the open unit disk. However, |1/(1-z)| is a meromorphic function that is holomorphic everywhere except for |z=1|, therefore there is a *unique* analytic function defined everywhere except |z=1| that agrees with the infinite sum on the unit disk, namely |1/(1-z)| itself. Choosing |z=-1| leads to the common example of “summing a divergent series” with “|\sum_{n=0}^\infty (-1)^n = 1/2|” which really means “the value at |-1| of the unique complex analytic function which agrees with this infinite series when it converges”.

Sticking with just evaluation, applying the Cauchy Residue theorem to quadrature, i.e. numerical integration, leads to an interesting connection to a rational approximation problem. Say we want to compute |\int_{-1}^1 f(x) \mathrm dx|, we can use the Cauchy integral to evaluate |f(x)| leading to

$$\int_{-1}^1 f(x) \mathrm dx
= \int_{-1}^1 \frac{1}{2\pi i}\oint \frac{f(z)\mathrm dz}{z - x}\mathrm dx
= \frac{1}{2\pi i}\oint f(z)\int_{-1}^1 \frac{\mathrm dx}{z - x}\mathrm dz
= \frac{1}{2\pi i}\oint f(z)\log\left(\frac{z+1}{z-1}\right)\mathrm dz$$

A quadrature formula looks like |\int_{-1}^1 f(x) \mathrm dx \approx \sum_{k=1}^N w_k f(x_k)|. The sum can be written as a Cauchy integral of |\oint f(z)\sum_{k=1}^N \frac{1}{2\pi i}\frac{w_k\mathrm dz}{z - x_k}|. We thus have

$$\left|\frac{1}{2\pi i}\oint f(z)\left[\log\left(\frac{z+1}{z-1}\right) - \sum_{k=1}^N \frac{w_k}{z - x_k}\right]\mathrm dz\right|$$

as the error of the approximation. The sum is a rational function (in partial fraction form) and thus the error is minimized by points (|x_k|) and weights (|w_k|) that lead to better rational approximations of |\log((z+1)/(z-1))|^{3}.

The ability to calculate coefficients of the Taylor series of a holomorphic function is, by itself, already a valuable tool in both theory and practice. In particular, the coefficients of a generating function or a Z-transform can be computed with Cauchy integrals. This has applications in probability theory, statistics, finance, combinatorics, recurrences, differential equations, and signal processing. Indeed, when |z_0 = 0| and |\Gamma| is the unit circle, then the Cauchy integral is a component of the Fourier series of |f|. Approximating these integrals with the Trapezoid Rule (which we’ll discuss in a bit) produces the Discrete Fourier Transform.

Let |p| be a polynomial and, for simplicity, assume all its zeroes are of multiplicity one. Then |1/p(z)| is a meromorphic function that’s holomorphic everywhere except for the roots of |p|. The Cauchy integral |\frac{1}{2\pi i}\oint_{\Gamma} \frac{p’(z)\mathrm dz}{p(z)}| counts the number of roots of |p| are contained in the area bounded by |\Gamma|. If we know there is only one root of |p| within the area bounded by |\Gamma|, then we can compute that root with |\frac{1}{2\pi i}\oint_{\Gamma} \frac{z p’(z)\mathrm dz}{p(z)}|. A better approach is to use the formulas |\left(\oint \frac{z\mathrm dz}{p(z)}\right)/\left(\oint \frac{\mathrm dz}{p(z)}\right)|. Similar ideas can be used to adapt this to counting and finding multiple roots. See Numerical Algorithms based on Analytic Function Values at Roots of Unity by Austin, Kravanja, and Trefethen (2014) which is a good survey in general.

Another very common use of Cauchy’s Residue Theorem is to sum (convergent) infinite series. |\tan(\pi z)/\pi| has a zero at |z = k| for each integer |k| and a non-zero derivative at those points. In fact, the derivative is |1|. Alternatively, we could use |\sin(\pi z)/\pi| which has a zero at |z = k| for each integer |k| but has derivative |(-1)^k| at those points. Therefore, |\pi\cot(\pi z) = \pi/\tan(\pi z)| has a (first-order) pole at |z = k| for each integer |k| with residue |1|. In particular, if |f| is a holomorphic function (at least near the real axis), then the value of the Cauchy integral of |f(z)\pi\cot(\pi z)| along a Hankel contour will be |2\pi i \sum_{k=0}^\infty f(k)|. Along an infinite strip around the real axis we’d get |2 \pi i \sum_{k=-\infty}^\infty f(k)|. As an example, we can consider the famous sum, |\sum_{k=1}^\infty 1/k^2|. It can be shown that if |f| is a meromorphic function whose poles are not located at integers and |\vert zf(z)\vert| is bounded for sufficiently large |\vert z\vert|, then |\oint f(z)\pi \cot(\pi z)\mathrm dz = 0|. We thus have that |\sum_{k=-\infty}^{\infty} f(k) = -\sum_j \mathrm{Res}(f(z)\pi\cot(\pi z); z_j)| where |z_j| are the poles of |f|. In particular, |f(z) = \frac{1}{z^2 + a^2}| has (first-order) poles at |\pm ai|. This gives us simply

$$\sum_{k=-\infty}^{\infty} \frac{1}{k^2 + a^2} = -\pi\frac{\cot(\pi a i)-\cot(-\pi a i)}{2ai} = \frac{\pi}{a}\coth(\pi a)$$

where I’ve used |\coth(x) = i\cot(xi)| and the fact that |\coth| is an odd function. Exploiting the symmetry of the sum gives us

$$\sum_{k=1}^{\infty} \frac{1}{k^2 + a^2} = \frac{\pi}{2a}\coth(\pi a) - \frac{1}{2a^2}$$

By expanding |\coth| in a Laurent series, we see that the limit of the right-hand side as |a| approaches |0| is |\frac{\pi^2}{6}|. While contour integration is quite effective for coming up with analytic solutions to infinite sums, numerically integrating the contour integrals is also highly effective as illustrated in Talbot quadratures and rational approximations by Trefethen, Weideman, and Schmelzer (2006), for example.

We’ve seen in the previous section that |f’(z_0) = \frac{1}{2\pi i}\oint_{\Gamma} \frac{f(z)\mathrm dz}{(z-z_0)^2}|. This doesn’t much help us if we don’t have a way to compute the integrals. From this point forward, fix |\Gamma| as a circle of radius |h| centered on |z_0|.

Before that, let’s consider numerical integration in general. Say we want to integrate the real function |f| from |0| to |b|, i.e. we want to calculate |\int_0^b f(x)\mathrm dx|. The most obvious way to go about it is to approximate the Riemann sums that define the (Riemann) integral. This would produce a formula like |\int_0^b f(x)\mathrm dx \approx \frac{b}{N}\sum_{k=0}^{N-1} f(bk/N)| corresponding to summing the areas of rectangles whose left points are the values of |f|. As before with central differences, relatively minor tweaks will give better approximations. In particular, we get the two roughly equivalent approximations of the **Midpoint Rule** |\int_0^b f(x)\mathrm dx \approx \frac{b}{N}\sum_{k=0}^{N-1} f\left(\frac{b(k+(1/2))}{N}\right)| where we take the midpoint rather than the left or right point, and the **Trapezoid Rule** |\int_0^b f(x)\mathrm dx \approx \frac{b}{2N}\sum_{k=0}^{N-1}[f(b(k+1)/N) + f(bk/N)]| where we average the left and right Riemann sums. While both of these perform substantially better than the left/right Riemann sums, they are still rather basic quadrature rules; the error decreases as |O(1/N^2)|.

Something special happens when |f| is a periodic function. First, the Trapezoid rule reduces to |\frac{b}{N}\sum_{k=0}^{N-1} f(bk/N)|. More importantly, the Midpoint rule and the Trapezoid rule both start converging geometrically rather than quadratically. Furthermore, for the particular case we’re interested in, namely integrating analytic functions along a circle in the complex plane, these quadrature rules are optimal. Let |\zeta| be the |2N|-th root of unity. The Trapezoid rule corresponds to sum the values of |f| at the even powers of |\zeta| scaled by the radius |h| and translated by |z_0|, and the Midpoint rule corresponds to the sum of the odd powers.

We now have two parameters for approximating a Cauchy integral via the Trapezoid or Midpoint rules: the radius |h| and the number of points |N|.

Complex-Step Differentiation corresponds to approximating the Cauchy integral for the derivative using the extreme case of the Midpoint rule with |N=2| and very small radii (i.e. values of |h|). Meanwhile, Central Differences corresponds to the extreme case of using the Trapezoid rule with |N=2| and very small radii. To spell this out a bit more, we perform the substitution |z - z_0 = he^{\theta i}| which leads to |\mathrm dz = hie^{\theta i}\mathrm d\theta| and

$$\frac{1}{2\pi i}\oint_{|z - z_0| = h} \frac{f(z)\mathrm dz}{(z - z_0)^2} = \frac{1}{2 \pi h}\int_0^{2\pi} f(z_0 + he^{\theta i})e^{-\theta i}\mathrm d\theta$$

Applying the Trapezoid rule to the right hand side of this corresponds to picking |\theta = 0, \pi|, while applying the Midpoint rule corresponds to picking |\theta = \pm \pi/2|. |e^{\theta i} = \pm 1| for |\theta = 0, \pi|, and |e^{\theta i} = \pm i| for |\theta = \pm \pi/2|. For the Trapezoid rule, this leads to |f’(z_0) \approx \frac{1}{2h}[f(z_0 + h) - f(z_0 - h)]| which is Central Differences. For the Midpoint rule, this leads to |f’(z_0) \approx \frac{1}{2hi}[f(z_0 + hi) - f(z_0 - hi)]|. This is Complex-Step Differentiation when |z_0| is real.

As just calculated, **Complex-Step Differentiation** computes the derivative at the *real* number |x_0| via the formula:

$$f'(x_0) \approx \frac{1}{2hi} [f(x_0 + hi) - f(x_0 - hi)]$$

Another perspective on this formula is that it is just the Central Differences formula along the imaginary axis instead of the real axis.

When |f| is complex analytic and real-valued on real arguments, then we have |f(\overline z) = \overline{f(z)}| where |\overline z| is the complex conjugate of |z|, i.e. it maps |a + bi| to |a - bi| or |re^{\theta i}| to |re^{-\theta i}|. This leads to |f(x_0 + hi) - f(\overline{x_0 + hi}) = f(x_0 + hi) - \overline{f(x_0 + hi)} = 2i\operatorname{Im}(f(x_0 + hi))|. This lets us simplify Complex-Step Differentiation to |f’(x_0) \approx \operatorname{Im}(f(x_0 + h))/h|.

Here is the earlier interactive example but now using Complex-Step Differentiation. As |h| decreases in magnitude, the error steadily decreases until there is no error at all.

|h|:

|f’(||)|:

error:

This formula using |\operatorname{Im}| avoids catastrophic cancellation simply by not doing a subtraction. However, it turns out for real |x_0| (which is necessary to derive the simplified formula), there isn’t a problem either way. Using the first form of the Complex-Step Differentiation formula is also numerically stable. The key here is that the imaginary part of |x_0| and |f(x_0)| are both |0| and so we don’t get catastrophic cancellation for the same reason we wouldn’t get it with Central Differences if |f(x_0) = 0|. This suggests that if we wanted to evaluate |f’| at some non-zero point on the imaginary axis, Complex-Step Differentiation would perform poorly while Central Differences would perform well. Further, if we wanted to evaluate |f’| at some point not on either the real or imaginary axes, neither approach would perform well. In this case, choosing different values for |N| and the radius would be necessary^{4}.

A third perspective on Complex-Step Differentiation comes when we think about which value of |h| should we use. The smaller |f’(x_0)| is, the smaller we’d want |h| to be. Unlike Central Differences, there is little stopping us from having |h| be *very* small and values like |h=10^{-100}| are typical. In fact, around |h=10^{-155}| in double precision floating point arithmetic, |h| gets the theoretically useful property that |h^2 = 0| due to underflow. In this case, |x_0 + hi| behaves like |x_0 + \varepsilon| where |\varepsilon^2 = 0|. This is the defining property of the ring of dual numbers. Dual numbers are exactly what are used in forward-mode automatic differentiation.

The ring of dual numbers has numbers of the form |a + b\varepsilon| where |a, b \in \mathbb R|. This behaves just like the complex numbers except that instead of |i^2 = -1| we have |\varepsilon^2 = 0|. The utility of dual numbers for our purposes can be seen by expanding |f(x_0 + \varepsilon)| in a Taylor series about |x_0|. We get |f(x_0 + \varepsilon) = f(x_0) + f’(x_0)\varepsilon|. All higher power terms of the Taylor series are zero because |\varepsilon^2 = 0|. We can thus get the derivative of |f| simply by computing |f(x + \varepsilon)| and then looking at the coefficient of |\varepsilon| in the result.

In this example there is no interactivity as we are not estimating the derivative in the AD case but instead calculating it in parallel. There is no |h| parameter.

|f’(||)|:

error:

As the end of the previous section indicated, Complex-Step Differentiation approximates this (often exactly) by using |hi| as |\varepsilon|. Nevertheless, this is not ideal. Often the complex versions of a function will be more costly than their dual number counterparts. For example, |(a + bi)(c + di) = (ac - bd) + (ad + bc)i| involves four real multiplications and two additions. |(a + b\varepsilon)(c + d\varepsilon) = ac + (ad + bc)\varepsilon| involves three real multiplications and one addition on the other hand.

Using Complex Variables to Estimate Derivatives of Real Functions by Squire and Trapp (1998) is the first(?) published paper *specifically* about the idea of complex-step differentiation. It’s a three page paper and the authors are not claiming any originality but just demonstrating the effectiveness of ideas from the ’60s that the authors found to be underappreciated.

The Complex-Step Derivative Approximation by Martins, Sturdza, and Alonso (2003) does a much deeper dive into the theory behind complex-step differentiation and its connections to automatic differentiation.

You may have noticed the name “Trefethen” in many of the papers cited. Nick Trefethen and his collaborators have been doing amazing work for the past couple of decades, most notably in the Chebfun project. Looking at Trefethen’s book Approximation Theory and Approximation Practice (and lectures) recently reintroduced me to Trefethen’s work. This particular article was prompted by a footnote in the paper The Exponentially Convergent Trapezoidal Rule which I highly recommend. In fact, I highly recommend Chebfun as well as nearly all of Trefethen’s work. It is routinely compelling, interesting, and well presented.

Using the language of Geometric Calculus, we can write a very general form of the Fundamental Theorem of Calculus. Namely, \[\int_{\mathcal M} \mathrm d^m\mathbf x \nabla f(\mathbf x) = \oint_{\partial \mathcal M}\mathrm d^{m-1}\mathbf x f(\mathbf x)\] where |\mathcal M| is an |m|-dimensional manifold. Here |f| is a multivector-valued vector function. If |m=2| and |\nabla f = 0|, then this would produce a formula very similar to the Cauchy integral formula.

Writing |f(x + yi) = u(x, y) + v(x, y)i|, the Cauchy-Riemann equations are |\frac{\partial u}{\partial x} = \frac{\partial v}{\partial y}| and |\frac{\partial u}{\partial y} = -\frac{\partial v}{\partial x}|. However, |\nabla f = 0| leads to the slightly different equations |\frac{\partial u}{\partial x} = -\frac{\partial v}{\partial y}| and |\frac{\partial u}{\partial y} = \frac{\partial v}{\partial x}|.

The resolution of this discrepancy is found by recognizing that we don’t want |f| to be a vector-valued vector function but rather a spinor-valued spinor function. It is most natural to identify complex numbers with the even subalgebra of the 2D geometric algebra. If |\mathbf e_1| and |\mathbf e_2| are the two orthonormal basis vectors of the 2D space, then the pseudoscalar |I \equiv \mathbf e_1\wedge \mathbf e_2 = \mathbf e_1 \mathbf e_2| satisfies |I^2 = -1|. For the 2D case, a spinor is a multivector of the form |a + bI|.

We can generalize the vector derivative, |\nabla|, to a multivector derivative |\nabla_X| where |X| is a multivector variable by using the generic formula for the directional derivative in a linear space and then defining |\nabla_X| to be a linear combination of directional derivatives. Given any |\mathbb R|-linear space |V| and an element |v \in V|, we can define the directional derivative of |f : V \to V| in the direction |v| via |\frac{\partial f}{\partial v}(x) \equiv \frac{\mathrm d f(x + \tau v)}{\mathrm d\tau}|. In our case, we have the basis vectors |\{1, \mathbf e_1, \mathbf e_2, I\}| though we only care about the even subalgebra corresponding to the basis vectors |\{1, I\}|. Define |\partial_1 f(x) \equiv \frac{\mathrm d f(x + \tau)}{\mathrm d \tau}| and |\partial_I f(x) \equiv \frac{\mathrm d f(x + \tau I)}{\mathrm d \tau}| assuming |f| is a spinor-valued function^{5}. We can then define |\nabla_{\mathbf z} \equiv \partial_1 + I\partial_I|. We now have |\nabla_{\mathbf z} f = 0| is the equivalent to the Cauchy-Riemann equations where |f| is now a spinor-valued spinor function, i.e. a function of |\mathbf z|.

In terms of the general theory of partial differential equations, we are saying that |z^{-1}| is a Green’s function for |\nabla|. We can then understand everything that is happening here in terms of general results. In particular, it is the two-dimensional case of the results described in Multivector Functions by Hestenes.↩︎

See Numerical Algorithms based on Analytic Function Values at Roots of Unity by Austin, Kravanja, and Trefethen (2014) for an example. Also, with some minor tweaks, we can have that “point” be a matrix and these integrals can be used to calculate functions of matrices, e.g. the square root, exponent, inverse, and log of a matrix. See Computing |A^\alpha|, |\log(A)|, and Related Matrix Functions by Contour Integrals by Hale, Higham, and Trefethen (2009) for details.↩︎

See Is Gauss Quadrature Better than Clenshaw-Curtis? by Trefethen (2008) for more details.↩︎

While focused on issues that mostly come up with very high-order derivatives, e.g. |100|-th derivatives and higher, Accuracy and Stability of Computing High-Order Derivatives of Analytic Functions by Cauchy Integrals by Bornemann (2009) nevertheless has a good discussions of the concerns here.↩︎

If we allowed arbitrary multivector-valued functions, then we’d need to add a projection producing the tangential derivative.↩︎

This is part 3 in a series. See the previous part about internal languages for indexed monoidal categories upon which this part heavily depends.

In category theory, the hom-sets between two objects can often be equipped with some extra structure which is respected by identities and composition. For example, the set of group homomorphisms between two abelian groups is itself an abelian group by defining the operations pointwise. Similarly, the set of monotonic functions between two partially ordered sets (posets) is a poset again by defining the ordering pointwise. Linear functions between vector spaces form a vector space. The set of functors between small categories is a small category. Of course, the structure on the hom-sets can be different than the objects. Trivially, with the earlier examples a vector space is an abelian group, so we could say that linear functions form an abelian group instead of a vector space. Likewise groups are monoids. Less trivially, the set of relations between two sets is a partially ordered set via inclusion. There are many cases where instead of hom-sets we have hom-objects that aren’t naturally thought of as sets. For example, we can have hom-objects be non-negative (extended) real numbers from which the category laws become the laws of a generalized metric space. We can identify posets with categories who hom-objects are elements of a two element set or, even better, a two element poset with one element less than or equal to the other.

This general process is called enriching a category in some other category which is almost always called |\newcommand{\V}{\mathcal V}\V| in the generic case. We then talk about having |\V|-categories and |\V|-functors, etc. In a specific case, it will be something like |\mathbf{Ab}|-categories for an |\mathbf{Ab}|-enriched category, where |\mathbf{Ab}| is the category of abelian groups. Unsurprisingly, not just *any* category will do for |\V|. However, it turns out very little structure is needed to define a notion of |\V|-category, |\V|-functor, |\V|-natural transformation, and |\V|-profunctor. The usual “baseline” is that |\V| is a monoidal category. As mentioned in the previous post, paraphrasing Bénabou, notions of “families of objects/arrows” are ubiquitous and fundamental in category theory. It is useful for our purposes to make this structure explicit. For very little cost, this will also provide a vastly more general notion that will readily capture enriched categories, indexed categories, and categories that are simultaneously indexed and enriched, of which internal categories are an example. The tool for this is a (Grothendieck) fibration aka a fibered category or the mostly equivalent concept of an indexed category.^{1}

To that end, instead of just a monoidal category, we’ll be using indexed monoidal categories. Typically, to get an experience as much like ordinary category theory as possible, additional structure is assumed on |\V|. In particular, it is assumed to be an (indexed) cosmos which means that it is an indexed symmetric monoidally closed category with indexed coproducts preserved by |\otimes| and indexed products and fiberwise finite limits and colimits (preserved by the indexed structure). This is quite a lot more structure which I’ll introduce in later parts. In this part, I’ll make no assumptions beyond having an indexed monoidal category.

The purpose of the machinery of the previous posts is to make this section seem boring and pedestrian. Other than being a little more explicit and formal, for most of the following concepts it will look like we’re restating the usual definitions of categories, functors, and natural transformations. The main exception is profunctors which will be presented in a quite different manner, though still in a manner that is easy to connect to the usual presentation. (We will see how to recover the usual presentation in later parts.)

While I’ll start by being rather explicit about indexes and such, I will start to suppress that detail over time as most of it is inferrable. One big exception is that right from the start I’ll omit the explicit dependence of primitive terms on indexes. For example, while I’ll write |\mathsf F(\mathsf{id}) = \mathsf{id}| for the first functor law, what the syntax of the previous posts says I should be writing is |\mathsf F(s, s; \mathsf{id}(s)) = \mathsf{id}(s)|.

To start, I want to introduce two different notions of |\V|-category, small |\V|-categories and large |\V|-categories, and talk about what this distinction actually means. I will proceed afterwards with the “large” notions, e.g. |\V|-functors between large |\V|-categories, as the small case will be an easy special case.

The **theory of a small |\V|-category** consists of:

- an index type |\newcommand{\O}{\mathsf O}\O|,
- a linear type |\newcommand{\A}{\mathsf A}s, t : \O \vdash \A(t, s)|,
- a linear term |s : \O; \vdash \mathsf{id} : \A(s, s)|, and
- a linear term |s, u, t : \O; g : \A(t, u), f : \A(u, s) \vdash g \circ f : \A(t, s)|

satisfying

$$\begin{gather}
s, t : \O; f : \A(t, s) \vdash \mathsf{id} \circ f = f = f \circ \mathsf{id} : \A(t, s)\qquad \text{and} \\ \\
s, u, v, t : \O; h : \A(t, v), g : \A(v, u), f : \A(u, s) \vdash h \circ (g \circ f) = (h \circ g) \circ f : \A(t, s)
\end{gather}$$

In the notation of the previous posts, I’m saying |\O : \mathsf{IxType}|, |\A : (\O, \O) \to \mathsf{Type}|, |\mathsf{id} : (s : \O;) \to \A(s, s)|, and |\circ : (s, u, t : \O; \A(t, u), \A(u, s)) \to \A(t, s)| are primitives added to the signature of the theory. I’ll continue to use the earlier, more pointwise presentation above to describe the signature.

A **small |\V|-category** for an |\mathbf S|-indexed monoidal category |\V| is then an interpretation of this theory. That is, an object |O| of |\mathbf S| as the interpretation of |\O|, and an object |A| of |\V^{O\times O}| as the interpretation of |\A|. The interpretation of |\mathsf{id}| is an arrow |I_O \to \Delta_O^* A| of |\V^O|, where |\Delta_O : O \to O\times O| is the diagonal arrow |\langle id, id\rangle| in |\mathbf S|. The interpretation of |\circ| is an arrow |\pi_{23}^* A \otimes \pi_{12}^* A \to \pi_{13}^* A| of |\V^{O\times O \times O}| where |\pi_{ij} : X_1 \times X_2 \times X_3 \to X_i \times X_j| are the appropriate projections.

Since we can prove in the internal language that the choice of |()| for |\O|, |s, t : () \vdash I| for |\A|, |s, t : (); x : I \vdash x : I| for |\mathsf{id}|, and |s, u, t : (); f : I, g: I \vdash \mathsf{match}\ f\ \mathsf{as}\ *\ \mathsf{in}\ g : I| for |\circ| satisfies the laws of the theory of a small |\V|-category, we know we have a |\V|-category which I’ll call |\mathbb I| for any |\V|^{2}.

For |\V = \mathcal Fam(\mathbf V)|, |O| is a set of objects. |A| is an |(O\times O)|-indexed family of objects of |\mathbf V| which we can write |\{A(t,s)\}_{s,t\in O}|. The interpretation of |\mathsf{id}| is an |O|-indexed family of arrows of |\mathbf V|, |\{ id_s : I_s \to A(s, s) \}_{s\in O}|. Finally, the interpretation of |\circ| is a family of arrows of |\mathbf V|, |\{ \circ_{s,u,t} : A(t, u)\otimes A(u, s) \to A(t, s) \}_{s,u,t \in O}|. This is exactly the data of a (small) |\mathbf V|-enriched category. One example is when |\mathbf V = \mathbf{Cat}| which produces (strict) |2|-categories.

For |\V = \mathcal Self(\mathbf S)|, |O| is an object of |\mathbf S|. |A| is an arrow of |\mathbf S| into |O\times O|, i.e. an object of |\mathbf S/O\times O|. I’ll write the object part of this as |A| as well, i.e. |A : A \to O\times O|. The idea is that the two projections are the target and source of the arrow. The interpretation of |\mathsf{id}| is an arrow |ids : O \to A| such that |A \circ ids = \Delta_O|, i.e. the arrow produced by |ids| should have the same target and source. Finally, the interpretation of |\circ| is an arrow |c| from the pullback of |\pi_2 \circ A| and |\pi_1 \circ A| to |A|. The source of this arrow is the object of composable pairs of arrows. We further require |c| to produce an arrow with the appropriate target and source of the composable pair. This is exactly the data for a category internal to |\mathbf S|. An interesting case for contrast with the previous paragraph is that a category internal to |\mathbf{Cat}| is a double category.

For |\V = \mathcal Const(\mathbf V)|, the above data is exactly the data of a monoid object in |\mathbf V|. This is a formal illustration that a (|\mathbf V|-enriched) category is just an “indexed monoid”. Indeed, |\mathcal Const(\mathbf V)|-functors will be monoid homomorphisms and |\mathcal Const(\mathbf V)|-profunctors will be double-sided monoid actions. In particular, when |\mathbf V = \mathbf{Ab}|, we get rings, ring homomorphisms, and bimodules of rings. The intuitions here are the guiding ones for the construction we’re realizing.

One aspect of working in a not-necessarily-symmetric (indexed) monoidal category is the choice of the standard order of composition or diagrammatic order is not so trivial since it is not possible to even state what it means for them to be the equivalent. To be clear, this definition isn’t really taking a stance on the issue. We can interpret |\A(t, s)| as the type of arrows |s \to t| and then |\circ| will be the standard order of composition, or as the type of arrows |t \to s| and then |\circ| will be the diagrammatic order. In fact, there’s nothing in this definition that stops us from having |\A(t, s)| being the type of arrows |t \to s| while still having |\circ| be standard order composition as usual. The issue comes up only once we consider |\V|-profunctors as we will see.

A **large |\V|-category** is a model of a theory of the following form. There is

- a collection of index types |\O_x|,
- for each pair of index types |\O_x| and |\O_y|, a linear type |s : \O_x, t : \O_y \vdash \A_{yx}(t, s)|,
- for each index type |\O_x|, a linear term |s : \O_x; \vdash \mathsf{id}_x : \A_{xx}(s, s)|, and
- for each triple of index types, |\O_x|, |\O_y|, and |\O_z|, a linear term |s: \O_x, u : \O_y, t : \O_z; g : \A_{zy}(t, u), f : \A_{yx}(u, s) \vdash g \circ_{xyz} f : \A_{zx}(t, s)|

satisfying the same laws as small |\V|-categories, just with some extra subscripts. Clearly, a small |\V|-category is just a large |\V|-category where the collection of index types consists of just a single index type.

The typical way of describing the difference between small and large (|\V|-)categories would be to say something like: “By having a collection of index types in a large |\V|-category, we can have a proper class of them. In a small |\V|-category, the index type of objects is interpreted as an object in a category, and a proper class can’t be an object of a category^{3}.” However, for us, there’s a more directly relevant distinction. Namely, while we had a single theory of small |\V|-categories, there is no single theory of large |\V|-categories. Different large |\V|-categories correspond to models of (potentially) different theories. In other words, the notion of a small |\V|-category is able to be captured by our notion of theory but not the concept of a large |\V|-category. This extends to |\V|-functors, |\V|-natural transformations, and |\V|-profunctors. In the small case, we can define a single theory which captures each of these concepts, but that isn’t possible in the large case. In general, notions of “large” and “small” are about what we can internalize within the relevant object language, usually a set theory. Arguably, the only reason we speak of “size” and of proper classes being “large” is that the Axiom of Specification outright states that any subclass of a set is a set, so proper classes in **ZFC** can’t be subsets of any set. As I’ve mentioned elsewhere, you can definitely have set theories with proper classes that are contained in even finite sets, so the issue isn’t one of “bigness”.

The above discussion also explains the hand-wavy word “collection”. The collection is a collection in the meta-language in which we’re discussing/formalizing the notion of theory. When working *within* the theory of a particular large |\V|-category, all the various types and terms are just available ab initio and are independent. There is no notion of “collection of types” within the theory and nothing indicating that some types are part of a “collection” with others.

Another perspective on this distinction between large and small |\V|-categories is that small |\V|-categories have a *family* of arrows, identities, and compositions with respect to the notion of “family” represented by our internal language. If we hadn’t wanted to bother with formulating the internal language of an *indexed* monoidal category, we could have still defined the notion of |\V|-category with respect to the internal language of a (non-indexed) monoidal category. It’s just that all such |\V|-categories (except for monoid objects) would have to be large |\V|-categories. That is, the indexing and notion of “family” would be at a meta-level. Since most of the |\V|-categories of interest will be large (though, generally a special case called a |\V|-fibration which reins in the size a bit), it may seem that there was no real benefit to the indexing stuff. Where it comes in, or rather where small |\V|-categories come in, is that our notion of (co)complete means “has all (co)limits of *small* diagrams” and small diagrams are |\V|-functors from small |\V|-categories.^{4} There are several other places, e.g. the notion of presheaf, where we implicitly depend on what we mean by “small |\V|-category”. So while we won’t usually be focused on small |\V|-categories, which |\V|-categories are small impacts the structure of the whole theory.

The formulation of |\V|-functors is straightforward. As mentioned before, I’ll only present the “large” version.

Formally, we can’t formulate a theory of just a |\V|-functor, but rather we need to formulate a theory of “a pair of |\V|-categories and a |\V|-functor between them”.

A **|\V|-functor between (large) |\V|-categories |\mathcal C| and |\mathcal D|** is a model of a theory consisting of a theory of a large |\V|-category, of which |\mathcal C| is a model, and a theory of a large |\V|-category which I’ll write with primes, of which |\mathcal D| is a model, and model of the following additional data:

- for each index type |\O_x|, an index type |\O’_{F_x}| and an index term, |s : \O_x \vdash \mathsf F_x(s) : \O’_{F_x}|, and
- for each pair of index types, |\O_x| and |\O_y|, a linear term |s : \O_x, t : \O_y; f : \A_{yx}(t, s) \vdash \mathsf F_{yx}(f) : \A’_{F_yF_x}(\mathsf F_y(t), \mathsf F_x(s))|

satisfying

$$\begin{gather}
s : \O_x; \vdash \mathsf F_{xx}(\mathsf{id}_x) = \mathsf{id}'_{F_x}: \A'_{F_xF_x}(F_x(s), F_x(s))\qquad\text{and} \\
s : \O_x, u : \O_y, t : \O_z; g : \A_{zy}(t, u), f : \A_{yx}(u, s)
\vdash \mathsf F_{zx}(g \circ_{xyz} f) = F_{zy}(g) \circ'_{F_xF_yF_z} F_{yx}(f) : \A'_{F_zF_x}(F_z(t), F_x(s))
\end{gather}$$

The assignment of |\O’_{F_x}| for |\O_x| is, again, purely metatheoretical. From within the theory, all we know is that we happen to have some index types named |\O_x| and |\O’_{F_x}| and some data relating them. The fact that there is some kind of mapping of one to the other is not part of the data.

Next, I’ll define **|\V|-natural transformations** . As before, what we’re really doing is defining |\V|-natural transformations as a model of a theory of “a pair of (large) |\V|-categories with a pair of |\V|-functors between them and a |\V|-natural transformation between those”. As before, I’ll use primes to indicate the types and terms of the second of each pair of subtheories. Unlike before, I’ll only mention what is added which is:

- for each index type |\O_x|, a linear term |s : \O_x; \vdash \tau_x : \A’_{F’_xF_x}(\mathsf F’_x(s), \mathsf F_x(s))|

satisfying

$$\begin{gather}
s : \O_x, t : \O_y; f : \A_{yx}(t, s)
\vdash \mathsf F'_{yx}(f) \circ'_{F_xF'_xF'_y} \tau_x = \tau_y \circ'_{F_xF_yF'_y} \mathsf F_{yx}(f) : \A'_{F'_yF_x}(\mathsf F'_y(t), \mathsf F_x(s))
\end{gather}$$

In practice, I’ll suppress the subscripts on all but index types as the rest are inferrable. This makes the above equation the much more readable

$$\begin{gather}
s : \O_x, t : \O_y; f : \A(t, s)
\vdash \mathsf F'(f) \circ' \tau = \tau \circ' \mathsf F(f) : \A'(\mathsf F'(t), \mathsf F(s))
\end{gather}$$

Here’s where we need to depart from the usual story. In the usual story, a |\mathbf V|-enriched profunctor |\newcommand{\proarrow}{\mathrel{-\!\!\!\mapsto}} P : \mathcal C \proarrow \mathcal D| is a |\mathbf V|-enriched functor |P : \mathcal C\otimes\mathcal D^{op}\to\mathbf V| (or, often, the opposite convention is used |P : \mathcal C^{op}\otimes\mathcal D \to \mathbf V|). There are many problems with this definition in our context.

- Without symmetry, we have no definition of opposite category.
- Without symmetry, the tensor product of |\mathbf V|-enriched categories doesn’t make sense.
- |\mathbf V| is not itself a |\mathbf V|-enriched category, so it doesn’t make sense to talk about |\mathbf V|-enriched functors into it.
- Even if it was, we’d need some way of converting between arrows of |\mathbf V| as a category and arrows of |\mathbf V| as a |\mathbf V|-enriched category.
- The equation |P(g \circ f, h \circ k) = P(g, k) \circ P(f, h)| requires symmetry. (This is arguably 2 again.)

All of these problems are solved when |\mathbf V| is a symmetric monoidally closed category.

Alternatively, we can reformulate the notion of a |\V|-profunctor so that it works in our context and is equivalent to the usual one when it makes sense.^{5} To this end, at a low level a |\mathbf V|-enriched profunctor is a family of arrows

$$\begin{gather}P : \mathcal C(t, s)\otimes\mathcal D(s', t') \to [P(s, s'), P(t, t')]\end{gather}$$

which satisfies

$$\begin{gather}P(g \circ f, h \circ k)(p) = P(g, k)(P(f, h)(p))\end{gather}$$

in the internal language of a symmetric monoidally closed category among other laws. We can uncurry |P| to eliminate the need for closure, getting

$$\begin{gather}P : \mathcal C(t, s)\otimes \mathcal D(s', t')\otimes P(s, s') \to P(t, t')\end{gather}$$

satisfying

$$\begin{gather}P(g \circ f, h \circ k, p) = P(g, k, P(f, h, p))\end{gather}$$

We see that we’re always going to need to permute |f| and |h| past |k| unless we move the third argument to the second producing the nice

$$\begin{gather}P : \mathcal C(t, s)\otimes P(s, s') \otimes \mathcal D(s', t') \to P(t, t')\end{gather}$$

and the law

$$\begin{gather}P(g \circ f, p, h \circ k) = P(g, P(f, p, h), k)\end{gather}$$

which no longer requires symmetry. This is also where the order of the arguments of |\circ| drives the order of the arguments of |\V|-profunctors.

A **|\V|-profunctor**, |P : \mathcal C \proarrow \mathcal D|, is a model of the theory (containing subtheories for |\mathcal C| and |\mathcal D| etc. as in the |\V|-functor case) having:

- for each pair of index types |\O_x| and |\O’_{x’}|, a linear type |s : \O_x, t : \O’_{x’} \vdash \mathsf P_{x’x}(t, s)|, and
- for each quadruple of index types |\O_x|, |\O_y|, |\O’_{x’}|, and |\O’_{y’}|, a linear term |s : \O_x, s’ : \O’_{x’}, t : \O_y, t’ : \O’_{y’}; f : \A_{yx}(t, s), p : \mathsf P_{xx’}(s, s’), h : \A’_{x’y’}(s’, t’) \vdash \mathsf P_{yxx’y’}(f, p, h) : \mathsf P_{yy’}(t, t’)|

satisfying

$$\begin{align}
& s : \O_x, s' : \O'_{x'}; p : \mathsf P(s, s')
\vdash \mathsf P(\mathsf{id}, p, \mathsf{id}') = p : \mathsf P(s, s') \\ \\
& s : \O_x, s' : \O'_{x'}, u : \O_y, u' : \O'_{y'}, t : \O_z, t' : \O'_{z'}; \\
& g : \A(t, u), f : \A(u, s), p : \mathsf P(s, s'), h : \A'(s', u'), k : \A'(u', t') \\
\vdash\ & \mathsf P(g \circ f, p, h \circ' k) = \mathsf P(g, \mathsf P(f, p, h), k) : \mathsf P(t, t')
\end{align}$$

This can also be equivalently presented as a pair of a left and a right action satisfying bimodule laws. We’ll make the following definitions |\mathsf P_l(f, p) = \mathsf P(f, p, \mathsf {id})| and |\mathsf P_r(p, h) = \mathsf P(\mathsf{id}, p ,h)|.

A **|\V|-presheaf** on |\mathcal C| is a |\V|-profunctor |P : \mathbb I \proarrow \mathcal C|. Similarly, a **|\V|-copresheaf** on |\mathcal C| is a |\V|-profunctor |P : \mathcal C \proarrow \mathbb I|.

Of course, we have the fact that the term

$$\begin{gather}
s : \O_x, t : \O_y, s' : \O_z, t' : \O_w; h : \A(t, s), g : \A(s, s'), f : \A(s', t')
\vdash h \circ g \circ f : \A(t, t')
\end{gather}$$

witnesses the interpretation of |\A| as a |\V|-profunctor |\mathcal C \proarrow \mathcal C| for any |\V|-category, |\mathcal C|, which we’ll call the **hom |\V|-profunctor**. More generally, given a |\V|-profunctor |P : \mathcal C \proarrow \mathcal D|, and |\V|-functors |F : \mathcal C’ \to \mathcal C| and |F’ : \mathcal D’ \to \mathcal D|, we have the |\V|-profunctor |P(F, F’) : \mathcal C’ \proarrow \mathcal D’| defined as

$$\begin{gather}
s : \O_x, s' : \O'_{x'}, t : \O_y, t' : \O'_{y'}; f : \A(t, s), p : \mathsf P(\mathsf F(s), \mathsf F'(s')), f' : \A'(s', t')
\vdash \mathsf P(\mathsf F(f), p, \mathsf F'(f')) : \mathsf P(\mathsf F(t), \mathsf F'(t'))
\end{gather}$$

In particular, we have the **representable |\V|-profunctors** when |P| is the hom |\V|-profunctor and either |F| or |F’| is the identity |\V|-functor, e.g. |\mathcal C(Id, F)| or |\mathcal C(F, Id)|.

There’s a natural notion of morphism of |\V|-profunctors which we could derive either via passing the notion of natural transformation of the bifunctorial view through the same reformulations as above, or by generalizing the notion of a bimodule homomorphism. This would produce a notion like: a |\V|-natural transformation from |\alpha : P \to Q| is a |\alpha : P(t, s) \to Q(t, s)| satisfying |\alpha(P(f, p, h)) = Q(f, \alpha(p), h)|. While there’s nothing wrong with this definition, it doesn’t quite meet our needs. One way to see this is that it would be nice to have a bicategory whose |0|-cells were |\V|-categories, |1|-cells |\V|-profunctors, and |2|-cells |\V|-natural transformations as above. The problem there isn’t the |\V|-natural transformations but the |1|-cells. In particular, we don’t have composition of |\V|-profunctors. In the analogy with bimodules, we don’t have tensor products so we can’t reduce multilinear maps to linear maps; therefore, linear maps don’t suffice, and we really want a notion of multilinear maps.

So, instead of a bicategory what we’ll have is a virtual bicategory (or, more generally, a virtual double category). A virtual bicategory is to a bicategory what a multicategory is to a monoidal category, i.e. multicategories are “virtual monoidal categories”. The only difference between a virtual bicategory and a multicategory is that instead of our multimorphisms having arbitrary lists of objects as their sources, our “objects” (|1|-cells) themselves have sources and targets (|0|-cells) and our multimorphisms (|2|-cells) have *composable sequences* of |1|-cells as their sources.

A **|\V|-multimorphism** from a composable sequence of |\V|-profunctors |P_1, \dots, P_n| to the |\V|-profunctor |Q| is a model of the theory consisting of the various necessary subtheories and:

- a linear term, |s_0 : \O_{x_0}^0, \dots, s_n : \O_{x_n}^n; p_1 : \mathsf P_{x_0x_1}^1(s_0, s_1), \dots, p_n : \mathsf P_{x_{n-1}x_n}^n(s_{n-1}, s_n) \vdash \tau_{x_0\cdots x_n}(p_1, \dots, p_n) : \mathsf Q_{x_0x_n}(s_0, s_n)|

satisfying

$$\begin{align}
& t, s_0 : \O^0, \dots, s_n : \O^n;
f : \A^0(t, s_0), p_1 : \mathsf P^1(s_0, s_1), \dots, p_n : \mathsf P^n(s_{n-1}, s_n) \\
\vdash\ & \tau(\mathsf P_l^0(f, p_1), \dots, p_n) = \mathsf Q_l(f, \tau(p_1, \dots, p_n)) : \mathsf Q(t, s_n) \\ \\
& s_0 : \O^0, \dots, s_n, s : \O^n;
p_1 : \mathsf P^1(s_0, s_1), \dots, p_n : \mathsf P^n(s_{n-1}, s_n), f : \A^n(s_n, s) \\
\vdash\ & \tau(p_1, \dots, \mathsf P_r^n(p_n, f)) = \mathsf Q_r(\tau(p_1, \dots, p_n), f) : \mathsf Q(s_0, s) \\ \\
& s_0 : \O^0, \dots, s_n : \O^n; \\
& p_1 : \mathsf P^1(s_0, s_1), \dots, p_i : \mathsf P^i(s_{i-1}, s_i), f : \A^i(s_i, s_{i+1}),
p_{i+1} : \mathsf P^{i+1}(s_i, s_{i+1}), \dots, p_n : \mathsf P^n(s_{n-1}, s_n) \\
\vdash\ & \tau(p_1, \dots, \mathsf P_r^i(p_i, f), p_{i+1}, \dots, p_n) = \tau(p_1, \dots, p_i, \mathsf P_l^{i+1}(f, p_{i+1}) \dots, p_n) : \mathsf Q(s_0, s_n)
\end{align}$$

except for the |n=0| case in which case the only law is

$$\begin{gather}
t, s : \O^0; f : \A^0(t, s) \vdash \mathsf Q_l(f, \tau()) = \mathsf Q_r(\tau(), f) : \mathsf Q(t, s)
\end{gather}$$

The laws involving the action of |\mathsf Q| are called **external equivariance**, while the remaining law is called **internal equivariance**. We’ll write |\V\mathbf{Prof}(P_1, \dots, P_n; Q)| for the set of |\V|-multimorphisms from the composable sequence of |\V|-profunctors |P_1, \dots, P_n| to the |\V|-profunctor |Q|.

As with multilinear maps, we can characterize composition via a universal property. Write |Q_1\diamond\cdots\diamond Q_n| for the **composite |\V|-profunctor** (when it exists) of the composable sequence |Q_1, \dots, Q_n|. We then have for any pair of composable sequences |R_1, \dots, R_m| and |S_1, \dots, S_k| which compose with |Q_1, \dots, Q_n|,

$$\begin{gather}
\V\mathbf{Prof}(R_1,\dots, R_m, Q_1 \diamond \cdots \diamond Q_n, S_1, \dots, S_k; -)
\cong \V\mathbf{Prof}(R_1,\dots, R_m, Q_1, \dots, Q_n, S_1, \dots, S_k; -)
\end{gather}$$

where the forward direction is induced by precomposition with a |\V|-multimorphism |Q_1, \dots, Q_n \to Q_1 \diamond \cdots \diamond Q_n|. A |\V|-multimorphism with this property is called **opcartesian**. The |n=0| case is particularly important and, for a |\V|-category |\mathcal C|, produces the **unit |\V|-profunctor**, |U_\mathcal C : \mathcal C \proarrow \mathcal C| as the composite of the empty sequence. When we have all composites, |\V\mathbf{Prof}| becomes an actual bicategory rather than a virtual bicategory. |\V\mathbf{Prof}| always has all units, namely the hom |\V|-profunctors. Much like we can define the tensor product of modules by quotienting the tensor product of their underlying abelian groups by internal equivariance, we will find that we can make composites when we have enough (well-behaved) colimits^{6}.

Related to composites, we can talk about left/right closure of |\V\mathbf{Prof}|. In this case we have the natural isomorphisms:

$$\begin{gather}
\V\mathbf{Prof}(Q_1,\dots, Q_n, R; S) \cong \V\mathbf{Prof}(Q_1, \dots, Q_n; R \triangleright S) \\
\V\mathbf{Prof}(R, Q_1, \dots, Q_n; S) \cong \V\mathbf{Prof}(Q_1, \dots, Q_n;S \triangleleft R)
\end{gather}$$

Like composites, this merely characterizes these constructs; they need not exist in general. These will be important when we talk about Yoneda and (co)limits in |\V|-categories.

A |\V|-natural transformation |\alpha : F \to G : \mathcal C \to \mathcal D| is the same as |\alpha\in\V\mathbf{Prof}(;\mathcal D(G, F))|.

Just as an example, let’s prove a basic fact about categories for arbitrary |\V|-categories. This will use an informal style.

The fact will be that full and faithful functors reflect isomorphisms. Let’s go through the typical proof for the ordinary category case.

Suppose we have an natural transformation |\varphi : \mathcal D(FA, FB) \to \mathcal C(A, B)| natural in |A| and |B| such that |\varphi| is an inverse to |F|, i.e. the action of the functor |F| on arrows. If |Ff \circ Fg = id| and |Fg \circ Ff = id|, then by the naturality of |\varphi|, |\varphi(id) = \varphi(Ff \circ id \circ Fg) = f \circ \varphi(id) \circ g| and similarly with |f| and |g| switched. We now just need to show that |\varphi(id) = id| but |id = F(id)|, so |\varphi(id) = \varphi(F(id)) = id|. |\square|

Now in the internal language. We’ll start with the theory of a |\V|-functor, so we have |\O|, |\O’|, |\A|, |\A’|, and |\mathsf F|. While the previous paragraph talks about a natural transformation, we can readily see that it’s really a multimorphism. In our case, it is a |\V|-multimorphism |\varphi| from |\A’(\mathsf F, \mathsf F)| to |\A|. Before we do that though, we need to show that |\mathsf F| itself is a |\V|-multimorphism. This corresponds to the naturality of the action on arrows of |F| which we took for granted in the previous paragraph. This is quickly verified: the external equivariance equations are just the functor law for composites. The additional data we have is two linear terms |\mathsf f| and |\mathsf g| such that |\mathsf F(\mathsf f) \circ \mathsf F(\mathsf g) = \mathsf{id}| and |\mathsf F(\mathsf g) \circ \mathsf F(\mathsf f) = \mathsf{id}|. Also, |\varphi(\mathsf F(h)) = h|. The result follows through almost identically to the previous paragraph. |\varphi(\mathsf{id}) = \varphi(\mathsf F(\mathsf f) \circ \mathsf F(\mathsf g)) = \varphi(\mathsf F(\mathsf f) \circ \mathsf{id} \circ \mathsf F(\mathsf g))|, we apply external equivariance twice to get |\mathsf f \circ \varphi(\mathsf{id}) \circ \mathsf g|. The functor law for |\mathsf{id}| gives |\varphi(\mathsf{id}) = \varphi(\mathsf F(\mathsf{id})) = \mathsf{id}|. A quick glance verifies that all these equations use their free variables linearly as required. |\square|

As a warning, in the above |\mathsf f| and |\mathsf g| are not free variables but constants, i.e. primitive linear terms. Thus there is no issue with an equation like |\mathsf F(\mathsf f) \circ \mathsf F(\mathsf g) = \mathsf{id}| as both sides have no free variables.

This is a very basic result but, again, the payoff here is how boring and similar to the usual case this is. For contrast, the definition of an internal profunctor is given here. This definition is easier to connect to our notion of |\V|-presheaf, specifically a |\mathcal Self(\mathbf S)|-presheaf, than it is to the usual |\mathbf{Set}|-valued functor definition. While not hard, it would take me a bit of time to even formulate the above proposition, and a proof in terms of the explicit definitions would be hard to recognize as just the ordinary proof.

For fun, let’s figure out what the |\mathcal Const(\mathbf{Ab})| case of this result says explicitly. A |\mathcal Const(\mathbf{Ab})|-category is a ring, a |\mathcal Const(\mathbf{Ab})|-functor is a ring homomorphism, and a |\mathcal Const(\mathbf{Ab})|-profunctor is a bimodule. Let |R| and |S| be rings and |f : R \to S| be a ring homomorphism. An isomorphism in |R| viewed as a |\mathcal Const(\mathbf{Ab})|-category is just an invertible element. Every ring, |R|, is an |R|-|R|-bimodule. Given any |S|-|S|-bimodule |P|, we have an |R|-|R|-bimodule |f^*(P)| via restriction of scalars, i.e. |f^*(P)| has the same elements as |P| and for |p \in f^*(P)|, |rpr’ = f(r)pf(r’)|. In particular, |f| gives rise to a bimodule homomorphism, i.e. a linear function, |f : R \to f^*(S)| which corresponds to its action on arrows from the perspective of |f| as a |\mathcal Const(\mathbf{Ab})|-functor. If this linear transformation has an inverse, then the above result states that when |f(r)| is invertible so is |r|. So to restate this all in purely ring theoretic terms, given a ring homomorphism |f : R \to S| and an abelian group homomorphism |\varphi : S \to R| satisfying |\varphi(f(rst)) = r\varphi(f(s))t| and |\varphi(f(r)) = r|, then if |f(r)| is invertible so is |r|.

Indexed categories are equivalent to

*cloven*fibrations and, if you have the Axiom of Choice, all fibrations can be cloven. Indexed categories can be viewed as*presentations*of fibrations.↩︎This suggests that we could define a small |\V|-category |\mathcal C \otimes \mathcal D| where |\mathcal C| and |\mathcal D| are small |\V|-categories. Start formulating a definition of such a |\V|-category. You will get stuck. Where? Why? This implies that the (ordinary, or better, 2-)category of small |\V|-categories does not have a monoidal product with |\mathbb I| as unit in general.↩︎

With a good understanding of what a class is, it’s clear that it doesn’t even make sense to have a proper class be an object. In frameworks with an explicit notion of "class", this is often manifested by saying that a class that is an element of another class is a set (and thus not a proper class).↩︎

This suggests that it might be interesting to consider categories that are (co)complete with respect to this monoid notion of “small”. I don’t think I’ve ever seen a study of such categories. (Co)limits of monoids are not trivial.↩︎

This is one of the main things I like about working in weak foundations. It forces you to come up with better definitions that make it clear what is and is not important and eliminates coincidences. Of course, it also produces definitions and theorems that are inherently more general too.↩︎

This connection isn’t much of a surprise as the tensor product of modules is exactly the (small) |\mathcal Const(\mathbf{Ab})| case of this.↩︎

This is part 2 in a series. See the previous part about internal languages for (non-indexed) monoidal categories. The main application I have in mind – enriching in indexed monoidal categories – is covered in the next post.

As Jean Bénabou pointed out in Fibered Categories and the Foundations of Naive Category Theory (PDF) notions of “families of objects/arrows” are ubiquitous and fundamental in category theory. One of the more noticeable places early on is in the definition of a natural transformation as a family of arrows. However, even in the definition of category, identities and compositions are families of functions, or, in the enriched case, arrows of |\mathbf V|. From a foundational perspective, one place where this gets really in-your-face is when trying to formalize the notion of (co)completeness. It is straightforward to make a first-order theory of a finitely complete category, e.g. this one. For arbitrary products and thus limits, we need to talk about families of objects. To formalize the usual meaning of this in a first-order theory would require attaching an entire first-order theory of sets, e.g. **ZFC**, to our notion of complete category. If your goals are of a foundational nature like Bénabou’s were, then this is unsatisfactory. Instead, we can abstract out what we need of the notion of “family”. The result turns out to be equivalent to the notion of a fibration.

My motivations here are not foundational but leaving the notion of “family” entirely meta-theoretical means not being able to talk about it except in the semantics. Bénabou’s comment suggests that at the semantic level we want not just a monoidal category, but a fibration of monoidal categories^{1}. At the syntactic level, it suggests that there should be a built-in notion of “family” in our language. We accomplish both of these goals by formulating the internal language of an indexed monoidal category.

As a benefit, we can generalize to other notions of “family” than set-indexed families. We’ll clearly be able to formulate the notion of an enriched category. It’s also clear that we’ll be able to formulate the notion of an indexed category. Of course, we’ll also be able to formulate the notion of a category that is both enriched and indexed which includes the important special case of an internal category. We can also consider cases with trivial indexing which, in the unenriched case, will give us monoids, and in the |\mathbf{Ab}|-enriched case will give us rings.

Following Shulman’s Enriched indexed categories, let |\mathbf{S}| be a category with a cartesian monoidal structure, i.e. finite products. Then an |\mathbf{S}|-indexed monoidal category is simply a pseudofunctor |\newcommand{\V}{\mathcal V}\V : \mathbf{S}^{op} \to \mathbf{MonCat}|. A pseudofunctor is like a functor except that the functor laws only hold up to isomorphism, e.g. |\V(id)\cong id|. |\mathbf{MonCat}| is the |2|-category of monoidal categories, strong monoidal functors^{2}, and monoidal natural transformations. We’ll write |\V(X)| as |\V^X| and |\V(f)| as |f^*|. We’ll never have multiple relevant indexed monoidal categories so this notation will never be ambiguous. We’ll call the categories |\V^X| **fiber categories** and the functors |f^*| **reindexing functors**. The cartesian monoidal structure on |\mathbf S| becomes relevant when we want to equip the total category, |\int\V|, (computed via the Grothendieck construction in the usual way) with a monoidal structure. In particular, the tensor product of |A \in \V^X| and |B \in \V^Y| is an object |A\otimes B \in \V^{X\times Y}| calculated as |\pi_1^*(A) \otimes_{X\times Y} \pi_2^*(B)| where |\otimes_{X\times Y}| is the monoidal tensor in |\V^{X\times Y}|. The unit, |I|, is the unit |I_1 \in \V^1|.

The two main examples are: |\mathcal Fam(\mathbf V)| where |\mathbf V| is a (non-indexed) monoidal category and |\mathcal Self(\mathbf S)| where |\mathbf S| is a category with finite limits. |\mathcal Fam(\mathbf V)| is a |\mathbf{Set}|-indexed monoidal category with |\mathcal Fam(\mathbf V)^X| defined as the set of |X|-indexed families of objects of |\mathbf V|, families of arrows between them, and an index-wise monoidal product. We can identify |\mathcal Fam(\mathbf V)^X| with the functor category |[DX, \mathbf V]| where |D : \mathbf{Set} \to \mathbf{cat}| takes a set |X| to a small discrete category. Enriching in indexed monoidal category |\mathcal Fam(\mathbf V)| will be equivalent to enriching in the non-indexed monoidal category |\mathbf V|, i.e. the usual notion of enrichment in a monoidal category. |\mathcal Self(\mathbf S)| is an |\mathbf S|-indexed monoidal category and |\mathcal Self(\mathbf S)^X| is the slice category |\mathbf S/X| with its cartesian monoidal structure. |f^*| is the pullback functor. |\mathcal Self(\mathbf S)|-enriched categories are categories internal to |\mathbf S|. A third example we’ll find interesting is |\mathcal Const(\mathbf V)| for a (non-indexed) monoidal category, |\mathbf V|, which is a |\mathbf 1|-indexed monoidal category, which corresponds to an object of |\mathbf{MonCat}|, namely |\mathbf V|.

This builds on the internal language of a monoidal category described in the previous post. We’ll again have **linear types** and **linear terms** which will be interpreted into objects and arrows in the fiber categories. To indicate the dependence on the indexing, we’ll use two contexts: |\Gamma| will be an **index context** containing **index types** and **index variables**, which will be interpreted into objects and arrows of |\mathbf S|, while |\Delta|, the **linear context**, will contain linear types and linear variables as before except now linear types will be able to depend on **index terms**. So we’ll have judgements that look like:

$$\begin{gather}
\Gamma \vdash A \quad \text{and} \quad \Gamma; \Delta \vdash E : B
\end{gather}$$

The former indicates that |A| is a linear type indexed by the index variables of |\Gamma|. The latter states that |E| is a linear term of linear type |B| in the linear context |\Delta| indexed by the index variables of |\Gamma|. We’ll also have judgements for index types and index terms:

$$\begin{gather}
\vdash X : \square \quad \text{and} \quad \Gamma \vdash E : Y
\end{gather}$$

The former asserts that |X| is an index type. The latter asserts that |E| is an index term of index type |Y| in the index context |\Gamma|.

Since each fiber category is monoidal, we’ll have all the rules from before just with an extra |\Gamma| hanging around. Since our indexing category, |\mathbf S|, is also monoidal, we’ll also have copies of these rules at the level of indexes. However, since |\mathbf S| is *cartesian* monoidal, we’ll also have the structural rules of weakening, exchange, and contraction for index terms and types. To emphasize the cartesian monoidal structure of indexes, I’ll use the more traditional Cartesian product and tuple notation: |\times| and |(E_1, \dots, E_n)|. This notation allows a bit more uniformity as the |n=0| case can be notated by |()|.

The only really new rule is the rule that allows us to move linear types and terms from one index context to another, i.e. the rule that would correspond to applying a reindexing functor. I call this rule Reindex and, like Cut, it will be witnessed by substitution. Like Cut, it will also be a rule which we can eliminate. At the semantic level, this elimination corresponds to the fact that to understand the interpretation of any particular (linear) term, we can first reindex *everything*, i.e. all the interpretations of all subterms, into the same fiber category and then we can work entirely within that one fiber category. The Reindex rule is:

$$\begin{gather}
\dfrac{\Gamma \vdash E : X \quad \Gamma', x : X; a_1 : A_1, \dots, a_n : A_n \vdash E' : B}{\Gamma',\Gamma; a_1 : A_1[E/x], \dots, a_n : A_n[E/x] \vdash E'[E/x] : B[E/x]}\text{Reindex}
\end{gather}$$

By representing reindexing by syntactic substitution, we’re requiring the semantics of (linear) type and term formation operations to be respected by reindexing functors. This is exactly the right thing to do as the appropriate notion of, say, indexed coproducts, which would correspond to sum types, is coproducts in each fiber category which are preserved by reindexing functors.

Below I provide a listing of rules and equations.

None of this section is necessary for anything else.

This notion of (linear) types and terms being indexed by other types and terms is reminiscent of parametric types or dependent types. The machinery of indexed/fibered categories is also commonly used in the categorical semantics of parameterized and dependent types. However, there are important differences between those cases and our case.

In the case of parameterized types, we have types and terms that depend on other types. In this case, we have kinds, which are “types of types”, which classify types which in turn classify terms. If we try to set up an analogy to our situation, index types would correspond to kinds and index terms would correspond to types. The most natural thing to continue would be to have linear terms correspond to terms, but we start to see the problem. Linear terms are classified by linear types, but linear types are *not* index terms. They don’t even induce index terms. In the categorical semantics of parameterized types, this identification of types with (certain) expressions classified by kinds is handled by the notion of a generic object. A generic object corresponds to the kind |\mathsf{Type}| (what Haskell calls `*`

). The assumption of a generic object is a rather strong assumption and one that none of our example indexed monoidal categories support in general.

A similar issue occurs when we try to make an analogy to dependent types. The defining feature of a dependent type system is that types can depend on terms. The problem with such a potential analogy is that linear types and terms do not induce index types and terms. A nice way to model the semantics of dependent types is the notion of a comprehension category. This, however, is additional structure beyond what we are given by an indexed monoidal category. However, comprehension categories will implicitly come up later when we talk about adding |\mathbf S|-indexed (co)products. These comprehension categories will share the same index category as our indexed monoidal categories, namely |\mathbf S|, but will have different total categories. Essentially, a comprehension category shows how objects (and arrows) of a total category can be represented in the index category. We can then talk about having (co)products in a different total category with same index category with respect to those objects picked out by the comprehension category. We get dependent types in the case where the total categories are the same. (More precisely, the fibrations are the same.) Sure enough, we will see that when |\mathcal Self(\mathbf S)| has |\mathbf S|-indexed products, then |\mathbf S| is, indeed, a model of a dependent type theory. In particular, it is locally cartesian closed.

$$\begin{gather}
\dfrac{\vdash X : \square}{x : X \vdash x : X}\text{IxAx} \qquad
\dfrac{\Gamma\vdash E : X \quad \Gamma', x : X \vdash E': Y}{\Gamma',\Gamma \vdash E'[E/x] : Y}\text{IxCut}
\\ \\
\dfrac{\vdash Y : \square \quad \Gamma\vdash E : X}{\Gamma, y : Y \vdash E : X}\text{Weakening},\ y\text{ fresh} \qquad
\dfrac{\Gamma, x : X, y : Y, \Gamma' \vdash E : Z}{\Gamma, y : Y, x : X, \Gamma' \vdash E : Z}\text{Exchange} \qquad
\dfrac{\Gamma, x : X, y : Y \vdash E : Z}{\Gamma, x : X \vdash E[x/y] : Z}\text{Contraction}
\\ \\
\dfrac{\mathsf X : \mathsf{IxType}}{\vdash \mathsf X : \square}\text{PrimIxType} \qquad
\dfrac{\vdash X_1 : \square \quad \cdots \quad \vdash X_n : \square}{\vdash (X_1, \dots, X_n) : \square}{\times_n}\text{F}
\\ \\
\dfrac{\Gamma \vdash E_1 : X_1 \quad \cdots \quad \Gamma \vdash E_n : X_n \quad \mathsf F : (X_1, \dots, X_n) \to Y}{\Gamma \vdash \mathsf F(E_1, \dots, E_n) : Y}\text{PrimIxTerm}
\\ \\
\dfrac{\Gamma_1 \vdash E_1 : X_1 \quad \cdots \quad \Gamma_n \vdash E_n : X_n}{\Gamma_1,\dots,\Gamma_n \vdash (E_1, \dots, E_n) : (X_1, \dots, X_n)}{\times_n}\text{I} \qquad
\dfrac{\Gamma \vdash E : (X_1, \dots, X_n) \quad x_1 : X_1, \dots, x_n : X_n, \Gamma' \vdash E' : Y}{\Gamma, \Gamma' \vdash \mathsf{match}\ E\ \mathsf{as}\ (x_1, \dots, x_n)\ \mathsf{in}\ E' : Y}{\times_n}\text{E}
\\ \\
\dfrac{\Gamma \vdash E_1 : X_1 \quad \cdots \quad \Gamma \vdash E_n : X_n \quad \mathsf A : (X_1, \dots, X_n) \to \mathsf{Type}}{\Gamma \vdash \mathsf A(E_1, \dots, E_n)}\text{PrimType}
\\ \\
\dfrac{\Gamma \vdash A}{\Gamma; a : A \vdash a : A}\text{Ax} \qquad
\dfrac{\Gamma; \Delta_1 \vdash E_1 : A_1 \quad \cdots \quad \Gamma; \Delta_n \vdash E_n : A_n \quad \Gamma; \Delta_l, a_1 : A_1, \dots, a_n : A_n, \Delta_r \vdash E: B}{\Gamma; \Delta_l, \Delta_1, \dots, \Delta_n, \Delta_r \vdash E[E_1/a_1, \dots, E_n/a_n] : B}\text{Cut}
\\ \\
\dfrac{\Gamma \vdash E : X \quad \Gamma', x : X; a_1 : A_1, \dots, a_n : A_n \vdash E' : B}{\Gamma',\Gamma; a_1 : A_1[E/x], \dots, a_n : A_n[E/x] \vdash E'[E/x] : B[E/x]}\text{Reindex}
\\ \\
\dfrac{}{\Gamma\vdash I}I\text{F} \qquad
\dfrac{\Gamma\vdash A_1 \quad \cdots \quad \Gamma \vdash A_n}{\Gamma \vdash A_1 \otimes \cdots \otimes A_n}{\otimes_n}\text{F}, n \geq 1
\\ \\
\dfrac{\Gamma \vdash E_1 : X_1 \quad \cdots \quad \Gamma \vdash E_n : X_n \quad \Gamma; \Delta_1 \vdash E_1' : A_1 \quad \cdots \quad \Gamma; \Delta_m \vdash E_m' : A_m \quad \mathsf f : (x_1 : X_1, \dots, x_n : X_n; A_1, \dots, A_m) \to B}{\Gamma; \Delta_1, \dots, \Delta_m \vdash \mathsf f(E_1, \dots, E_n; E_1', \dots, E_m') : B}\text{PrimTerm}
\\ \\
\dfrac{}{\Gamma; \vdash * : I}I\text{I} \qquad
\dfrac{\Gamma; \Delta \vdash E : I \quad \Gamma; \Delta_l, \Delta_r \vdash E' : B}{\Gamma; \Delta_l, \Delta, \Delta_r \vdash \mathsf{match}\ E\ \mathsf{as}\ *\ \mathsf{in}\ E' : B}I\text{E}
\\ \\
\dfrac{\Gamma; \Delta_1 \vdash E_1 : A_1 \quad \cdots \quad \Gamma; \Delta_n \vdash E_n : A_n}{\Gamma; \Delta_1,\dots,\Delta_n \vdash E_1 \otimes \cdots \otimes E_n : A_1 \otimes \cdots \otimes A_n}{\otimes_n}\text{I}
\\ \\
\dfrac{\Gamma; \Delta \vdash E : A_1 \otimes \cdots \otimes A_n \quad \Gamma; \Delta_l, a_1 : A_1, \dots, a_n : A_n, \Delta_r \vdash E' : B}{\Gamma; \Delta_l, \Delta, \Delta_r \vdash \mathsf{match}\ E\ \mathsf{as}\ (a_1 \otimes \cdots \otimes a_n)\ \mathsf{in}\ E' : B}{\otimes_n}\text{E},n \geq 1
\end{gather}$$

$$\begin{gather}
\dfrac{\Gamma_1 \vdash E_1 : X_1 \quad \cdots \quad \Gamma_n \vdash E_n : X_n \qquad x_1 : X_1, \dots, x_n : X_n, \Gamma \vdash E : Y}{\Gamma_1, \dots, \Gamma_n, \Gamma \vdash (\mathsf{match}\ (E_1, \dots, E_n)\ \mathsf{as}\ (x_1, \dots, x_n)\ \mathsf{in}\ E) = E[E_1/x_1, \dots, E_n/x_n] : Y}{\times_n}\beta
\\ \\
\dfrac{\Gamma \vdash E : (X_1, \dots, X_n) \qquad \Gamma, x : (X_1, \dots, X_n) \vdash E' : B}{\Gamma \vdash E'[E/x] = \mathsf{match}\ E\ \mathsf{as}\ (x_1, \dots, x_n)\ \mathsf{in}\ E'[(x_1, \dots, x_n)/x] : B}{\times_n}\eta
\\ \\
\dfrac{\Gamma \vdash E_1 : (X_1, \dots, X_n) \qquad x_1 : X_1, \dots, x_n : X_n \vdash E_2 : Y \quad y : Y \vdash E_3 : Z}{\Gamma \vdash (\mathsf{match}\ E_1\ \mathsf{as}\ (x_1, \dots, x_n)\ \mathsf{in}\ E_3[E_2/y]) = E_3[(\mathsf{match}\ E_1\ \mathsf{as}\ (x_1, \dots, x_n)\ \mathsf{in}\ E_2)/y] : Z}{\times_n}\text{CC}
\\ \\
\dfrac{\Gamma;\vdash E : B}{\Gamma;\vdash (\mathsf{match}\ *\ \mathsf{as}\ *\ \mathsf{in}\ E) = E : B}{*}\beta \qquad
\dfrac{\Gamma; \Delta \vdash E : I \qquad \Gamma; \Delta_l, a : I, \Delta_r \vdash E' : B}{\Gamma; \Delta_l, \Delta, \Delta_r \vdash E'[E/a] = (\mathsf{match}\ E\ \mathsf{as}\ *\ \mathsf{in}\ E'[{*}/a]) : B}{*}\eta
\\ \\
\dfrac{\Gamma; \Delta_1 \vdash E_1 : A_1 \quad \cdots \quad \Gamma; \Delta_n \vdash E_n : A_n \qquad \Gamma; \Delta_l, a_1 : A_1, \dots, a_n, \Delta_r : A_n \vdash E : B}{\Gamma; \Delta_l, \Delta_1, \dots, \Delta_n, \Delta_r \vdash (\mathsf{match}\ E_1\otimes\cdots\otimes E_n\ \mathsf{as}\ a_1\otimes\cdots\otimes a_n\ \mathsf{in}\ E) = E[E_1/a_1, \dots, E_n/a_n] : B}{\otimes_n}\beta
\\ \\
\dfrac{\Gamma; \Delta \vdash E : A_1 \otimes \cdots \otimes A_n \qquad \Gamma; \Delta_l, a : A_1 \otimes \cdots \otimes A_n, \Delta_r \vdash E' : B}{\Gamma; \Delta_l, \Delta, \Delta_r \vdash E'[E/a] = \mathsf{match}\ E\ \mathsf{as}\ a_1\otimes\cdots\otimes a_n\ \mathsf{in}\ E'[(a_1\otimes\cdots\otimes a_n)/a] : B}{\otimes_n}\eta
\\ \\
\dfrac{\Gamma; \Delta \vdash E_1 : I \qquad \Gamma; \Delta_l, \Delta_r \vdash E_2 : B \qquad \Gamma; b : B \vdash E_3 : C}{\Gamma; \Delta_l, \Delta, \Delta_r \vdash (\mathsf{match}\ E_1\ \mathsf{as}\ *\ \mathsf{in}\ E_3[E_2/b]) = E_3[(\mathsf{match}\ E_1\ \mathsf{as}\ *\ \mathsf{in}\ E_2)/b] : C}{*}\text{CC}
\\ \\
\dfrac{\Gamma; \Delta \vdash E_1 : A_1 \otimes \cdots \otimes A_n \qquad \Gamma; \Delta_l, a_1 : A_1, \dots, a_n : A_n, \Delta_r \vdash E_2 : B \qquad \Gamma; b : B \vdash E_3 : C}{\Gamma; \Delta_l, \Delta, \Delta_r \vdash (\mathsf{match}\ E_1\ \mathsf{as}\ a_1 \otimes \cdots \otimes a_n\ \mathsf{in}\ E_3[E_2/b]) = E_3[(\mathsf{match}\ E_1\ \mathsf{as}\ a_1 \otimes \dots \otimes a_n\ \mathsf{in}\ E_2)/b] : C}{\otimes_n}\text{CC}
\end{gather}$$

|\mathsf X : \mathsf{IxType}| means |\mathsf X| is a primitive index type in the signature. |\mathsf A : (X_1, \dots, X_n) \to \mathsf{Type}| means that |\mathsf A| is a primitive linear type in the signature. |\mathsf F : (X_1, \dots, X_n) \to Y| and |\mathsf f : (x_1 : X_1, \dots, x_n : X_n; A_1, \dots, A_m) \to B| mean that |\mathsf F| and |\mathsf f| are assigned these types in the signature. In the latter case, it is assumed that |x_1 : X_1, \dots, x_n : X_n \vdash A_i| for |i = 1, \dots, m| and |x_1 : X_1, \dots, x_n : X_n \vdash B|. Alternatively, these assumptions could be added as additional hypotheses to the PrimTerm rule. Generally, every |x_i| will be used in some |A_j| or in |B|, though this isn’t technically required.

As before, I did not write the usual laws for equality (reflexivity and indiscernability of identicals) but they also should be included.

See the discussion in the previous part about the commuting conversion (|\text{CC}|) rules.

A theory in this language is free to introduce additional index types, operations on indexes, linear types, and linear operations.

Fix an |\mathbf S|-indexed monoidal category |\V|. Write |\newcommand{\den}[1]{[\![#1]\!]}\den{-}| for the (overloaded) interpretation function. Its value on primitive operations is left as a parameter.

Associators for the semantic |\times| and |\otimes| will be omitted below.

$$\begin{align}
\vdash X : \square \implies & \den{X} \in \mathsf{Ob}(\mathbf S) \\ \\
\den{\Gamma} = & \prod_{i=1}^n \den{X_i}\text{ where } \Gamma = x_1 : X_1, \dots, x_n : X_n \\
\den{(X_1, \dots, X_n)} = & \prod_{i=1}^n \den{X_i}
\end{align}$$

$$\begin{align}
\Gamma \vdash E : X \implies & \den{E} \in \mathbf{S}(\den{\Gamma}, \den{X}) \\ \\
\den{x_i} =\, & \pi_i \text{ where } x_1 : X_1, \dots, x_n : X_n \vdash x_i : X_i \\
\den{(E_1, \dots, E_n)} =\, & \den{E_1} \times \cdots \times \den{E_n} \\
\den{\mathsf{match}\ E\ \mathsf{as}\ (x_1, \dots, x_n)\ \mathsf{in}\ E'} =\, & \den{E'} \circ (\den{E} \times id_{\den{\Gamma'}}) \text{ where }
\Gamma' \vdash E' : Y \\
\den{\mathsf F(E_1, \dots, E_n)} =\, & \den{\mathsf F} \circ (\den{E_1} \times \cdots \times \den{E_n}) \\
& \quad \text{ where }\mathsf F\text{ is an appropriately typed index operation}
\end{align}$$

IxAx is witnessed by identity, and IxCut by composition in |\mathbf S|. Weakening is witnessed by projection. Exchange and Contraction are witnessed by expressions that can be built from projections and tupling. This is very standard.

$$\begin{align}
\Gamma \vdash A \implies & \den{A} \in \mathsf{Ob}(\V^{\den{\Gamma}}) \\ \\
\den{\Delta} =\, & \den{A_1}\otimes_{\den{\Gamma}}\cdots\otimes_{\den{\Gamma}}\den{A_n} \text{ where } \Delta = a_1 : A_1, \dots, a_n : A_n \\
\den{I} =\, & I_{\den{\Gamma}}\text{ where } \Gamma \vdash I \\
\den{A_1 \otimes \cdots \otimes A_n} =\, & \den{A_1}\otimes_{\den{\Gamma}} \cdots \otimes_{\den{\Gamma}} \den{A_n}\text{ where }
\Gamma \vdash A_i \\
\den{\mathsf A(E_1, \dots, E_n)} =\, & \langle \den{E_1}, \dots, \den{E_n}\rangle^*(\den{\mathsf A}) \\
& \quad \text{ where }\mathsf A\text{ is an appropriately typed linear type operation}
\end{align}$$

$$\begin{align}
\Gamma; \Delta \vdash E : A \implies & \den{E} \in \V^{\den{\Gamma}}(\den{\Delta}, \den{A}) \\ \\
\den{a} =\, & id_{\den{A}} \text{ where } a : A \\
\den{*} =\, & id_{I_{\den{\Gamma}}} \text{ where } \Gamma;\vdash * : I \\
\den{E_1 \otimes \cdots \otimes E_n} =\, & \den{E_1} \otimes_{\den{\Gamma}} \cdots \otimes_{\den{\Gamma}} \den{E_n} \text{ where }
\Gamma; \Delta_i \vdash E_i : A_i \\
\den{\mathsf{match}\ E\ \mathsf{as}\ {*}\ \mathsf{in}\ E'} =\, &
\den{E'} \circ (id_{\den{\Delta_l}} \otimes_{\den{\Gamma}} (\lambda_{\den{\Delta_r}} \circ (\den{E} \otimes_{\den{\Gamma}} id_{\den{\Delta_r}}))) \\
\den{\mathsf{match}\ E\ \mathsf{as}\ a_1 \otimes \cdots \otimes a_n\ \mathsf{in}\ E'} =\, &
\den{E'} \circ (id_{\den{\Delta_l}} \otimes_{\den{\Gamma}} \den{E} \otimes_{\den{\Gamma}} id_{\den{\Delta_r}}) \\
\den{\mathsf f(E_1, \dots, E_n; E_1', \dots, E_n')} =\, & \langle \den{E_1}, \dots, \den{E_n}\rangle^*(\den{\mathsf f})
\circ (\den{E_1'} \otimes_{\den{\Gamma}} \cdots \otimes_{\den{\Gamma}} \den{E_n'}) \\
& \quad \text{ where }\mathsf f\text{ is an appropriately typed linear operation}
\end{align}$$

As with the index derivations, Ax is witnessed by the identity, in this case in |\V^{\den{\Gamma}}|.

|\den{E[E_1/a_1,,E_n/a_n]} = \den{E} \circ (\den{E_1}\otimes\cdots\otimes\den{E_n})| witnesses Cut.

Roughly speaking, Reindex is witnessed by |\den{E}^*(\den{E’})|. If we were content to restrict ourselves to semantics in |\mathbf S|-indexed monoidal categories witnessed by functors, as opposed to pseudofunctors, into *strict* monoidal categories, then this would suffice. For an arbitrary |\mathbf S|-indexed monoidal category, we can’t be sure that the naive interpretation of |A[E/x][E’/y]|, i.e. |\den{E’}^*(\den{E}^*(\den{A}))|, which we’d get from two applications of the Reindex rule, is the same as the interpretation of |A[E[E’/y]/x]|, i.e. |\den{E \circ E’}^*(\den{A})|, which we’d get from IxCut followed by Reindex. On the other hand, |A[E/x][E’/y] = A[E[E’/y]/x]| is simply true syntactically by the definition of substitution (which I have not provided but is the obvious, usual thing). There are similar issues for (meta-)equations like |I[E/x] = I| and |(A_1 \otimes A_2)[E/x] = A_1[E/x] \otimes A_2[E/x]|.

The solution is that we essentially use a normal form where we eliminate the uses of Reindex. These normal form derivations will be reached by rewrites such as:

$$\begin{gather}
\dfrac{\dfrac{\mathcal D}{\Gamma' \vdash E : X} \qquad \dfrac{\dfrac{\mathcal D_1}{\Gamma, x : X; \Delta_1 \vdash E_1 : A_1} \quad
\cdots \quad \dfrac{\mathcal D_n}{\Gamma, x : X; \Delta_n \vdash E_n : A_n}}
{\Gamma, x : X; \Delta_1, \dots, \Delta_n \vdash E_1 \otimes \cdots \otimes E_n : A_1 \otimes \cdots \otimes A_n}}
{\Gamma, \Gamma'; \Delta_1[E/x], \dots, \Delta_n[E/x] \vdash E_1[E/x] \otimes \cdots \otimes E_n[E/x] : A_1[E/x] \otimes \cdots \otimes A_n[E/x]} \\
\Downarrow \\
\dfrac{\dfrac{\dfrac{\mathcal D}{\Gamma' \vdash E : X} \quad \dfrac{\mathcal D_1}{\Gamma, x : X; \Delta_1 \vdash E_1 : A_1}}
{\Gamma, \Gamma'; \Delta_1[E/x] \vdash E_1[E/x] : A_1[E/x]} \quad
\cdots \quad
\dfrac{\dfrac{\mathcal D}{\Gamma' \vdash E : X} \quad \dfrac{\mathcal D_n}{\Gamma, x : X; \Delta_n \vdash E_n : A_n}}
{\Gamma, \Gamma'; \Delta_n[E/x] \vdash E_n[E/x] : A_n[E/x]}}
{\Gamma, \Gamma'; \Delta_1[E/x], \dots, \Delta_n[E/x] \vdash E_1[E/x] \otimes \cdots \otimes E_n[E/x] : A_1[E/x] \otimes \cdots \otimes A_n[E/x]}
\end{gather}$$

Semantically, this is witnessed by the strong monoidal structure, i.e. |\den{E}^*(\den{E_1} \otimes \cdots \otimes \den{E_n}) \cong \den{E}^*(\den{E_1}) \otimes \cdots \otimes \den{E}^*(\den{E_n})|. We need such rewrites for all (linear) rules that can immediately precede Reindex in a derivation. For |I\text{I}|, |I\text{E}|, |\otimes_n\text{E}|, and, as we’ve just seen, |\otimes_n\text{I}|, these rewrites are witnessed by |\den{E}^*| being a strong monoidal functor. The rewrites for |\text{Ax}| and |\text{Cut}| are witnessed by functorality of |\den{E}^*| and also strong monoidality for Cut. Finally, two adjacent uses of Reindex become an IxCut and a Reindex and are witnessed by the pseudofunctoriality of |(\_)^*|. (While we’re normalizing, we may as well eliminate Cut and IxCut as well.)

As the previous post alludes, monoidal structure is more than we need. If we pursue the generalizations described there in this indexed context, we eventually end up at augmented virtual double categories or virtual equipment.↩︎

The terminology here is a mess. Leinster calls strong monoidal functors “weak”. “Strong” also refers to tensorial strength, and it’s quite possible to have a “strong lax monoidal functor”. (In fact, this is what applicative functors are usually described as, though a strong lax closed functor would be a more direct connection.) Or the functors we’re talking about which are not-strong strong monoidal functors…↩︎

This is the first post in a series of posts on doing enriched indexed category theory and using the notion of an internal language to make this look relatively mundane. The internal language aspects are useful for other purposes too, as will be illustrated in this post, for example. This is related to the post Category Theory, Syntactically. In particular, it can be considered half-way between the unary theories and the finite product theories described there.

First in this series – this post – covers the internal language of a monoidal category. This is fairly straightforward, but it already provides some use. For example, the category of endofunctors on a category is a strict monoidal category, and so we can take a different perspective on natural transformations. This will also motivate the notions of a (virtual) bicategory and an actegory. Throughout this post, I’ll give a fairly worked example of turning some categorical content into rules of a type-/proof-theory.

The second post will add indexing to the notion of monoidal category and introduce the very powerful and useful notion of an indexed monoidal category.

The third post will formulate the notion of categories enriched in an indexed monoidal category and give the definitions which don’t require any additional assumptions.

The fourth post will introduce the notion and internal language for an indexed cosmos. Normally, when we do enriched category theory, we want the category into which we’re enriching to not be just a monoidal category but a cosmos. This provides many additional properties. An indexed cosmos is just the analogue of that for indexed monoidal categories.

The fifth post will then formulate categorical concepts for our enriched indexed categories that require some or all of these additional properties provided by an indexed cosmos.

At some point, there will be a post on virtual double categories as they (or, even better, augmented virtual double categories) are what will really be behind the notion of enriched indexed categories we’ll define. Basically, we’ll secretly be spelling out a specific instance of the |\mathsf{Mod}| construction.

Fix a monoidal category called |\newcommand{\V}{\mathbf V}\V|.

The internal language of a monoidal category is quite simple to describe^{1}. We’ll write terms in context. The term

$$\begin{align}
a_1 : A_1, \dots, a_n : A_n \vdash E : B\end{align}$$

will represent an arrow |A_1 \otimes \cdots \otimes A_n \to B| in |\V|. The |n = 0| case will be represented by omitting the context and will correspond to an arrow from the unit, |I|. However, there’s a catch. The term |E| must use all the variables |a_1, \dots, a_n| *exactly* once and in the order that they are listed in the context. (The |\mathsf{match}| construct will make this more complicated. Ultimately, a one-dimensional syntax isn’t that well suited to this situation.) For example,

$$\begin{align}a_1 : A_1, a_2: A_2 \vdash \mathsf f(a_2, a_1) : B, \quad a : A_1 \vdash \mathsf g(a, a) : B, \quad \text{and} \quad a : A \vdash \mathsf b : B\end{align}$$

are all *in*valid. We’ll call the context the **linear context**, consisting of **linear variables** with **linear types**, which we’ll usually represent with the metavariable |\Delta|. The naming comes from the connections to ordered linear logic.

Substituting for the linear variables, written |E[E_1/a_1,\dots,E_n/a_n]|, corresponds to the composition |E \circ (E_1 \otimes \cdots \otimes E_n)|. You can work out what associativity and unit laws of composition would look like. It should be noted, though, that this is a meta-theorem. Substitution is defined in the typical, syntactic way, and we’d need to prove that for every term the interpretation of the result of substituting into that term is equal to the composition of the interpretations.

If we replaced arrows |A_1 \otimes \cdots \otimes A_n \to B| in a monoidal category with multiarrows |(A_1, \dots, A_n) \to B| in a multicategory^{2}, then we’d be done. However, monoidal categories correspond to *representable* multicategories. If we write, |\mathcal C(A_1, \dots, A_n; B)| for the set of multiarrows |(A_1, \dots, A_n) \to B| in the multicatgory |\mathcal C|, then |\mathcal C| being representable means we have

$$\begin{align}
\mathcal C(A_1, \dots, A_m, B_1, \dots, B_n, C_1, \dots, C_p; D)
\cong \mathcal C(A_1, \dots, A_m, B_1 \otimes \cdots \otimes B_n, C_1, \dots, C_p; D)
\end{align}$$

natural in |D| and multinatural in the |A_i| and |C_i|. This implies that every |n|-ary arrow is equivalent to a unary arrow.

Since it’s useful to know, I’ll go into some detail on how we derive natural-deduction-style rules from this categorical data. First, we note that the above natural isomorphism (at least when |m=0| and |p=0|) has the form of a universal property using representability and, specifically, is a mapping-out property. That is, we are saying that the functor |\mathcal C(B_1, \dots, B_n; \_)| is represented by |B_1 \otimes \cdots \otimes B_n|. Let’s call the left to right direction of the above natural isomorphism |\varphi|. While this universal property can, of course, be represented by the natural isomorphism as above, it can also be equivalently represented by a universal element. Namely, one particularly notable choice for |D| is |B_1 \otimes \cdots \otimes B_n| itself, at which point we can consider |\eta = \varphi^{-1}(id_{B_1 \otimes \cdots \otimes B_n}) : (B_1, \dots, B_n) \to B_1 \otimes \cdots \otimes B_n|. By using naturality of |\varphi^{-1}|, we can easily show that |\varphi^{-1}(f) = f \circ \eta|. To witness the fact that |\varphi^{-1}(\varphi(f)) = f|, we need |\varphi(f) \circ \eta = f|. We’ve now shown that a natural transformation |\varphi| as above and an element (in this case a multiarrow) |\eta| which satisfy |\varphi(\eta) = id| and |\varphi(f) \circ \eta = f| is equivalent to the natural isomorphism above.

To start translating this to rules, we look at |\eta| first. A direct translation would be to say we have the rule:

$$\begin{align}
\dfrac{}{a_1 : A_1, \dots, a_n : A_n \vdash \eta : A_1 \otimes \cdots \otimes A_n}
\end{align}$$

This is unnatural because it treats |\eta| like a primitive open term. This also means that to use |\eta|, we’d need to use the Cut rule (which corresponds to substitution/composition) which would stymie Cut elimination. The solution is to mix a use of Cut into the rule itself producing the term |\eta[E_1/a_1, \dots, E_n/a_n]| which I’ll write more perspicuously as |E_1 \otimes \cdots \otimes E_n|. This gives rise to the rule:

$$\begin{align}
\dfrac{\Delta_1 \vdash E_1 : A_1 \quad \cdots \quad \Delta_n \vdash E_n : A_n}{\Delta_1,\dots,\Delta_n \vdash E_1 \otimes \cdots \otimes E_n : A_1 \otimes \cdots \otimes A_n}{\otimes_n}\text{I}
\end{align}$$

Of course, we could restrict to just the |n=0| and |n=2| cases if we wanted. I’ll write the |n=0| case of |a_1\otimes\cdots\otimes a_n| as |*| in both the term and pattern cases. As the label for the rule, |\otimes_n I|, suggests, this is an introduction rule for |\otimes|.

The rule corresponding to |\varphi| takes less massaging. We’ll do the same trick of incorporating a Cut (Where?), but this makes a fairly minor difference in this case. The rule we get is:

$$\begin{align}
\dfrac{\Delta \vdash E : A_1 \otimes \cdots \otimes A_n \quad \Delta_l, a_1 : A_1, \dots, a_n : A_n, \Delta_r \vdash E' : B}{\Delta_l, \Delta, \Delta_r \vdash \mathsf{match}\ E\ \mathsf{as}\ a_1 \otimes \cdots \otimes a_n\ \mathsf{in}\ E' : B}{\otimes_n}\text{E},n \geq 1
\end{align}$$

Again, as the label suggests, this is an elimination rule for |\otimes|.

We then need equalities for the two equations and a third equality for naturality of |\varphi|.

$$\begin{gather}
\dfrac{\Delta_1 \vdash E_1 : A_1 \quad \cdots \quad \Delta_n \vdash E_n : A_n \quad \Delta_l, a_1 : A_1, \dots, a_n : A_n, \Delta_r \vdash E : B}{\Delta_l, \Delta_1, \dots, \Delta_n, \Delta_r \vdash (\mathsf{match}\ E_1\otimes\cdots\otimes E_n\ \mathsf{as}\ a_1\otimes\cdots\otimes a_n\ \mathsf{in}\ E) = E[E_1/a_1, \dots, E_n/a_n] : B}{\otimes}\beta
\\ \\
\dfrac{\Delta \vdash E : A_1 \otimes \cdots \otimes A_n \quad \Delta_l, a : A_1 \otimes \cdots \otimes A_n, \Delta_r \vdash E' : B}{\Delta_l, \Delta, \Delta_r \vdash E'[E/a] = (\mathsf{match}\ E\ \mathsf{as}\ a_1\otimes\cdots\otimes a_n\ \mathsf{in}\ E'[(a_1\otimes\cdots\otimes a_n)/a]) : B}{\otimes}\eta
\end{gather}$$

The first equation corresponds to an introduction immediately followed by an elimination which is the form of a |\beta|-rule. The second is an elimination followed by an introduction and gives rise to an |\eta|-rule.

Since we are considering a mapping-out property, the naturality equations gives rise to what is called a commuting conversion:

$$\begin{align}
\dfrac{\Delta \vdash E_1 : A_1 \otimes \cdots \otimes A_n \quad \Delta_l, a_1 : A_1, \dots, a_n : A_n, \Delta_r \vdash E_2 : B \quad b : B \vdash E_3 : C}{\Delta_l, \Delta, \Delta_r \vdash E_3[(\mathsf{match}\ E_1\ \mathsf{as}\ a_1 \otimes \dots \otimes a_n\ \mathsf{in}\ E_2)/b] = (\mathsf{match}\ E_1\ \mathsf{as}\ a_1 \otimes \cdots \otimes a_n\ \mathsf{in}\ E_3[E_2/b] : C}{\otimes}\text{CC}
\end{align}$$

This rule is pretty bad from the perspective of structural proof theory. If I give you terms of the forms of the left and right hand sides of the equation, it can be quite difficult and potentially ambiguous to figure out what terms |E_2| and |E_3| should be. While this particular example can’t happen in our case, imagine if |E_3| did not use the variable |b|. This technically isn’t a problem for building a derivation or verifying it as you can just require that when someone invokes this rule they must *specify* what all the meta-variables are. Still, this causes difficulties for proof search and normalization and proofs of meta-theorems.

The solution is straightforward enough. We simply instantiate |E_3| with a concrete term. The not-so-straightforward part is knowing which terms we need. In this case, the main rule we need (and only one if we interpret the above as covering the |n = 0| case) is the following:

$$\begin{align}
\dfrac{\Delta \vdash E_1 : A_1 \otimes \cdots \otimes A_m \quad \Delta_l, a_1 : A_1, \dots, a_m : A_m, \Delta_r \vdash E_2 : B_1 \otimes \cdots \otimes B_n \quad \Delta_l', b_1 : B_1, \dots, b_n : B_n, \Delta_r' \vdash E_3 : C}{\Delta_l', \Delta_l, \Delta, \Delta_r, \Delta_r' \vdash (\mathsf{match}\ (\mathsf{match}\ E_1\ \mathsf{as}\ a_1 \otimes \cdots \otimes a_m\ \mathsf{in}\ E_2)\ \mathsf{as}\ b_1 \otimes \cdots \otimes b_n\ \mathsf{in}\ E_3) \\ \qquad \qquad \qquad \quad \, = (\mathsf{match}\ E_1\ \mathsf{as}\ a_1 \otimes \cdots \otimes a_n\ \mathsf{in}\ \mathsf{match}\ E_2\ \mathsf{as}\ b_1 \otimes \cdots \otimes b_m\ \mathsf{in}\ E_3) : C}{\otimes}{\otimes}\text{CC}
\end{align}$$

You can see that this rule presents no difficulty in picking out the subterms that should correspond to the meta-variables. Fortunately, we don’t need a rule for every possible top-level term the |E_3| from the first rule could be, let alone the infinite number of possible instantiations of |E_3|. Unfortunately, we *do* need one for each elimination rule, and so the number of commuting conversions grows roughly quadratically with the number of connectives.

The motivation for all these rules – the |\beta| and |\eta| rules as well as the commuting conversions – is ideally to have a well-defined normal form for terms that we can systematically reach. In particular, we want two terms to have the same normal form if and only if they are semantically equivalent. If we fail to have normal forms, we’d still at least want two terms to be in the same equivalence class induced by the equations if and only if they are semantically equivalent. Often, we only consider normal forms modulo the equivalence induced by commuting conversions and then endeavor to ensure that this equivalence is easily decidable.

If we were to consider a mapping-in property, e.g. |\mathcal C(\_, B) \times \mathcal C(\_, C) \cong \mathcal C(\_, B \times C)| for the categorical product, the story would be very similar except with introduction and elimination switched. We’d also find that we don’t need a commuting conversion rule, though we would need to expand any existing commuting conversions with the new eliminator. You can work through it and try to find where the naturality equation went.

See the appendix for a full and compact listing of the rules and an explicit formulation of what it means to interpret this language into a monoidal category.

Before we go on to examples of monoidal categories, let’s consider an example of a theory that we can formulate in our internal language. The main example is the theory of monoids. We assume a type |\mathsf M| and operations |\mathsf e : () \to \mathsf M| and |\mathsf m : (\mathsf M, \mathsf M) \to \mathsf M|. To this we add the equations, |\mathsf m(\mathsf e(), a) = a = \mathsf m(a, \mathsf e())| and |\mathsf m(\mathsf m(a_1, a_2), a_3) = \mathsf m(a_1, \mathsf m(a_2, a_3))|. As we can readily verify, all these equations use the free variables in order and exactly once on each side of the equations. You can contrast this to the axioms of a group and see that those axioms aren’t valid in our internal language. A model of the theory of a monoid in a monoidal category is known as a monoid object. Another theory we could formulate in our internal language is the theory of a monoid action.

For mathematicians, the archetypal example of a monoidal category is the category of vector spaces over a field |k|, e.g. the real numbers. The monoidal product is the tensor product of vector spaces with unit |k|. The multiarrows of the associated multicategory are multilinear maps. The tensor product is symmetric which corresponds to adding the structural rule of Exchange to our list of rules. In practice, this means while we’re still required to use each variable in the context of a term exactly once, we are no longer required to use them in order. A model of the theory of monoids in this monoidal category, i.e. a monoid object in this monoidal category, is exactly an associative, unital |k|-algebra. Closely related, a ring is exactly a monoid object in the monoidal category of abelian groups.

Another major class of monoidal categories is cartesian monoidal categories where the monoidal product is the categorical product. The category of vector spaces has categorical products so it forms a monoidal category with that as well. Therefore part of the data of a monoidal category is a specific choice of monoidal product as there can easily be many inequivalent monoidal products. In terms of our internal language, a cartesian monoidal product corresponds to adding all the structural rules: Exchange, Weakening, and Contraction. This means that we are free to use variables however we like, i.e. we can use them in any order, ignore them, or use them multiple times. Usually, categorical products are presented in terms of a mapping-in property as I mentioned in the previous section. For a type theory, this leads to having tupling and projections. A good exercise would be to look up (or formulate) the rules for product types and show how the rules we’ve provided, in addition to the structural rules, allows us to define projections and show that the relevant equations are satisfied. A monoid object in |\mathbf{Set}| with respect to its cartesian monoidal product is exactly what we typically mean by a monoid.

A final example is the category of endofunctors on a given category which becomes a strict, non-symmetric monoidal category with composition as the monoidal product and the identity functor as the unit. Our internal language then provides a rather different perspective on natural transformations. A natural transformation |\tau : F \circ G \to H| is now viewed as a binary operation |\tau : (F, G) \to H|. To be clear, this is still *interpreted* as a family of (*unary*) arrows |\tau_A : F(G(A)) \to H(A)|. The action on arrows (which are natural transformations in this case) of the monoidal product is horizontal composition of natural transformations. The famous example is, of course, the natural transformations |\mu : T \circ T \to T| and |\eta : Id \to T| become the operations |\mu : (T, T) \to T| and |\eta : () \to T| and the monad laws are exactly the monoid laws. That is, a monad is a monoid object in the category of endofunctors equipped with this monoidal product.

The category of endofunctors example is pretty nice, but it is weird to limit to just endofunctors. We can consider natural transformations from an arbitrary composable sequence of functors whose composite has the same source and target objects as the target functor. The source of this restriction is that our objects are (endo-)functors and the monoidal product needs to work on any pair of objects in any order. The solution to this is to use the fact that a monoidal category is exactly a one-object bicategory.

We can thus readily generalize to the internal language of a bicategory. As before, it was helpful to use the notion of a multicategory, at least in passing. The analogue of a multicategory in this context is a virtual bicategory. That is, a multicategory is to a monoidal category as a virtual bicategory is to a bicategory. Basically, a virtual bicategory is like a multicategory except that each object now has a specified source and target and instead of allowing arbitrary sequences as sources for multiarrows, we only allow *composable* sequences. Here, a composable sequence is a sequence of objects such that the target of one object in the sequence is the source of the next. The analogue of a representable multicategory is the existence of composites in our virtual bicategory. We can say a bicategory is a virtual bicategory that has all composites.

For our internal language, the only real change we need to make is to keep track of and enforce the composability constraint. One way of doing this is to modify our type formation judgement to |\vdash_S^T A| asserting that |A| is a linear type with source |S| and target |T|. Our typing judgement is similarly decorated producing |\Delta \vdash_S^T E : B|. |\Delta| is again of the form |a_1 : A_1, \dots, a_n : A_n|, but now there is the constraint that the source of |A_n| is |S|, the target of |A_1| is |T|, and the target of |A_i| is the source of |A_{i+1}| for |i < n|. This leads to rules like

$$\begin{align}
\dfrac{\Delta_1 \vdash_{T_1}^{T_0} E_1 : A_1 \quad \cdots \quad \Delta_n \vdash_{T_n}^{T_{n-1}} E_n : A_n}{\Delta_1,\dots,\Delta_n \vdash_{T_n}^{T_0} E_1 * \cdots * E_n : A_1 * \cdots * A_n}
\end{align}$$

but nothing needs to change at the term level (though I did rename |\otimes| to |*| as that’s less misleading). The main (strict) bicategory would be |\mathbf{Cat}| where we’d interpret the linear types as functors and the linear terms as natural transformations.

Another direction for generalization is motivated by T-algebras where |T| is a monad. A |T|-algebra is an arrow |\alpha : TA \to A|. If we think of |\alpha| as a binary operation, |\alpha : (T, A) \to A|, similarly to how we viewed |\mu|, the |T|-algebra laws would look like |\alpha(\eta(), a) = a| and |\alpha(\mu(x, y), a) = \alpha(x, \alpha(y, a))|. These look exactly like the laws of a monoid action. The problem with this idea is that |T| and |A| are different kinds of objects; they live in different categories. One solution to this is to consider the internal language of an actegory.

An actegory is a monoidal category, |\mathcal C|, that acts on another category, |\mathcal D|. The quickest way of describing this is to say it is a strong monoidal functor from |\mathcal C \to [\mathcal D, \mathcal D]| where the (endo-)functor category |[\mathcal D, \mathcal D]| is equipped with composition as its monoidal product. We can uncurry this functor into a bifunctor |({-})\cdot({=}) : \mathcal C \times \mathcal D \to \mathcal D| satisfying |I\cdot D \cong D| and |(C \otimes C’)\cdot D \cong C \cdot (C’ \cdot D)|. For our |T|-algebras, |\mathcal C| would be |[\mathcal D, \mathcal D]|, and the monoidal functor would just be the identity.

To make the internal language, we’d start by including all the rules for a monoidal category to handle the structure on |\mathcal C|. Next, we’d add a judgement |\Delta / d : D \vdash E : D’| where |\Delta = c_1 : C_1, \dots, c_n : C_n|. This would have the same restriction that all variables, |c_1, \dots, c_n| and |d|, would need to be used exactly once and in the order they were written. The idea is that this would be interpreted as an arrow in |\mathcal D| from |(C_1 \otimes \cdots \otimes C_n)\cdot D \to D’|.

We’d have the rules (among others):

$$\begin{gather}
\dfrac{}{/ d : D \vdash d : D} \\ \\
\dfrac{\Delta \vdash E : C \quad \Delta' / d : D \vdash E' : D'}{\Delta, \Delta' / d : D \vdash E \cdot E' : C \cdot D'} \\ \\
\dfrac{\Delta / d : D \vdash E : C \cdot D' \quad c : C / d' : D' \vdash E' : D''}{\Delta / d : D \vdash \mathsf{match}\ E\ \mathsf{as}\ c \cdot d'\ \mathsf{in}\ E' : D''}
\end{gather}$$

Presumably, you could formulate a notion of “virtual actegory” where the arrows consist of a list of objects from a multicategory |\mathcal C| and a final object from a category |\mathcal D| as their source and an object of |\mathcal D| as their target. You could imagine going further (or alternately) for an analogue of a (virtual) bicategory which would, again, amount to using composable sequences. (The name “biactegory” is already taken.)

Regardless, the above framework allows us to have |t : T / a : A \vdash \alpha(t, a) : A|, and we can then express our desired equations for a |T|-algebra in the form of the laws of a monoid action. One place where this notation comes in handy is in the connections between |T|-algebras and absolute colimits.

$$\begin{gather}
\dfrac{\vdash A}{a : A \vdash a : A}\text{Ax} \qquad
\dfrac{\Delta_1 \vdash E_1 : A_1 \quad \cdots \quad \Delta_n \vdash E_n : A_n \quad \Delta_l, a_1 : A_1, \dots, a_n : A_n, \Delta_r \vdash E: B}{\Delta_l, \Delta_1, \dots, \Delta_n, \Delta_r \vdash E[E_1/a_1,\dots,E_n/a_n] : B}\text{Cut}
\\ \\
\dfrac{}{\vdash I}I\text{F} \qquad
\dfrac{\vdash A_1 \quad \cdots \quad \vdash A_n}{\vdash A_1 \otimes \cdots \otimes A_n}{\otimes_n}\text{F}, n \geq 1 \qquad
\dfrac{\mathsf A : \mathsf{Type}}{\vdash \mathsf A}\text{PrimType}
\\ \\
\dfrac{\Delta_1 \vdash E_1 : A_1 \quad \cdots \quad \Delta_n \vdash E_n : A_n \quad \mathsf f : (A_1, \dots, A_n) \to B}{\Delta_1, \dots, \Delta_n \vdash \mathsf f(E_1, \dots, E_n) : B}\text{PrimTerm}
\\ \\
\dfrac{}{\vdash * : I}I\text{I} \qquad
\dfrac{\Delta_1 \vdash E_1 : A_1 \quad \cdots \quad \Delta_n \vdash E_n : A_n}{\Delta_1,\dots,\Delta_n \vdash E_1 \otimes \cdots \otimes E_n : A_1 \otimes \cdots \otimes A_n}{\otimes_n}\text{I}
\\ \\
\dfrac{\Delta \vdash E : I \quad \Delta_l, \Delta_r \vdash E' : B}{\Delta_l, \Delta, \Delta_r \vdash \mathsf{match}\ E\ \mathsf{as}\ {*}\ \mathsf{in}\ E' : B}I\text{E} \qquad
\dfrac{\Delta \vdash E : A_1 \otimes \cdots \otimes A_n \quad \Delta_l, a_1 : A_1, \dots, a_n : A_n, \Delta_r \vdash E' : B}{\Delta_l, \Delta, \Delta_r \vdash \mathsf{match}\ E\ \mathsf{as}\ a_1 \otimes \cdots \otimes a_n\ \mathsf{in}\ E' : B}{\otimes_n}\text{E},n \geq 1
\end{gather}$$

$$\begin{gather}
\dfrac{\Delta \vdash E : B}{\Delta \vdash (\mathsf{match}\ {*}\ \mathsf{as}\ {*}\ \mathsf{in}\ E) = E : B}{*}\beta \qquad
\dfrac{\Delta \vdash E : I \quad \Delta_l, a : I, \Delta_r \vdash E' : B}{\Delta_l, \Delta, \Delta_r \vdash E'[E/a] = (\mathsf{match}\ E\ \mathsf{as}\ {*}\ \mathsf{in}\ E'[*/a]) : B}{*}\eta
\\ \\
\dfrac{\Delta_1 \vdash E_1 : A_1 \quad \cdots \quad \Delta_n \vdash E_n : A_n \quad \Delta_l, a_1 : A_1, \dots, a_n : A_n, \Delta_r \vdash E : B}{\Delta_l, \Delta_1, \dots, \Delta_n, \Delta_r \vdash (\mathsf{match}\ E_1\otimes\cdots\otimes E_n\ \mathsf{as}\ a_1\otimes\cdots\otimes a_n\ \mathsf{in}\ E) = E[E_1/a_1, \dots, E_n/a_n] : B}{\otimes_n}\beta
\\ \\
\dfrac{\Delta \vdash E : A_1 \otimes \cdots \otimes A_n \quad \Delta_l, a : A_1 \otimes \cdots \otimes A_n, \Delta_r \vdash E' : B}{\Delta_l, \Delta, \Delta_r \vdash E'[E/a] = (\mathsf{match}\ E\ \mathsf{as}\ a_1\otimes\cdots\otimes a_n\ \mathsf{in}\ E'[(a_1\otimes\cdots\otimes a_n)/a]) : B}{\otimes_n}\eta
\\ \\
\dfrac{\Delta \vdash E_1 : I \quad \Delta_l, \Delta_r \vdash E_2 : I \quad \Delta_l', \Delta_r' \vdash E_3 : C}{\Delta_l', \Delta_l, \Delta, \Delta_r, \Delta_r' \vdash (\mathsf{match}\ (\mathsf{match}\ E_1\ \mathsf{as}\ {*}\ \mathsf{in}\ E_2)\ \mathsf{as}\ {*}\ \mathsf{in}\ E_3) = (\mathsf{match}\ E_1\ \mathsf{as}\ {*}\ \mathsf{in}\ \mathsf{match}\ E_2\ \mathsf{as}\ {*}\ \mathsf{in}\ E_3) : C}{*}{*}\text{CC}
\\ \\
\dfrac{\Delta \vdash E_1 : A_1 \otimes \cdots \otimes A_n \quad \Delta_l, a_1 : A_1, \dots, a_n : A_n, \Delta_r \vdash E_2 : I\quad \Delta_l', \Delta_r' \vdash E_3 : C}{\Delta_l', \Delta_l, \Delta, \Delta_r, \Delta_r' \vdash (\mathsf{match}\ (\mathsf{match}\ E_1\ \mathsf{as}\ a_1 \otimes \cdots \otimes a_n\ \mathsf{in}\ E_2)\ \mathsf{as}\ {*}\ \mathsf{in}\ E_3) \\ \qquad \qquad \qquad \quad \, = (\mathsf{match}\ E_1\ \mathsf{as}\ a_1 \otimes \cdots \otimes a_n\ \mathsf{in}\ \mathsf{match}\ E_2\ \mathsf{as}\ {*}\ \mathsf{in}\ E_3) : C}{\otimes_n}{*}\text{CC}
\\ \\
\dfrac{\Delta \vdash E_1 : I \quad \Delta_l, \Delta_r \vdash E_2 : B_1 \otimes \cdots \otimes B_m \quad \Delta_l', b_1 : B_1, \dots, b_m : B_m, \Delta_r' \vdash E_3 : C}{\Delta_l', \Delta_l, \Delta, \Delta_r, \Delta_r' \vdash (\mathsf{match}\ (\mathsf{match}\ E_1\ \mathsf{as}\ {*}\ \mathsf{in}\ E_2)\ \mathsf{as}\ b_1 \otimes \cdots \otimes b_m\ \mathsf{in}\ E_3) \\ \qquad \qquad \qquad \quad \, = (\mathsf{match}\ E_1\ \mathsf{as}\ {*}\ \mathsf{in}\ \mathsf{match}\ E_2\ \mathsf{as}\ b_1 \otimes \cdots \otimes b_m\ \mathsf{in}\ E_3) : C}{*}{\otimes_m}\text{CC}
\\ \\
\dfrac{\Delta \vdash E_1 : A_1 \otimes \cdots \otimes A_n \quad \Delta_l, a_1 : A_1, \dots, a_n : A_n, \Delta_r \vdash E_2 : B_1 \otimes \cdots \otimes B_m \quad \Delta_l', b_1 : B_1, \dots, b_m : B_m, \Delta_r' \vdash E_3 : C}{\Delta_l', \Delta_l, \Delta, \Delta_r, \Delta_r' \vdash (\mathsf{match}\ (\mathsf{match}\ E_1\ \mathsf{as}\ a_1 \otimes \cdots \otimes a_n\ \mathsf{in}\ E_2)\ \mathsf{as}\ b_1 \otimes \cdots \otimes b_m\ \mathsf{in}\ E_3) \\ \qquad \qquad \qquad \quad \, = (\mathsf{match}\ E_1\ \mathsf{as}\ a_1 \otimes \cdots \otimes a_n\ \mathsf{in}\ \mathsf{match}\ E_2\ \mathsf{as}\ b_1 \otimes \cdots \otimes b_m\ \mathsf{in}\ E_3) : C}{\otimes_n}{\otimes_m}\text{CC}
\end{gather}$$

|\mathsf A : \mathsf{Type}| means that |\mathsf A| is a primitive type in the signature. |\mathsf f : (A_1, \dots, A_n) \to B| means that |\mathsf f| is assigned this type in the signature.

I did not write them, but the usual laws for equality (reflexivity and indiscernability of identicals) should be included.

A theory in this language is free to introduce additional linear types and linear operations.

Write |\newcommand{\den}[1]{[\![#1]\!]}\den{-}| for the (overloaded) interpretation function. Its value on primitive operations is left as a parameter.

Associators for the semantic |\otimes| will be omitted below. We can arbitrarily assume a particular association of monoidal products and then the relevant associators are completely determined by the input and output types. There are multiple possible expressions for those associators, but the coherence conditions of monoidal categories guarantee that they are equal.

$$\begin{align}
\vdash A \implies & \den{A} \in \mathsf{Ob}(\V) \\ \\
\den{\Delta} =\, & \den{A_1}\otimes \cdots \otimes \den{A_n} \text{ where } \Delta = a_1 : A_1, \dots, a_n : A_n \\
\den{I} =\, & I \\
\den{A_1 \otimes \cdots \otimes A_n} =\, & \den{A_1}\otimes \cdots \otimes \den{A_n} \\
\end{align}$$

$$\begin{align}
\Delta \vdash E : A \implies & \den{E} \in \V(\den{\Delta}, \den{A}) \\ \\
\den{a} =\, & id_{\den{A}} \text{ where } a : A \\
\den{*} =\, & id_I \\
\den{E_1 \otimes \cdots \otimes E_n} =\, & \den{E_1} \otimes \cdots \otimes \den{E_n} \text{ where }\Delta_i \vdash E_i : A_i \\
\den{\mathsf{match}\ E\ \mathsf{as}\ {*}\ \mathsf{in}\ E'} =\, &
\den{E'} \circ (id_{\den{\Delta_l}} \otimes (\lambda_{\den{\Delta_r}} \circ (\den{E} \otimes id_{\den{\Delta_r}}))) \\
\den{\mathsf{match}\ E\ \mathsf{as}\ a_1 \otimes \cdots \otimes a_n\ \mathsf{in}\ E'} =\, &
\den{E'} \circ (id_{\den{\Delta_l}} \otimes \den{E} \otimes id_{\den{\Delta_r}}) \\
\den{\mathsf f(E_1, \dots, E_n)} =\, & \den{\mathsf f} \circ (\den{E_1} \otimes \cdots \otimes \den{E_n})
\text{ where }\mathsf f\text{ is an appropriately typed linear operation}
\end{align}$$

where |\lambda_B : I \otimes B \cong B| is the left unitor.

When I was young and dumb and first learning category theory, I got it into my head that arguments involving sets were not “categorical”. This is not completely crazy as the idea of category theory being an alternate “foundation” and categorical critiques of set theoretic reasoning are easy to find. As such, I tended to neglect techniques that significantly leveraged |\mathbf{Set}|, and, in particular, representability. Instead, I’d prefer arguments using universal arrows as those translate naturally and directly to 2-categories.

This was a mistake. I have long since completely reversed my position on this for both practical and theoretical reasons. Practically, representability and related techniques provide very concise definitions which lead to concise proofs which I find relatively easy to formulate and easy to verify. This is especially true when combined with the (co)end calculus. It’s also the case that for a lot of math you simply don’t need any potential generality you might gain by, e.g. being able to use an arbitrary 2-category. Theoretically, I’ve gained a better understanding of where and how category theory is (or is not) “foundational”, and a better understanding of what about set theory categorists were critiquing. Category theory as a whole does *not* provide an alternate foundation for mathematics as that term is usually understood by mathematicians. A branch of category theory, topos theory, does, but a topos is fairly intentionally designed to give a somewhat |\mathbf{Set}|-like experience. Similarly, many approaches to higher category theory still include a |\mathbf{Set}|-like level.

This is, of course, not to suggest ideas like universal arrows *aren’t* important or can’t lead to elegant proofs.

Below is a particular example of attacking a problem from the perspective of representability. I use this example more because it is a neat proof that I hadn’t seen before. There are plenty of simpler compelling examples, such as proving that right(/left) adjoints are (co)continuous, and I regularly use representability in proofs I presented on, e.g. the Math StackExchange.

An elementary topos, |\mathcal E|, can be described as a category with finite limits and power objects. Having power objects means having a functor |\mathsf P : \mathcal E^{op} \to \mathcal E| such that |\mathcal E(A,\mathsf PB) \cong \mathsf{Sub}(A \times B)| natural in |A| and |B| where |\mathsf{Sub}| is the (contravariant) functor that takes an object to its set of subobjects. The action of |\mathsf{Sub}(f)| for an arrow |f : A \to B| is a function |m \mapsto f^*(m)| where |m| is a (representative) monomorphism and |f^*(m)| is the pullback of |f| along |m| which is a monomorphism by basic facts about pullbacks. In diagrammatic form:

$$\require{AMScd}
\begin{CD}
f^{-1}(B') @>f^\ast(m)>> A \\
@VVV @VVfV \\
B' @>>m> B
\end{CD}$$

This is a characterization of |\mathsf P| via representability. We are saying that |\mathsf PB| represents the functor |\mathsf{Sub}(- \times B)| parameterized in |B|.

A well-known and basic fact about elementary toposes is that they are cartesian closed. (Indeed, finite limits + cartesian closure + a subobject classifier is a common alternative definition.) Cartesian closure can be characterized as |\mathcal E(- \times A, B) \cong \mathcal E(-,B^A)| which characterizes the exponent, |B^A|, via representability. Namely, that |B^A| represents the functor |\mathcal E(- \times A, B)| parameterized in |A|. Proving that elementary toposes are cartesian closed is not too difficult, but it is a bit fiddly. This is the example that I’m going to use.

All the proofs I reference rely on the following basic facts about an elementary topos.

We have the monomorphism |\top : 1 \to \mathsf P1| induced by the identity arrow |\mathsf P1 \to \mathsf P1|.

We need the lemma that |\mathcal E(A \times B,PC) \cong \mathcal E(A,\mathsf P(B \times C))|. **Proof**:

$$\begin{align}
\mathcal E(A \times B,\mathsf PC) \cong \mathsf{Sub}((A \times B) \times C) \cong \mathsf{Sub}(A \times (B \times C)) \cong \mathcal E(A,\mathsf P(B \times C))\ \square
\end{align}$$

Since the arrow |\langle id_A, f\rangle : A \to A \times B| is a monomorphism for any arrow |f : A \to B|, the map |f \mapsto \langle id, f \rangle| is a map from |\mathcal E(-, B)| to |\mathsf{Sub}(- \times B)|. Using |\mathsf{Sub}(- \times B) \cong \mathcal E(-,\mathsf PB)|, we get a map |\mathcal E(-, B) \to \mathsf{Sub}(- \times B) \cong \mathcal E(-,\mathsf PB)|. By Yoneda, i.e. by evaluating it at |id|, we get the singleton map: |\{\}_B : B \to \mathsf PB|. If we can show that |\{\}| is a monomorphism, then, since |\mathsf PA \cong \mathsf P(A \times 1)|, we’ll get an arrow |\sigma : \mathsf PA \to \mathsf P1| such that

$$\begin{CD}
A @>\{\}_A>> \mathsf PA \\
@VVV @VV\sigma_AV \\
1 @>>\top> \mathsf P1
\end{CD}$$

is a pullback.

|\{\}_A| is a monomorphism because any |f, g : X \to A| gets mapped by the above to |\langle id_X, f\rangle| and |\langle id_X, g\rangle| which represent the same subobject when |\{\} \circ f = \{\} \circ g|. Therefore, there’s an isomorphism |j : X \cong X| such that |\langle id_X, f\rangle \circ j = \langle j, f \circ j\rangle = \langle id_X, g\rangle| but this means |j = id_X| and thus |f = g|.

To restate the problem: given the above setup, we want to show that the elementary topos |\mathcal E| is cartesian closed.

Toposes, Triples, and Theories by Barr and Wells actually provides *two* proofs of this statement: Theorem 4.1 of Chapter 5. The first proof is in exactly the representability-style approach I’m advocating, but it relies on earlier results about how a topos relates to its slice categories. The second proof is more concrete and direct, but it also involves |\mathsf P\mathsf P\mathsf P\mathsf P B|…

Sheaves in Geometry and Logic by Mac Lane and Moerdijk also has this result as Theorem 1 of section IV.2 “The Construction of Exponentials”. The proof starts on page 167 and finishes on 169. The idea is to take the set theoretic construction of functions via their graphs and interpret that into topos concepts. This proof involves a decent amount of equational reasoning (either via diagrams or via generalized elements).

Contrast these to Dan Doel’s proof^{1} using representability, which proceeds as follows. (Any mistakes are mine.)

Start with the pullback induced by the singleton map.

$$\begin{CD}
B @>\{\}_B>> \mathsf PB \\
@VVV @VV\sigma_BV \\
1 @>>\top> \mathsf P1
\end{CD}$$

Apply the functor |\mathcal E(= \times A,-)| which preserves the fact that it is a pullback via continuity.

$$\begin{CD}
\mathcal{E}(-\times A,B) @>>> \mathcal{E}(- \times A,\mathsf PB) \\
@VVV @VVV \\
\mathcal{E}(- \times A,1) @>>> \mathcal{E}(- \times A,\mathsf P1)
\end{CD}$$

Note:

- |\mathcal E(- \times A,1) \cong 1 \cong \mathcal E(-,1)| (by continuity)
- |\mathcal E(- \times A,\mathsf PB) \cong \mathcal E(-,\mathsf P(A \times B))| (by definition of power objects)
- |\mathcal E(- \times A,\mathsf P1) \cong \mathcal E(-,\mathsf PA)| (because |A \times 1 \cong A|)

This means that the above pullback is also the pullback of

$$\begin{CD}
\mathcal E(-\times A,B) @>>> \mathcal E(-,\mathsf P(A\times B)) \\
@VVV @VVV \\
\mathcal E(-,1) @>>> \mathcal E(-,\mathsf PA)
\end{CD}$$

Since |\mathcal E| has all finite limits, it has the following pullback

$$\begin{CD}
X @>>> \mathsf P(A \times B) \\
@VVV @VVV \\
1 @>>> \mathsf PA
\end{CD}$$

where the bottom and right arrows are induced by the corresponding arrows of the previous diagram by Yoneda. Applying |\mathcal E(=,-)| to this diagram gives another pullback diagram by continuity

$$\begin{CD}
\mathcal E(-,X) @>>> \mathcal E(-,\mathsf P(A\times B)) \\
@VVV @VVV \\
\mathcal E(-,1) @>>> \mathcal E(-,\mathsf PA)
\end{CD}$$

which is to say |\mathcal E(- \times A, B) \cong \mathcal E(-, X)| because pullbacks are unique up to isomorphism, so |X| satisfies the universal property of |B^A|, namely |\mathcal E(- \times A, B) \cong \mathcal E(-,B^A)|.

Sent to me almost exactly three years ago.↩︎

Recursive helping is a technique for implementing lock-free concurrent data structures and algorithms. I’m going to illustrate this in the case of implementing a multi-variable compare-and-swap (MCAS) in terms of a single variable compare-and-swap. Basically everything I’m going to talk about comes from Keir Fraser’s PhD Thesis, Practical Lock-Freedom (2004) which I **strongly** recommend. Fraser’s thesis goes much further than this, e.g. fine-grained lock-free implementations of software transactional memory (STM)^{1}. Fraser went on to contribute to the initial implementations of STM in Haskell, though his thesis uses C++.

First, some prerequisites.

I imagine most developers when they hear the term “lock-free” take it to mean a concurrent algorithm implemented without using locks. It, however, has a technical definition. Assuming a concurrent application is functionally correct, e.g. it does the right thing if it terminates no matter how things are scheduled, we still have three liveness problems in decreasing order of severity:

**Deadlock**- the application getting stuck in state where no subprocesses can be scheduled**Livelock**- the application fails to make progress despite subprocesses being scheduled, e.g. endlessly retrying**Starvation**- some subprocesses never make progress even though the application as a whole makes progress

In parallel, we have three properties corresponding to programs that cannot exhibit the above behaviors:

**Obstruction-free**- no deadlock, you’ll get this if you don’t use locks**Lock-free**- obstruction-free and no livelock**Wait-free**- lock-free and no starvation

Wait-freedom is the most desirable property but was difficult to achieve with reasonable efficiency. However, relatively recent techniques (2014) may do for wait-free algorithms what Fraser’s thesis did for lock-free algorithms, namely reduce a research problem to an exercise. These techniques, however, still start from a lock-free algorithm. Obstruction-freedom is usually what you get from concurrency control mechanisms that abort and retry in the case of conflicts. To achieve lock-freedom, we need to avoid losing and redoing work, or at least doing so repeatedly indefinitely.

See Fraser’s thesis for more formal definitions.

I’ll use “**lockless**” to mean an algorithm implemented without using locks.

The usual primitive used to implement and describe lockless algorithms is compare-and-swap, often just called `cas`

. There are other possibilities, but `cas`

is universal, relatively simple, and widely implemented, e.g. as the `cmpxchg`

operation for the x86 architecture. The following is a specification of a `cas`

operation in Haskell with the additional note that this intended to be performed atomically (which it would not be in Haskell).

```
cas :: (Eq a) => IORef a -> a -> a -> IO a
cas ref old new = do
curr <- readIORef ref
if curr == old then do
writeIORef ref new
return curr
else do
return curr
```

The specification of multiple compare-and-swap is the straightforward extension to the above to several variables.

```
mcasSpec :: (Eq a) => [(IORef a, a, a)] -> IO Bool
mcasSpec entries = do
eqs <- forM entries $ \(ref, old, _) -> do
curr <- readIORef ref
return (curr == old)
if and eqs then do -- if all equal
forM_ entries $ \(ref, _, new) -> do
writeIORef ref new
return True
else do
return False
```

The above is, again, intended to be executed atomically. It will be convenient to allow a bit more flexibility in the type producing the type:

where we have

```
-- Abstract.
newtype MCASRef a = MCASRef { unMCASRef :: IORef (Either (T a) a) } deriving ( Eq )
newMCASRef :: a -> IO (MCASRef a)
newMCASRef v = MCASRef <$> newIORef (Right v)
readMCASRef :: MCASRef a -> IO a
-- Will be implemented below.
```

The idea here is that, in addition to values of type `a`

, we can also store values of type `T a`

into the pointers for internal use, and we can unambiguously distinguish them from values of type `a`

. `T a`

can be any type constructor you like.

In the code below, I will assume `IORef`

s have an `Ord`

instance, i.e. that they can be sorted. This is *not* true, but an approach as in ioref-stable could be used to accomplish this. Alternatively, `Ptr`

s to `StablePtr`

s could be used.

We won’t worry about memory consistency concerns here. That is, we’ll assume sequential consistency where all CPU cores see all updates immediately.

I recommend stopping here and thinking about how you would implement `mcas`

in terms of `cas`

while, of course, achieving the desired atomicity. The solution I’ll present – the one from Fraser’s thesis – is moderately involved, so if the approach you come up with is very simple, then you’ve probably made a mistake. Unsurprisingly, the solution I’ll present makes use of the additional flexibility in the type and the ability to sort `IORef`

s. I’m not claiming it is impossible to accomplish this without these though. For example, you could apply a universal construction which witnesses the universality of `cas`

.

Lockless algorithms typically proceed by attempting an operation and detecting conflicts, e.g. as in multi-version concurrency control. This requires storing enough information to tell that a conflicting operation has occurred/is occurring. Once a conflict is detected, the simplest solution is typically to abort and retry hoping that there isn’t a conflict the next time. This clearly leads to the possibility of livelock.

Instead of aborting, the later invocation of the operation could instead help the earlier one to complete, thereby getting it out of its way. This ensures that the first invocation will always, eventually complete giving us lock-freedom. However, this doesn’t guarantee that once the second invocation finishes helping the first invocation that a third invocation won’t jump in before the second invocation gets a chance to start on its own work. In this case, the second invocation will help the third invocation to complete before attempting to start itself. A process can end up spending all its time helping other processes while never getting its own work done, leading to starvation.

To perform recursive helping, we need an invocation to store enough information so that subsequent, overlapping invocations are able to assist. To accomplish this, we’ll store a (pointer to a) “descriptor” containing the parameters of the invocation being helped and potentially additional information^{2}. This is what we’ll use for the `T`

type constructor.

The general approach will be: at the beginning of the operation we will attempt to “install” a descriptor in the first field we touch utilizing `cas`

. There are then three possible outcomes. If we fail and find a value, then the operation has failed. If we fail and find an existing descriptor, then we (potentially recursively) help that descriptor. If we succeed, then we have successfully “acquired” the field and we “help” ourselves. We can have many processes all trying to help the same invocation at the same time, so it is still important that multiple identical help calls don’t interfere with each other. Just because we’re helping an invocation of an operation doesn’t mean that that the original process isn’t still executing.

Since we’ll be replacing pointers to values with pointers to descriptors, reading the value becomes non-trivial. In particular, if when we read a pointer we get a descriptor, we’ll need to help the invocation described to completion. We will need to keep doing this until we successfully read a value.

An operation we’ll use in the implementations of `mcas`

is a conditional compare-and-swap (CCAS) where we only perform the swap if, additionally, an additional variable is set to a given value. It has the following specification.

```
-- Specification. Implementations should perform this atomically.
ccasSpec :: (Eq a, Eq c) => IORef a -> a -> a -> IORef c -> c -> IO ()
ccasSpec ref old new condRef check = do
curr <- readIORef ref
cond <- readIORef condRef
if cond == check && curr == old then do
writeIORef ref new
else do
return ()
```

We’ll need to show that this can be implemented in terms of `cas`

, or rather a version with modifications similar to those mentioned for `mcas`

. This will be a simple instance of the recursive helping approach that will be applied in the implementation of `mcas`

.

```
type CCASDescriptor a c = IORef (CCASRef a c, a, a, IORef c, c)
newtype CCASRef a c = CCASRef { unCCASRef :: IORef (Either (CCASDescriptor a c) a) } deriving ( Eq, Ord )
```

We begin with the types. As described above, a `CCASRef`

is just an `IORef`

that holds either a value or a descriptor, and the descriptor is just an `IORef`

pointing at a tuple holding the arguments to `ccas`

. We won’t actually modify this latter `IORef`

and instead are just using it for its object identity. It could be replaced with a read-only `IVar`

or a `Unique`

could be allocated and used as an identifier instead. In a lower-level language, this `IORef`

corresponds to having a pointer to the descriptor.

```
newCCASRef :: a -> IO (CCASRef a c)
newCCASRef v = CCASRef <$> newIORef (Right v)
readCCASRef :: (Eq a, Eq c) => CCASRef a c -> IO a
readCCASRef ref = do
x <- readIORef (unCCASRef ref)
case x of
Right v -> return v
Left d -> do
ccasHelp d
readCCASRef ref
-- Not atomic. This CAS can fail even when it would be impossible if `ccas` was truly atomic.
-- Example: ccas a reference to the same value but where the condRef is False. The ccas fails
-- and thus should behave as a no-op, but if a `casCCASRef` occurs during the course of the ccas,
-- the `casCCASRef` can fail even though it should succeed in all cases.
casCCASRef :: (Eq a) => CCASRef a c -> a -> a -> IO Bool
casCCASRef (CCASRef ref) old new = do
curr <- cas ref (Right old) (Right new)
return (curr == Right old)
tryReadCCASRef :: CCASRef a c -> IO (Maybe a)
tryReadCCASRef ref = do
x <- readIORef (unCCASRef ref)
return (case x of Left _ -> Nothing; Right v -> Just v)
```

To get them out of the way, the following functions implement the reference-like aspects of a `CCASRef`

. The descriptor is an internal implementation detail. The interface is meant to look like a normal reference to a value of type `a`

. The main notes are:

- since the
`CCASRef`

may not contain a value when we read, we loop helping to complete the`ccas`

until it does, `casCCASRef`

is a slightly simplified`cas`

used in`mcas`

but should be not be considered part of the interface, and`tryReadCCASRef`

is used in the implementation of`mcas`

, but you quite possibly wouldn’t provide it otherwise.

```
ccas :: (Eq a, Eq c) => CCASRef a c -> a -> a -> IORef c -> c -> IO ()
ccas ref old new condRef check = do
d <- newIORef (ref, old, new, condRef, check)
v <- cas (unCCASRef ref) (Right old) (Left d)
go d v
where go d (Left d') = do -- descriptor already there
ccasHelp d'
v <- cas (unCCASRef ref) (Right old) (Left d)
go d v
go d (Right curr) | curr == old = ccasHelp d -- we succeeded
| otherwise = return () -- we failed
ccasHelp :: (Eq a, Eq c) => CCASDescriptor a c -> IO ()
ccasHelp d = do
(CCASRef ref, old, new, condRef, check) <- readIORef d
cond <- readIORef condRef
_ <- cas ref (Left d) (Right $! if cond == check then new else old)
return ()
```

Here we illustrate the (not so recursive) helping pattern. `ccas`

allocates a descriptor and then attempts to “acquire” the reference. There are three possibilities.

- We find a descriptor already there, in which case we help it and then try to acquire the reference again.
- The CAS succeeds and thus we successfully “acquire” the reference. We then “help ourselves”.
- The CAS fails with an unexpected (non-descriptor) value. Thus, the CCAS fails and we do nothing.

Helping, implemented by `ccasHelp`

, just performs the logic of CCAS. If we’ve gotten to `ccasHelp`

, we know the invocation described by the descriptor did, in fact, find the expected value there. By installing our descriptor, we’ve effectively “locked out” any other calls to `ccas`

until we complete. We can thus check the `condRef`

at our leisure. As long as our descriptor is still in the `CCASRef`

, which we check via a `cas`

, we know that there have been no intervening operations, including other processes completing this `ccas`

. `ccasHelp`

is idempotent in the sense that running it multiple times, even in parallel, with the same descriptor is the same as running it once. This is due to the fact that we only (successfully) CAS in the descriptor once, so we can only CAS it out at most once.

The setup for MCAS is much the same as CCAS. The main additional complexity comes from the fact that we need to simultaneously “acquire” multiple references. This is handled by a two-phase approach. In the first phase, we attempt to “acquire” each reference. We proceed to the second phase once we’ve either seen that the MCAS is going to fail, or we have successfully “acquired” each reference. In the second phase, we either reset all the “acquired” references to their old values if the MCAS failed or to their new values if it succeeded. The MCAS will be considered to have occurred atomically at the point we record this decision, via a CAS, i.e. between the two phases.

```
data MCASStatus = UNDECIDED | FAILED | SUCCESSFUL deriving ( Eq )
data MCASDescriptor' a = MCASDescriptor [(MCASRef a, a, a)] (IORef MCASStatus)
type MCASDescriptor a = IORef (MCASDescriptor' a)
newtype MCASRef a = MCASRef { unMCASRef :: CCASRef (Either (MCASDescriptor a) a) MCASStatus }
deriving ( Eq, Ord )
```

As with CCAS, an `MCASRef`

is a reference, in this case a `CCASRef`

, that either holds a value or a descriptor. The descriptor holds the arguments of `mcas`

, as with `ccas`

, but it additionally holds a status reference. This status reference will be used as the condition reference of the CCAS. In particular, as we’ll see, we will only perform `ccas`

’s when the status is `UNDECIDED`

.

```
newMCASRef :: a -> IO (MCASRef a)
newMCASRef v = MCASRef <$> newCCASRef (Right v)
readMCASRef :: (Eq a) => MCASRef a -> IO a
readMCASRef ref = do
x <- readCCASRef (unMCASRef ref)
case x of
Right v -> return v
Left d -> do
mcasHelp d
readMCASRef ref
```

There’s nothing to say about the reference interface functions. They are essentially identical to the CCAS ones for the same reasons only with `CCASRef`

s instead of `IORef`

s.

```
mcas :: (Eq a) => [(MCASRef a, a, a)] -> IO Bool
mcas entries = do
status <- newIORef UNDECIDED
d <- newIORef (MCASDescriptor (sortOn (\(ref, _, _) -> ref) entries) status)
mcasHelp d
```

The `mcas`

function is fairly straightforward. It allocates a status reference and a descriptor and delegates most of the work to `mcasHelp`

. The main but critical subtlety is the sort. This is critical to ensuring termination.

```
mcasHelp :: (Eq a) => MCASDescriptor a -> IO Bool
mcasHelp d = do
MCASDescriptor entries statusRef <- readIORef d
let phase1 [] = do
_ <- cas statusRef UNDECIDED SUCCESSFUL
phase2
phase1 ((MCASRef ref, old, new):es) = tryAcquire ref old new es
tryAcquire ref old new es = do
_ <- ccas ref (Right old) (Left d) statusRef UNDECIDED
v <- tryReadCCASRef ref
case v of
Just (Left d') | d == d' -> phase1 es -- successful acquisition
| otherwise -> do -- help someone else
mcasHelp d'
tryAcquire ref old new es
Just (Right curr) | curr == old -> do
status <- readIORef statusRef
if status == UNDECIDED then do
tryAcquire ref old new es -- failed to acquire but could still succeed
else do
phase2
_ -> do -- failed MCAS
_ <- cas statusRef UNDECIDED FAILED
phase2
phase2 = do
status <- readIORef statusRef
let succeeded = status == SUCCESSFUL
forM_ entries $ \(MCASRef ref, old, new) -> do
casCCASRef ref (Left d) (Right (if succeeded then new else old))
return succeeded
phase1 entries
```

`phase1`

attempts to “acquire” each `MCASRef`

by using `tryAcquire`

which will move `phase1`

to the next entry each time it succeeds. Therefore, if `phase1`

reaches the end of the list and there was no interference, we will have successfully “acquired” all references. We record this with a CAS against `statusRef`

. If this CAS succeeds, then the MCAS will be considered successful and conceptually to have occurred at this point. If the CAS fails, then some other process has already completed this MCAS, possibly in success or failure. We then move to `phase2`

.

`tryAcquire`

can also detect that the MCAS should fail. In this case, we immediately attempt to record this fact via a CAS into `statusRef`

. As with the successful case, this CAS succeeding marks the conceptual instant that the MCAS completes. As before, we then move on to `phase2`

.

We never enter `phase2`

without `statusRef`

being set to either `SUCCESSFUL`

or `FAILED`

. `phase2`

is completely straightforward. We simply set each “acquired” reference to either the new or old value depending on whether the MCAS succeeded or not. The `casCCASRef`

will fail if either we never got around to “acquiring” a particular reference (in the case of MCAS failure), or if a reference was written to since it was “acquired”. Since such writes conceptually occurred after the MCAS completed, we do not want to overwrite them.

During `tryAcquire`

, there are a few cases that lead to retrying. First, if we find that a reference has already been “acquired” by some other MCAS operation, we recursively help it. Here, the sorting of the references is important to ensure that any MCAS operation we help will never try to help us back. It’s easy to see that without the sorting, if two MCAS operations on the same two references each acquired one of the references, the (concurrent) recursive calls to `mcasHelp`

would become infinite loops. With a total ordering on references, each recursive call to `mcasHelp`

will be at a greater reference and thus must eventually terminate. The other case for `tryAcquire`

, is that the expected value is written after the `ccas`

but before the `tryReadCCASRef`

. In this case, we try again unless the status has already been decided. It might seem like this is just an optimization, and that we could instead treat this as the MCAS failing. However, the intervening write may have written the value that was there before the `ccas`

, meaning that there was never a point at which the MCAS could have failed.

References are only “acquired” at the `ccas`

in `phase1`

. Once the status has been decided, no references may be “acquired” any longer. Since it’s impossible to enter `phase2`

without deciding the status, once one process enters `phase2`

, no processes are capable of “acquiring” references. This makes `phase2`

idempotent and, indeed, each CAS in `phase2`

is independently idempotent. Overlapping executions of `phase1`

are fine essentially because each `ccas`

is idempotent and the `statusRef`

can change at most once.

Let’s see how an `mcas`

interacts with other operations from the perspective of atomicity. If we attempt to read a reference via `readMCASRef`

which is included in the list of references of an ongoing `mcas`

, there are two possibilities. Either that reference has not yet been “acquired” by the `mcas`

, in which case the read will occur conceptually before the MCAS, or it has been “acquired” in which case the read will help the MCAS to completion and then try again after the MCAS. The story is similar for overlapping `mcas`

’s. The `mcas`

which “acquires” the least reference in their intersection will conceptually complete first, either because it literally finishes before the second `mcas`

notices or because the second `mcas`

will help it to completion. Writes are only slightly different.

It is important to note that these operations are NOT atomic with respect to some other reasonable operations. Most notably, they are not atomic with respect to blind writes. It is easy to construct a scenario where two blind writes happen in sequence but the first appears to happen after the `mcas`

and the second before. Except for initialization, I don’t believe there are any blind writes to references involved in a `mcas`

in Fraser’s thesis. Fraser’s thesis does, however, contain `cas`

operations directly against these references. These are also not atomic for exactly the same reason `casCCASRef`

isn’t. That said, Fraser’s uses of `cas`

against `MCASRef`

s are safe, because in each case they just retry until success.

While I’ve gone into a good amount of detail here, I’ve mainly wanted to illustrate the concept of recursive helping. It’s a key concept for lock-free and wait-free algorithm designs, but it also may be a useful idea to have in mind when designing concurrent code even if you aren’t explicitly trying to achieve a lock-free guarantee.

MCAS allows you to perform transactions involving multiple updates atomically. STM additionally allows you to perform transactions involving multiple reads and updates atomically.↩︎

While I haven’t attempted it to see if it works out, it seems like you could make a generic “recursive helping” framework by storing an

`IO`

action instead. The “descriptors” have the flavor of defunctionalized continuations.↩︎

When one talks about “indexed (co)products” in an indexed category, it is often described as follows:

Let |\mathcal C| be an **|\mathbf S|-indexed category**, i.e. a pseudofunctor |\mathbf S^{op} \to \mathbf{Cat}| where |\mathbf S| is an ordinary category. Write |\mathcal C^I| for |\mathcal C(I)| and |f^* : \mathcal C^J \to \mathcal C^I| for |\mathcal C(f)| where |f : I \to J|. The functors |f^*| will be called **reindexing functors**. |\mathcal C| has **|\mathbf S|-indexed coproducts** whenever

- each reindexing functor |f^*| has a left adjoint |\Sigma_f|, and
- the Beck-Chevalley condition holds, i.e. whenever

$$\require{AMScd}\begin{CD} I @>h>> J \\ @VkVV @VVfV \\ K @>>g> L \end{CD}$$

is a pullback square in |\mathbf S|, then the canonical morphism |\Sigma_k \circ h^* \to g^* \circ \Sigma_f| is an isomorphism.

The first condition is reasonable, especially motivated with some examples, but the second condition is more mysterious. It’s clear that you’d need *something* more than simply a family of adjunctions, but it’s not clear how you could calculate the particular condition quoted. That’s the goal of this article. I will not cover what the Beck-Chevalley condition is intuitively saying. I cover that in this Stack Exchange answer from a logical perspective, though there are definitely other possible perspectives as well.

Some questions are:

- Where does the Beck-Chevalley condition come from?
- What is this “canonical morphism”?
- Why do we care about pullback squares in particular?

The concepts we’re interested in will typically be characterized by universal properties, so we’ll want an indexed notion of adjunction. We can get that by instantiating the general definition of an adjunction in any bicategory if we can make a bicategory of indexed categories. This is pretty easy since indexed categories are already described as pseudofunctors which immediately suggests a natural notion of indexed functor would be a pseudonatural transformation.

Explicitly, given indexed categories |\mathcal C, \mathcal D : \mathbf S^{op} \to \mathbf{Cat}|, an **indexed functor** |F : \mathcal C \to \mathcal D| consists of a functor |F^I : \mathcal C^I \to \mathcal D^I| for each object |I| of |\mathbf S| and a natural isomorphism |F^f : \mathcal D(f) \circ F^J \cong F^I \circ \mathcal C(f)| for each |f : I \to J| in |\mathbf S|.

An indexed natural transformation corresponds to a modification which is the name for the 3-cells between the 2-cells in the 3-category of 2-categories. For us, this works out to be the following: for each object |I| of |\mathbf S|, we have a natural transformation |\alpha^I : F^I \to G^I| such that for each |f : I \to J| the following diagram commutes

$$\begin{CD}
\mathcal D(f) \circ F^J @>id_{\mathcal D(f)}*\alpha^J>> \mathcal D(f) \circ G^J \\
@V\cong VV @VV\cong V \\
F^I \circ \mathcal C(f) @>>\alpha^I*id_{\mathcal C(f)}> G^I \circ \mathcal C(f)
\end{CD}$$

where the isomorphisms are the isomorphisms from the pseudonaturality of |F| and |G|.

Indexed adjunctions can now be defined via the unit and counit definition which works in any bicategory. In particular, since indexed functors consist of families of functors and indexed natural transformations consist of families of natural transformations, both indexed by the objects of |\mathbf S|, part of the data of an indexed adjunction is a family of adjunctions.

Let’s work out what the additional data is. First, to establish notation, we have indexed functor |F : \mathcal D\to \mathcal C| and |U : \mathcal C \to \mathcal D| such that |F \dashv U| in an indexed sense. That means we have |\eta : Id \to U \circ F| and |\varepsilon : F \circ U \to Id| as indexed natural transformations. The first pieces of additional data, then, are the fact that |F| and |U| are indexed functors, so we have natural isomorphisms |F^f : \mathcal C(f)\circ F^J \to F^I\circ \mathcal D(f)| and |U^f : \mathcal C(f) \circ U^J \to U^I \circ \mathcal D(f)| for each |f : I \to J| in |\mathbf S|. The next pieces of additional data, or rather constraints, are the coherence conditions on |\eta| and |\varepsilon|. These work out to

$$\begin{gather}
U^I(F^f)^{-1} \circ \eta_{\mathcal D(f)}^I = U_{F^J}^f \circ \mathcal D(f)\eta^J \qquad\text{and}\qquad
\varepsilon_{\mathcal C(f)}^I \circ F^I U^f = \mathcal C(f)\varepsilon^J \circ (F_{U^J}^f)^{-1}
\end{gather}$$

This doesn’t look too much like the example in the introduction, but maybe some of this additional data is redundant. If we didn’t already know where we end up, one hint would be that |(F^f)^{-1} : F^I \circ \mathcal C(f) \to \mathcal D(f) \circ F^J| and |U^f : \mathcal D(f) \circ U^J \to U^I \circ \mathcal C(f)| look like mates. Indeed, it would be quite nice if they were as mates uniquely determine each other and this would make the reindexing give rise to a morphism of adjunctions. Unsurprisingly, this is the case.

To recall, generally, given adjunctions |F \dashv U : \mathcal C \to \mathcal D| and |F’ \dashv U’ : \mathcal C’ \to \mathcal D’|, a **morphism of adjunctions** from the former to the latter is a pair of functors |K : \mathcal C \to \mathcal C’| and |L : \mathcal D \to \mathcal D’|, and a natural transformation |\lambda : F’ \circ L \to K \circ F| or, equivalently, a natural transformation |\mu : L \circ U \to U’ \circ K|. You can show that there is a bijection |[\mathcal D,\mathcal C’](F’\circ L, K \circ F) \cong [\mathcal C, \mathcal D’](L \circ U, U’ \circ K)|. Concretely, |\mu = U’K\varepsilon \circ U’\lambda_U \circ \eta’_{LU}| provides the mapping in one direction. The mapping in the other direction is similar, and we can prove it is a bijection using the triangle equalities. |\lambda| and |\mu| are referred to as **mates** of each other.

In our case, |K| and |L| will be reindexing functors |\mathcal C(f)| and |\mathcal D(f)| respectively for some |f : I \to J|. We need to show that the family of adjunctions and the coherence conditions on |\eta| and |\varepsilon| force |(F^f)^{-1}| and |U^f| to be mates. The proof is as follows:

$$\begin{align}
& U^I \mathcal C(f) \varepsilon^J \circ U^I(F_{U^J}^f)^{-1} \circ \eta_{\mathcal D(f)U^J}^I & \qquad \{\text{coherence of }\eta \} \\
= \quad & U^I \mathcal C(f) \varepsilon^J \circ U_{F^JU^J}^f \circ \mathcal D(f)\eta_{U^J}^J & \qquad \{\text{naturality of }U^f \} \\
= \quad & U^f \circ \mathcal D(f)U^J\varepsilon^J \circ \mathcal D(f)\eta_{U^J}^J & \qquad \{\text{functoriality of }\mathcal D(f) \} \\
= \quad & U^f \circ \mathcal D(f)(U^J\varepsilon^J \circ \eta_{U^J}^J) & \qquad \{\text{triangle equality} \} \\
= \quad & U^f &
\end{align}$$

The next natural question is: if we know |(F^f)^{-1}| and |U^f| are mates, do we still need the coherence conditions on |\eta| and |\varepsilon|? The answer is “no”.

$$\begin{align}
& U_{F^J}^f \circ \mathcal D(f)\eta^J & \qquad \{\text{mate of }U^f \} \\
= \quad & U^I \mathcal C(f) \varepsilon_{F^J}^J \circ U^I(F_{F^J}^f)^{-1} \circ \eta^I_{\mathcal D(f)U^I} \circ \mathcal D(f)\eta^J & \{\text{naturality of }\eta^I \} \\
= \quad & U^I \mathcal C(f) \varepsilon_{F^J}^J \circ U^I(F_{F^J}^f)^{-1} \circ U^I F^I D(f)\eta^J \circ \eta_{\mathcal D(f)}^I & \{\text{naturality of }U^I(F^f)^{-1} \} \\
= \quad & U^I \mathcal C(f) \varepsilon_{F^J}^J \circ U^I\mathcal C(f)F^J\eta^J \circ U^I (F^f)^{-1} \circ \eta_{\mathcal D(f)}^I & \{\text{functoriality of }U^I\mathcal C(f) \} \\
= \quad & U^I \mathcal C(f)(\varepsilon_{F^J}^J \circ F^J\eta^J) \circ U^I(F^f)^{-1} \circ \eta_{\mathcal D(f)}^I & \{\text{triangle equality} \} \\
= \quad & U^I (F^f)^{-1} \circ \eta_{\mathcal D(f)}^I &
\end{align}$$

Similarly for the other coherence condition.

We’ve shown that if |U| is an indexed functor it has a left adjoint exactly when each |U^I| has a left adjoint, |F^I|, ** and** for each |f : I \to J|, the mate of |U^f| with respect to those adjoints, which will be |(F^f)^{-1}|, is invertible. This latter condition is the Beck-Chevalley condition. As you can quickly verify, an invertible natural transformation doesn’t imply that it’s mate is invertible. Indeed, if |F| and |F’| are left adjoints and |\lambda : F’\circ L \to K \circ F| is invertible, then |\lambda^{-1} : K \circ F \to F’ \circ L| is not of the right form to have a mate (unless |F| and |F’| are also right adjoints and, particularly, an adjoint equivalence if we want to get an inverse to the mate of |\lambda|).

We’ve answered questions 1 and 2 from above, but 3 is still open, and we’ve generated a new question: what is the *indexed* functor whose left adjoint we’re finding? The family of reindexing functors isn’t indexed by objects of |\mathbf S| but, most obviously, by *arrows* of |\mathbf S|. To answer these questions, we’ll consider a more general notion of indexed (co)products.

A **comprehension category** is a functor |\mathcal P : \mathcal E \to \mathbf S^{\to}| (where |\mathbf S^{\to}| is the arrow category) such that |p = \mathsf{cod} \circ \mathcal P| is a (Grothendieck) fibration and |\mathcal P| takes (|p|-)cartesian arrows of |\mathcal E| to pullback squares in |\mathbf S^{\to}|. It won’t be necessary to know what a fibration is, as we’ll need only a few simple examples, but fibrations provide a different, and in many ways better, perspective^{1} on indexed categories and being able to move between the perspectives is valuable.

A comprehension category can also be presented as a natural transformation |\mathcal P : \{{-}\} \to p| where |\{{-}\}| is just another name for |\mathsf{dom} \circ \mathcal P|. This natural transformation induces an indexed functor |\langle\mathcal P\rangle : \mathcal C \circ p \to \mathcal C \circ \{{-}\}| where |\mathcal C| is an |\mathbf S|-indexed category. We have **|\mathcal P|-(co)products** when there is an indexed (left) right adjoint to this indexed functor.

One of the most important fibrations is the codomain fibration |\mathsf{cod} : \mathbf S^{\to} \to \mathbf S| which corresponds to |Id| as a comprehension category. However, |\mathsf{cod}| is only a fibration when |\mathbf S| has all pullbacks. In particular, the cartesian morphisms of |\mathbf S^{\to}| are the pullback squares. However, we can define the notion of cartesian morphism with respect to any functor; we only need |\mathbf S| to have pullbacks for |\mathsf{cod}| to be a fibration because a fibration requires you to have *enough* cartesian morphisms. However, given *any* functor |p : \mathcal E \to \mathbf S|, we have a subcategory |\mathsf{Cart}(p) \hookrightarrow \mathcal E| which consists of just the cartesian morphisms of |\mathcal E|. The composition |\mathsf{Cart}(p)\hookrightarrow \mathcal E \to \mathbf S| is always a fibration.

Thus, if we consider the category |\mathsf{Cart}(\mathsf{cod})|, this will consist of whatever pullback squares exist in |\mathbf S|. The inclusion |\mathsf{Cart}(\mathsf{cod}) \hookrightarrow \mathbf S^{\to}| gives us a comprehension category. Write |\vert\mathsf{cod}\vert| for that comprehension category. The definition in the introduction is now seen to be equivalent to having |\vert\mathsf{cod}\vert|-coproducts. That is, the indexed functor |\langle\vert\mathsf{cod}\vert\rangle| having an indexed left adjoint. The Beck-Chevalley condition is what is necessary to show that a family of left (or right) adjoints to (the components of) an indexed functor combine together into an *indexed* functor.

Indexed categories are, in some sense, a

*presentation*of fibrations which are the more intrinsic notion. This means it is better to work out concepts with respect to fibrations and then see what this means for indexed categories rather than the other way around or even using the “natural” suggestions. This is why indexed categories are pseudofunctors rather than either strict or lax functors. For our purposes, we have an equivalence of 2-categories between the 2-category of |\mathbf S|-indexed categories and the 2-category of fibrations over |\mathbf S|.↩︎

Below I’ll be focused primarily on absolute colimits as those are the most commonly used examples. They are an important part of the theory of monadicity. The trick to many questions about absolute colimits and related concepts is to see what it means for |\newcommand{\Set}{\mathbf{Set}}\newcommand{\Hom}{\mathsf{Hom}} \newcommand{onto}{\twoheadrightarrow}\Hom| functors to preserve them.

To start, we can show that certain colimits *cannot* be absolute, at least for |\Set|-enriched category theory. In particular, initial objects and coproducts are never absolute. Using the trick above, this is easily proven.

\[\Hom(0,0)\cong 1 \not\cong 0\]

\[\Set(\Hom(X+Y,Z),1)\cong 1 \not\cong 2\cong\Set(\Hom(X,Z),1)+\Set(\Hom(Y,Z),1)\]

What do absolute epimorphisms look like? We know that there *are* absolute epimorphisms because a split epimorphism is defined by a certain diagram commuting. Are there other absolute epimorphisms? To find out, we apply our trick.

Let |r:X\onto Y| be our epimorphism. Then we have the surjection \[\Hom(Y,r):\Hom(Y,X)\onto\Hom(Y,Y)\] but this means that for every arrow |f:Y\to Y|, there’s an arrow |s:Y\to X| such that |f = r \circ s|. As you can no doubt guess, we want to choose |f=id_Y|, and we then have that |r| is a split epimorphism. Therefore split epimorphisms are the only examples of absolute epimorphisms.

Now let’s consider the coequalizer case. Let |f,g:X\to Y| and |e:Y\onto C| be their coequalizer which we’ll assume is absolute. Before we pull out our trick, we can immediately use the previous result to show that |e| has a section, i.e. an arrow |s : C\rightarrowtail Y| such that |id_C=e\circ s|. Moving on, we use the trick to get the diagram: \[\Hom(Y,X)\rightrightarrows\Hom(Y,Y)\onto\Hom(Y,C)\]

Next, we use the explicit construction of the coequalizer in |\Set| which |\Hom(Y,C)| is supposed to be canonically isomorphic to. That is, the coequalizer of |\Hom(Y,f)| and |\Hom(Y,g)| is |\Hom(Y,Y)| quotiented by the equivalence relation generated by the relation which identifies |h,k:Y\to Y| when |\exists j:Y\to X.h = f\circ j \land k = g\circ j|. Let |[h]| represent the equivalence class of |h|. The claim that |\Hom(Y,C)| is (with |\Hom(Y,e)|) a coequalizer of the above arrows implies that |e\circ h = \bar e([h])| and |[h]=\bar e^{-1}(e\circ h)| with |\bar e| and |\bar e^{-1}| forming an isomorphism. Of course, our next move is to choose |h=id_Y| giving |e=\bar e([id_Y])|. However, |e=e\circ s\circ e = \bar e([s\circ e])| so we get |[id_Y]=[s\circ e]| because |\bar e| is invertible.

If we call the earlier relation |\sim| and write |\sim^*| for its reflexive, symmetric, transitive closure, then |[id_Y] = \{h:Y\to Y\mid id_Y\sim^* h\}|. Therefore |id_Y \sim^* s\circ e|. Now let’s make a simplifying assumption and assume further that |id_Y \sim s\circ e|, i.e. that |id_Y| is *directly* related to |s\circ e| by |\sim|. By definition of |\sim| this means there is a |t : Y\to X| such that |id_Y = f\circ t| and |s\circ e = g\circ t|. A given triple of |f|, |g|, and |e| such that |e\circ f = e\circ g| and equipped with a |s : C\to Y| and |t : Y\to X| satisfying the previous two equations along with |q\circ s = id_C| is called a **split coequalizer**. This data is specified diagrammatically and so is preserved by all functors, thus split coequalizers are absolute. All that we need to show is that this data is enough, on its own, to show that |e| is a coequalizer.

Given any |q : Y\to Z| such that |q\circ f = q\circ g|, we need to show that there exists a unique arrow |C\to Z| which |q| factors through. The obvious candidate is |q\circ s| leading to us needing to verify that |q=q\circ s\circ e|. We calculate as follows: \[ \begin{align} q \circ s \circ e & = q \circ g \circ t \\ & = q \circ f \circ t \\ & = q \end{align}\] Uniqueness then quickly follows since if |q = k\circ e| then |q\circ s = k\circ e \circ s = k|. |\square|

There’s actually an even simpler example where |s\circ e = id_Y| which corresponds to the trivial case where |f=g|.

Split coequalizers show that (non-trivial) absolute coequalizers can exist, but they don’t exhaust all the possibilities. The obvious cause of this is the simplifying assumption we made where we said |id_Y\sim s\circ e| rather than |id_Y\sim^* s\circ e|. In the general case, the equivalence will be witnessed by a sequence of arrows |t_i : Y\to X| such that we have either |s\circ e = g\circ t_0| or |s \circ e = f\circ t_0|, then |f\circ t_0 = g\circ t_1| or |g\circ t_0 = f\circ t_1| respectively, and so on until we get to |f\circ t_n = id_Y| or |g\circ t_n = id_Y|. As a diagram, this is a fan of diamonds of the form |f\circ t_i = g\circ t_{i+1}| or |g\circ t_i = f\circ t_{i+1}| with a diamond with side |s\circ e| on one end of the fan and a triangle with |id_Y| on the other. All this data is diagrammatic so it is preserved by all functors making the coequalizer absolute. That it *is* a coequalizer uses the same proof as for split coequalizers except that we have a series of equalities to show that |q\circ s \circ e = q|, namely all those pasted together diamonds. There is no conceptual difficulty here; it’s just awkward to notate.

The absolute coequalizer case captures the spirit of the general case, but you can see a description here. I’m not going to work through it, but you could as an exercise. Less tediously, you could work through absolute pushouts. If |P| is the pushout of |Y \leftarrow X \to Z|, then the functors to consider are |\Hom(P,-)| and |\Hom(Y,-)+\Hom(Z,-)|. For each, the pushout in |\Set| can be turned into a coequalizer of a coproduct. For the first functor, as before, this gives us an inverse image of |id_P| which will either be an arrow |P\to Y| or an arrow |P\to Z| which will play the role of |s|. The other functor produces a coequalizer of |\Hom(Y,Y)+\Hom(Z,Y)+\Hom(Y,Z)+\Hom(Z,Z)|. The generating relation of the equivalence relation states that there exists either an arrow |Y\to X| or an arrow |Z\to X| such that the appropriate equation holds. The story plays out much as before except that we have a sequence of arrows from two different hom-sets.

]]>When sets are first introduced to students, usually examples are used with finite, explicitly presented sets. For example, #{1,2,3} uu {2,3,4} = {1,2,3,4}#. This is the beginning of the idea that a set is a “collection” of things. Later, when infinite sets are introduced, the idea that sets are “collections” of things is still commonly used as the intuitive basis for the definitions. While I personally find this a bit of a philosophical bait-and-switch, my main issue with it is that I don’t think it does a good job reflecting how we work with sets day-to-day nor for more in-depth set-theoretic investigations. Instead, I recommend thinking about infinite sets as defined by properties and #x in X# for some infinite set #X# means checking whether it satisfies the property defining #X#, *not* rummaging through the infinite elements that make up #X# and seeing if #x# is amongst them. This perspective closely reflects how we prove things about infinite sets. It makes it much clearer that the job is to find logical relationships between the properties that define sets.

Of course, this view can also be applied to finite sets and *should* be applied to them. For a constructivist, the notion of “finite set” splits into multiple inequivalent notions, and it is quite easy to show that there is a subset of #{0,1}# which is not “finite” with respect to strong notions of “finite” that are commonly used by constructivists. Today, though, I’ll stick with classical logic and classical set theory. In particular, I’m going to talk about the Internal Set Theory of Edward Nelson, or mostly the small fragment he used in Radically Elementary Probability Theory. In the first chapter of an unfinished book on Internal Set Theory, he states the following:

Perhaps it is fair to say that “finite” does not mean what we have always thought it to mean. What have we always thought it to mean? I used to think that I knew what I had always thought it to mean, but I no longer think so.

While it may be a bit strong to say that Internal Set Theory leads to some question about what “finite” means, I think it makes a good case for questioning what “finite set” means. These concerns are similar to the relativity of the notion of “(un)countable set”.

Minimal Internal Set Theory (**minIST**) starts by taking all the axiom( schema)s of **ZFC** as axioms. So right from the get-go we can do “all of math” in exactly the same way we used to. **minIST** is a conservative extension of **ZFC**, so a formula of **minIST** can be translated to a formula of **ZFC** that is provable if and only if the original formula was provable in **minIST**.

Before going into what makes **minIST** different from **ZFC**, let’s talk about what “finite set” means in **ZFC** (and thus **minIST**). There are multiple equivalent characterizations of a set being “finite”. One common definition is a set that has no bijection with a proper subset of itself. A more positive and more useful equivalent definition is a set is **finite** if and only if it has a bijection with #{k in bbbN | k < n}# for some #n in bbbN#. This latter definition will be handier for our purposes here.

The only new primitive concept **minIST** adds is the concept of a “standard” natural number. That is, a new predicate symbol is added to the language of **ZFC**, `#sf "St"#`

, and #n in bbbN# is **standard** if and only if `#sf "St"(n)#`

holds. For convenience, we’ll define the following shorthand: `#forall^{:sf "St":}n.P(n)#`

for `#forall n.sf "St"(n) => P(n)#`

. This is very similar to the shorthand: #forall x in X.P(x)# which stands for #forall x.x in X => P(x)#. In fact, if we had a set of standard natural numbers then we could use this more common shorthand, but it will turn out that the existence of a set of standard natural numbers is contradictory.

The four additional axiom( schema)s of **minIST** are the following:

`#sf "St"(0)#`

, i.e. #0# is standard.`#forall n.sf "St"(n) => sf "St"(n+1)#`

, i.e. if #n# is standard, then #n+1# is standard.`#P(0) ^^ (forall^{:sf "St":}n.P(n) => P(n+1)) => forall^{:sf "St":} n.P(n)#`

for every formula #P#, i.e. we can do induction over standard natural numbers to prove things about all standard natural numbers.`#exists n in bbbN.not sf "St"(n)#`

, i.e. there exists a natural number that is not standard.

That’s it. There is a bit of a subtlety though. The Axiom Schema of Separation (aka the Axiom Schema of Specification or Comprehension) is an axiom schema of **ZFC**. It is the axiom schema that justifies the notion of set comprehension, i.e. defining a subset of #X# via #{x in X | P(x)}#. This axiom schema is indexed by the formulas *of ZFC*. When we incorporated

`#sf "St"#`

are called `#{n in bbbN | sf "St"(n)}#`

. This doesn’t mean there isn’t some other way of making the set of standard natural numbers. That possibility is excluded by the following proof.Suppose there is some set #S# such that `#n in S <=> sf "St"(n)#`

. Using axioms 1 and 2 we can prove that #0 in S# and if #n in S# then #n+1 in S#. The *internal* form of induction, like all internal theorems, still holds. Internal induction is just axiom 3 but with #forall# instead of `#forall^{:sf "St":}#`

. Using internal induction, we can immediately prove that #forall n in bbbN.n in S#, but this means `#forall n in bbbN.sf "St"(n)#`

which directly contradicts axiom 4, thus there is no such #S#. #square#

This negative result is actually quite productive. For example, we can’t have a set #N# of nonstandard natural numbers otherwise we could define #S# as #bbbN \\ N#. This means that if we can prove some *internal* property holds for all nonstandard naturals, then there must exist a standard natural number for which it holds as well. Properties like this are called **overspill** because an internal property that holds for the standard natural numbers spills over into the nonstandard natural numbers. Here’s a small example. First, a real number #x# is **infinitesimal** if and only if there exists a nonstandard natural #nu# such that `#|x| <= 1/nu#`

. (Note, we can prove that every nonstandard natural is larger than every standard natural.) Two real numbers #x# and #y# are **nearly equal**, written |x \simeq y|, if and only if #x - y# is infinitesimal. A function #f : bbbR -> bbbR# is **(nearly) continuous at #x#** if and only if for all #y in bbbR#, if |x \simeq y| then |f(x) \simeq f(y)|. Now if #f# is continuous at #x#, then for every standard natural #n# and #y in bbbR#, `#|x - y| <= delta => |f(x) - f(y)| <= 1/n#`

holds for all infinitesimal #delta# by continuity. Thus by overspill it holds for some non-infinitesimal #delta#, i.e. for some #delta > 1/m# for some standard #m#. #square#

In Radically Elementary Probability Theory, Nelson reformulates much of probability theory and stochastic process theory by restricting to the finite case. Traditionally, probability theory for finite sample spaces is (relatively) trivial. But remember, “finite” means bijective with #{k in bbbN | k < n}# for some #n in bbbN# and that #n# can be nonstandard in **minIST**. The notion of “continuous” defined before translates to the usual notion when the **minIST** formula is translated to **ZFC**. Similarly, subsets of infinitesimal probability become subsets of measure zero. Sums of infinitesimals over finite sets of nonstandard cardinality become integrals. Little to nothing is lost, instead it is just presented in a different language, a language that often lets you say complex things simply.

Again, I want to reiterate that *everything true in* **ZFC** *is true in* **minIST**. One of the issues with Robinson-style nonstandard analysis is that the hyperreals are not an Archimedean field. An ordered field #F# is Archimedean if for all #x in F# such that #x > 0# there is an #n in bbbN# such that #nx > 1#. The hyperreals serve as a model of the (**min**)**IST** reals. *Within* **minIST**, the Archimedean property holds. Even if #x in bbbR# is infinitesimal, there is still a nonstandard #n in bbbN# such that #nx > 1#. But this statement translates to the statement in the meta-language that there is a hypernatural #n# such that for every positive hyperreal #x#, #nx > 1# which does hold. However, if we use the *meta-language’s* notion of “natural number”, this statement is false.

Full Internal Set Theory (**IST**) only has three axiom schemas in addition to the axioms of **ZFC**. Like **minIST**, there is `#sf "St"#`

but now we talk about standard sets in general, not just standard natural numbers. The three axiom schemas are as follows:

`#forall^{:sf "St":}t_1 cdots forall^{:sf "St":}t_n.(forall^{:sf "St":} x.P(x, t_1, ..., t_n)) <=> forall x.P(x, t_1, ..., t_n)#`

for internal #P# with no (other) free variables.`#(forall^{:sf "StFin":}X.exists y.forall x in X.P(x, y)) <=> exists y.forall^{:sf "St"} x.P(x, y)#`

for internal #P# which may have other free variables.`#forall^{:sf "St":}X.exists^{:sf "St"} Y.forall^{:sf "St":}z.(z in Y <=> z in X ^^ P(z))#`

for*any*formula #P# which may have other free variables.

These are called the Transfer Principle (T), the Idealization Principle (I), and the Standardization Principle (S) respectively. `#forall^{sf "StFin"}X.P(X)#`

means for all #X# which are both standard and finite. The Transfer Principle essentially states that an internal property is true for all sets if and only if it is true for all standard sets. The Transfer Principle allows us to prove that any internal property that is satisfied uniquely is satisfied by a standard set. This means everything we can define in ZFC is standard, e.g. the set of reals, the set of naturals, the set of functions from reals to naturals, any particular natural, #pi#, the Riemann sphere.

The Idealization Principle is what allows nonstandard sets to exist. For example, let #P(x, y) -= y in bbbN ^^ (x in bbbN => x < y)#. The Idealization Principle then states that if for any finite set of natural numbers, we can find a natural number greater than all of them, then there is a natural number greater than any standard natural number. Since we clearly can find a natural number greater than any natural number in some given finite set of natural numbers, the premise holds and thus we have a natural number greater than any standard one and thus necessarily nonstandard. In fact, a similar argument can be used to show that *all* infinite sets have nonstandard elements. A related argument shows that a standard, finite set is exactly a set all of whose elements are standard.

The Standardization Principle isn’t needed to derive **minIST** nor is it needed for the following result so I won’t discuss it further. Another result derivable from Idealization is: there is a finite set that contains every standard set. To prove it, simply instantiate the Idealization Principle with `#P(x,y) -= x in y ^^ y\ "is finite"#`

where “is finite” stands for the formalization of any of the definitions of “finite” given before. It is in the discussion after proving this that Nelson makes the statement quoted in the introduction. We can prove some results about this set. First, it can’t be standard itself. Here are three different reasons why: 1) if it were standard, then it would include itself violating the Axiom of Foundation/Regularity; 2) if it were standard, then by Transfer we could prove that it contains *all* sets again violating the Axiom of Foundation; 3) if it were standard, then by the result mentioned in the previous paragraph, it would contain *only* standard elements, but then we could intersect it with #bbbN# to get the set of standard naturals which does not exist. Like the fact that there is no smallest nonstandard natural, there is no smallest finite set containing all standard sets.

**IST** also illustrates another concept. In set theory we often talk about classes. My experience with this concept was a series of poor explanations: a class was “the extent of a predicate”, “the ‘collection’ of entities that satisfy a predicate” and a proper class was a “collection” that was “too big” to be a set like the class of all sets. I muddled on with vague descriptions like this for quite some time before I finally figured out what they meant. Harking back to my first paragraph, a **class** (in **ZFC**^{1}) is just a (unary) predicate, i.e. a formula (of **ZFC**) with one free variable, or at least it can be represented by such^{2}. A formula #P(x)# is a **proper class** if there is no set #X# such that #forall x.P(x) <=> x in X#. The “class of all sets” is simply the constantly true predicate. This is a proper class because a set #U# such that #forall x.x in U# leads to a contradiction, e.g. Russell’s paradox by warranting #{x in U | x notin x}# via the Axiom of Separation. It may have already dawned on you that `#sf "St"(x)#`

is a class in (**min**)**IST** and, in fact, a *proper* class. All of those times when I said “there is no set of standard naturals/nonstandard naturals/infinitesimals/standard sets”, I could equally well have said “there is a proper class of standard naturals/nonstandard naturals/infinitesimals/standard sets”. The result of the previous paragraph means the (proper) class of standard sets – far from being “too big” to be a set – is a subclass of (the class induced by) a finite set! Classes were never about “size”^{3}.

To reiterate, **IST** is a *conservative extension* of **ZFC**. Focusing first on the “conservative” part, when we translate the theorem that there is a finite set containing all standard sets to **ZFC**, it becomes the tautologous statement that every finite set is a subset of some finite set. This starts to reveal what is happening in **IST**. However, focusing on the “extension” part, *nothing in* **ZFC** *refutes any of* **IST**. To take a Platonist perspective, these nonstandard sets could “always have been there” and **IST** just finally lets us talk about them. If you were a Platonist but didn’t want to accept these nonstandard sets, the conservativity result would still allow you to use **IST** as “just a shorthand” for formulas in **ZFC**. Either way, it is clear that the notion of “finite set” in **ZFC** has a lot of wiggle room.

Some set theories have an explicit notion of “class”.↩︎

The point is that any logically equivalent formula would do, i.e. classes are equivalence classes of formulas under logical equivalence. One could arguably go further and say that the two formulas represent the same class even if we can’t

*prove*that they are logically equivalent as long as we (somehow) know they are “true” for exactly the same things. This extensional view of predicates is what the “extent” stuff is about.↩︎On the other hand, the Axiom Schema of Specification states that any subclass of a set is a set in ZFC, so for ZFC no proper class can be contained in a set. In fact, many non-traditional systems can be understood as only allowing certain subclasses of “sets” to be “sets”. For example, in constructive recursive mathematics, we may require a “set” to be recursively enumerable, but we can certainly have subclasses of recursively enumerable sets that are not recursively enumerable. See also this discussion on the Axiom of Choice and “size”.↩︎