# Hedonistic Learning

Learning for the fun of it

## Introduction

Complex-step differentiation is a simple and effective technique for numerically differentiating a(n analytic) function. Discussing it is a neat combination of complex analysis, numerical analysis, and ring theory. We’ll see that it is very closely connected to forward-mode automatic differentiation (FAD). For better or worse, while widely applicable, the scenarios where complex-step differentiation is the best solution are a bit rare. To apply complex-step differentiation, you need a version of your desired function that operates on complex numbers. If you have that, then you can apply complex-step differentiation immediately. Otherwise, you need to adapt the function to complex arguments. This can be done essentially automatically using the same techniques as automatic differentiation, but at that point you might as well use automatic differentiation. Adapting the code to complex numbers or AD takes about the same amount of effort, however, the AD version will be more efficient, more accurate, and easier to use.

Nevertheless, this serves as a simple example to illustrate several theoretical and practical ideas.

## Introduction

This is part 3 in a series. See the previous part about internal languages for indexed monoidal categories upon which this part heavily depends.

In category theory, the hom-sets between two objects can often be equipped with some extra structure which is respected by identities and composition. For example, the set of group homomorphisms between two abelian groups is itself an abelian group by defining the operations pointwise. Similarly, the set of monotonic functions between two partially ordered sets (posets) is a poset again by defining the ordering pointwise. Linear functions between vector spaces form a vector space. The set of functors between small categories is a small category. Of course, the structure on the hom-sets can be different than the objects. Trivially, with the earlier examples a vector space is an abelian group, so we could say that linear functions form an abelian group instead of a vector space. Likewise groups are monoids. Less trivially, the set of relations between two sets is a partially ordered set via inclusion. There are many cases where instead of hom-sets we have hom-objects that aren’t naturally thought of as sets. For example, we can have hom-objects be non-negative (extended) real numbers from which the category laws become the laws of a generalized metric space. We can identify posets with categories who hom-objects are elements of a two element set or, even better, a two element poset with one element less than or equal to the other.

This general process is called enriching a category in some other category which is almost always called $\newcommand{\V}{\mathcal V}\V$ in the generic case. We then talk about having $\V$-categories and $\V$-functors, etc. In a specific case, it will be something like $\mathbf{Ab}$-categories for an $\mathbf{Ab}$-enriched category, where $\mathbf{Ab}$ is the category of abelian groups. Unsurprisingly, not just any category will do for $\V$. However, it turns out very little structure is needed to define a notion of $\V$-category, $\V$-functor, $\V$-natural transformation, and $\V$-profunctor. The usual “baseline” is that $\V$ is a monoidal category. As mentioned in the previous post, paraphrasing Bénabou, notions of “families of objects/arrows” are ubiquitous and fundamental in category theory. It is useful for our purposes to make this structure explicit. For very little cost, this will also provide a vastly more general notion that will readily capture enriched categories, indexed categories, and categories that are simultaneously indexed and enriched, of which internal categories are an example. The tool for this is a (Grothendieck) fibration aka a fibered category or the mostly equivalent concept of an indexed category.1

To that end, instead of just a monoidal category, we’ll be using indexed monoidal categories. Typically, to get an experience as much like ordinary category theory as possible, additional structure is assumed on $\V$. In particular, it is assumed to be an (indexed) cosmos which means that it is an indexed symmetric monoidally closed category with indexed coproducts preserved by $\otimes$ and indexed products and fiberwise finite limits and colimits (preserved by the indexed structure). This is quite a lot more structure which I’ll introduce in later parts. In this part, I’ll make no assumptions beyond having an indexed monoidal category.

## Introduction

This is part 2 in a series. See the previous part about internal languages for (non-indexed) monoidal categories. The main application I have in mind – enriching in indexed monoidal categories – is covered in the next post.

As Jean Bénabou pointed out in Fibered Categories and the Foundations of Naive Category Theory (PDF) notions of “families of objects/arrows” are ubiquitous and fundamental in category theory. One of the more noticeable places early on is in the definition of a natural transformation as a family of arrows. However, even in the definition of category, identities and compositions are families of functions, or, in the enriched case, arrows of $\mathbf V$. From a foundational perspective, one place where this gets really in-your-face is when trying to formalize the notion of (co)completeness. It is straightforward to make a first-order theory of a finitely complete category, e.g. this one. For arbitrary products and thus limits, we need to talk about families of objects. To formalize the usual meaning of this in a first-order theory would require attaching an entire first-order theory of sets, e.g. ZFC, to our notion of complete category. If your goals are of a foundational nature like Bénabou’s were, then this is unsatisfactory. Instead, we can abstract out what we need of the notion of “family”. The result turns out to be equivalent to the notion of a fibration.

My motivations here are not foundational but leaving the notion of “family” entirely meta-theoretical means not being able to talk about it except in the semantics. Bénabou’s comment suggests that at the semantic level we want not just a monoidal category, but a fibration of monoidal categories1. At the syntactic level, it suggests that there should be a built-in notion of “family” in our language. We accomplish both of these goals by formulating the internal language of an indexed monoidal category.

As a benefit, we can generalize to other notions of “family” than set-indexed families. We’ll clearly be able to formulate the notion of an enriched category. It’s also clear that we’ll be able to formulate the notion of an indexed category. Of course, we’ll also be able to formulate the notion of a category that is both enriched and indexed which includes the important special case of an internal category. We can also consider cases with trivial indexing which, in the unenriched case, will give us monoids, and in the $\mathbf{Ab}$-enriched case will give us rings.

## Introduction

This is the first post in a series of posts on doing enriched indexed category theory and using the notion of an internal language to make this look relatively mundane. The internal language aspects are useful for other purposes too, as will be illustrated in this post, for example. This is related to the post Category Theory, Syntactically. In particular, it can be considered half-way between the unary theories and the finite product theories described there.

First in this series – this post – covers the internal language of a monoidal category. This is fairly straightforward, but it already provides some use. For example, the category of endofunctors on a category is a strict monoidal category, and so we can take a different perspective on natural transformations. This will also motivate the notions of a (virtual) bicategory and an actegory. Throughout this post, I’ll give a fairly worked example of turning some categorical content into rules of a type-/proof-theory.

The second post will add indexing to the notion of monoidal category and introduce the very powerful and useful notion of an indexed monoidal category.

The third post will formulate the notion of categories enriched in an indexed monoidal category and give the definitions which don’t require any additional assumptions.

The fourth post will introduce the notion and internal language for an indexed cosmos. Normally, when we do enriched category theory, we want the category into which we’re enriching to not be just a monoidal category but a cosmos. This provides many additional properties. An indexed cosmos is just the analogue of that for indexed monoidal categories.

The fifth post will then formulate categorical concepts for our enriched indexed categories that require some or all of these additional properties provided by an indexed cosmos.

At some point, there will be a post on virtual double categories as they (or, even better, augmented virtual double categories) are what will really be behind the notion of enriched indexed categories we’ll define. Basically, we’ll secretly be spelling out a specific instance of the $\mathsf{Mod}$ construction.

## Example Representability Argument

### Introduction

When I was young and dumb and first learning category theory, I got it into my head that arguments involving sets were not “categorical”. This is not completely crazy as the idea of category theory being an alternate “foundation” and categorical critiques of set theoretic reasoning are easy to find. As such, I tended to neglect techniques that significantly leveraged $\mathbf{Set}$, and, in particular, representability. Instead, I’d prefer arguments using universal arrows as those translate naturally and directly to 2-categories.

This was a mistake. I have long since completely reversed my position on this for both practical and theoretical reasons. Practically, representability and related techniques provide very concise definitions which lead to concise proofs which I find relatively easy to formulate and easy to verify. This is especially true when combined with the (co)end calculus. It’s also the case that for a lot of math you simply don’t need any potential generality you might gain by, e.g. being able to use an arbitrary 2-category. Theoretically, I’ve gained a better understanding of where and how category theory is (or is not) “foundational”, and a better understanding of what about set theory categorists were critiquing. Category theory as a whole does not provide an alternate foundation for mathematics as that term is usually understood by mathematicians. A branch of category theory, topos theory, does, but a topos is fairly intentionally designed to give a somewhat $\mathbf{Set}$-like experience. Similarly, many approaches to higher category theory still include a $\mathbf{Set}$-like level.

This is, of course, not to suggest ideas like universal arrows aren’t important or can’t lead to elegant proofs.

Below is a particular example of attacking a problem from the perspective of representability. I use this example more because it is a neat proof that I hadn’t seen before. There are plenty of simpler compelling examples, such as proving that right(/left) adjoints are (co)continuous, and I regularly use representability in proofs I presented on, e.g. the Math StackExchange.

### The Problem

An elementary topos, $\mathcal E$, can be described as a category with finite limits and power objects. Having power objects means having a functor $\mathsf P : \mathcal E^{op} \to \mathcal E$ such that $\mathcal E(A,\mathsf PB) \cong \mathsf{Sub}(A \times B)$ natural in $A$ and $B$ where $\mathsf{Sub}$ is the (contravariant) functor that takes an object to its set of subobjects. The action of $\mathsf{Sub}(f)$ for an arrow $f : A \to B$ is a function $m \mapsto f^*(m)$ where $m$ is a (representative) monomorphism and $f^*(m)$ is the pullback of $f$ along $m$ which is a monomorphism by basic facts about pullbacks. In diagrammatic form:
$$\require{AMScd} \begin{CD} f^{-1}(B') @>f^\ast(m)>> A \\ @VVV @VVfV \\ B' @>>m> B \end{CD}$$

This is a characterization of $\mathsf P$ via representability. We are saying that $\mathsf PB$ represents the functor $\mathsf{Sub}(- \times B)$ parameterized in $B$.

A well-known and basic fact about elementary toposes is that they are cartesian closed. (Indeed, finite limits + cartesian closure + a subobject classifier is a common alternative definition.) Cartesian closure can be characterized as $\mathcal E(- \times A, B) \cong \mathcal E(-,B^A)$ which characterizes the exponent, $B^A$, via representability. Namely, that $B^A$ represents the functor $\mathcal E(- \times A, B)$ parameterized in $A$. Proving that elementary toposes are cartesian closed is not too difficult, but it is a bit fiddly. This is the example that I’m going to use.

### Common Setup

All the proofs I reference rely on the following basic facts about an elementary topos.

We have the monomorphism $\top : 1 \to \mathsf P1$ induced by the identity arrow $\mathsf P1 \to \mathsf P1$.

We need the lemma that $\mathcal E(A \times B,PC) \cong \mathcal E(A,\mathsf P(B \times C))$. Proof:
\begin{align} \mathcal E(A \times B,\mathsf PC) \cong \mathsf{Sub}((A \times B) \times C) \cong \mathsf{Sub}(A \times (B \times C)) \cong \mathcal E(A,\mathsf P(B \times C))\ \square \end{align}

Since the arrow $\langle id_A, f\rangle : A \to A \times B$ is a monomorphism for any arrow $f : A \to B$, the map $f \mapsto \langle id, f \rangle$ is a map from $\mathcal E(-, B)$ to $\mathsf{Sub}(- \times B)$. Using $\mathsf{Sub}(- \times B) \cong \mathcal E(-,\mathsf PB)$, we get a map $\mathcal E(-, B) \to \mathsf{Sub}(- \times B) \cong \mathcal E(-,\mathsf PB)$. By Yoneda, i.e. by evaluating it at $id$, we get the singleton map: $\{\}_B : B \to \mathsf PB$. If we can show that $\{\}$ is a monomorphism, then, since $\mathsf PA \cong \mathsf P(A \times 1)$, we’ll get an arrow $\sigma : \mathsf PA \to \mathsf P1$ such that

$$\begin{CD} A @>\{\}_A>> \mathsf PA \\ @VVV @VV\sigma_AV \\ 1 @>>\top> \mathsf P1 \end{CD}$$

is a pullback.

$\{\}_A$ is a monomorphism because any $f, g : X \to A$ gets mapped by the above to $\langle id_X, f\rangle$ and $\langle id_X, g\rangle$ which represent the same subobject when $\{\} \circ f = \{\} \circ g$. Therefore, there’s an isomorphism $j : X \cong X$ such that $\langle id_X, f\rangle \circ j = \langle j, f \circ j\rangle = \langle id_X, g\rangle$ but this means $j = id_X$ and thus $f = g$.

### Other Proofs

To restate the problem: given the above setup, we want to show that the elementary topos $\mathcal E$ is cartesian closed.

Toposes, Triples, and Theories by Barr and Wells actually provides two proofs of this statement: Theorem 4.1 of Chapter 5. The first proof is in exactly the representability-style approach I’m advocating, but it relies on earlier results about how a topos relates to its slice categories. The second proof is more concrete and direct, but it also involves $\mathsf P\mathsf P\mathsf P\mathsf P B$…

Sheaves in Geometry and Logic by Mac Lane and Moerdijk also has this result as Theorem 1 of section IV.2 “The Construction of Exponentials”. The proof starts on page 167 and finishes on 169. The idea is to take the set theoretic construction of functions via their graphs and interpret that into topos concepts. This proof involves a decent amount of equational reasoning (either via diagrams or via generalized elements).

### Dan’s Proof via Representability

Contrast these to Dan Doel’s proof1 using representability, which proceeds as follows. (Any mistakes are mine.)

$$\begin{CD} B @>\{\}_B>> \mathsf PB \\ @VVV @VV\sigma_BV \\ 1 @>>\top> \mathsf P1 \end{CD}$$

Apply the functor $\mathcal E(= \times A,-)$ which preserves the fact that it is a pullback via continuity.
$$\begin{CD} \mathcal{E}(-\times A,B) @>>> \mathcal{E}(- \times A,\mathsf PB) \\ @VVV @VVV \\ \mathcal{E}(- \times A,1) @>>> \mathcal{E}(- \times A,\mathsf P1) \end{CD}$$

Note:

• $\mathcal E(- \times A,1) \cong 1 \cong \mathcal E(-,1)$ (by continuity)
• $\mathcal E(- \times A,\mathsf PB) \cong \mathcal E(-,\mathsf P(A \times B))$ (by definition of power objects)
• $\mathcal E(- \times A,\mathsf P1) \cong \mathcal E(-,\mathsf PA)$ (because $A \times 1 \cong A$)

This means that the above pullback is also the pullback of
$$\begin{CD} \mathcal E(-\times A,B) @>>> \mathcal E(-,\mathsf P(A\times B)) \\ @VVV @VVV \\ \mathcal E(-,1) @>>> \mathcal E(-,\mathsf PA) \end{CD}$$

Since $\mathcal E$ has all finite limits, it has the following pullback

$$\begin{CD} X @>>> \mathsf P(A \times B) \\ @VVV @VVV \\ 1 @>>> \mathsf PA \end{CD}$$

where the bottom and right arrows are induced by the corresponding arrows of the previous diagram by Yoneda. Applying $\mathcal E(=,-)$ to this diagram gives another pullback diagram by continuity

$$\begin{CD} \mathcal E(-,X) @>>> \mathcal E(-,\mathsf P(A\times B)) \\ @VVV @VVV \\ \mathcal E(-,1) @>>> \mathcal E(-,\mathsf PA) \end{CD}$$

which is to say $\mathcal E(- \times A, B) \cong \mathcal E(-, X)$ because pullbacks are unique up to isomorphism, so $X$ satisfies the universal property of $B^A$, namely $\mathcal E(- \times A, B) \cong \mathcal E(-,B^A)$.

1. Sent to me almost exactly three years ago.↩︎

## Introduction

Recursive helping is a technique for implementing lock-free concurrent data structures and algorithms. I’m going to illustrate this in the case of implementing a multi-variable compare-and-swap (MCAS) in terms of a single variable compare-and-swap. Basically everything I’m going to talk about comes from Keir Fraser’s PhD Thesis, Practical Lock-Freedom (2004) which I strongly recommend. Fraser’s thesis goes much further than this, e.g. fine-grained lock-free implementations of software transactional memory (STM)1. Fraser went on to contribute to the initial implementations of STM in Haskell, though his thesis uses C++.

## Beck-Chevalley

This is a fairly technical article. This article will most likely not have any significance for you if you haven’t heard of the Beck-Chevalley condition before.

## Introduction

When one talks about “indexed (co)products” in an indexed category, it is often described as follows:

Let $\mathcal C$ be an $\mathbf S$-indexed category, i.e. a pseudofunctor $\mathbf S^{op} \to \mathbf{Cat}$ where $\mathbf S$ is an ordinary category. Write $\mathcal C^I$ for $\mathcal C(I)$ and $f^* : \mathcal C^J \to \mathcal C^I$ for $\mathcal C(f)$ where $f : I \to J$. The functors $f^*$ will be called reindexing functors. $\mathcal C$ has $\mathbf S$-indexed coproducts whenever

1. each reindexing functor $f^*$ has a left adjoint $\Sigma_f$, and
2. the Beck-Chevalley condition holds, i.e. whenever
$$\require{AMScd}\begin{CD} I @>h>> J \\ @VkVV @VVfV \\ K @>>g> L \end{CD}$$
is a pullback square in $\mathbf S$, then the canonical morphism $\Sigma_k \circ h^* \to g^* \circ \Sigma_f$ is an isomorphism.

The first condition is reasonable, especially motivated with some examples, but the second condition is more mysterious. It’s clear that you’d need something more than simply a family of adjunctions, but it’s not clear how you could calculate the particular condition quoted. That’s the goal of this article. I will not cover what the Beck-Chevalley condition is intuitively saying. I cover that in this Stack Exchange answer from a logical perspective, though there are definitely other possible perspectives as well.

Some questions are:

1. Where does the Beck-Chevalley condition come from?
2. What is this “canonical morphism”?
3. Why do we care about pullback squares in particular?

## Absolute Colimits

In category theory a concept is called absolute if it is preserved by all functors. Identity arrows and composition are absolute by the definition of functor. Less trivially, isomorphisms are absolute. In general, anything that is described by a diagram commuting is absolute as that diagram will be preserved by any functor. This is generally the case, but if I tell you something is an absolute epimorphism, it’s not clear what diagram is causing that; the notion of epimorphism itself doesn’t reduce to the commutativity of a particular diagram.


## Finite

### Introduction

When sets are first introduced to students, usually examples are used with finite, explicitly presented sets. For example, {1,2,3} uu {2,3,4} = {1,2,3,4}. This is the beginning of the idea that a set is a “collection” of things. Later, when infinite sets are introduced, the idea that sets are “collections” of things is still commonly used as the intuitive basis for the definitions. While I personally find this a bit of a philosophical bait-and-switch, my main issue with it is that I don’t think it does a good job reflecting how we work with sets day-to-day nor for more in-depth set-theoretic investigations. Instead, I recommend thinking about infinite sets as defined by properties and x in X for some infinite set X means checking whether it satisfies the property defining X, not rummaging through the infinite elements that make up X and seeing if x is amongst them. This perspective closely reflects how we prove things about infinite sets. It makes it much clearer that the job is to find logical relationships between the properties that define sets.

Of course, this view can also be applied to finite sets and should be applied to them. For a constructivist, the notion of “finite set” splits into multiple inequivalent notions, and it is quite easy to show that there is a subset of {0,1} which is not “finite” with respect to strong notions of “finite” that are commonly used by constructivists. Today, though, I’ll stick with classical logic and classical set theory. In particular, I’m going to talk about the Internal Set Theory of Edward Nelson, or mostly the small fragment he used in Radically Elementary Probability Theory. In the first chapter of an unfinished book on Internal Set Theory, he states the following:

Perhaps it is fair to say that “finite” does not mean what we have always thought it to mean. What have we always thought it to mean? I used to think that I knew what I had always thought it to mean, but I no longer think so.

While it may be a bit strong to say that Internal Set Theory leads to some question about what “finite” means, I think it makes a good case for questioning what “finite set” means. These concerns are similar to the relativity of the notion of “(un)countable set”.

## The Pedagogy of Logic: A Rant

### Introduction

Over the years I’ve seen a lot of confusion about formal logic online. Usually this is from students (but I’ve also seen it with experienced mathematicians) on platforms like the Math or CS StackExchange. Now there is clearly some selection bias where the people asking the questions are the people who are confused, but while the questions about the confusions are common, the confusions are often evident even in questions about other aspects, the confusions are in the (upvoted!) answers, and when you look at the study material they are using it is often completely unsurprising that they are confused. Again, these confusions are also not limited to just “students”, though I’ll use that word in a broad sense below. Finally, on at least one of the points below, the confusion seems to be entirely on the instructors’ side.

An indication that this is true more broadly is this quote from the nLab:

The distinction between object language and metalanguage exists even in classical mathematics, but it seems that most classical mathematicians are not used to remembering it, although it is not entirely clear why this should be so. One possibly relevant observation is that even if P is a statement which is neither provable nor disprovable (like the continuum hypothesis), in classical mathematics it is still provable that “P is either true or false.” Moreover, classical model theory often restricts itself to two-valued models in which the only truth values are “true” and “false,” although classical logic still holds in Boolean-valued models, and in such a case the truth value of P may be neither “true” nor “false,” although the truth value of “P or not P” is still “true.” Certainly when talking about classical truths which fail constructively, such as excluded middle, it is important to remember that “fails to be provable” is different from “is provably false.”

To give an indication of the problem, here is my strong impression of what would happen if I gave a student who had just passed a full year introduction to formal logic the following exercise: “Give me a formal proof of (neg P => neg Q) => (Q => P)”. I suspect most would draw up a truth table. When I responded, “that’s not a formal proof,” they would be confused. If you are confused by that response, then this is definitely for you. I’m sure they would completely agree with me that if I asked for the inverse matrix of some matrix M, showing that the determinant of M is non-zero does not constitute an answer to that question. The parallelism of these scenarios would likely be lost on them though. While I think courses that focus on a syntactic approach to logic will produce students that are much more likely to give an appropriate answer to my exercise, that doesn’t mean they lack the confusions, just that this exercise isn’t a good discriminator for them. For example, if they don’t see the parallelism of the two scenarios I described, or worse, have no idea what truth tables have to do with anything, then they have some gap.

As a disclaimer, I am not an educator nor even an academic in any professional capacity.

As a summary, the points I will touch on are:

If anyone reading this is aware of introductory textbooks or other resources (ideally freely available, but I’ll take a reasonably priced textbook too) that avoid most or all of the major issues I list and otherwise do a good job, I would be very interested. My own education on these topics has been piecemeal and eclectic and also strongly dependent on my background in programming. This leaves me with no reasonable recommendations. Please leave a comment or email me, if you have such a recommendation.