Hedonistic Learning

Learning for the fun of it


This is the first post in a series of posts on doing enriched indexed category theory and using the notion of an internal language to make this look relatively mundane. The internal language aspects are useful for other purposes too, as will be illustrated in this post, for example. This is related to the post Category Theory, Syntactically. In particular, it can be considered half-way between the unary theories and the finite product theories described there.

First in this series – this post – covers the internal language of a monoidal category. This is fairly straightforward, but it already provides some use. For example, the category of endofunctors on a category is a strict monoidal category, and so we can take a different perspective on natural transformations. This will also motivate the notions of a (virtual) bicategory and an actegory. Throughout this post, I’ll give a fairly worked example of turning some categorical content into rules of a type-/proof-theory.

The second post will add indexing to the notion of monoidal category and introduce the very powerful and useful notion of an indexed monoidal category.

The third post will formulate the notion of categories enriched in an indexed monoidal category and give the definitions which don’t require any additional assumptions.

The fourth post will introduce the notion and internal language for an indexed cosmos. Normally, when we do enriched category theory, we want the category into which we’re enriching to not be just a monoidal category but a cosmos. This provides many additional properties. An indexed cosmos is just the analogue of that for indexed monoidal categories.

The fifth post will then formulate categorical concepts for our enriched indexed categories that require some or all of these additional properties provided by an indexed cosmos.

At some point, there will be a post on virtual double categories as they (or, even better, augmented virtual double categories) are what will really be behind the notion of enriched indexed categories we’ll define. Basically, we’ll secretly be spelling out a specific instance of the |\mathsf{Mod}| construction.



When I was young and dumb and first learning category theory, I got it into my head that arguments involving sets were not “categorical”. This is not completely crazy as the idea of category theory being an alternate “foundation” and categorical critiques of set theoretic reasoning are easy to find. As such, I tended to neglect techniques that significantly leveraged |\mathbf{Set}|, and, in particular, representability. Instead, I’d prefer arguments using universal arrows as those translate naturally and directly to 2-categories.

This was a mistake. I have long since completely reversed my position on this for both practical and theoretical reasons. Practically, representability and related techniques provide very concise definitions which lead to concise proofs which I find relatively easy to formulate and easy to verify. This is especially true when combined with the (co)end calculus. It’s also the case that for a lot of math you simply don’t need any potential generality you might gain by, e.g. being able to use an arbitrary 2-category. Theoretically, I’ve gained a better understanding of where and how category theory is (or is not) “foundational”, and a better understanding of what about set theory categorists were critiquing. Category theory as a whole does not provide an alternate foundation for mathematics as that term is usually understood by mathematicians. A branch of category theory, topos theory, does, but a topos is fairly intentionally designed to give a somewhat |\mathbf{Set}|-like experience. Similarly, many approaches to higher category theory still include a |\mathbf{Set}|-like level.

This is, of course, not to suggest ideas like universal arrows aren’t important or can’t lead to elegant proofs.

Below is a particular example of attacking a problem from the perspective of representability. I use this example more because it is a neat proof that I hadn’t seen before. There are plenty of simpler compelling examples, such as proving that right(/left) adjoints are (co)continuous, and I regularly use representability in proofs I presented on, e.g. the Math StackExchange.

The Problem

An elementary topos, |\mathcal E|, can be described as a category with finite limits and power objects. Having power objects means having a functor |\mathsf P : \mathcal E^{op} \to \mathcal E| such that |\mathcal E(A,\mathsf PB) \cong \mathsf{Sub}(A \times B)| natural in |A| and |B| where |\mathsf{Sub}| is the (contravariant) functor that takes an object to its set of subobjects. The action of |\mathsf{Sub}(f)| for an arrow |f : A \to B| is a function |m \mapsto f^*(m)| where |m| is a (representative) monomorphism and |f^*(m)| is the pullback of |f| along |m| which is a monomorphism by basic facts about pullbacks. In diagrammatic form:
$$\require{AMScd} \begin{CD} f^{-1}(B') @>f^\ast(m)>> A \\ @VVV @VVfV \\ B' @>>m> B \end{CD}$$

This is a characterization of |\mathsf P| via representability. We are saying that |\mathsf PB| represents the functor |\mathsf{Sub}(- \times B)| parameterized in |B|.

A well-known and basic fact about elementary toposes is that they are cartesian closed. (Indeed, finite limits + cartesian closure + a subobject classifier is a common alternative definition.) Cartesian closure can be characterized as |\mathcal E(- \times A, B) \cong \mathcal E(-,B^A)| which characterizes the exponent, |B^A|, via representability. Namely, that |B^A| represents the functor |\mathcal E(- \times A, B)| parameterized in |A|. Proving that elementary toposes are cartesian closed is not too difficult, but it is a bit fiddly. This is the example that I’m going to use.

Common Setup

All the proofs I reference rely on the following basic facts about an elementary topos.

We have the monomorphism |\top : 1 \to \mathsf P1| induced by the identity arrow |\mathsf P1 \to \mathsf P1|.

We need the lemma that |\mathcal E(A \times B,PC) \cong \mathcal E(A,\mathsf P(B \times C))|. Proof:
$$\begin{align} \mathcal E(A \times B,\mathsf PC) \cong \mathsf{Sub}((A \times B) \times C) \cong \mathsf{Sub}(A \times (B \times C)) \cong \mathcal E(A,\mathsf P(B \times C))\ \square \end{align}$$

Since the arrow |\langle id_A, f\rangle : A \to A \times B| is a monomorphism for any arrow |f : A \to B|, the map |f \mapsto \langle id, f \rangle| is a map from |\mathcal E(-, B)| to |\mathsf{Sub}(- \times B)|. Using |\mathsf{Sub}(- \times B) \cong \mathcal E(-,\mathsf PB)|, we get a map |\mathcal E(-, B) \to \mathsf{Sub}(- \times B) \cong \mathcal E(-,\mathsf PB)|. By Yoneda, i.e. by evaluating it at |id|, we get the singleton map: |\{\}_B : B \to \mathsf PB|. If we can show that |\{\}| is a monomorphism, then, since |\mathsf PA \cong \mathsf P(A \times 1)|, we’ll get an arrow |\sigma : \mathsf PA \to \mathsf P1| such that

$$\begin{CD} A @>\{\}_A>> \mathsf PA \\ @VVV @VV\sigma_AV \\ 1 @>>\top> \mathsf P1 \end{CD}$$

is a pullback.

|\{\}_A| is a monomorphism because any |f, g : X \to A| gets mapped by the above to |\langle id_X, f\rangle| and |\langle id_X, g\rangle| which represent the same subobject when |\{\} \circ f = \{\} \circ g|. Therefore, there’s an isomorphism |j : X \cong X| such that |\langle id_X, f\rangle \circ j = \langle j, f \circ j\rangle = \langle id_X, g\rangle| but this means |j = id_X| and thus |f = g|.

Other Proofs

To restate the problem: given the above setup, we want to show that the elementary topos |\mathcal E| is cartesian closed.

Toposes, Triples, and Theories by Barr and Wells actually provides two proofs of this statement: Theorem 4.1 of Chapter 5. The first proof is in exactly the representability-style approach I’m advocating, but it relies on earlier results about how a topos relates to its slice categories. The second proof is more concrete and direct, but it also involves |\mathsf P\mathsf P\mathsf P\mathsf P B|…

Sheaves in Geometry and Logic by Mac Lane and Moerdijk also has this result as Theorem 1 of section IV.2 “The Construction of Exponentials”. The proof starts on page 167 and finishes on 169. The idea is to take the set theoretic construction of functions via their graphs and interpret that into topos concepts. This proof involves a decent amount of equational reasoning (either via diagrams or via generalized elements).

Dan’s Proof via Representability

Contrast these to Dan Doel’s proof1 using representability, which proceeds as follows. (Any mistakes are mine.)

Start with the pullback induced by the singleton map.

$$\begin{CD} B @>\{\}_B>> \mathsf PB \\ @VVV @VV\sigma_BV \\ 1 @>>\top> \mathsf P1 \end{CD}$$

Apply the functor |\mathcal E(= \times A,-)| which preserves the fact that it is a pullback via continuity.
$$\begin{CD} \mathcal{E}(-\times A,B) @>>> \mathcal{E}(- \times A,\mathsf PB) \\ @VVV @VVV \\ \mathcal{E}(- \times A,1) @>>> \mathcal{E}(- \times A,\mathsf P1) \end{CD}$$


  • |\mathcal E(- \times A,1) \cong 1 \cong \mathcal E(-,1)| (by continuity)
  • |\mathcal E(- \times A,\mathsf PB) \cong \mathcal E(-,\mathsf P(A \times B))| (by definition of power objects)
  • |\mathcal E(- \times A,\mathsf P1) \cong \mathcal E(-,\mathsf PA)| (because |A \times 1 \cong A|)

This means that the above pullback is also the pullback of
$$\begin{CD} \mathcal E(-\times A,B) @>>> \mathcal E(-,\mathsf P(A\times B)) \\ @VVV @VVV \\ \mathcal E(-,1) @>>> \mathcal E(-,\mathsf PA) \end{CD}$$

Since |\mathcal E| has all finite limits, it has the following pullback

$$\begin{CD} X @>>> \mathsf P(A \times B) \\ @VVV @VVV \\ 1 @>>> \mathsf PA \end{CD}$$

where the bottom and right arrows are induced by the corresponding arrows of the previous diagram by Yoneda. Applying |\mathcal E(=,-)| to this diagram gives another pullback diagram by continuity

$$\begin{CD} \mathcal E(-,X) @>>> \mathcal E(-,\mathsf P(A\times B)) \\ @VVV @VVV \\ \mathcal E(-,1) @>>> \mathcal E(-,\mathsf PA) \end{CD}$$

which is to say |\mathcal E(- \times A, B) \cong \mathcal E(-, X)| because pullbacks are unique up to isomorphism, so |X| satisfies the universal property of |B^A|, namely |\mathcal E(- \times A, B) \cong \mathcal E(-,B^A)|.

  1. Sent to me almost exactly three years ago.↩︎


Recursive helping is a technique for implementing lock-free concurrent data structures and algorithms. I’m going to illustrate this in the case of implementing a multi-variable compare-and-swap (MCAS) in terms of a single variable compare-and-swap. Basically everything I’m going to talk about comes from Keir Fraser’s PhD Thesis, Practical Lock-Freedom (2004) which I strongly recommend. Fraser’s thesis goes much further than this, e.g. fine-grained lock-free implementations of software transactional memory (STM)1. Fraser went on to contribute to the initial implementations of STM in Haskell, though his thesis uses C++.


This is a fairly technical article. This article will most likely not have any significance for you if you haven’t heard of the Beck-Chevalley condition before.


When one talks about “indexed (co)products” in an indexed category, it is often described as follows:

Let |\mathcal C| be an |\mathbf S|-indexed category, i.e. a pseudofunctor |\mathbf S^{op} \to \mathbf{Cat}| where |\mathbf S| is an ordinary category. Write |\mathcal C^I| for |\mathcal C(I)| and |f^* : \mathcal C^J \to \mathcal C^I| for |\mathcal C(f)| where |f : I \to J|. The functors |f^*| will be called reindexing functors. |\mathcal C| has |\mathbf S|-indexed coproducts whenever

  1. each reindexing functor |f^*| has a left adjoint |\Sigma_f|, and
  2. the Beck-Chevalley condition holds, i.e. whenever
    $$\require{AMScd}\begin{CD} I @>h>> J \\ @VkVV @VVfV \\ K @>>g> L \end{CD}$$
    is a pullback square in |\mathbf S|, then the canonical morphism |\Sigma_k \circ h^* \to g^* \circ \Sigma_f| is an isomorphism.

The first condition is reasonable, especially motivated with some examples, but the second condition is more mysterious. It’s clear that you’d need something more than simply a family of adjunctions, but it’s not clear how you could calculate the particular condition quoted. That’s the goal of this article. I will not cover what the Beck-Chevalley condition is intuitively saying. I cover that in this Stack Exchange answer from a logical perspective, though there are definitely other possible perspectives as well.

Some questions are:

  1. Where does the Beck-Chevalley condition come from?
  2. What is this “canonical morphism”?
  3. Why do we care about pullback squares in particular?

In category theory a concept is called absolute if it is preserved by all functors. Identity arrows and composition are absolute by the definition of functor. Less trivially, isomorphisms are absolute. In general, anything that is described by a diagram commuting is absolute as that diagram will be preserved by any functor. This is generally the case, but if I tell you something is an absolute epimorphism, it’s not clear what diagram is causing that; the notion of epimorphism itself doesn’t reduce to the commutativity of a particular diagram.

Below I’ll be focused primarily on absolute colimits as those are the most commonly used examples. They are an important part of the theory of monadicity. The trick to many questions about absolute colimits and related concepts is to see what it means for |\newcommand{\Set}{\mathbf{Set}}\newcommand{\Hom}{\mathsf{Hom}} \newcommand{onto}{\twoheadrightarrow}\Hom| functors to preserve them.



When sets are first introduced to students, usually examples are used with finite, explicitly presented sets. For example, #{1,2,3} uu {2,3,4} = {1,2,3,4}#. This is the beginning of the idea that a set is a “collection” of things. Later, when infinite sets are introduced, the idea that sets are “collections” of things is still commonly used as the intuitive basis for the definitions. While I personally find this a bit of a philosophical bait-and-switch, my main issue with it is that I don’t think it does a good job reflecting how we work with sets day-to-day nor for more in-depth set-theoretic investigations. Instead, I recommend thinking about infinite sets as defined by properties and #x in X# for some infinite set #X# means checking whether it satisfies the property defining #X#, not rummaging through the infinite elements that make up #X# and seeing if #x# is amongst them. This perspective closely reflects how we prove things about infinite sets. It makes it much clearer that the job is to find logical relationships between the properties that define sets.

Of course, this view can also be applied to finite sets and should be applied to them. For a constructivist, the notion of “finite set” splits into multiple inequivalent notions, and it is quite easy to show that there is a subset of #{0,1}# which is not “finite” with respect to strong notions of “finite” that are commonly used by constructivists. Today, though, I’ll stick with classical logic and classical set theory. In particular, I’m going to talk about the Internal Set Theory of Edward Nelson, or mostly the small fragment he used in Radically Elementary Probability Theory. In the first chapter of an unfinished book on Internal Set Theory, he states the following:

Perhaps it is fair to say that “finite” does not mean what we have always thought it to mean. What have we always thought it to mean? I used to think that I knew what I had always thought it to mean, but I no longer think so.

While it may be a bit strong to say that Internal Set Theory leads to some question about what “finite” means, I think it makes a good case for questioning what “finite set” means. These concerns are similar to the relativity of the notion of “(un)countable set”.



Over the years I’ve seen a lot of confusion about formal logic online. Usually this is from students (but I’ve also seen it with experienced mathematicians) on platforms like the Math or CS StackExchange. Now there is clearly some selection bias where the people asking the questions are the people who are confused, but while the questions about the confusions are common, the confusions are often evident even in questions about other aspects, the confusions are in the (upvoted!) answers, and when you look at the study material they are using it is often completely unsurprising that they are confused. Again, these confusions are also not limited to just “students”, though I’ll use that word in a broad sense below. Finally, on at least one of the points below, the confusion seems to be entirely on the instructors’ side.

An indication that this is true more broadly is this quote from the nLab:

The distinction between object language and metalanguage exists even in classical mathematics, but it seems that most classical mathematicians are not used to remembering it, although it is not entirely clear why this should be so. One possibly relevant observation is that even if P is a statement which is neither provable nor disprovable (like the continuum hypothesis), in classical mathematics it is still provable that “P is either true or false.” Moreover, classical model theory often restricts itself to two-valued models in which the only truth values are “true” and “false,” although classical logic still holds in Boolean-valued models, and in such a case the truth value of P may be neither “true” nor “false,” although the truth value of “P or not P” is still “true.” Certainly when talking about classical truths which fail constructively, such as excluded middle, it is important to remember that “fails to be provable” is different from “is provably false.”

To give an indication of the problem, here is my strong impression of what would happen if I gave a student who had just passed a full year introduction to formal logic the following exercise: “Give me a formal proof of #(neg P => neg Q) => (Q => P)#”. I suspect most would draw up a truth table. When I responded, “that’s not a formal proof,” they would be confused. If you are confused by that response, then this is definitely for you. I’m sure they would completely agree with me that if I asked for the inverse matrix of some matrix #M#, showing that the determinant of #M# is non-zero does not constitute an answer to that question. The parallelism of these scenarios would likely be lost on them though. While I think courses that focus on a syntactic approach to logic will produce students that are much more likely to give an appropriate answer to my exercise, that doesn’t mean they lack the confusions, just that this exercise isn’t a good discriminator for them. For example, if they don’t see the parallelism of the two scenarios I described, or worse, have no idea what truth tables have to do with anything, then they have some gap.

As a disclaimer, I am not an educator nor even an academic in any professional capacity.

As a summary, the points I will touch on are:

If anyone reading this is aware of introductory textbooks or other resources (ideally freely available, but I’ll take a reasonably priced textbook too) that avoid most or all of the major issues I list and otherwise do a good job, I would be very interested. My own education on these topics has been piecemeal and eclectic and also strongly dependent on my background in programming. This leaves me with no reasonable recommendations. Please leave a comment or email me, if you have such a recommendation.


I couldn’t find an online version of Djinn, so I ran it through GHCJS with some tweaks, added a basic interface and hosted it here. It runs in your browser, so go crazy. If you want changes to the default settings or environment, feel free to suggest them. Right now everything is the default settings of the Djinn tool except that multiple results is enabled.

(or: How Model Theory Got Scooped by Category Theory)


This will be a very non-traditional introduction to the ideas behind category theory. It will essentially be a slice through model theory (presented in a more programmer-friendly manner) with an unusual organization. Of course the end result will be ***SPOILER ALERT*** it was category theory all along. A secret decoder ring will be provided at the end. This approach is inspired by the notion of an internal logic/language and by Vaughn Pratt’s paper The Yoneda Lemma Without Category Theory.

I want to be very clear, though. This is not meant to be an analogy or an example or a guide for intuition. This is category theory. It is simply presented in a different manner.

Dan Doel pointed me at some draft lecture notes by Mike Shulman that seem very much in the spirit of this blog post (albeit aimed at an audience familiar with category theory): Categorical Logic from a Categorical Point of View. A theory in my sense corresponds to a category presentation (Definition 1.7.1) as defined by these lecture notes. Its oft-mentioned Appendix A will also look very familiar.



I’ve been watching the Spring 2012 lectures for MIT 6.851 Advanced Data Structures with Prof. Erik Demaine. In lecture 12, “Fusion Trees”, it mentions a constant time algorithm for finding the index of the first most significant 1 bit in a word, i.e. the binary logarithm. Assuming word operations are constant time, i.e. in the Word RAM model, the below algorithm takes 27 word operations (not counting copying). When I compiled it with GHC 8.0.1 -O2 the core of the algorithm was 44 straight-line instructions. The theoretically interesting thing is, other than changing the constants, the same algorithm works for any word size that’s an even power of 2. Odd powers of two need a slight tweak. This is demonstrated for Word64, Word32, and Word16. It should be possible to do this for any arbitrary word size w.

The clz instruction can be used to implement this function, but this is a potential simulation if that or a similar instruction wasn’t available. It’s probably not the fastest way. Similarly, find first set and count trailing zeros can be implemented in terms of this operation.


Below is the complete code. You can also download it here.

{-# LANGUAGE BangPatterns #-}
import Data.Word
import Data.Bits

-- Returns 0-based bit index of most significant bit that is 1. Assumes input is non-zero.
-- That is, 2^indexOfMostSignificant1 x <= x < 2^(indexOfMostSignificant1 x + 1)
-- From Erik Demaine's presentation in Spring 2012 lectures of MIT 6.851, particularly "Lecture 12: Fusion Trees".
-- Takes 26 (source-level) straight-line word operations.
indexOfMostSignificant1 :: Word64 -> Word64
indexOfMostSignificant1 w = idxMsbyte .|. idxMsbit
        -- top bits of each byte
        !wtbs = w .&. 0x8080808080808080
        -- all but top bits of each byte producing 8 7-bit chunks
        !wbbs = w .&. 0x7F7F7F7F7F7F7F7F              

        -- parallel compare of each 7-bit chunk to 0, top bit set in result if 7-bit chunk was not 0
        !pc = parallelCompare 0x8080808080808080 wbbs

        -- top bit of each byte set if the byte has any bits set in w
        !ne = wtbs .|. pc                             

        -- a summary of which bytes (except the first) are non-zero as a 7-bit bitfield, i.e. top bits collected into bottom byte
        !summary = sketch ne `unsafeShiftR` 1

        -- parallel compare summary to powers of two
        !cmpp2 = parallelCompare 0xFFBF9F8F87838180 (0x0101010101010101 * summary)
        -- index of most significant non-zero byte * 8
        !idxMsbyte = sumTopBits8 cmpp2                

        -- most significant 7-bits of most significant non-zero byte
        !msbyte = ((w `unsafeShiftR` (fromIntegral idxMsbyte)) .&. 0xFF) `unsafeShiftR` 1

        -- parallel compare msbyte to powers of two
        !cmpp2' = parallelCompare 0xFFBF9F8F87838180 (0x0101010101010101 * msbyte)

        -- index of most significant non-zero bit in msbyte
        !idxMsbit = sumTopBits cmpp2' 

        -- Maps top bits of each byte into lower byte assuming all other bits are 0.
        -- 0x2040810204081 = sum [2^j | j <- map (\i -> 49 - 7*i) [0..7]]
        -- In general if w = 2^(2*k+p) and p = 0 or 1 the formula is:
        -- sum [2^j | j <- map (\i -> w-(2^k-1) - 2^(k+p) - (2^(k+p) - 1)*i) [0..2^k-1]]
        -- Followed by shifting right by w - 2^k
        sketch w = (w * 0x2040810204081) `unsafeShiftR` 56

        parallelCompare w1 w2 = complement (w1 - w2) .&. 0x8080808080808080
        sumTopBits w = ((w `unsafeShiftR` 7) * 0x0101010101010101) `unsafeShiftR` 56
        sumTopBits8 w = ((w `unsafeShiftR` 7) * 0x0808080808080808) `unsafeShiftR` 56

indexOfMostSignificant1_w32 :: Word32 -> Word32
indexOfMostSignificant1_w32 w = idxMsbyte .|. idxMsbit
    where !wtbs = w .&. 0x80808080
          !wbbs = w .&. 0x7F7F7F7F
          !pc = parallelCompare 0x80808080 wbbs
          !ne = wtbs .|. pc
          !summary = sketch ne `unsafeShiftR` 1
          !cmpp2 = parallelCompare 0xFF838180 (0x01010101 * summary)
          !idxMsbyte = sumTopBits8 cmpp2
          !msbyte = ((w `unsafeShiftR` (fromIntegral idxMsbyte)) .&. 0xFF) `unsafeShiftR` 1
          !cmpp2' = parallelCompare 0x87838180 (0x01010101 * msbyte)

          -- extra step when w is not an even power of two
          !cmpp2'' = parallelCompare 0xFFBF9F8F (0x01010101 * msbyte)
          !idxMsbit = sumTopBits cmpp2' + sumTopBits cmpp2''

          sketch w = (w * 0x204081) `unsafeShiftR` 28
          parallelCompare w1 w2 = complement (w1 - w2) .&. 0x80808080
          sumTopBits w = ((w `unsafeShiftR` 7) * 0x01010101) `unsafeShiftR` 24
          sumTopBits8 w = ((w `unsafeShiftR` 7) * 0x08080808) `unsafeShiftR` 24

indexOfMostSignificant1_w16 :: Word16 -> Word16
indexOfMostSignificant1_w16 w = idxMsnibble .|. idxMsbit
    where !wtbs = w .&. 0x8888
          !wbbs = w .&. 0x7777
          !pc = parallelCompare 0x8888 wbbs
          !ne = wtbs .|. pc
          !summary = sketch ne `unsafeShiftR` 1
          !cmpp2 = parallelCompare 0xFB98 (0x1111 * summary)
          !idxMsnibble = sumTopBits4 cmpp2
          !msnibble = ((w `unsafeShiftR` (fromIntegral idxMsnibble)) .&. 0xF) `unsafeShiftR` 1
          !cmpp2' = parallelCompare 0xFB98 (0x1111 * msnibble)
          !idxMsbit = sumTopBits cmpp2'

          sketch w = (w * 0x249) `unsafeShiftR` 12
          parallelCompare w1 w2 = complement (w1 - w2) .&. 0x8888
          sumTopBits w = ((w `unsafeShiftR` 3) * 0x1111) `unsafeShiftR` 12
          sumTopBits4 w = ((w `unsafeShiftR` 3) * 0x4444) `unsafeShiftR` 12