Hedonistic Learning

Learning for the fun of it

This is a fairly technical article. This article will most likely not have any significance for you if you haven’t heard of the Beck-Chevalley condition before.


When one talks about “indexed (co)products” in an indexed category, it is often described as follows:

Let |\mathcal C| be an |\mathbf S|-indexed category, i.e. a pseudofunctor |\mathbf S^{op} \to \mathbf{Cat}| where |\mathbf S| is an ordinary category. Write |\mathcal C^I| for |\mathcal C(I)| and |f^* : \mathcal C^J \to \mathcal C^I| for |\mathcal C(f)| where |f : I \to J|. The functors |f^*| will be called reindexing functors. |\mathcal C| has |\mathbf S|-indexed coproducts whenever

  1. each reindexing functor |f^*| has a left adjoint |\Sigma_f|, and
  2. the Beck-Chevalley condition holds, i.e. whenever
    $$\require{AMScd}\begin{CD} I @>h>> J \\ @VkVV @VVfV \\ K @>>g> L \end{CD}$$
    is a pullback square in |\mathbf S|, then the canonical morphism |\Sigma_k \circ h^* \to g^* \circ \Sigma_f| is an isomorphism.

The first condition is reasonable, especially motivated with some examples, but the second condition is more mysterious. It’s clear that you’d need something more than simply a family of adjunctions, but it’s not clear how you could calculate the particular condition quoted. That’s the goal of this article. I will not cover what the Beck-Chevalley condition is intuitively saying. I cover that in this Stack Exchange answer from a logical perspective, though there are definitely other possible perspectives as well.

Some questions are:

  1. Where does the Beck-Chevalley condition come from?
  2. What is this “canonical morphism”?
  3. Why do we care about pullback squares in particular?

In category theory a concept is called absolute if it is preserved by all functors. Identity arrows and composition are absolute by the definition of functor. Less trivially, isomorphisms are absolute. In general, anything that is described by a diagram commuting is absolute as that diagram will be preserved by any functor. This is generally the case, but if I tell you something is an absolute epimorphism, it’s not clear what diagram is causing that; the notion of epimorphism itself doesn’t reduce to the commutativity of a particular diagram.

Below I’ll be focused primarily on absolute colimits as those are the most commonly used examples. They are an important part of the theory of monadicity. The trick to many questions about absolute colimits and related concepts is to see what it means for |\newcommand{\Set}{\mathbf{Set}}\newcommand{\Hom}{\mathsf{Hom}} \newcommand{onto}{\twoheadrightarrow}\Hom| functors to preserve them.



When sets are first introduced to students, usually examples are used with finite, explicitly presented sets. For example, #{1,2,3} uu {2,3,4} = {1,2,3,4}#. This is the beginning of the idea that a set is a “collection” of things. Later, when infinite sets are introduced, the idea that sets are “collections” of things is still commonly used as the intuitive basis for the definitions. While I personally find this a bit of a philosophical bait-and-switch, my main issue with it is that I don’t think it does a good job reflecting how we work with sets day-to-day nor for more in-depth set-theoretic investigations. Instead, I recommend thinking about infinite sets as defined by properties and #x in X# for some infinite set #X# means checking whether it satisfies the property defining #X#, not rummaging through the infinite elements that make up #X# and seeing if #x# is amongst them. This perspective closely reflects how we prove things about infinite sets. It makes it much clearer that the job is to find logical relationships between the properties that define sets.

Of course, this view can also be applied to finite sets and should be applied to them. For a constructivist, the notion of “finite set” splits into multiple inequivalent notions, and it is quite easy to show that there is a subset of #{0,1}# which is not “finite” with respect to strong notions of “finite” that are commonly used by constructivists. Today, though, I’ll stick with classical logic and classical set theory. In particular, I’m going to talk about the Internal Set Theory of Edward Nelson, or mostly the small fragment he used in Radically Elementary Probability Theory. In the first chapter of an unfinished book on Internal Set Theory, he states the following:

Perhaps it is fair to say that “finite” does not mean what we have always thought it to mean. What have we always thought it to mean? I used to think that I knew what I had always thought it to mean, but I no longer think so.

While it may be a bit strong to say that Internal Set Theory leads to some question about what “finite” means, I think it makes a good case for questioning what “finite set” means. These concerns are similar to the relativity of the notion of “(un)countable set”.



Over the years I’ve seen a lot of confusion about formal logic online. Usually this is from students (but I’ve also seen it with experienced mathematicians) on platforms like the Math or CS StackExchange. Now there is clearly some selection bias where the people asking the questions are the people who are confused, but while the questions about the confusions are common, the confusions are often evident even in questions about other aspects, the confusions are in the (upvoted!) answers, and when you look at the study material they are using it is often completely unsurprising that they are confused. Again, these confusions are also not limited to just “students”, though I’ll use that word in a broad sense below. Finally, on at least one of the points below, the confusion seems to be entirely on the instructors’ side.

An indication that this is true more broadly is this quote from the nLab:

The distinction between object language and metalanguage exists even in classical mathematics, but it seems that most classical mathematicians are not used to remembering it, although it is not entirely clear why this should be so. One possibly relevant observation is that even if P is a statement which is neither provable nor disprovable (like the continuum hypothesis), in classical mathematics it is still provable that “P is either true or false.” Moreover, classical model theory often restricts itself to two-valued models in which the only truth values are “true” and “false,” although classical logic still holds in Boolean-valued models, and in such a case the truth value of P may be neither “true” nor “false,” although the truth value of “P or not P” is still “true.” Certainly when talking about classical truths which fail constructively, such as excluded middle, it is important to remember that “fails to be provable” is different from “is provably false.”

To give an indication of the problem, here is my strong impression of what would happen if I gave a student who had just passed a full year introduction to formal logic the following exercise: “Give me a formal proof of #(neg P => neg Q) => (Q => P)#”. I suspect most would draw up a truth table. When I responded, “that’s not a formal proof,” they would be confused. If you are confused by that response, then this is definitely for you. I’m sure they would completely agree with me that if I asked for the inverse matrix of some matrix #M#, showing that the determinant of #M# is non-zero does not constitute an answer to that question. The parallelism of these scenarios would likely be lost on them though. While I think courses that focus on a syntactic approach to logic will produce students that are much more likely to give an appropriate answer to my exercise, that doesn’t mean they lack the confusions, just that this exercise isn’t a good discriminator for them. For example, if they don’t see the parallelism of the two scenarios I described, or worse, have no idea what truth tables have to do with anything, then they have some gap.

As a disclaimer, I am not an educator nor even an academic in any professional capacity.

As a summary, the points I will touch on are:

If anyone reading this is aware of introductory textbooks or other resources (ideally freely available, but I’ll take a reasonably priced textbook too) that avoid most or all of the major issues I list and otherwise do a good job, I would be very interested. My own education on these topics has been piecemeal and eclectic and also strongly dependent on my background in programming. This leaves me with no reasonable recommendations. Please leave a comment or email me, if you have such a recommendation.


I couldn’t find an online version of Djinn, so I ran it through GHCJS with some tweaks, added a basic interface and hosted it here. It runs in your browser, so go crazy. If you want changes to the default settings or environment, feel free to suggest them. Right now everything is the default settings of the Djinn tool except that multiple results is enabled.

(or: How Model Theory Got Scooped by Category Theory)


This will be a very non-traditional introduction to the ideas behind category theory. It will essentially be a slice through model theory (presented in a more programmer-friendly manner) with an unusual organization. Of course the end result will be ***SPOILER ALERT*** it was category theory all along. A secret decoder ring will be provided at the end. This approach is inspired by the notion of an internal logic/language and by Vaughn Pratt’s paper The Yoneda Lemma Without Category Theory.

I want to be very clear, though. This is not meant to be an analogy or an example or a guide for intuition. This is category theory. It is simply presented in a different manner.

Dan Doel pointed me at some draft lecture notes by Mike Shulman that seem very much in the spirit of this blog post (albeit aimed at an audience familiar with category theory): Categorical Logic from a Categorical Point of View. A theory in my sense corresponds to a category presentation (Definition 1.7.1) as defined by these lecture notes. Its oft-mentioned Appendix A will also look very familiar.



I’ve been watching the Spring 2012 lectures for MIT 6.851 Advanced Data Structures with Prof. Erik Demaine. In lecture 12, “Fusion Trees”, it mentions a constant time algorithm for finding the index of the first most significant 1 bit in a word, i.e. the binary logarithm. Assuming word operations are constant time, i.e. in the Word RAM model, the below algorithm takes 27 word operations (not counting copying). When I compiled it with GHC 8.0.1 -O2 the core of the algorithm was 44 straight-line instructions. The theoretically interesting thing is, other than changing the constants, the same algorithm works for any word size that’s an even power of 2. Odd powers of two need a slight tweak. This is demonstrated for Word64, Word32, and Word16. It should be possible to do this for any arbitrary word size w.

The clz instruction can be used to implement this function, but this is a potential simulation if that or a similar instruction wasn’t available. It’s probably not the fastest way. Similarly, find first set and count trailing zeros can be implemented in terms of this operation.


Below is the complete code. You can also download it here.

{-# LANGUAGE BangPatterns #-}
import Data.Word
import Data.Bits

-- Returns 0-based bit index of most significant bit that is 1. Assumes input is non-zero.
-- That is, 2^indexOfMostSignificant1 x <= x < 2^(indexOfMostSignificant1 x + 1)
-- From Erik Demaine's presentation in Spring 2012 lectures of MIT 6.851, particularly "Lecture 12: Fusion Trees".
-- Takes 26 (source-level) straight-line word operations.
indexOfMostSignificant1 :: Word64 -> Word64
indexOfMostSignificant1 w = idxMsbyte .|. idxMsbit
        -- top bits of each byte
        !wtbs = w .&. 0x8080808080808080
        -- all but top bits of each byte producing 8 7-bit chunks
        !wbbs = w .&. 0x7F7F7F7F7F7F7F7F              

        -- parallel compare of each 7-bit chunk to 0, top bit set in result if 7-bit chunk was not 0
        !pc = parallelCompare 0x8080808080808080 wbbs

        -- top bit of each byte set if the byte has any bits set in w
        !ne = wtbs .|. pc                             

        -- a summary of which bytes (except the first) are non-zero as a 7-bit bitfield, i.e. top bits collected into bottom byte
        !summary = sketch ne `unsafeShiftR` 1

        -- parallel compare summary to powers of two
        !cmpp2 = parallelCompare 0xFFBF9F8F87838180 (0x0101010101010101 * summary)
        -- index of most significant non-zero byte * 8
        !idxMsbyte = sumTopBits8 cmpp2                

        -- most significant 7-bits of most significant non-zero byte
        !msbyte = ((w `unsafeShiftR` (fromIntegral idxMsbyte)) .&. 0xFF) `unsafeShiftR` 1

        -- parallel compare msbyte to powers of two
        !cmpp2' = parallelCompare 0xFFBF9F8F87838180 (0x0101010101010101 * msbyte)

        -- index of most significant non-zero bit in msbyte
        !idxMsbit = sumTopBits cmpp2' 

        -- Maps top bits of each byte into lower byte assuming all other bits are 0.
        -- 0x2040810204081 = sum [2^j | j <- map (\i -> 49 - 7*i) [0..7]]
        -- In general if w = 2^(2*k+p) and p = 0 or 1 the formula is:
        -- sum [2^j | j <- map (\i -> w-(2^k-1) - 2^(k+p) - (2^(k+p) - 1)*i) [0..2^k-1]]
        -- Followed by shifting right by w - 2^k
        sketch w = (w * 0x2040810204081) `unsafeShiftR` 56

        parallelCompare w1 w2 = complement (w1 - w2) .&. 0x8080808080808080
        sumTopBits w = ((w `unsafeShiftR` 7) * 0x0101010101010101) `unsafeShiftR` 56
        sumTopBits8 w = ((w `unsafeShiftR` 7) * 0x0808080808080808) `unsafeShiftR` 56

indexOfMostSignificant1_w32 :: Word32 -> Word32
indexOfMostSignificant1_w32 w = idxMsbyte .|. idxMsbit
    where !wtbs = w .&. 0x80808080
          !wbbs = w .&. 0x7F7F7F7F
          !pc = parallelCompare 0x80808080 wbbs
          !ne = wtbs .|. pc
          !summary = sketch ne `unsafeShiftR` 1
          !cmpp2 = parallelCompare 0xFF838180 (0x01010101 * summary)
          !idxMsbyte = sumTopBits8 cmpp2
          !msbyte = ((w `unsafeShiftR` (fromIntegral idxMsbyte)) .&. 0xFF) `unsafeShiftR` 1
          !cmpp2' = parallelCompare 0x87838180 (0x01010101 * msbyte)

          -- extra step when w is not an even power of two
          !cmpp2'' = parallelCompare 0xFFBF9F8F (0x01010101 * msbyte)
          !idxMsbit = sumTopBits cmpp2' + sumTopBits cmpp2''

          sketch w = (w * 0x204081) `unsafeShiftR` 28
          parallelCompare w1 w2 = complement (w1 - w2) .&. 0x80808080
          sumTopBits w = ((w `unsafeShiftR` 7) * 0x01010101) `unsafeShiftR` 24
          sumTopBits8 w = ((w `unsafeShiftR` 7) * 0x08080808) `unsafeShiftR` 24

indexOfMostSignificant1_w16 :: Word16 -> Word16
indexOfMostSignificant1_w16 w = idxMsnibble .|. idxMsbit
    where !wtbs = w .&. 0x8888
          !wbbs = w .&. 0x7777
          !pc = parallelCompare 0x8888 wbbs
          !ne = wtbs .|. pc
          !summary = sketch ne `unsafeShiftR` 1
          !cmpp2 = parallelCompare 0xFB98 (0x1111 * summary)
          !idxMsnibble = sumTopBits4 cmpp2
          !msnibble = ((w `unsafeShiftR` (fromIntegral idxMsnibble)) .&. 0xF) `unsafeShiftR` 1
          !cmpp2' = parallelCompare 0xFB98 (0x1111 * msnibble)
          !idxMsbit = sumTopBits cmpp2'

          sketch w = (w * 0x249) `unsafeShiftR` 12
          parallelCompare w1 w2 = complement (w1 - w2) .&. 0x8888
          sumTopBits w = ((w `unsafeShiftR` 3) * 0x1111) `unsafeShiftR` 12
          sumTopBits4 w = ((w `unsafeShiftR` 3) * 0x4444) `unsafeShiftR` 12


Programmers in typed languages with higher order functions and algebraic data types are already comfortable with most of the basic constructions of set/type theory. In categorical terms, those programmers are familiar with finite products and coproducts and (monoidal/cartesian) closed structure. The main omissions are subset types (equalizers/pullbacks) and quotient types (coequalizers/pushouts) which would round out limits and colimits. Not having a good grasp on either of these constructions dramatically shrinks the world of mathematics that is understandable, but while subset types are fairly straightforward, quotient types are quite a bit less intuitive.

See this StackOverflow answer by me which provides a more software engineering perspective.

Subset Types

In my opinion, most programmers can more or less immediately understand the notion of a subset type at an intuitive level.
A subset type is just a type combined with a predicate on that type that specifies which values of the type we want. For example, we may have something like { n:Nat | n /= 0 } meaning the type of naturals not equal to #0#. We may use this in the type of the division function for the denominator. Consuming a value of a subset type is easy, a natural not equal to #0# is still just a natural, and we can treat it as such. The difficult part is producing a value of a subset type. To do this, we must, of course, produce a value of the underlying type — Nat in our example — but then we must further convince the type checker that the predicate holds (e.g. that the value does not equal #0#). Most languages provide no mechanism to prove potentially arbitrary facts about code, and this is why they do not support subset types. Dependently typed languages do provide such mechanisms and thus either have or can encode subset types. Outside of dependently typed languages the typical solution is to use an abstract data type and use a runtime check when values of that abstract data type are created.

Quotient Types

The dual of subset types are quotient types. My impression is that this construction is the most difficult basic construction for people to understand. Further, programmers aren’t much better off, because they have little to which to connect the idea. Before I give a definition, I want to provide the example with which most people are familiar: modular (or clock) arithmetic. A typical way this is first presented is as a system where the numbers “wrap-around”. For example, in arithmetic mod #3#, we count #0#, #1#, #2#, and then wrap back around to #0#. Programmers are well aware that it’s not necessary to guarantee that an input to addition, subtraction, or multiplication mod #3# is either #0#, #1#, or #2#. Instead, the operation can be done and the mod function can be applied at the end. This will give the same result as applying the mod function to each argument at the beginning. For example, #4+7 = 11# and #11 mod 3 = 2#, and #4 mod 3 = 1# and #7 mod 3 = 1# and #1+1 = 2 = 11 mod 3#.

For mathematicians, the type of integers mod #n# is represented by the quotient type #ZZ//n ZZ#. The idea is that the values of #ZZ // n ZZ# are integers except that we agree that any two integers #a# and #b# are treated as equal if #a - b = kn# for some integer #k#. For #ZZ // 3 ZZ#, #… -6 = -3 = 0 = 3 = 6 = …# and #… = -5 = -2 = 1 = 4 = 7 = …# and #… = -4 = -1 = 2 = 5 = 8 = …#.

Equivalence Relations

To start to formalize this, we need the notion of an equivalence relation. An equivalence relation is a binary relation #(~~)# which is reflexive (#x ~~ x# for all #x#), symmetric (if #x ~~ y# then #y ~~ x#), and transitive (if #x ~~ y# and #y ~~ z# then #x ~~ z#). We can check that “#a ~~ b# iff there exists an integer #k# such that #a-b = kn#” defines an equivalence relation on the integers for any given #n#. For reflexivity we have #a - a = 0n#. For symmetry we have if #a - b = kn# then #b - a = -kn#. Finally, for transitivity we have if #a - b = k_1 n# and #b - c = k_2 n# then #a - c = (k_1 + k_2)n# which we get by adding the preceding two equations.

Any relation can be extended to an equivalence relation. This is called the reflexive-, symmetric-, transitive-closure of the relation. For an arbitrary binary relation #R# we can define the equivalence relation #(~~_R)# via "#a ~~_R b# iff #a = b# or #R(a, b)# or #b ~~_R a# or #a ~~_R c and c ~~_R b# for some #c#". To be precise, #~~_R# is the smallest relation satisfying those constraints. In Datalog syntax, this looks like:

eq_r(A, A).
eq_r(A, B) :- r(A, B).
eq_r(A, B) :- eq_r(B, A).
eq_r(A, B) :- eq_r(A, C), eq_r(C, B).

Quotient Types: the Type Theory view

If #T# is a type, and #(~~)# is an equivalence relation, we use #T // ~~# as the notation for the quotient type, which we read as “#T# quotiented by the equivalence relation #(~~)#”. We call #T# the underlying type of the quotient type. We then say #a = b# at type #T // ~~# iff #a ~~ b#. Dual to subset types, to produce a value of a quotient type is easy. Any value of the underlying type is a value of the quotient type. (In type theory, this produces the perhaps surprising result that #ZZ# is a subtype of #ZZ // n ZZ#.) As expected, consuming a value of a quotient type is more complicated. To explain this, we need to explain what a function #f : T // ~~ -> X# is for some type #X#. A function #f : T // ~~ -> X# is a function #g : T -> X# which satisfies #g(a) = g(b)# for all #a# and #b# such that #a ~~ b#. We call #f# (or #g#, they are often conflated) well-defined if #g# satisfies this condition. In other words, any well-defined function that consumes a quotient type isn’t allowed to produce an output that distinguishes between equivalent inputs. A better way to understand this is that quotient types allow us to change what the notion of equality is for a type. From this perspective, a function being well-defined just means that it is a function. Taking equal inputs to equal outputs is one of the defining characteristics of a function.

Sometimes we can finesse needing to check the side condition. Any function #h : T -> B# gives rise to an equivalence relation on #T# via #a ~~ b# iff #h(a) = h(b)#. In this case, any function #g : B -> X# gives rise to a function #f : T // ~~ -> X# via #f = g @ h#. In particular, when #B = T# we are guaranteed to have a suitable #g# for any function #f : T // ~~ -> X#. In this case, we can implement quotient types in a manner quite similar subset types, namely we make an abstract type and we normalize with the #h# function as we either produce or consume values of the abstract type. A common example of this is rational numbers. We can reduce a rational number to lowest terms either when it’s produced or when the numerator or denominator get accessed, so that we don’t accidentally write functions which distinguish between #1/2# and #2/4#. For modular arithmetic, the mod by #n# function is a suitable #h#.

Quotient Types: the Set Theory view

In set theory such an #h# function can always be made by mapping the elements of #T# to the equivalence classes that contain them, i.e. #a# gets mapped to #{b | a ~~ b}# which is called the equivalence class of #a#. In fact, in set theory, #T // ~~# is usually defined to be the set of equivalence classes of #(~~)#. So, for the example of #ZZ // 3 ZZ#, in set theory, it is a set of exactly three elements: the elements are #{ 3n+k | n in ZZ}# for #k = 0, 1, 2#. Equivalence classes are also called partitions and are said to partition the underlying set. Elements of these equivalence classes are called representatives of the equivalence class. Often a notation like #[a]# is used for the equivalence class of #a#.

More Examples

Here is a quick run-through of some significant applications of quotient types. I’ll give the underlying type and the equivalence relation and what the quotient type produces. I’ll leave it as an exercise to verify that the equivalence relations really are equivalence relations, i.e. reflexive, symmetric, and transitive. I’ll start with more basic examples. You should work through them to be sure you understand how they work.


Integers can be presented as pairs of naturals #(n, m)# with the idea being that the pair represents “#n - m#”. Of course, #1 - 2# should be the same as #2 - 3#. This is expressed as #(n_1, m_1) ~~ (n_2, m_2)# iff #n_1 + m_2 = n_2 + m_1#. Note how this definition only relies on operations on natural numbers. You can explore how to define addition, subtraction, multiplication, and other operations on this representation in a well-defined manner.


Rationals can be presented very similarly to integers, only with multiplication instead of addition. We also have pairs #(n, d)#, usually written #n/d#, in this case of an integer #n# and a non-zero natural #d#. The equivalence relation is #(n_1, d_1) ~~ (n_2, d_2)# iff #n_1 d_2 = n_2 d_1#.

(Topological) Circles

We can extend the integers mod #n# to the continuous case. Consider the real numbers with the equivalence relation #r ~~ s# iff #r - s = k# for some integer #k#. You could call this the reals mod #1#. Topologically, this is a circle. If you walk along it far enough, you end up back at a point equivalent to where you started. Occasionally this is written as #RR//ZZ#.


Doing the previous example in 2D gives a torus. Specifically, we have pairs of real numbers and the equivalence relation #(x_1, y_1) ~~ (x_2, y_2)# iff #x_1 - x_2 = k# and #y_1 - y_2 = l# for some integers #k# and #l#. Quite a bit of topology relies on similar constructions as will be expanded upon on the section on gluing.

Unordered pairs

Here’s an example that’s a bit closer to programming. Consider the following equivalence relation on arbitrary pairs: #(a_1, b_1) ~~ (a_2, b_2)# iff #a_1 = a_2 and b_1 = b_2# or #a_1 = b_2 and b_1 = a_2#. This just says that a pair is equivalent to either itself, or a swapped version of itself. It’s interesting to consider what a well-defined function is on this type.1

Gluing / Pushouts

Returning to topology and doing a bit more involved construction, we arrive at gluing or pushouts. In topology, we often want to take two topological spaces and glue them together in some specified way. For example, we may want to take two discs and glue their boundaries together. This gives a sphere. We can combine two spaces into one with the disjoint sum (or coproduct, i.e. Haskell’s Either type.) This produces a space that contains both the input spaces, but they don’t interact in any way. You can visualize them as sitting next to each other but not touching. We now want to say that certain pairs of points, one from each of the spaces, are really the same point. That is, we want to quotient by an equivalence relation that would identify those points. We need some mechanism to specify which points we want to identify. One way to accomplish this is to have a pair of functions, #f : C -> A# and #g : C -> B#, where #A# and #B# are the spaces we want to glue together. We can then define a relation #R# on the disjoint sum via #R(a, b)# iff there’s a #c : C# such that #a = tt "inl"(f(c)) and b = tt "inr"(g(c))#. This is not an equivalence relation, but we can extend it to one. The quotient we get is then the gluing of #A# and #B# specified by #C# (or really by #f# and #g#). For our example of two discs, #f# and #g# are the same function, namely the inclusion of the boundary of the disc into the disc. We can also glue a space to itself. Just drop the disjoint sum part. Indeed, the circle and torus are examples.

Polynomial ring ideals

We write #RR[X]# for the type of polynomials with one indeterminate #X# with real coefficients. For two indeterminates, we write #RR[X, Y]#. Values of these types are just polynomials such as #X^2 + 1# or #X^2 + Y^2#. We can consider quotienting these types by equivalence relations generated from identifications like #X^2 + 1 ~~ 0# or #X^2 - Y ~~ 0#, but we want more than just the reflexive-, symmetric-, transitive-closure. We want this equivalence relation to also respect the operations we have on polynomials, in particular, addition and multiplication. More precisely, we want if #a ~~ b# and #c ~~ d# then #ac ~~ bd# and similarly for addition. An equivalence relation that respects all operations is called a congruence. The standard notation for the quotient of #RR[X, Y]# by a congruence generated by both of the previous identifications is #RR[X, Y]//(X^2 + 1, X^2 - Y)#. Now if #X^2 + 1 = 0# in #RR[X, Y]//(X^2 + 1, X^2 - Y)#, then for any polynomial #P(X, Y)#, we have #P(X, Y)(X^2 + 1) = 0# because #0# times anything is #0#. Similarly, for any polynomial #Q(X, Y)#, #Q(X, Y)(X^2 - Y) = 0#. Of course, #0 + 0 = 0#, so it must be the case that #P(X, Y)(X^2 + 1) + Q(X, Y)(X^2 - Y) = 0# for all polynomials #P# and #Q#. In fact, we can show that all elements in the equivalence class of #0# are of this form. You’ve now motivated the concrete definition of a ring ideal and given it’s significance. An ideal is an equivalence class of #0# with respect to some congruence. Let’s work out what #RR[X, Y]//(X^2 + 1, X^2 - Y)# looks like concretely. First, since #X^2 - Y = 0#, we have #Y = X^2# and so we see that values of #RR[X, Y]//(X^2 + 1, X^2 - Y)# will be polynomials in only one indeterminate because we can replace all #Y#s with #X^2#s. Since #X^2 = -1#, we can see that all those polynomials will be linear (i.e. of degree 1) because we can just keep replacing #X^2#s with #-1#s, i.e. #X^(n+2) = X^n X^2 = -X^n#. The end result is that an arbitrary polynomial in #RR[X, Y]//(X^2 + 1, X^2 - Y)# looks like #a + bX# for real numbers #a# and #b# and we have #X^2 = -1#. In other words, #RR[X, Y]//(X^2 + 1, X^2 - Y)# is isomorphic to the complex numbers, #CC#.

As a reasonably simple exercise, given a polynomial #P(X) : RR[X]#, what does it get mapped to when embedded into #RR[X]//(X - 3)#, i.e. what is #[P(X)] : RR[X]//(X - 3)#?2

Free algebras modulo an equational theory

Moving much closer to programming, we have a rather broad and important example that a mathematician might describe as free algebras modulo an equational theory. This example covers several of the preceding examples. In programmer-speak, a free algebra is just a type of abstract syntax trees for some language. We’ll call a specific abstract syntax tree a term. An equational theory is just a collection of pairs of terms with the idea being that we’d like these terms to be considered equal. To be a bit more precise, we will actually allow terms to contain (meta)variables. An example equation for an expression language might be Add(#x#,#x#) = Mul(2,#x#). We call a term with no variables a ground term. We say a ground term matches another term if there is a consistent substitution for the variables that makes the latter term syntactically equal to the ground term. E.g. Add(3, 3) matches Add(#x#,#x#) via the substitution #x |->#3. Now, the equations of our equational theory gives rise to a relation on ground terms #R(t_1, t_2)# iff there exists an equation #l = r# such that #t_1# matches #l# and #t_2# matches #r#. This relation can be extended to an equivalence relation on ground terms, and we can then quotient by that equivalence relation.

Let’s consider a worked example. We can consider the theory of monoids. We have two operations (types of AST nodes): Mul(#x#,#y#) and 1. We have the following three equations: Mul(1,#x#) =#x#, Mul(#x#, 1) =#x#, and Mul(Mul(#x#,#y#),#z#) = Mul(#x#, Mul(#y#,#z#)). We additionally have a bunch of constants subject to no equations. In this case, it turns out we can define a normalization function, what I called #h# far above, and that the quotient type is isomorphic to lists of constants. Now, we can extend this theory to the theory of groups by adding a new operation, Inv(#x#), and new equations: Inv(Inv(#x#)) =#x#, Inv(Mul(#x#,#y#)) = Mul(Inv(#y#), Inv(#x#)), and Mul(Inv(#x#),#x#) = 1. If we ignore the last of these equations, you can show that we can normalize to a form that is isomorphic to a list of a disjoint sum of the constants, i.e. [Either Const Const] in Haskell if Const were the type of the constant terms. Quotienting this type by the equivalence relation extended with that final equality corresponds to adding the rule that a Left c cancels out Right c in the list whenever they are adjacent.

This overall example is a fairly profound one. Almost all of abstract algebra can be viewed as an instance of this or a closely related variation. When you hear about things defined in terms of “generators and relators”, it is an example of this sort. Indeed, those “relators” are used to define a relation that will be extended to an equivalence relation. Being defined in this way is arguably what it means for something to be “algebraic”.


The Introduction to Type Theory section of the NuPRL book provides a more comprehensive and somewhat more formal presentation of these and related concepts. While the quotient type view of quotients is conceptually different from the standard set theoretic presentation, it is much more amenable to computation as the #ZZ // n ZZ# example begins to illustrate.

  1. It’s a commutative function.↩︎

  2. It gets mapped to it’s value at #3#, i.e. #P(3)#.↩︎

I don’t believe classical logic is false; I just believe that it is not true.


Knockout is a nice JavaScript library for making values that automatically update when any of their “dependencies” update. Those dependencies can form an arbitrary directed acyclic graph. Many people seem to think of it as “yet another” templating library, but the core idea which is useful far beyond “templating” is the notion of observable values. One nice aspect is that it is a library and not a framework so you can use it as little or as much as you want and you can integrate it with other libraries and frameworks.

At any rate, this article is more geared toward those who have already decided on using Knockout or a library (in any language) offering similar capabilities. I strongly suspect the issues and solutions I’ll discuss apply to all similar sorts of libraries. While I’ll focus on one particular example, the ideas behind it apply generally. This example, admittedly, is one that almost anyone will implement, and in my experience will do it incorrectly the first time and won’t realize the problem until later.

The Problem

When doing any front-end work, before long there will be a requirement to support “multi-select” of something. Of course, you want the standard select/deselect all functionality and for it to work correctly, and of course you want to do something with the items you’ve selected. Here’s a very simple example:

Number selected:

Here, the number selected is an overly simple example of using the selected items. More realistically, the selected items will trigger other items to show up and/or trigger AJAX requests to update the data or populate other data. The HTML for this example is completely straightforward.

<div id="#badExample">
    Number selected: <span data-bind="text: $data.numberSelected()"></span>
        <tr><th><input type="checkbox" data-bind="checked: $data.allSelected"/></th><th>Item</th></tr>
        <!-- ko foreach: { data: $data.items(), as: '$item' } -->
        <tr><td><input type="checkbox" data-bind="checked: $data.selected"/></td><td data-bind="text: 'Item number: '+$data.body"></td></tr>
        <!-- /ko -->
        <tr><td><button data-bind="click: function() { $data.add(); }">Add</button></td></tr>

The way nearly everyone (including me) first thinks to implement this is by adding a selected observable to each item and then having allSelected depend on all of the selected observables. Since we also want to write to allSelected to change the state of the selected observables we use a writable computed observable. This computed observable will loop through all the items and check to see if they are all set to determine it’s state. When it is updated, it will loop through all the selected observables and set them to the appropriate state. Here’s the full code listing.

var badViewModel = {
    counter: 0,
    items: ko.observableArray()

badViewModel.allSelected = ko.computed({
    read: function() {
        var items = badViewModel.items();
        var allSelected = true;
        for(var i = 0; i < items.length; i++) { // Need to make sure we depend on each item, so don't break out of loop early
            allSelected = allSelected && items[i].selected();
        return allSelected;
    write: function(newValue) {
        var items = badViewModel.items();
        for(var i = 0; i < items.length; i++) {

badViewModel.numberSelected = ko.computed(function() {
    var count = 0;
    var items = badViewModel.items();
    for(var i = 0; i < items.length; i++) {
        if(items[i].selected()) count++;
    return count;

badViewModel.add = function() {
        body: badViewModel.counter++,
        selected: ko.observable(false)

ko.applyBindings(badViewModel, document.getElementById('#badExample'));

This should be relatively straightforward, and it works, so what’s the problem? The problem can be seen in numberSelected (and it also comes up with allSelected which I’ll get to momentarily). numberSelected depends on each selected observable and so it will be fired each time each one updates. That means if you have 100 items, and you use the select all checkbox, numberSelected will be called 100 times. For this example, that doesn’t really matter. For a more realistic example than numberSelected, this may mean rendering one, then two, then three, … then 100 HTML fragments or making 100 AJAX requests. In fact, this same behavior is present in allSelected. When it is written, as it’s writing to the selected observables, it is also triggering itself.

So the problem is updating allSelected or numberSelected can’t be done all at once, or to use database terminology, it can’t be updated atomically. One possible solution in newer versions of Knockout is to use deferredUpdates or, what I did back in the much earlier versions of Knockout, abuse the rate limiting features. The problem with this solution is that it makes updates asynchronous. If you’ve written your code to not care whether it was called synchronously or asynchronously, then this will work fine. If you haven’t, doing this throws you into a world of shared state concurrency and race conditions. In this case, this solution is far worse than the disease.

The Solution

So, what’s the alternative? We want to update all selected items atomically; we can atomically update a single observable; so we’ll put all selected items into a single observable. Now an item determines if it is selected by checking whether it is in the collection of selected items. More abstractly, we make our observables more coarse-grained, and we have a bunch of small computed observables depend on a large observable instead of a large computed observable depending on a bunch of small observables as we had in the previous code. Here’s an example using the exact same HTML and presenting the same overt behavior.

Number selected:

And here’s the code behind this second example:

var goodViewModel = {
    counter: 0,
    selectedItems: ko.observableArray(),
    items: ko.observableArray()

goodViewModel.allSelected = ko.computed({
    read: function() {
        return goodViewModel.items().length === goodViewModel.selectedItems().length;
    write: function(newValue) {
        if(newValue) {
            goodViewModel.selectedItems(goodViewModel.items().slice(0)); // Need a copy!
        } else {

goodViewModel.numberSelected = ko.computed(function() {
    return goodViewModel.selectedItems().length;

goodViewModel.add = function() {
    var item = { body: goodViewModel.counter++ }
    item.selected = ko.computed({
        read: function() {
            return goodViewModel.selectedItems.indexOf(item) > -1;
        write: function(newValue) {
            if(newValue) {
            } else {

ko.applyBindings(goodViewModel, document.getElementById('#goodExample'));

One thing to note is that setting allSelected and numberSelected are now both simple operations. A write to an observable triggers a constant number of writes to other observables. In fact, there are only two (non-computed) observables. On the other hand, reading the selected observable is more expensive. Toggling all items has quadratic complexity. In fact, it had quadratic complexity before due to the feedback. However, unlike the previous code, this also has quadratic complexity when any individual item is toggled. Unlike the previous code, though, this is simply due to a poor choice of data structure. Equipping each item with an “ID” field and using an object as a hash map would reduce the complexity to linear. In practice, for this sort of scenario, it tends not to make a big difference. Also, Knockout won’t trigger dependents if the value doesn’t change, so there’s no risk of the extra work propagating into still more extra work. Nevertheless, while I endorse this solution for this particular problem, in general making finer grained observables can help limit the scope of changes so unnecessary work isn’t done.

Still, the real concern and benefit of this latter approach isn’t the asymptotic complexity of the operations, but the atomicity of the operations. In the second solution, every update is atomic. There are no intermediate states on the way to a final state. This means that dependents, represented by numberSelected but which are realistically much more complicated, don’t get triggered excessively and don’t need to “compensate” for unintended intermediate values.


We could take the coarse-graining to its logical conclusion and have the view model for an application be a single observable holding an object representing the entire view model (and containing no observables of its own). Taking this approach actually does have a lot of benefits, albeit there is little reason to use Knockout at that point. Instead this starts to lead to things like Facebook’s Flux pattern and the pattern perhaps most clearly articulated by Cycle JS.