Learning for the fun of it

When sets are first introduced to students, usually examples are used with finite, explicitly presented sets. For example, #{1,2,3} uu {2,3,4} = {1,2,3,4}#. This is the beginning of the idea that a set is a “collection” of things. Later, when infinite sets are introduced, the idea that sets are “collections” of things is still commonly used as the intuitive basis for the definitions. While I personally find this a bit of a philosophical bait-and-switch, my main issue with it is that I don’t think it does a good job reflecting how we work with sets day-to-day nor for more in-depth set-theoretic investigations. Instead, I recommend thinking about infinite sets as defined by properties and #x in X# for some infinite set #X# means checking whether it satisfies the property defining #X#, *not* rummaging through the infinite elements that make up #X# and seeing if #x# is amongst them. This perspective closely reflects how we prove things about infinite sets. It makes it much clearer that the job is to find logical relationships between the properties that define sets.

Of course, this view can also be applied to finite sets and *should* be applied to them. For a constructivist, the notion of “finite set” splits into multiple inequivalent notions, and it is quite easy to show that there is a subset of #{0,1}# which is not “finite” with respect to strong notions of “finite” that are commonly used by constructivists. Today, though, I’ll stick with classical logic and classical set theory. In particular, I’m going to talk about the Internal Set Theory of Edward Nelson, or mostly the small fragment he used in Radically Elementary Probability Theory. In the first chapter of an unfinished book on Internal Set Theory, he states the following:

Perhaps it is fair to say that “finite” does not mean what we have always thought it to mean. What have we always thought it to mean? I used to think that I knew what I had always thought it to mean, but I no longer think so.

While it may be a bit strong to say that Internal Set Theory leads to some question about what “finite” means, I think it makes a good case for questioning what “finite set” means. These concerns are similar to the relativity of the notion of “(un)countable set”.

Over the years I’ve seen a lot of confusion about formal logic online. Usually this is from students (but I’ve also seen it with experienced mathematicians) on platforms like the Math or CS StackExchange. Now there is clearly some selection bias where the people asking the questions are the people who are confused, but while the questions *about* the confusions are common, the confusions are often evident even in questions about other aspects, the confusions are *in the (upvoted!) answers*, and when you look at the study material they are using it is often completely unsurprising that they are confused. Again, these confusions are also not limited to just “students”, though I’ll use that word in a broad sense below. Finally, on at least one of the points below, the confusion seems to be entirely on the instructors’ side.

To give an indication of the problem, here is my strong impression of what would happen if I gave a student who had just passed a full year introduction to formal logic the following exercise: “Give me a formal proof of #(neg P => neg Q) => (Q => P)#”. I suspect most would draw up a truth table. When I responded, “that’s not a formal proof,” they would be confused. If you are confused by that response, then this is definitely for you. I’m sure they would completely agree with me that if I asked for the inverse matrix of some matrix #M#, showing that the determinant of #M# is non-zero does not constitute an answer to that question. The parallelism of these scenarios would likely be lost on them though. While I think courses that focus on a syntactic approach to logic will produce students that are much more likely to give an appropriate answer to my exercise, that doesn’t mean they lack the confusions, just that this exercise isn’t a good discriminator for them. For example, if they don’t see the parallelism of the two scenarios I described, or worse, have no idea what truth tables have to do with anything, then they have some gap.

As a disclaimer, I am not an educator nor even an academic in any professional capacity.

As a summary, the points I will touch on are:

- Not
*continually*emphasizing the distinction between syntax and semantics. - Unnecessary philosophizing.
- The complete absence of non-classical logics.
- Silly linguistics exercises.
- The complete conflation of negation introduction with double negation elimination.
- Overuse of indirect proof.
- Some smaller issues that are not so bad and some missed opportunities.

If anyone reading this is aware of *introductory* textbooks or other resources (ideally freely available, but I’ll take a reasonably priced textbook too) that avoid most or all of the major issues I list and otherwise do a good job, I would be very interested. My own education on these topics has been piecemeal and eclectic and also strongly dependent on my background in programming. This leaves me with no reasonable recommendations. Please leave a comment or email me, if you have such a recommendation.

I couldn’t find an online version of Djinn, so I ran it through GHCJS with some tweaks, added a basic interface and hosted it here. It runs in your browser, so go crazy. If you want changes to the default settings or environment, feel free to suggest them. Right now everything is the default settings of the Djinn tool except that multiple results is enabled.

This will be a *very* non-traditional introduction to the ideas behind category theory. It will essentially be a slice through model theory (presented in a more programmer-friendly manner) with an unusual organization. Of course the end result will be *****SPOILER ALERT***** it was category theory all along. A secret decoder ring will be provided at the end. This approach is inspired by the notion of an internal logic/language and by Vaughn Pratt’s paper The Yoneda Lemma Without Category Theory.

I want to be very clear, though. This is not meant to be an analogy or an example or a guide for intuition. This *is* category theory. It is simply presented in a different manner.

Dan Doel pointed me at some draft lecture notes by Mike Shulman that seem very much in the spirit of this blog post (albeit aimed at an audience familiar with category theory): Categorical Logic from a Categorical Point of View. A theory in my sense corresponds to a category presentation (Definition 1.7.1) as defined by these lecture notes. Its oft-mentioned Appendix A will also look very familiar.

I’ve been watching the Spring 2012 lectures for MIT 6.851 Advanced Data Structures with Prof. Erik Demaine. In lecture 12, “Fusion Trees”, it mentions a constant time algorithm for finding the index of the first most significant 1 bit in a word, i.e. the binary logarithm. Assuming word operations are constant time, i.e. in the Word RAM model, the below algorithm takes 27 word operations (not counting copying). When I compiled it with GHC 8.0.1 -O2 the core of the algorithm was 44 straight-line instructions. The theoretically interesting thing is, other than changing the constants, the same algorithm works for any word size that’s an even power of 2. Odd powers of two need a slight tweak. This is demonstrated for `Word64`

, `Word32`

, and `Word16`

. It should be possible to do this for any arbitrary word size `w`

.

The `clz`

instruction can be used to implement this function, but this is a potential simulation if that or a similar instruction wasn’t available. It’s probably not the fastest way. Similarly, find first set and count trailing zeros can be implemented in terms of this operation.

Below is the complete code. You can also download it here.

```
{-# LANGUAGE BangPatterns #-}
import Data.Word
import Data.Bits
-- Returns 0-based bit index of most significant bit that is 1. Assumes input is non-zero.
-- That is, 2^indexOfMostSignificant1 x <= x < 2^(indexOfMostSignificant1 x + 1)
-- From Erik Demaine's presentation in Spring 2012 lectures of MIT 6.851, particularly "Lecture 12: Fusion Trees".
-- Takes 26 (source-level) straight-line word operations.
indexOfMostSignificant1 :: Word64 -> Word64
indexOfMostSignificant1 w = idxMsbyte .|. idxMsbit
where
-- top bits of each byte
!wtbs = w .&. 0x8080808080808080
-- all but top bits of each byte producing 8 7-bit chunks
!wbbs = w .&. 0x7F7F7F7F7F7F7F7F
-- parallel compare of each 7-bit chunk to 0, top bit set in result if 7-bit chunk was not 0
!pc = parallelCompare 0x8080808080808080 wbbs
-- top bit of each byte set if the byte has any bits set in w
!ne = wtbs .|. pc
-- a summary of which bytes (except the first) are non-zero as a 7-bit bitfield, i.e. top bits collected into bottom byte
!summary = sketch ne `unsafeShiftR` 1
-- parallel compare summary to powers of two
!cmpp2 = parallelCompare 0xFFBF9F8F87838180 (0x0101010101010101 * summary)
-- index of most significant non-zero byte * 8
!idxMsbyte = sumTopBits8 cmpp2
-- most significant 7-bits of most significant non-zero byte
!msbyte = ((w `unsafeShiftR` (fromIntegral idxMsbyte)) .&. 0xFF) `unsafeShiftR` 1
-- parallel compare msbyte to powers of two
!cmpp2' = parallelCompare 0xFFBF9F8F87838180 (0x0101010101010101 * msbyte)
-- index of most significant non-zero bit in msbyte
!idxMsbit = sumTopBits cmpp2'
-- Maps top bits of each byte into lower byte assuming all other bits are 0.
-- 0x2040810204081 = sum [2^j | j <- map (\i -> 49 - 7*i) [0..7]]
-- In general if w = 2^(2*k+p) and p = 0 or 1 the formula is:
-- sum [2^j | j <- map (\i -> w-(2^k-1) - 2^(k+p) - (2^(k+p) - 1)*i) [0..2^k-1]]
-- Followed by shifting right by w - 2^k
sketch w = (w * 0x2040810204081) `unsafeShiftR` 56
parallelCompare w1 w2 = complement (w1 - w2) .&. 0x8080808080808080
sumTopBits w = ((w `unsafeShiftR` 7) * 0x0101010101010101) `unsafeShiftR` 56
sumTopBits8 w = ((w `unsafeShiftR` 7) * 0x0808080808080808) `unsafeShiftR` 56
indexOfMostSignificant1_w32 :: Word32 -> Word32
indexOfMostSignificant1_w32 w = idxMsbyte .|. idxMsbit
where !wtbs = w .&. 0x80808080
!wbbs = w .&. 0x7F7F7F7F
!pc = parallelCompare 0x80808080 wbbs
!ne = wtbs .|. pc
!summary = sketch ne `unsafeShiftR` 1
!cmpp2 = parallelCompare 0xFF838180 (0x01010101 * summary)
!idxMsbyte = sumTopBits8 cmpp2
!msbyte = ((w `unsafeShiftR` (fromIntegral idxMsbyte)) .&. 0xFF) `unsafeShiftR` 1
!cmpp2' = parallelCompare 0x87838180 (0x01010101 * msbyte)
-- extra step when w is not an even power of two
!cmpp2'' = parallelCompare 0xFFBF9F8F (0x01010101 * msbyte)
!idxMsbit = sumTopBits cmpp2' + sumTopBits cmpp2''
sketch w = (w * 0x204081) `unsafeShiftR` 28
parallelCompare w1 w2 = complement (w1 - w2) .&. 0x80808080
sumTopBits w = ((w `unsafeShiftR` 7) * 0x01010101) `unsafeShiftR` 24
sumTopBits8 w = ((w `unsafeShiftR` 7) * 0x08080808) `unsafeShiftR` 24
indexOfMostSignificant1_w16 :: Word16 -> Word16
indexOfMostSignificant1_w16 w = idxMsnibble .|. idxMsbit
where !wtbs = w .&. 0x8888
!wbbs = w .&. 0x7777
!pc = parallelCompare 0x8888 wbbs
!ne = wtbs .|. pc
!summary = sketch ne `unsafeShiftR` 1
!cmpp2 = parallelCompare 0xFB98 (0x1111 * summary)
!idxMsnibble = sumTopBits4 cmpp2
!msnibble = ((w `unsafeShiftR` (fromIntegral idxMsnibble)) .&. 0xF) `unsafeShiftR` 1
!cmpp2' = parallelCompare 0xFB98 (0x1111 * msnibble)
!idxMsbit = sumTopBits cmpp2'
sketch w = (w * 0x249) `unsafeShiftR` 12
parallelCompare w1 w2 = complement (w1 - w2) .&. 0x8888
sumTopBits w = ((w `unsafeShiftR` 3) * 0x1111) `unsafeShiftR` 12
sumTopBits4 w = ((w `unsafeShiftR` 3) * 0x4444) `unsafeShiftR` 12
```

Programmers in typed languages with higher order functions and algebraic data types are already comfortable with most of the basic constructions of set/type theory. In categorical terms, those programmers are familiar with finite products and coproducts and (monoidal/cartesian) closed structure. The main omissions are subset types (equalizers/pullbacks) and quotient types (coequalizers/pushouts) which would round out limits and colimits. Not having a good grasp on either of these constructions dramatically shrinks the world of mathematics that is understandable, but while subset types are fairly straightforward, quotient types are quite a bit less intuitive.

In my opinion, most programmers can more or less immediately understand the notion of a subset type at an intuitive level.

A **subset type** is just a type combined with a predicate on that type that specifies which values of the type we want. For example, we may have something like `{ n:Nat | n /= 0 }`

meaning the type of naturals not equal to #0#. We may use this in the type of the division function for the denominator. Consuming a value of a subset type is easy, a natural not equal to #0# is still just a natural, and we can treat it as such. The difficult part is producing a value of a subset type. To do this, we must, of course, produce a value of the underlying type — `Nat`

in our example — but then we must further convince the type checker that the predicate holds (e.g. that the value does not equal #0#). Most languages provide no mechanism to prove potentially arbitrary facts about code, and this is why they do not support subset types. Dependently typed languages do provide such mechanisms and thus either have or can encode subset types. Outside of dependently typed languages the typical solution is to use an abstract data type and use a runtime check when values of that abstract data type are created.

The dual of subset types are quotient types. My impression is that this construction is the most difficult basic construction for people to understand. Further, programmers aren’t much better off, because they have little to which to connect the idea. Before I give a definition, I want to provide the example with which most people are familiar: modular (or clock) arithmetic. A typical way this is first presented is as a system where the numbers “wrap-around”. For example, in arithmetic mod #3#, we count #0#, #1#, #2#, and then wrap back around to #0#. Programmers are well aware that it’s not necessary to guarantee that an input to addition, subtraction, or multiplication mod #3# is either #0#, #1#, or #2#. Instead, the operation can be done and the `mod`

function can be applied at the end. This will give the same result as applying the `mod`

function to each argument at the beginning. For example, #4+7 = 11# and #11 mod 3 = 2#, and #4 mod 3 = 1# and #7 mod 3 = 1# and #1+1 = 2 = 11 mod 3#.

For mathematicians, the type of integers mod #n# is represented by the quotient type #ZZ//n ZZ#. The idea is that the values of #ZZ // n ZZ# are integers except that we agree that any two integers #a# and #b# are treated as equal if #a - b = kn# for some integer #k#. For #ZZ // 3 ZZ#, #… -6 = -3 = 0 = 3 = 6 = …# and #… = -5 = -2 = 1 = 4 = 7 = …# and #… = -4 = -1 = 2 = 5 = 8 = …#.

To start to formalize this, we need the notion of an equivalence relation. An **equivalence relation** is a binary relation `#(~~)#`

which is **reflexive** (#x ~~ x# for all #x#), **symmetric** (if `#x ~~ y#`

then `#y ~~ x#`

), and **transitive** (if `#x ~~ y#`

and `#y ~~ z#`

then `#x ~~ z#`

). We can check that “#a ~~ b# iff there exists an integer #k# such that #a-b = kn#” defines an equivalence relation on the integers for any given #n#. For reflexivity we have #a - a = 0n#. For symmetry we have if #a - b = kn# then #b - a = -kn#. Finally, for transitivity we have if #a - b = k_1 n# and #b - c = k_2 n# then #a - c = (k_1 + k_2)n# which we get by adding the preceding two equations.

Any relation can be extended to an equivalence relation. This is called the reflexive-, symmetric-, transitive-closure of the relation. For an arbitrary binary relation #R# we can define the equivalence relation #(~~_R)# via “#a ~~_R b# iff #a = b# or #R(a, b)# or #b ~~_R a# or #a ~~_R c and c ~~_R b# for some #c#“. To be precise, #~~_R# is the smallest relation satisfying those constraints. In Datalog syntax, this looks like:

```
eq_r(A, A).
eq_r(A, B) :- r(A, B).
eq_r(A, B) :- eq_r(B, A).
eq_r(A, B) :- eq_r(A, C), eq_r(C, B).
```

If #T# is a type, and `#(~~)#`

is an equivalence relation, we use #T // ~~# as the notation for the **quotient type**, which we read as “#T# quotiented by the equivalence relation `#(~~)#`

”. We call #T# the **underlying type** of the quotient type. We then say #a = b# at type #T // ~~# iff #a ~~ b#. Dual to subset types, to produce a value of a quotient type is easy. Any value of the underlying type is a value of the quotient type. (In type theory, this produces the perhaps surprising result that #ZZ# is a *subtype* of #ZZ // n ZZ#.) As expected, consuming a value of a quotient type is more complicated. To explain this, we need to explain what a function #f : T // ~~ -> X# is for some type #X#. A function #f : T // ~~ -> X# is a function #g : T -> X# which satisfies #g(a) = g(b)# for all #a# and #b# such that #a ~~ b#. We call #f# (or #g#, they are often conflated) **well-defined** if #g# satisfies this condition. In other words, any well-defined function that consumes a quotient type isn’t allowed to produce an output that distinguishes between equivalent inputs. A better way to understand this is that quotient types allow us to change what the notion of equality is for a type. From this perspective, a function being well-defined just means that it is a function. Taking equal inputs to equal outputs is one of the defining characteristics of a function.

Sometimes we can finesse needing to check the side condition. Any function #h : T -> B# gives rise to an equivalence relation on #T# via #a ~~ b# iff #h(a) = h(b)#. In this case, any function #g : B -> X# gives rise to a function #f : T // ~~ -> X# via #f = g @ h#. In particular, when #B = T# we are guaranteed to have a suitable #g# for any function #f : T // ~~ -> X#. In this case, we can implement quotient types in a manner quite similar subset types, namely we make an abstract type and we normalize with the #h# function as we either produce or consume values of the abstract type. A common example of this is rational numbers. We can reduce a rational number to lowest terms either when it’s produced or when the numerator or denominator get accessed, so that we don’t accidentally write functions which distinguish between #1/2# and #2/4#. For modular arithmetic, the mod by #n# function is a suitable #h#.

In set theory such an #h# function can always be made by mapping the elements of #T# to the equivalence classes that contain them, i.e. #a# gets mapped to #{b | a ~~ b}# which is called the **equivalence class** of #a#. In fact, in set theory, #T // ~~# is usually defined to *be* the set of equivalence classes of `#(~~)#`

. So, for the example of #ZZ // 3 ZZ#, in set theory, it is a set of exactly three elements: the elements are #{ 3n+k | n in ZZ}# for #k = 0, 1, 2#. Equivalence classes are also called **partitions** and are said to partition the underlying set. Elements of these equivalence classes are called **representatives** of the equivalence class. Often a notation like #[a]# is used for the equivalence class of #a#.

Here is a quick run-through of some significant applications of quotient types. I’ll give the underlying type and the equivalence relation and what the quotient type produces. I’ll leave it as an exercise to verify that the equivalence relations really are equivalence relations, i.e. reflexive, symmetric, and transitive. I’ll start with more basic examples. You should work through them to be sure you understand how they work.

Integers can be presented as pairs of naturals #(n, m)# with the idea being that the pair represents “#n - m#”. Of course, #1 - 2# should be the same as #2 - 3#. This is expressed as #(n_1, m_1) ~~ (n_2, m_2)# iff #n_1 + m_2 = n_2 + m_1#. Note how this definition only relies on operations on natural numbers. You can explore how to define addition, subtraction, multiplication, and other operations on this representation in a well-defined manner.

Rationals can be presented very similarly to integers, only with multiplication instead of addition. We also have pairs #(n, d)#, usually written #n/d#, in this case of an integer #n# and a non-zero natural #d#. The equivalence relation is #(n_1, d_1) ~~ (n_2, d_2)# iff #n_1 d_2 = n_2 d_1#.

We can extend the integers mod #n# to the continuous case. Consider the real numbers with the equivalence relation #r ~~ s# iff #r - s = k# for some integer #k#. You could call this the reals mod #1#. Topologically, this is a circle. If you walk along it far enough, you end up back at a point equivalent to where you started. Occasionally this is written as #RR//ZZ#.

Doing the previous example in 2D gives a torus. Specifically, we have pairs of real numbers and the equivalence relation #(x_1, y_1) ~~ (x_2, y_2)# iff #x_1 - x_2 = k# and #y_1 - y_2 = l# for some integers #k# and #l#. Quite a bit of topology relies on similar constructions as will be expanded upon on the section on gluing.

Here’s an example that’s a bit closer to programming. Consider the following equivalence relation on arbitrary pairs: #(a_1, b_1) ~~ (a_2, b_2)# iff #a_1 = a_2 and b_1 = b_2# or #a_1 = b_2 and b_1 = a_2#. This just says that a pair is equivalent to either itself, or a swapped version of itself. It’s interesting to consider what a well-defined function is on this type.^{1}

Returning to topology and doing a bit more involved construction, we arrive at gluing or pushouts. In topology, we often want to take two topological spaces and glue them together in some specified way. For example, we may want to take two discs and glue their boundaries together. This gives a sphere. We can combine two spaces into one with the disjoint sum (or coproduct, i.e. Haskell’s `Either`

type.) This produces a space that contains both the input spaces, but they don’t interact in any way. You can visualize them as sitting next to each other but not touching. We now want to say that certain pairs of points, one from each of the spaces, are really the same point. That is, we want to quotient by an equivalence relation that would identify those points. We need some mechanism to specify which points we want to identify. One way to accomplish this is to have a pair of functions, #f : C -> A# and #g : C -> B#, where #A# and #B# are the spaces we want to glue together. We can then define a relation #R# on the disjoint sum via #R(a, b)# iff there’s a #c : C# such that `#a = tt "inl"(f(c)) and b = tt "inr"(g(c))#`

. This is not an equivalence relation, but we can extend it to one. The quotient we get is then the gluing of #A# and #B# specified by #C# (or really by #f# and #g#). For our example of two discs, #f# and #g# are the same function, namely the inclusion of the boundary of the disc into the disc. We can also glue a space to itself. Just drop the disjoint sum part. Indeed, the circle and torus are examples.

We write #RR[X]# for the type of polynomials with one indeterminate #X# with real coefficients. For two indeterminates, we write #RR[X, Y]#. Values of these types are just polynomials such as #X^2 + 1# or #X^2 + Y^2#. We can consider quotienting these types by equivalence relations generated from identifications like #X^2 + 1 ~~ 0# or #X^2 - Y ~~ 0#, but we want more than just the reflexive-, symmetric-, transitive-closure. We want this equivalence relation to also respect the operations we have on polynomials, in particular, addition and multiplication. More precisely, we want if #a ~~ b# and #c ~~ d# then #ac ~~ bd# and similarly for addition. An equivalence relation that respects all operations is called a **congruence**. The standard notation for the quotient of #RR[X, Y]# by a congruence generated by both of the previous identifications is #RR[X, Y]//(X^2 + 1, X^2 - Y)#. Now if #X^2 + 1 = 0# in #RR[X, Y]//(X^2 + 1, X^2 - Y)#, then for *any* polynomial #P(X, Y)#, we have #P(X, Y)(X^2 + 1) = 0# because #0# times anything is #0#. Similarly, for any polynomial #Q(X, Y)#, #Q(X, Y)(X^2 - Y) = 0#. Of course, #0 + 0 = 0#, so it must be the case that #P(X, Y)(X^2 + 1) + Q(X, Y)(X^2 - Y) = 0# for all polynomials #P# and #Q#. In fact, we can show that all elements in the equivalence class of #0# are of this form. You’ve now motivated the concrete definition of a ring ideal and given it’s significance. An **ideal** is an equivalence class of #0# with respect to some congruence. Let’s work out what #RR[X, Y]//(X^2 + 1, X^2 - Y)# looks like concretely. First, since #X^2 - Y = 0#, we have #Y = X^2# and so we see that values of #RR[X, Y]//(X^2 + 1, X^2 - Y)# will be polynomials in only one indeterminate because we can replace all #Y#s with #X^2#s. Since #X^2 = -1#, we can see that all those polynomials will be linear (i.e. of degree 1) because we can just keep replacing #X^2#s with #-1#s, i.e. #X^(n+2) = X^n X^2 = -X^n#. The end result is that an arbitrary polynomial in #RR[X, Y]//(X^2 + 1, X^2 - Y)# looks like #a + bX# for real numbers #a# and #b# and we have #X^2 = -1#. In other words, #RR[X, Y]//(X^2 + 1, X^2 - Y)# is isomorphic to the complex numbers, #CC#.

As a reasonably simple exercise, given a polynomial #P(X) : RR[X]#, what does it get mapped to when embedded into #RR[X]//(X - 3)#, i.e. what is #[P(X)] : RR[X]//(X - 3)#?^{2}

Moving much closer to programming, we have a rather broad and important example that a mathematician might describe as free algebras modulo an equational theory. This example covers several of the preceding examples. In programmer-speak, a free algebra is just a type of abstract syntax trees for some language. We’ll call a specific abstract syntax tree a **term**. An equational theory is just a collection of pairs of terms with the idea being that we’d like these terms to be considered equal. To be a bit more precise, we will actually allow terms to contain (meta)variables. An example equation for an expression language might be `Add(`

#x#`,`

#x#`) = Mul(2,`

#x#`)`

. We call a term with no variables a **ground term**. We say a ground term **matches** another term if there is a consistent substitution for the variables that makes the latter term syntactically equal to the ground term. E.g. `Add(3, 3)`

matches `Add(`

#x#`,`

#x#`)`

via the substitution #x |->#`3`

. Now, the equations of our equational theory gives rise to a relation on ground terms #R(t_1, t_2)# iff there exists an equation #l = r# such that #t_1# matches #l# and #t_2# matches #r#. This relation can be extended to an equivalence relation on ground terms, and we can then quotient by that equivalence relation.

Let’s consider a worked example. We can consider the theory of monoids. We have two operations (types of AST nodes): `Mul(`

#x#`,`

#y#`)`

and `1`

. We have the following three equations: `Mul(1,`

#x#`) =`

#x#, `Mul(`

#x#`, 1) =`

#x#, and `Mul(Mul(`

#x#`,`

#y#`),`

#z#`) = Mul(`

#x#`, Mul(`

#y#`,`

#z#`))`

. We additionally have a bunch of constants subject to no equations. In this case, it turns out we can define a normalization function, what I called #h# far above, and that the quotient type is isomorphic to lists of constants. Now, we can extend this theory to the theory of groups by adding a new operation, `Inv(`

#x#`)`

, and new equations: `Inv(Inv(`

#x#`)) =`

#x#, `Inv(Mul(`

#x#`,`

#y#`)) = Mul(Inv(`

#y#`), Inv(`

#x#`))`

, and `Mul(Inv(`

#x#`),`

#x#`) = 1`

. If we ignore the last of these equations, you can show that we can normalize to a form that is isomorphic to a list of a disjoint sum of the constants, i.e. `[Either Const Const]`

in Haskell if `Const`

were the type of the constant terms. Quotienting this type by the equivalence relation extended with that final equality corresponds to adding the rule that a `Left c`

cancels out `Right c`

in the list whenever they are adjacent.

This overall example is a fairly profound one. Almost all of abstract algebra can be viewed as an instance of this or a closely related variation. When you hear about things defined in terms of “generators and relators”, it is an example of this sort. Indeed, those “relators” are used to define a relation that will be extended to an equivalence relation. Being defined in this way is arguably what it *means* for something to be “algebraic”.

The Introduction to Type Theory section of the NuPRL book provides a more comprehensive and somewhat more formal presentation of these and related concepts. While the quotient *type* view of quotients is conceptually different from the standard set theoretic presentation, it is much more amenable to computation as the #ZZ // n ZZ# example begins to illustrate.

I don’t believe classical logic is false; I just believe that it is not true.

Knockout is a nice JavaScript library for making values that automatically update when any of their “dependencies” update. Those dependencies can form an arbitrary directed acyclic graph. Many people seem to think of it as “yet another” templating library, but the core idea which is useful far beyond “templating” is the notion of observable values. One nice aspect is that it is a library and not a framework so you can use it as little or as much as you want and you can integrate it with other libraries and frameworks.

At any rate, this article is more geared toward those who have already decided on using Knockout or a library (in any language) offering similar capabilities. I strongly suspect the issues and solutions I’ll discuss apply to all similar sorts of libraries. While I’ll focus on one particular example, the ideas behind it apply generally. This example, admittedly, is one that almost anyone will implement, and in my experience will do it incorrectly the first time and won’t realize the problem until later.

When doing any front-end work, before long there will be a requirement to support “multi-select” of something. Of course, you want the standard select/deselect all functionality and for it to work correctly, and of course you want to do something with the items you’ve selected. Here’s a very simple example:

Number selected:

Item | |
---|---|

Here, the number selected is an overly simple example of using the selected items. More realistically, the selected items will trigger other items to show up and/or trigger AJAX requests to update the data or populate other data. The HTML for this example is completely straightforward.

```
<div id="#badExample">
Number selected: <span data-bind="text: $data.numberSelected()"></span>
<table>
<tr><th><input type="checkbox" data-bind="checked: $data.allSelected"/></th><th>Item</th></tr>
<!-- ko foreach: { data: $data.items(), as: '$item' } -->
<tr><td><input type="checkbox" data-bind="checked: $data.selected"/></td><td data-bind="text: 'Item number: '+$data.body"></td></tr>
<!-- /ko -->
<tr><td><button data-bind="click: function() { $data.add(); }">Add</button></td></tr>
</table>
</div>
```

The way nearly everyone (including me) first thinks to implement this is by adding a `selected`

observable to each item and then having `allSelected`

depend on all of the `selected`

observables. Since we also want to write to `allSelected`

to change the state of the `selected`

observables we use a writable computed observable. This computed observable will loop through all the items and check to see if they are all set to determine it’s state. When it is updated, it will loop through all the `selected`

observables and set them to the appropriate state. Here’s the full code listing.

```
var badViewModel = {
counter: 0,
items: ko.observableArray()
};
badViewModel.allSelected = ko.computed({
read: function() {
var items = badViewModel.items();
var allSelected = true;
for(var i = 0; i < items.length; i++) { // Need to make sure we depend on each item, so don't break out of loop early
allSelected = allSelected && items[i].selected();
}
return allSelected;
},
write: function(newValue) {
var items = badViewModel.items();
for(var i = 0; i < items.length; i++) {
items[i].selected(newValue);
}
}
});
badViewModel.numberSelected = ko.computed(function() {
var count = 0;
var items = badViewModel.items();
for(var i = 0; i < items.length; i++) {
if(items[i].selected()) count++;
}
return count;
});
badViewModel.add = function() {
badViewModel.items.push({
body: badViewModel.counter++,
selected: ko.observable(false)
});
};
ko.applyBindings(badViewModel, document.getElementById('#badExample'));
```

This should be relatively straightforward, and it works, so what’s the problem? The problem can be seen in `numberSelected`

(and it also comes up with `allSelected`

which I’ll get to momentarily). `numberSelected`

depends on *each* `selected`

observable and so it will be fired *each* time *each* one updates. That means if you have 100 items, and you use the select all checkbox, `numberSelected`

will be called 100 times. For this example, that doesn’t really matter. For a more realistic example than `numberSelected`

, this may mean rendering one, then two, then three, … then 100 HTML fragments or making 100 AJAX requests. In fact, this same behavior is present in `allSelected`

. When it is written, as it’s writing to the `selected`

observables, it is also triggering *itself*.

So the problem is updating `allSelected`

or `numberSelected`

can’t be done all at once, or to use database terminology, it can’t be updated atomically. One possible solution in newer versions of Knockout is to use `deferredUpdates`

or, what I did back in the much earlier versions of Knockout, abuse the rate limiting features. The problem with this solution is that it makes updates asynchronous. If you’ve written your code to not care whether it was called synchronously or asynchronously, then this will work fine. If you haven’t, doing this throws you into a world of shared state concurrency and race conditions. In this case, this solution is far worse than the disease.

So, what’s the alternative? We want to update all selected items atomically; we can atomically update a single observable; so we’ll put all selected items into a single observable. Now an item determines if it is selected by checking whether it is in the collection of selected items. More abstractly, we make our observables more coarse-grained, and we have a bunch of small computed observables depend on a large observable instead of a large computed observable depending on a bunch of small observables as we had in the previous code. Here’s an example using the exact same HTML and presenting the same overt behavior.

Number selected:

Item | |
---|---|

And here’s the code behind this second example:

```
var goodViewModel = {
counter: 0,
selectedItems: ko.observableArray(),
items: ko.observableArray()
};
goodViewModel.allSelected = ko.computed({
read: function() {
return goodViewModel.items().length === goodViewModel.selectedItems().length;
},
write: function(newValue) {
if(newValue) {
goodViewModel.selectedItems(goodViewModel.items().slice(0)); // Need a copy!
} else {
goodViewModel.selectedItems.removeAll();
}
}
});
goodViewModel.numberSelected = ko.computed(function() {
return goodViewModel.selectedItems().length;
});
goodViewModel.add = function() {
var item = { body: goodViewModel.counter++ }
item.selected = ko.computed({
read: function() {
return goodViewModel.selectedItems.indexOf(item) > -1;
},
write: function(newValue) {
if(newValue) {
goodViewModel.selectedItems.push(item);
} else {
goodViewModel.selectedItems.remove(item);
}
}
});
goodViewModel.items.push(item);
};
ko.applyBindings(goodViewModel, document.getElementById('#goodExample'));
```

One thing to note is that setting `allSelected`

and `numberSelected`

are now both simple operations. A write to an observable triggers a constant number of writes to other observables. In fact, there are only two (non-computed) observables. On the other hand, *reading* the `selected`

observable is more expensive. Toggling all items has quadratic complexity. In fact, it had quadratic complexity before due to the feedback. However, unlike the previous code, this *also* has quadratic complexity when any *individual* item is toggled. Unlike the previous code, though, this is simply due to a poor choice of data structure. Equipping each item with an “ID” field and using an object as a hash map would reduce the complexity to linear. In practice, for this sort of scenario, it tends not to make a big difference. Also, Knockout won’t trigger dependents if the value doesn’t change, so there’s no risk of the extra work propagating into still more extra work. Nevertheless, while I endorse this solution for this particular problem, in general making *finer* grained observables can help limit the scope of changes so unnecessary work isn’t done.

Still, the real concern and benefit of this latter approach isn’t the asymptotic complexity of the operations, but the *atomicity* of the operations. In the second solution, every update is atomic. There are no intermediate states on the way to a final state. This means that dependents, represented by `numberSelected`

but which are realistically much more complicated, don’t get triggered excessively and don’t need to “compensate” for unintended intermediate values.

We could take the coarse-graining to its logical conclusion and have the view model for an application be a single observable holding an object representing the entire view model (and containing no observables of its own). Taking this approach actually does have a lot of benefits, albeit there is little reason to use Knockout at that point. Instead this starts to lead to things like Facebook’s Flux pattern and the pattern perhaps most clearly articulated by Cycle JS.

For many people interested in type systems and type theory, their first encounter with the literature presents them with this:

`#frac(Gamma,x:tau_1 |--_Sigma e : tau_2)(Gamma |--_Sigma (lambda x:tau_1.e) : tau_1 -> tau_2) ->I#`

`#frac(Gamma |--_Sigma f : tau_1 -> tau_2 \qquad Gamma |--_Sigma x : tau_1)(Gamma |--_Sigma f x : tau_2) ->E#`

Since this notation is ubiquitous, authors (reasonably) expect readers to already be familiar with it and thus provide no explanation. Because the notation is ubiquitous, the beginner looking for alternate resources will not escape it. All they will find is that the notation is everywhere but exists in myriad minor variations which may or may not indicate significant differences. At this point the options are: 1) to muddle on and hope understanding the notation isn’t too important, 2) look for introductory resources which typically take the form of $50+ 500+ page textbooks, or 3) give up.

The goal of this article is to explain the notation part-by-part in common realizations, and to cover the main idea behind the notation which is the idea of an inductively defined relation. To eliminate ambiguity and make hand-waving impossible, I’ll ground the explanations in code, in particular, in Agda. That means for each example of the informal notation, there will be how it would be realized in Agda.^{1} It will become clear that I’m am not (just) using Agda as a formal notation to talk about these concepts, but that Agda’s^{2} data type mechanism directly captures them^{3}. The significance of this is that programmers are already familiar with many of the ideas behind the informal notation, and the notation is just obscuring this familiarity. Admittedly, Agda is itself pretty intimidating. I hope most of this article is accessible to those with familiarity with algebraic data types as they appear in Haskell, ML, Rust, or Swift with little to no need to look up details about Agda. Nevertheless, Agda at least has the benefit, when compared to the informal notation, of having a clear place to go to learn more, an unambiguous meaning, and tools that allow playing around with the ideas.

This is a simple mathematical thought experiment from Richard Hamming to demonstrate how poor our intuition for high dimensional spaces is. All that is needed is some basic, middle school level geometry and algebra.

Consider a 2x2 square centered at the origin. In each quadrant place circles as big as possible so that they fit in the square and don’t overlap. They’ll clearly have radius 1/2. See the image below. The question now is what’s the radius of the largest circle centered at the origin that doesn’t overlap the other circles.

It’s clear from symmetry that the inner circle is going to touch all the other circles at the same time, and it is clear that it is going to touch along the line from the origin to the center of one of the outer circles. So the radius of the inner circle, #r#, is just the distance from the origin to the center of one of the outer circles minus the radius of the outer circle, namely 1/2. As an equation:

`#r = sqrt(1/2^2 + 1/2^2) - 1/2 = sqrt(2)/2 - 1/2 ~~ 0.207106781#`

Now if we go to three dimensions we’ll have eight circles instead of four, but everything else is the same except the distances will now be `#sqrt(1/2^2 + 1/2^2 + 1/2^2)#`

. It’s clear that the only difference for varying dimensions is that in dimension #n# we’ll have #n# #1/2^2# terms under the square root sign. So the general solution is easily shown to be:

`#r = sqrt(n)/2 - 1/2#`

You should be weirded out now. If you aren’t, here’s a hint: what happens when #n = 10#? Here’s another hint: what happens as #n# approaches #oo#?

3blue1brown has a video describing this example and presenting one way of regaining intuition about it.