# Hedonistic Learning

Learning for the fun of it

I couldn’t find an online version of Djinn, so I ran it through GHCJS with some tweaks, added a basic interface and hosted it here. It runs in your browser, so go crazy. If you want changes to the default settings or environment, feel free to suggest them. Right now everything is the default settings of the Djinn tool except that multiple results is enabled.

## Overview

This will be a very non-traditional introduction to the ideas behind category theory. It will essentially be a slice through model theory (presented in a more programmer-friendly manner) with an unusual organization. Of course the end result will be ***SPOILER ALERT*** it was category theory all along. A secret decoder ring will be provided at the end. This approach is inspired by the notion of an internal logic/language and by Vaughn Pratt’s paper The Yoneda Lemma Without Category Theory.

I want to be very clear, though. This is not meant to be an analogy or an example or a guide for intuition. This is category theory. It is simply presented in a different manner.

Dan Doel pointed me at some draft lecture notes by Mike Shulman that seem very much in the spirit of this blog post (albeit aimed at an audience familiar with category theory): Categorical Logic from a Categorical Point of View. A theory in my sense corresponds to a category presentation (Definition 1.7.1) as defined by these lecture notes. Its oft-mentioned Appendix A will also look very familiar.

## Introduction

I’ve been watching the Spring 2012 lectures for MIT 6.851 Advanced Data Structures with Prof. Erik Demaine. In lecture 12, “Fusion Trees”, it mentions a constant time algorithm for finding the index of the first most significant 1 bit in a word, i.e. the binary logarithm. Assuming word operations are constant time, i.e. in the Word RAM model, the below algorithm takes 27 word operations (not counting copying). When I compiled it with GHC 8.0.1 -O2 the core of the algorithm was 44 straight-line instructions. The theoretically interesting thing is, other than changing the constants, the same algorithm works for any word size that’s an even power of 2. Odd powers of two need a slight tweak. This is demonstrated for `Word64`, `Word32`, and `Word16`. It should be possible to do this for any arbitrary word size `w`.

The `clz` instruction can be used to implement this function, but this is a potential simulation if that or a similar instruction wasn’t available. It’s probably not the fastest way. Similarly, find first set and count trailing zeros can be implemented in terms of this operation.

## Code

``````{- LANGUAGE BangPatterns -}
import Data.Word
import Data.Bits

-- Returns 0-based bit index of most significant bit that is 1. Assumes input is non-zero.
-- That is, 2^indexOfMostSignificant1 x <= x < 2^(indexOfMostSignificant1 x + 1)
-- From Erik Demaine's presentation in Spring 2012 lectures of MIT 6.851, particularly "Lecture 12: Fusion Trees".
-- Takes 26 (source-level) straight-line word operations.
indexOfMostSignificant1 :: Word64 -> Word64
indexOfMostSignificant1 w = idxMsbyte .$. idxMsbit where -- top bits of each byte !wtbs = w .&. 0x8080808080808080 -- all but top bits of each byte producing 8 7-bit chunks !wbbs = w .&. 0x7F7F7F7F7F7F7F7F -- parallel compare of each 7-bit chunk to 0, top bit set in result if 7-bit chunk was not 0 !pc = parallelCompare 0x8080808080808080 wbbs -- top bit of each byte set if the byte has any bits set in w !ne = wtbs .$. pc

-- a summary of which bytes (except the first) are non-zero as a 7-bit bitfield, i.e. top bits collected into bottom byte
!summary = sketch ne `unsafeShiftR` 1

-- parallel compare summary to powers of two
!cmpp2 = parallelCompare 0xFFBF9F8F87838180 (0x0101010101010101 * summary)

-- index of most significant non-zero byte * 8
!idxMsbyte = sumTopBits8 cmpp2

-- most significant 7-bits of most significant non-zero byte
!msbyte = ((w `unsafeShiftR` (fromIntegral idxMsbyte)) .&. 0xFF) `unsafeShiftR` 1

-- parallel compare msbyte to powers of two
!cmpp2' = parallelCompare 0xFFBF9F8F87838180 (0x0101010101010101 * msbyte)

-- index of most significant non-zero bit in msbyte
!idxMsbit = sumTopBits cmpp2'

-- Maps top bits of each byte into lower byte assuming all other bits are 0.
-- 0x2040810204081 = sum [2^j $j <- map (\i -> 49 - 7*i) [0..7]] -- In general if w = 2^(2*k+p) and p = 0 or 1 the formula is: -- sum [2^j$ j <- map (\i -> w-(2^k-1) - 2^(k+p) - (2^(k+p) - 1)*i) [0..2^k-1]]
-- Followed by shifting right by w - 2^k
sketch w = (w * 0x2040810204081) `unsafeShiftR` 56

parallelCompare w1 w2 = complement (w1 - w2) .&. 0x8080808080808080
sumTopBits w = ((w `unsafeShiftR` 7) * 0x0101010101010101) `unsafeShiftR` 56
sumTopBits8 w = ((w `unsafeShiftR` 7) * 0x0808080808080808) `unsafeShiftR` 56

indexOfMostSignificant1_w32 :: Word32 -> Word32
indexOfMostSignificant1_w32 w = idxMsbyte .$. idxMsbit where !wtbs = w .&. 0x80808080 !wbbs = w .&. 0x7F7F7F7F !pc = parallelCompare 0x80808080 wbbs !ne = wtbs .$. pc
!summary = sketch ne `unsafeShiftR` 1
!cmpp2 = parallelCompare 0xFF838180 (0x01010101 * summary)
!idxMsbyte = sumTopBits8 cmpp2
!msbyte = ((w `unsafeShiftR` (fromIntegral idxMsbyte)) .&. 0xFF) `unsafeShiftR` 1
!cmpp2' = parallelCompare 0x87838180 (0x01010101 * msbyte)

-- extra step when w is not an even power of two
!cmpp2'' = parallelCompare 0xFFBF9F8F (0x01010101 * msbyte)
!idxMsbit = sumTopBits cmpp2' + sumTopBits cmpp2''

sketch w = (w * 0x204081) `unsafeShiftR` 28
parallelCompare w1 w2 = complement (w1 - w2) .&. 0x80808080
sumTopBits w = ((w `unsafeShiftR` 7) * 0x01010101) `unsafeShiftR` 24
sumTopBits8 w = ((w `unsafeShiftR` 7) * 0x08080808) `unsafeShiftR` 24

indexOfMostSignificant1_w16 :: Word16 -> Word16
indexOfMostSignificant1_w16 w = idxMsnibble .$. idxMsbit where !wtbs = w .&. 0x8888 !wbbs = w .&. 0x7777 !pc = parallelCompare 0x8888 wbbs !ne = wtbs .$. pc
!summary = sketch ne `unsafeShiftR` 1
!cmpp2 = parallelCompare 0xFB98 (0x1111 * summary)
!idxMsnibble = sumTopBits4 cmpp2
!msnibble = ((w `unsafeShiftR` (fromIntegral idxMsnibble)) .&. 0xF) `unsafeShiftR` 1
!cmpp2' = parallelCompare 0xFB98 (0x1111 * msnibble)
!idxMsbit = sumTopBits cmpp2'

sketch w = (w * 0x249) `unsafeShiftR` 12
parallelCompare w1 w2 = complement (w1 - w2) .&. 0x8888
sumTopBits w = ((w `unsafeShiftR` 3) * 0x1111) `unsafeShiftR` 12
sumTopBits4 w = ((w `unsafeShiftR` 3) * 0x4444) `unsafeShiftR` 12``````

## Introduction

Programmers in typed languages with higher order functions and algebraic data types are already comfortable with most of the basic constructions of set/type theory. In categorical terms, those programmers are familiar with finite products and coproducts and (monoidal/cartesian) closed structure. The main omissions are subset types (equalizers/pullbacks) and quotient types (coequalizers/pushouts) which would round out limits and colimits. Not having a good grasp on either of these constructions dramatically shrinks the world of mathematics that is understandable, but while subset types are fairly straightforward, quotient types are quite a bit less intuitive.

## Subset Types

In my opinion, most programmers can more or less immediately understand the notion of a subset type at an intuitive level.
A subset type is just a type combined with a predicate on that type that specifies which values of the type we want. For example, we may have something like `{ n:Nat $n /= 0 }` meaning the type of naturals not equal to 0. We may use this in the type of the division function for the denominator. Consuming a value of a subset type is easy, a natural not equal to 0 is still just a natural, and we can treat it as such. The difficult part is producing a value of a subset type. To do this, we must, of course, produce a value of the underlying type — `Nat` in our example — but then we must further convince the type checker that the predicate holds (e.g. that the value does not equal 0). Most languages provide no mechanism to prove potentially arbitrary facts about code, and this is why they do not support subset types. Dependently typed languages do provide such mechanisms and thus either have or can encode subset types. Outside of dependently typed languages the typical solution is to use an abstract data type and use a runtime check when values of that abstract data type are created. ## Quotient Types The dual of subset types are quotient types. My impression is that this construction is the most difficult basic construction for people to understand. Further, programmers aren’t much better off, because they have little to which to connect the idea. Before I give a definition, I want to provide the example with which most people are familiar: modular (or clock) arithmetic. A typical way this is first presented is as a system where the numbers “wrap-around”. For example, in arithmetic mod 3, we count 0, 1, 2, and then wrap back around to 0. Programmers are well aware that it’s not necessary to guarantee that an input to addition, subtraction, or multiplication mod 3 is either 0, 1, or 2. Instead, the operation can be done and the `mod` function can be applied at the end. This will give the same result as applying the `mod` function to each argument at the beginning. For example, 4+7 = 11 and 11 mod 3 = 2, and 4 mod 3 = 1 and 7 mod 3 = 1 and 1+1 = 2 = 11 mod 3. For mathematicians, the type of integers mod n is represented by the quotient type ZZ//n ZZ. The idea is that the values of ZZ // n ZZ are integers except that we agree that any two integers a and b are treated as equal if a - b = kn for some integer k. For ZZ // 3 ZZ, … -6 = -3 = 0 = 3 = 6 = … and … = -5 = -2 = 1 = 4 = 7 = … and … = -4 = -1 = 2 = 5 = 8 = …. ## Equivalence Relations To start to formalize this, we need the notion of an equivalence relation. An equivalence relation is a binary relation `(~~)` which is reflexive (x ~~ x for all x), symmetric (if `x ~~ y` then `y ~~ x`), and transitive (if `x ~~ y` and `y ~~ z` then `x ~~ z`). We can check that “a ~~ b iff there exists an integer k such that a-b = kn” defines an equivalence relation on the integers for any given n. For reflexivity we have a - a = 0n. For symmetry we have if a - b = kn then b - a = -kn. Finally, for transitivity we have if a - b = k_1 n and b - c = k_2 n then a - c = (k_1 + k_2)n which we get by adding the preceding two equations. Any relation can be extended to an equivalence relation. This is called the reflexive-, symmetric-, transitive-closure of the relation. For an arbitrary binary relation R we can define the equivalence relation (~~_R) via “a ~~_R b iff a = b or R(a, b) or b ~~_R a or a ~~_R c and c ~~_R b for some c“. To be precise, ~~_R is the smallest relation satisfying those constraints. In Datalog syntax, this looks like: ``````eq_r(A, A). eq_r(A, B) :- r(A, B). eq_r(A, B) :- eq_r(B, A). eq_r(A, B) :- eq_r(A, C), eq_r(C, B).`````` ## Quotient Types: the Type Theory view If T is a type, and `(~~)` is an equivalence relation, we use T // ~~ as the notation for the quotient type, which we read as “T quotiented by the equivalence relation `(~~)`”. We call T the underlying type of the quotient type. We then say a = b at type T // ~~ iff a ~~ b. Dual to subset types, to produce a value of a quotient type is easy. Any value of the underlying type is a value of the quotient type. (In type theory, this produces the perhaps surprising result that ZZ is a subtype of ZZ // n ZZ.) As expected, consuming a value of a quotient type is more complicated. To explain this, we need to explain what a function f : T // ~~ -> X is for some type X. A function f : T // ~~ -> X is a function g : T -> X which satisfies g(a) = g(b) for all a and b such that a ~~ b. We call f (or g, they are often conflated) well-defined if g satisfies this condition. In other words, any well-defined function that consumes a quotient type isn’t allowed to produce an output that distinguishes between equivalent inputs. A better way to understand this is that quotient types allow us to change what the notion of equality is for a type. From this perspective, a function being well-defined just means that it is a function. Taking equal inputs to equal outputs is one of the defining characteristics of a function. Sometimes we can finesse needing to check the side condition. Any function h : T -> B gives rise to an equivalence relation on T via a ~~ b iff h(a) = h(b). In this case, any function g : B -> X gives rise to a function f : T // ~~ -> X via f = g @ h. In particular, when B = T we are guaranteed to have a suitable g for any function f : T // ~~ -> X. In this case, we can implement quotient types in a manner quite similar subset types, namely we make an abstract type and we normalize with the h function as we either produce or consume values of the abstract type. A common example of this is rational numbers. We can reduce a rational number to lowest terms either when it’s produced or when the numerator or denominator get accessed, so that we don’t accidentally write functions which distinguish between 1/2 and 2/4. For modular arithmetic, the mod by n function is a suitable h. ## Quotient Types: the Set Theory view In set theory such an h function can always be made by mapping the elements of T to the equivalence classes that contain them, i.e. a gets mapped to {b$ a ~~ b} which is called the equivalence class of a. In fact, in set theory, T // ~~ is usually defined to be the set of equivalence classes of `(~~)`. So, for the example of ZZ // 3 ZZ, in set theory, it is a set of exactly three elements: the elements are { 3n+k $n in ZZ} for k = 0, 1, 2. Equivalence classes are also called partitions and are said to partition the underlying set. Elements of these equivalence classes are called representatives of the equivalence class. Often a notation like [a] is used for the equivalence class of a. ## More Examples Here is a quick run-through of some significant applications of quotient types. I’ll give the underlying type and the equivalence relation and what the quotient type produces. I’ll leave it as an exercise to verify that the equivalence relations really are equivalence relations, i.e. reflexive, symmetric, and transitive. I’ll start with more basic examples. You should work through them to be sure you understand how they work. ### Integers Integers can be presented as pairs of naturals (n, m) with the idea being that the pair represents “n - m”. Of course, 1 - 2 should be the same as 2 - 3. This is expressed as (n_1, m_1) ~~ (n_2, m_2) iff n_1 + m_2 = n_2 + m_1. Note how this definition only relies on operations on natural numbers. You can explore how to define addition, subtraction, multiplication, and other operations on this representation in a well-defined manner. ### Rationals Rationals can be presented very similarly to integers, only with multiplication instead of addition. We also have pairs (n, d), usually written n/d, in this case of an integer n and a non-zero natural d. The equivalence relation is (n_1, d_1) ~~ (n_2, d_2) iff n_1 d_2 = n_2 d_1. ### (Topological) Circles We can extend the integers mod n to the continuous case. Consider the real numbers with the equivalence relation r ~~ s iff r - s = k for some integer k. You could call this the reals mod 1. Topologically, this is a circle. If you walk along it far enough, you end up back at a point equivalent to where you started. Occasionally this is written as RR//ZZ. ### Torii Doing the previous example in 2D gives a torus. Specifically, we have pairs of real numbers and the equivalence relation (x_1, y_1) ~~ (x_2, y_2) iff x_1 - x_2 = k and y_1 - y_2 = l for some integers k and l. Quite a bit of topology relies on similar constructions as will be expanded upon on the section on gluing. ### Unordered pairs Here’s an example that’s a bit closer to programming. Consider the following equivalence relation on arbitrary pairs: (a_1, b_1) ~~ (a_2, b_2) iff a_1 = a_2 and b_1 = b_2 or a_1 = b_2 and b_1 = a_2. This just says that a pair is equivalent to either itself, or a swapped version of itself. It’s interesting to consider what a well-defined function is on this type.1 ### Gluing / Pushouts Returning to topology and doing a bit more involved construction, we arrive at gluing or pushouts. In topology, we often want to take two topological spaces and glue them together in some specified way. For example, we may want to take two discs and glue their boundaries together. This gives a sphere. We can combine two spaces into one with the disjoint sum (or coproduct, i.e. Haskell’s `Either` type.) This produces a space that contains both the input spaces, but they don’t interact in any way. You can visualize them as sitting next to each other but not touching. We now want to say that certain pairs of points, one from each of the spaces, are really the same point. That is, we want to quotient by an equivalence relation that would identify those points. We need some mechanism to specify which points we want to identify. One way to accomplish this is to have a pair of functions, f : C -> A and g : C -> B, where A and B are the spaces we want to glue together. We can then define a relation R on the disjoint sum via R(a, b) iff there’s a c : C such that `a = tt "inl"(f(c)) and b = tt "inr"(g(c))`. This is not an equivalence relation, but we can extend it to one. The quotient we get is then the gluing of A and B specified by C (or really by f and g). For our example of two discs, f and g are the same function, namely the inclusion of the boundary of the disc into the disc. We can also glue a space to itself. Just drop the disjoint sum part. Indeed, the circle and torus are examples. ### Polynomial ring ideals We write RR[X] for the type of polynomials with one indeterminate X with real coefficients. For two indeterminates, we write RR[X, Y]. Values of these types are just polynomials such as X^2 + 1 or X^2 + Y^2. We can consider quotienting these types by equivalence relations generated from identifications like X^2 + 1 ~~ 0 or X^2 - Y ~~ 0, but we want more than just the reflexive-, symmetric-, transitive-closure. We want this equivalence relation to also respect the operations we have on polynomials, in particular, addition and multiplication. More precisely, we want if a ~~ b and c ~~ d then ac ~~ bd and similarly for addition. An equivalence relation that respects all operations is called a congruence. The standard notation for the quotient of RR[X, Y] by a congruence generated by both of the previous identifications is RR[X, Y]//(X^2 + 1, X^2 - Y). Now if X^2 + 1 = 0 in RR[X, Y]//(X^2 + 1, X^2 - Y), then for any polynomial P(X, Y), we have P(X, Y)(X^2 + 1) = 0 because 0 times anything is 0. Similarly, for any polynomial Q(X, Y), Q(X, Y)(X^2 - Y) = 0. Of course, 0 + 0 = 0, so it must be the case that P(X, Y)(X^2 + 1) + Q(X, Y)(X^2 - Y) = 0 for all polynomials P and Q. In fact, we can show that all elements in the equivalence class of 0 are of this form. You’ve now motivated the concrete definition of a ring ideal and given it’s significance. An ideal is an equivalence class of 0 with respect to some congruence. Let’s work out what RR[X, Y]//(X^2 + 1, X^2 - Y) looks like concretely. First, since X^2 - Y = 0, we have Y = X^2 and so we see that values of RR[X, Y]//(X^2 + 1, X^2 - Y) will be polynomials in only one indeterminate because we can replace all Ys with X^2s. Since X^2 = -1, we can see that all those polynomials will be linear (i.e. of degree 1) because we can just keep replacing X^2s with -1s, i.e. X^(n+2) = X^n X^2 = -X^n. The end result is that an arbitrary polynomial in RR[X, Y]//(X^2 + 1, X^2 - Y) looks like a + bX for real numbers a and b and we have X^2 = -1. In other words, RR[X, Y]//(X^2 + 1, X^2 - Y) is isomorphic to the complex numbers, CC. As a reasonably simple exercise, given a polynomial P(X) : RR[X], what does it get mapped to when embedded into RR[X]//(X - 3), i.e. what is [P(X)] : RR[X]//(X - 3)?2 ### Free algebras modulo an equational theory Moving much closer to programming, we have a rather broad and important example that a mathematician might describe as free algebras modulo an equational theory. This example covers several of the preceding examples. In programmer-speak, a free algebra is just a type of abstract syntax trees for some language. We’ll call a specific abstract syntax tree a term. An equational theory is just a collection of pairs of terms with the idea being that we’d like these terms to be considered equal. To be a bit more precise, we will actually allow terms to contain (meta)variables. An example equation for an expression language might be `Add(`x`,`x`) = Mul(2,`x`)`. We call a term with no variables a ground term. We say a ground term matches another term if there is a consistent substitution for the variables that makes the latter term syntactically equal to the ground term. E.g. `Add(3, 3)` matches `Add(`x`,`x`)` via the substitution x$->`3`. Now, the equations of our equational theory gives rise to a relation on ground terms R(t_1, t_2) iff there exists an equation l = r such that t_1 matches l and t_2 matches r. This relation can be extended to an equivalence relation on ground terms, and we can then quotient by that equivalence relation.

Let’s consider a worked example. We can consider the theory of monoids. We have two operations (types of AST nodes): `Mul(`x`,`y`)` and `1`. We have the following three equations: `Mul(1,`x`) =`x, `Mul(`x`, 1) =`x, and `Mul(Mul(`x`,`y`),`z`) = Mul(`x`, Mul(`y`,`z`))`. We additionally have a bunch of constants subject to no equations. In this case, it turns out we can define a normalization function, what I called h far above, and that the quotient type is isomorphic to lists of constants. Now, we can extend this theory to the theory of groups by adding a new operation, `Inv(`x`)`, and new equations: `Inv(Inv(`x`)) =`x, `Inv(Mul(`x`,`y`)) = Mul(Inv(`y`), Inv(`x`))`, and `Mul(Inv(`x`),`x`) = 1`. If we ignore the last of these equations, you can show that we can normalize to a form that is isomorphic to a list of a disjoint sum of the constants, i.e. `[Either Const Const]` in Haskell if `Const` were the type of the constant terms. Quotienting this type by the equivalence relation extended with that final equality corresponds to adding the rule that a `Left c` cancels out `Right c` in the list whenever they are adjacent.

This overall example is a fairly profound one. Almost all of abstract algebra can be viewed as an instance of this or a closely related variation. When you hear about things defined in terms of “generators and relators”, it is an example of this sort. Indeed, those “relators” are used to define a relation that will be extended to an equivalence relation. Being defined in this way is arguably what it means for something to be “algebraic”.

## Postscript

The Introduction to Type Theory section of the NuPRL book provides a more comprehensive and somewhat more formal presentation of these and related concepts. While the quotient type view of quotients is conceptually different from the standard set theoretic presentation, it is much more amenable to computation as the ZZ // n ZZ example begins to illustrate.

1. It’s a commutative function.

2. It gets mapped to it’s value at 3, i.e. P(3).

## Constructivist Motto

I don’t believe classical logic is false; I just believe that it is not true.

## The Mistake Everyone Makes with KnockoutJS

### Introduction

Knockout is a nice JavaScript library for making values that automatically update when any of their “dependencies” update. Those dependencies can form an arbitrary directed acyclic graph. Many people seem to think of it as “yet another” templating library, but the core idea which is useful far beyond “templating” is the notion of observable values. One nice aspect is that it is a library and not a framework so you can use it as little or as much as you want and you can integrate it with other libraries and frameworks.

At any rate, this article is more geared toward those who have already decided on using Knockout or a library (in any language) offering similar capabilities. I strongly suspect the issues and solutions I’ll discuss apply to all similar sorts of libraries. While I’ll focus on one particular example, the ideas behind it apply generally. This example, admittedly, is one that almost anyone will implement, and in my experience will do it incorrectly the first time and won’t realize the problem until later.

### The Problem

When doing any front-end work, before long there will be a requirement to support “multi-select” of something. Of course, you want the standard select/deselect all functionality and for it to work correctly, and of course you want to do something with the items you’ve selected. Here’s a very simple example:

Number selected:
Item

Here, the number selected is an overly simple example of using the selected items. More realistically, the selected items will trigger other items to show up and/or trigger AJAX requests to update the data or populate other data. The HTML for this example is completely straightforward.

``````<div id="badExample">
Number selected: <span data-bind="text: $data.numberSelected()"></span> <table> <tr><th><input type="checkbox" data-bind="checked:$data.allSelected"/></th><th>Item</th></tr>
<!-- ko foreach: { data: $data.items(), as: '$item' } -->
<tr><td><input type="checkbox" data-bind="checked: $data.selected"/></td><td data-bind="text: 'Item number: '+$data.body"></td></tr>
<!-- /ko -->

## High dimensional thought experiment from Hamming

This is a simple mathematical thought experiment from Richard Hamming to demonstrate how poor our intuition for high dimensional spaces is. All that is needed is some basic, middle school level geometry and algebra.

Consider a 2x2 square centered at the origin. In each quadrant place circles as big as possible so that they fit in the square and don’t overlap. They’ll clearly have radius 1/2. See the image below. The question now is what’s the radius of the largest circle centered at the origin that doesn’t overlap the other circles.

It’s clear from symmetry that the inner circle is going to touch all the other circles at the same time, and it is clear that it is going to touch along the line from the origin to the center of one of the outer circles. So the radius of the inner circle, r, is just the distance from the origin to the center of one of the outer circles minus the radius of the outer circle, namely 1/2. As an equation:

`r = sqrt(1/2^2 + 1/2^2) - 1/2 = sqrt(2)/2 - 1/2 ~~ 0.207106781`

Now if we go to three dimensions we’ll have eight circles instead of four, but everything else is the same except the distances will now be `sqrt(1/2^2 + 1/2^2 + 1/2^2)`. It’s clear that the only difference for varying dimensions is that in dimension n we’ll have n 1/2^2 terms under the square root sign. So the general solution is easily shown to be:

`r = sqrt(n)/2 - 1/2`

You should be weirded out now. If you aren’t, here’s a hint: what happens when n = 10? Here’s another hint: what happens as n approaches oo?

3blue1brown has a video describing this example and presenting one way of regaining intuition about it.

## Behavioral Reflection

The ultimate goal of behavioral reflection (aka procedural reflection and no doubt other things) is to make a language where programs within the language are able to completely redefine the language as it executes. This is arguably the pinnacle of expressive power. This also means, though, local reasoning about code is utterly impossible, essentially by design.

Smalltalk is probably the language closest to this that is widely used. The Common Lisp MOP (Meta-Object Protocol) is also inspired by research in this vein. The ability to mutate classes and handle calls to missing methods as present in, e.g. Ruby, are also examples of very limited forms of behavioral reflection. (Though, for the latter, often the term “structural reflection” is used instead, reserving “behavioral reflection” for things beyond mere structural reflection.)

### Very Brief History

The seminal reference for behavioral reflection is Brian Smith’s 1982 thesis on 3-LISP. This was followed by the languages Brown, Blond, Refci, and Black in chronological order. This is not a comprehensive list, and the last time I was in to this was about a decade ago. (Sure enough, there is some new work on Black and some very new citations of Black.)

### Core Idea

The core idea is simply to expose the state of the (potentially conceptual) interpreter and allow it to be manipulated.

Here’s a small example of why people find category theory interesting. Consider the category, call it ccC, consisting of two objects, which we’ll imaginatively name 0 and 1, and two non-identity arrows s,t : 0 -> 1. The category of graphs (specifically directed multigraphs with loops) is the category of presheaves over ccC, which is to say, it’s the category of contravariant functors from ccC to `bb "Set"`. I’ll spell out what that means momentarily, but first think about what has been done. We’ve provided a complete definition of not only graphs but the entire category of graphs and graph homomorphisms given you’re familiar with very basic category theory1. This is somewhat magical.
Spelling things out, G is a contravariant functor from ccC to `bb "Set"`. This means that G takes each object of ccC to a set, i.e. G(0) = V and G(1) = E for arbitrary sets V and E, but which will represent the vertices and edges of our graph. In addition, G takes the arrows in ccC to a (set) function, but, because it’s contravariant, it flips the arrows around. So we have, for example, G(s) : G(1) -> G(0) or G(s) : E -> V, and similarly for t. G(s) is a function that takes an edge and gives its source vertex, and G(t) gives the target. The arrows in the category of graphs are just natural transformations between the functors. In this case, that means if we have two graphs G and H, a graph homomorphism f : G -> H is a pair of functions f_0 : G(0) -> H(0) and f_1 : G(1) -> H(1) that satisfy H(s) @ f_1 = f_0 @ G(s) and H(t) @ f_1 = f_0 @ G(t). With a bit of thought you can see that all this says is that we must map the source and target of an edge e to the source and target of the edge f_1(e)#. You may also have noticed that any pair of sets and pair of functions between them is a graph. In other words, graphs are very simple things. This takes out a lot of the magic, though it is nice that we get the right notion of graph homomorphism for free. Let me turn the magic up to 11.