Here’s a small example of why people find category theory interesting. Consider the
category, call it |\mathcal C|, consisting of two objects, which we’ll imaginatively name |0| and |1|, and two
non-identity arrows |s,t : 0 \to 1|. The category of graphs (specifically directed multigraphs with loops)
is the category of presheaves over |\mathcal C|, which is to say, it’s the category of contravariant functors
from |\mathcal C| to |\mathbf{Set}|. I’ll spell out what that means momentarily, but first think
about what has been done. We’ve provided a complete definition of not only graphs but the entire category
of graphs and graph homomorphisms given you’re familiar with very basic category theory^{1}. This
is somewhat magical.

Spelling things out, |G| is a contravariant functor from |\mathcal C| to |\mathbf{Set}|. This means that
|G| takes each object of |\mathcal C| to a set, i.e. |G(0) = V| and |G(1) = E| for arbitrary sets |V| and |E|, but
which will represent the vertices and edges of our graph. In addition, |G| takes the arrows in |\mathcal C| to a
(set) function, but, because it’s contravariant, it flips the arrows around. So we have, for example,
|G(s) : G(1) \to G(0)| or |G(s) : E \to V|, and similarly for |t|. |G(s)| is a function that takes an edge
and gives its source vertex, and |G(t)| gives the target. The arrows in the category of graphs are just
natural transformations between the functors. In this case, that means if we have two graphs |G| and |H|,
a graph homomorphism |f : G \to H| is a pair of functions |f_0 : G(0) \to H(0)| and |f_1 : G(1) \to H(1)| that
satisfy |H(s) \circ f_1 = f_0 \circ G(s)| and |H(t) \circ f_1 = f_0 \circ G(t)|. With a bit of thought you can see that all
this says is that we must map the source and target of an edge |e| to the source and target of the edge
|f_1(e)|. You may also have noticed that *any* pair of sets and pair of functions between them is a graph.
In other words, graphs are very simple things. This takes out a lot of the magic, though it is nice
that we get the right notion of graph homomorphism for free. Let me turn the magic up to 11.

Presheaves are extremely important in category theory, so categorists know a lot about them. It turns out
that they have a lot of structure (mostly because |\mathbf{Set}| has a lot of structure). In
particular, in the categorical jargon, if |\mathcal C| is small (and finite is definitely small), then the
category of presheaves over |\mathcal C| is a complete and cocomplete elementary topos with a natural number object.
For those of you learning category theory who’ve gotten beyond the basics, proving this is a *very* good and
important exercise. In programmer-speak, that says we can do dependently typed lambda calculus with algebraic
data and codata types in that category. To be clear, this is not saying we have a lambda calculus with a
graph type. It’s saying we have a lambda calculus where *all* our types have graph structure.

But set that aside for now. This isn’t an article about graphs so let’s start generalizing away from them.
We’ll start with a baby step. You can probably guess how we’d go about making edge labeled graphs. We’d
add another object to |\mathcal C|, say |2|, and an arrow from |2| to |1|, call it |l_e| — remember that the
arrows are “backwards” because presheaves are contravariant. You may start seeing a pattern already. One
way to start formalizing that pattern is to say for every object of |\mathcal C| we have an abstract data type
with a field for each arrow into that object. In our example, |1| has three arrows into it, namely |s|, |t|,
and |l_e|. We can view an element |e \in F(1)| for a presheaf |F| as having fields `e.s`

, `e.t`

and
`e.l_e`

. Since |\mathbf{Set}| has (non-constructive, non-computable) decidable equality for
everything, we can tell |e| from |e’| even if `e.s = e'.s`

and `e.t = e'.t`

and `e.l_e = e'.l_e`

. This
is different from a normal abstract data type (in a purely functional language) where an abstract data type
with three fields would be isomorphic to a 3-tuple. The extra ability to differentiate them could be modeled
by having a fourth field that returned something that could only be compared for equality, kind of like
Data.Unique (ignoring the `Ord`

instance…) It may look like:

```
data Vertex -- abstract
vertexId :: Vertex -> Unique
data Label -- abstract
labelId :: Label -> Unique
data Edge -- abstract
edgeId :: Edge -> Unique
src :: Edge -> Vertex
tgt :: Edge -> Vertex
label :: Edge -> Label
```

Of course, if these are literally all the functions we have for these types (plus some constants otherwise we can’t make anything), then these abstract types might as well be records. Anything else they have is unobservable and thus superfluous.

```
data Vertex = Vertex { vertexId :: Unique }
data EdgeLabel = EdgeLabel { edgeLabelId :: Unique }
data Edge = Edge {
edgeId :: Unique,
src :: Vertex,
tgt :: Vertex,
edgeLabel :: EdgeLabel
}
```

This starts to suggest that we can just turn each object in our category |\mathcal C| into a record with an ID field. Each arrow in to that object becomes an additional field. This almost works. We’ll come back to this, but let’s take another baby step.

Something a bit more interesting happens if we want to label the vertices. We proceed as before: add another
object, call it |3|, and another arrow |l_v : 3 \to 0|. This time, though, we’re not finished. |\mathcal C| is
supposed to be a category, so we can compose arrows and thus we have two composites but no arrow for the
composites to be, namely: |s \circ l_v| and |t \circ l_v|. The abstract data type intuition suggests that
whenever we run into this situation, we should add a distinct arrow for each composite. For the purpose
of labeling vertices, this is the right way to go: we want to think of each vertex as having a vertex
label field with no constraints. There is, however, another choice. We could add *one* new arrow and have
both composites be equal to it. What that would say is that for every edge, the source and the target
vertices must have the same label. It’s easy to see that that would lead to every vertex in a connected
component having the same label. In code, the former choice would look like:

```
data VertexLabel = VertexLabel { vertexLabelId :: Unique }
data Vertex = Vertex {
vertexId :: Unique,
vertexLabel :: VertexLabel
}
data EdgeLabel = EdgeLabel { edgeLabelId :: Unique }
data Edge = Edge {
edgeId :: Unique,
src :: Vertex,
tgt :: Vertex,
edgeLabel :: EdgeLabel
-- With our earlier "rule" to add a field for each arrow
-- we should also have:
-- srcVertexLabel :: VertexLabel
-- tgtVertexLabel :: VertexLabel
-- but, by definition, these are:
-- vertexLabel . src and
-- vertexLabel . tgt respectively.
-- So we don't explicitly add a field for composite arrows.
}
```

The latter choice would look the same, except there would be an extra constraint that we can’t
capture in Haskell, namely `vertexLabel . src == vertexLabel . tgt`

.

If we stick to the abstract data type intuition and we have a cycle in the arrows in |\mathcal C|, then the
requirement to add a distinct arrow for each composite means we need to add a countably infinite number
of arrows, so |\mathcal C| is no longer finite. It is still “small” though, so that’s no problem. We could
say we have a finite *graph* of the non-composite arrows and possibly some equations like
|s \circ l_v = t \circ l_v| and we generate a category from that graph subject to those equations by adding in
all composites as necessary. We will use the arrows in this graph to decide on the fields for our
abstract data types, rather than the arrows in the category so we don’t end up with an infinity of fields
in our data types. Even when there aren’t an infinite number of arrows in our category, this is still
useful since we aren’t going to add a field to our edges to hold each vertex’s label, since we can just
get the vertex and then get the label.

Some of you have probably thought of another more appropriate intuition. A presheaf is a database.
The category which our presheaf is over, what we’ve been calling |\mathcal C| ^{2} is sort of like a schema. You don’t specify types, but you do
specify foreign key relationships plus some additional integrity constraints that don’t fit in to the
typical ones supported by databases.

For our earlier example:

```
CREATE VertexLabel (Id uniqueidentifier PRIMARY KEY);
CREATE Vertex (
Id uniqueidentifier PRIMARY KEY,
Label uniqueidentifier REFERENCES VertexLabel(Id)
);CREATE EdgeLabel (Id uniqueidentifier PRIMARY KEY);
CREATE Edge (
Id uniqueidentifier PRIMARY KEY,
REFERENCES VertexLabel(Id),
Src uniqueidentifier REFERENCES VertexLabel(Id),
Tgt uniqueidentifier Label uniqueidentifier REFERENCES EdgeLabel(Id)
-- Nothing stops us from having additional columns here and elsewhere
-- corresponding to the fact that our types were really abstract data
-- types in the Haskell, and to the fact that we can choose an arbitrary
-- set as long as it has at least this much structure. They won't be
-- preserved by "homomorphisms" though.
);
```

To be clear, each presheaf corresponds to a database *with data*. Inserting a row into a table is a
homomorphism of presheaves, but so is adding a (not preserved by homomorphism) column. The schema
above corresponds to the |\mathcal C| part of the presheaf.

If we had made that second choice before where we had only *one* arrow back that would lead to
the following integrity constraint:

```
NOT EXISTS(SELECT *
FROM Edge e
JOIN Vertex src ON e.Src = src.Id
JOIN Vertex tgt ON e.Tgt = tgt.Id
WHERE src.Label <> tgt.Label)
```

It turns out this intuition is completely general. You can actually think of all of presheaf theory
as a (slightly unusual) database theory. More objects just means more tables. More arrows means more
foreign keys and potentially additional integrity constraints. It’s not exactly relational though. In
fact, in some ways it’s a bit closer to SQL^{3} or object databases. You
can turn this around too going back to the point at the beginning. Whether or not we can think of *all*
presheaves like databases or all databases like presheaves, we can certainly think of *this* example, and
many like it, as presheaves. This means for many “databases”, we can talk about doing dependently typed
programming where our types *are* databases of the given shape. This also dramatically shifts the focus
in database theory from databases to database migrations, i.e. homomorphisms of databases.

David Spivak is the one who has done the most on this, so I’ll refer you to his work. He also has a better story for schemas that are much closer to normal schemas. I’ll end as I began:

Here’s a small example of why people find category theory interesting.